Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.pioneer.ai/llms.txt

Use this file to discover all available pages before exploring further.

Point your OpenAI, Anthropic, or other client at Pioneer. We find where your current model falls short, then build and route to small specialist models that are more accurate, cheaper, and faster — automatically.

What you can do with Pioneer

Drop in, inference like normal

Point your existing OpenAI or Anthropic client at Pioneer — same API, same code. No migration required. Unlimited for Pro users until August 2026.

Automatic gap detection

Pioneer clusters your traffic by use case and surfaces exactly where your current model is leaving accuracy, cost, or latency on the table.

Specialist model training

Pioneer trains and evaluates a fleet of small fine-tuned models on your behalf — Qwen, Llama, DeepSeek, GLiNER, and more. Zero MLOps from you.

You control routing

Pioneer surfaces lift, cost, and latency data for each specialist model. You decide when and how to route traffic to them.

Your model retrains itself while you sleep.

Adaptive Inference

Pioneer’s continuous improvement loop. Mines your live production failures for high-signal examples, retrains a specialist model automatically, and promotes improved checkpoints behind the same endpoint. No redeployment required.

Download your assets

Your weights and training datasets are yours. Download them at any time — bring them to any other platform or fine-tune further on your own infrastructure.

Bring your own evals

Use Pioneer’s built-in evaluation suite or plug in your own. Every retraining run is benchmarked before any traffic is promoted.

Full audit trail

Every Adaptive Inference run generates a full PDF report — training data, eval deltas, rollout stages, and checkpoint history included.

Supported model families

Pioneer supports two classes of models: encoder models for structured extraction tasks, and decoder models for generative tasks. Encoder models
  • GLiNER2 Large — A small, efficient model purpose-built for named entity recognition, text classification, and structured JSON extraction. GLiNER is the recommended starting point for agent text processing, document parsing, and routing workflows.
  • GLiGuard 300M — Fastino’s lightweight content moderation and safety classification model. Fast, low-overhead, and tunable on your own safety taxonomy.
  • GLiNER2-PII — Optimized for personally identifiable information detection and redaction across structured and unstructured text.
Decoder models (LLMs)
  • Qwen3 32B — Strong at coding, multilingual tasks, and complex multi-step reasoning, with thinking-mode support. Ideal for global products.
  • Llama — Meta’s open-source model family. Well-suited for RAG, summarization, and general-purpose chat.
  • DeepSeek V4 Pro — Coding-first model with extended chain-of-thought reasoning. Capable at code generation, structured reasoning, and agentic planning tasks.
  • Gemma — Google’s lightweight open model for fast, low-latency coding tasks.
  • Nemotron — NVIDIA’s high-throughput coding and reasoning model, tuned for production code generation.
  • Kimi K2.6 — Moonshot’s 256K context model under a modified MIT license. Strong at long-context retrieval and reasoning tasks.
Proprietary models (inference only)
  • Claude Sonnet 4.6 / Claude Opus 4.7 — Anthropic’s frontier models, available via Pioneer’s Anthropic-compatible endpoint.
  • GPT-4.1 / GPT-5.5 — OpenAI’s frontier models, available via Pioneer’s OpenAI-compatible endpoint.
To see all available base models, call GET /base-models. You can filter by task type or inference support.

How it fits into your workflow

Pioneer follows a straightforward lifecycle: upload your data → run inference → fine-tune a specialist model → evaluate performance → deploy to production.
1

Upload your dataset

Upload a labeled dataset or use Pioneer’s synthetic data generation to create training examples from a domain description and label list.
2

Run inference

Send requests via POST /inference. Use any base model or your own fine-tuned model — Pioneer exposes OpenAI- and Anthropic-compatible endpoints for drop-in compatibility.
3

Fine-tune

Start a training job with POST /felix/training-jobs. Pioneer runs LoRA fine-tuning on your data and returns F1, precision, and recall metrics on completion.
4

Evaluate

Run an evaluation with POST /felix/evaluations to benchmark your model against the base and surface lift, cost, and latency data before routing traffic to it.
5

Deploy

Deploy your fine-tuned model to production — no cold-start setup required. Route traffic to it via POST /inference using your training job ID, on-demand.

Next steps

Quickstart

Make your first API call in under five minutes.

Authentication

Generate your API key and authenticate requests.