Skip to main content
Pioneer supports parameter-efficient (LoRA) post-training on a wide range of open-source decoder models — from compact 1B-parameter models to 70B+ frontier models. You bring your training data, choose a base model that fits your task and budget, and Pioneer handles the infrastructure, routing, and serving. The result is a fine-tuned adapter you can call over the same API, with no GPU management required. All three post-training algorithms — supervised fine-tuning (sft), GRPO (grpo), and DPO (dpo) — are created through the same POST /felix/training-jobs endpoint and selected with the training_algorithm parameter.

Choose an algorithm

Algorithmtraining_algorithmWhat it optimizesDataset signal
Supervised fine-tuningsft (default)Imitate the assistant turns in your examplesChat messages
GRPOgrpoMaximize a reward over sampled completionsprompt + answer (+ a reward function)
DPOdpoPrefer chosen responses over rejected onesprompt + chosen + rejected
Omitting training_algorithm is equivalent to sft, so existing requests keep working unchanged.

SFT

You have example outputs you want the model to imitate (conversations, instruction-response pairs). The default and simplest path.

GRPO

“Good” is a programmatic check — exact answers, numeric correctness, JSON validity, a rubric. The model explores and is reinforced toward higher reward.

DPO

You have preference pairs — a better and a worse response per prompt — rather than a single gold answer.
All three algorithms are LoRA-based. A completed job produces a low-rank adapter that is hot-swapped onto the shared base model at serve time and exposed behind the same inference endpoints as base models — reference the training job’s id as the model_id at inference time. training_type defaults to "lora" and is the only supported value for decoder LLMs; "full" is reserved for GLiNER encoder models.

End-to-end walkthrough

1

Choose a decoder base model

Use GET /base-models to see the full current catalog, filtered to models that support training:
curl "https://api.pioneer.ai/base-models?task_type=decoder&supports_training=true" \
  -H "X-API-Key: YOUR_API_KEY"
The table below shows a selection of popular options. Context window size matters if your training examples or inference prompts are long.
Model IDLabelContext
Qwen/Qwen3-32BQwen3 32B131K
Qwen/Qwen3-30B-A3B-Instruct-2507Qwen3 30B A3B Instruct262K
Qwen/Qwen3-8BQwen3 8B131K
Qwen/Qwen3-4B-Instruct-2507Qwen3 4B Instruct262K
Qwen/Qwen2.5-7B-InstructQwen2.5 7B Instruct131K
Qwen/Qwen2.5-14B-InstructQwen2.5 14B Instruct131K
meta-llama/Llama-3.3-70B-InstructLlama 3.3 70B Instruct131K
meta-llama/Llama-3.1-8B-InstructLlama 3.1 8B Instruct131K
meta-llama/Llama-3.1-70B-InstructLlama 3.1 70B Instruct131K
meta-llama/Llama-3.2-3B-InstructLlama 3.2 3B Instruct131K
deepseek-ai/DeepSeek-V3.1DeepSeek V3.1163K
google/gemma-4-31b-itGemma 4 31B IT128K
openai/gpt-oss-120bGPT-OSS 120B131K
Choosing a model size: Smaller models (1B–8B) train and respond faster and cost less. Larger models (30B–70B) handle complex reasoning and longer inputs more reliably. Start with Qwen/Qwen3-8B or meta-llama/Llama-3.1-8B-Instruct for most tasks and scale up if needed.Not every model supports every algorithm — see Supported models below for the SFT/GRPO/DPO matrix.
2

Prepare your training data

The dataset shape depends on the algorithm you picked. Pick the matching tab.
# Each row: {"messages": [{"role": "user" | "assistant" | "system", "content": "..."}]}
# Generate synthetically:
curl -X POST https://api.pioneer.ai/generate \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task_type": "decoder",
    "dataset_name": "my-sft-dataset",
    "num_examples": 200,
    "domain_description": "Customer support for a SaaS product"
  }'
See the Synthetic Data guide for the full set of /generate options, including auto-labelling existing text. Once generated or uploaded, wait until the dataset status is ready before starting training.
3

Start a training job

Submit your training job with POST /felix/training-jobs. The training_algorithm parameter selects SFT, GRPO, or DPO.
curl -X POST https://api.pioneer.ai/felix/training-jobs \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "my-sft-model",
    "base_model": "Qwen/Qwen3-8B",
    "training_type": "lora",
    "datasets": [{"name": "my-sft-dataset", "version": "1"}],
    "lora_r": 16,
    "lora_alpha": 32,
    "learning_rate": 2e-5,
    "nr_epochs": 3
  }'
Pioneer routes your job automatically to the best available provider. The response includes your job ID:
{ "id": "uuid-of-training-job", "status": "requested" }
rl_config is required when training_algorithm is grpo or dpo and must be omitted for sft. Every key inside rl_config is optional and falls back to a TRL-aligned server default except reward_type, which is required for GRPO.
4

Poll until training is complete

Check job status by polling GET /felix/training-jobs/:id.
curl https://api.pioneer.ai/felix/training-jobs/YOUR_JOB_ID \
  -H "X-API-Key: YOUR_API_KEY"
Status transitions: requestedrunningcompletedeployed (or failed / stopped). The terminal success state is deployed, reached automatically once the adapter is live behind the inference endpoints.You can also stream training logs while the job is running:
curl https://api.pioneer.ai/felix/training-jobs/YOUR_JOB_ID/logs \
  -H "X-API-Key: YOUR_API_KEY"
5

Run inference on your fine-tuned model

Once the job status is deployed, use your job ID as the model_id (or model) on any of the three inference interfaces.Pioneer native API — use "task": "generate" for decoder models:
curl -X POST https://api.pioneer.ai/inference \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "YOUR_JOB_ID",
    "task": "generate",
    "messages": [{"role": "user", "content": "Summarize this article: ..."}]
  }'
OpenAI-compatible endpoint — drop-in replacement for the OpenAI SDK:
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.pioneer.ai/v1"
)

response = client.chat.completions.create(
    model="YOUR_JOB_ID",
    messages=[{"role": "user", "content": "Summarize this article: ..."}]
)
print(response.choices[0].message.content)
Anthropic-compatible endpoint:
curl -X POST https://api.pioneer.ai/v1/messages \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "YOUR_JOB_ID",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Summarize this article: ..."}]
  }'
Streaming is supported on all three interfaces.
Downloading your trained model weights is available on the Pro plan and above. Use GET /felix/training-jobs/:id/download to retrieve the weights once training is complete.

LoRA hyperparameters

LoRA capacity and the core optimization settings are configurable; the defaults are sensible starting points for SFT and the RL algorithms alike.
FieldDefaultPurpose
lora_r16LoRA rank — adapter capacity. Raise it for harder tasks or larger datasets.
lora_alpha32LoRA scaling factor (typically ~2× lora_r).
lora_dropout0.1Dropout applied to the adapter during training.
learning_rate2e-5Peak AdamW learning rate.
batch_size4Per-step batch size.
nr_epochs100Epoch ceiling; early stopping usually halts well before this.
validation_data_percentage0.2Fraction of the dataset held out for validation.

GRPO reward functions

GRPO (Group Relative Policy Optimization) samples multiple completions per prompt and reinforces the ones that score highest against a reward function. Set rl_config.reward_type to one of:
reward_typeScores a completion as correct when…
exact_matchthe normalized completion equals answer
contains_substringanswer appears anywhere in the completion
numeric_matchthe extracted number matches answer (handles #### 42, \boxed{42}, “the answer is 42”)
choice_matchthe final multiple-choice letter matches answer
regex_matchthe completion matches the supplied pattern
json_matchthe parsed JSON deep-equals answer
json_loose_matchthe parsed JSON loosely matches answer
rougeL_matchROUGE-L against the answer reference(s)
llm_as_judgea judge model scores the completion against a rubric
When reward_type is llm_as_judge, Pioneer mints and manages the judge credential for you — you never supply an API key. Optional judge knobs include llm_judge_model, llm_judge_rubric, and llm_judge_score_scale.

Supported models

The canonical, live list is always GET /base-models?supports_training=true. As of this writing:
Base modelSFTGRPODPO
Qwen/Qwen3-8B
Qwen/Qwen3-32B
Qwen/Qwen3-4B-Instruct-2507
Qwen/Qwen3-4B-Base
Qwen/Qwen3-1.7B-Base
meta-llama/Llama-3.1-8B-Instruct
HuggingFaceTB/SmolLM3-3B-Base
google/gemma-4-31b-it
meta-llama/Llama-3.2-3B-Instruct
Qwen/Qwen2.5-7B-Instruct
GRPO and DPO are available on the subset of models that have been verified end-to-end for reinforcement learning. Every trainable decoder supports SFT. Models marked for RL accept sft only; submitting grpo/dpo for them returns a 422.
GLiNER encoder models (fastino/gliner2-base-v1, fastino/gliner2-large-v1, fastino/gliner2-multi-v1, fastino/gliner2-multi-large-v1) are also trainable through the same endpoint — see the encoder fine-tuning guides for NER, classification, and structured extraction.

Serverless inference for base models

If you want to run inference on a base model without fine-tuning, several models are available as serverless endpoints with no startup latency:
Model IDLabelContext
Qwen/Qwen3-235B-A22B-Instruct-2507Qwen3 235B A22B Instruct262K
Qwen/Qwen3-8BQwen3 8B131K
deepseek-ai/DeepSeek-V3.1DeepSeek V3.1163K
openai/gpt-oss-120bGPT-OSS 120B131K
meta-llama/Llama-3.3-70B-InstructLlama 3.3 70B Instruct131K
moonshotai/Kimi-K2.6Kimi K2.6262K
Use GET /base-models?task_type=decoder&supports_inference=true to see the current serverless catalog.

Next steps