Choose a decoder base model
Use The table below shows a selection of popular options. Context window size matters if your training examples or inference prompts are long.
Choosing a model size: Smaller models (1B–8B) train and respond faster and cost less. Larger models (30B–70B) handle complex reasoning and longer inputs more reliably. Start with
GET /base-models to see the full current catalog, filtered to models that support training:| Model ID | Label | Context |
|---|---|---|
Qwen/Qwen3-32B | Qwen3 32B | 131K |
Qwen/Qwen3-30B-A3B-Instruct-2507 | Qwen3 30B A3B Instruct | 262K |
Qwen/Qwen3-8B | Qwen3 8B | 131K |
Qwen/Qwen3-4B-Instruct-2507 | Qwen3 4B Instruct | 262K |
Qwen/Qwen2.5-7B-Instruct | Qwen2.5 7B Instruct | 131K |
Qwen/Qwen2.5-14B-Instruct | Qwen2.5 14B Instruct | 131K |
meta-llama/Llama-3.3-70B-Instruct | Llama 3.3 70B Instruct | 131K |
meta-llama/Llama-3.1-8B-Instruct | Llama 3.1 8B Instruct | 131K |
meta-llama/Llama-3.1-70B-Instruct | Llama 3.1 70B Instruct | 131K |
meta-llama/Llama-3.2-3B-Instruct | Llama 3.2 3B Instruct | 131K |
deepseek-ai/DeepSeek-V3.1 | DeepSeek V3.1 | 163K |
google/gemma-4-31b-it | Gemma 4 31B IT | 128K |
openai/gpt-oss-120b | GPT-OSS 120B | 131K |
Qwen/Qwen3-8B or meta-llama/Llama-3.1-8B-Instruct for most tasks and scale up if needed.Prepare your training data
LLM fine-tuning uses the See the Synthetic Data guide for full details on generation options.Once generated or uploaded, confirm your dataset is ready:
decoder task type. Your dataset should contain prompt-completion pairs — conversations, instruction-response examples, or any input-output format relevant to your task.You can generate synthetic decoder training data with Pioneer’s /generate endpoint:Start a training job
Submit a training job with your chosen decoder base model and Pioneer routes your job automatically to the best available provider. The response includes your job ID:
training_type: "lora".Poll until training is complete
Check job status by polling Status transitions:
GET /felix/training-jobs/:id.requested → running → complete (or failed / stopped).You can also retrieve training logs while the job is running:Run inference on your fine-tuned model
Once the job status is OpenAI-compatible endpoint — drop-in replacement for the OpenAI SDK:Anthropic-compatible endpoint:Streaming is supported on all three interfaces.
"complete", use your job ID as the model identifier. Pioneer supports three inference interfaces.Pioneer native API — use "task": "generate" for decoder models:Downloading your trained model weights is available on the Pro plan and above. Use
GET /felix/training-jobs/:id/download to retrieve the weights once training is complete.Serverless inference for base models
If you want to run inference on a base model without fine-tuning, several models are available as serverless endpoints with no startup latency:| Model ID | Label | Context |
|---|---|---|
Qwen/Qwen3-235B-A22B-Instruct-2507 | Qwen3 235B A22B Instruct | 262K |
Qwen/Qwen3-8B | Qwen3 8B | 131K |
deepseek-ai/DeepSeek-V3.1 | DeepSeek V3.1 | 163K |
openai/gpt-oss-120b | GPT-OSS 120B | 131K |
meta-llama/Llama-3.3-70B-Instruct | Llama 3.3 70B Instruct | 131K |
moonshotai/Kimi-K2.6 | Kimi K2.6 | 262K |
GET /base-models?task_type=decoder&supports_inference=true to see the current serverless catalog.
Next steps
- Synthetic Data — generate training data without manual annotation
- Adaptive Inference — automatically retrain on live production data
- Agent Skills — let an AI coding agent manage training and inference for you

