GET /base-models to query the live list, which always reflects current availability and capabilities.
Encoder models (GLiNER)
GLiNER models perform named entity recognition and structured extraction. Most encoder models support both LoRA and full fine-tuning, and are served on-demand after training.| Model ID | Label | Input | Output | Training | Inference |
|---|---|---|---|---|---|
fastino/gliner2-base-v1 | GLiNER2 Base | $0.15 | $0.15 | LoRA, Full | On-demand |
fastino/gliner2-large-v1 | GLiNER2 Large | $0.15 | $0.15 | LoRA, Full | On-demand |
fastino/gliner2-multi-v1 | GLiNER2 Multi | $0.15 | $0.15 | LoRA, Full | On-demand |
fastino/gliner2-multi-large-v1 | GLiNER2 Multi Large | $0.15 | $0.15 | LoRA, Full | On-demand |
fastino/gliner2-multi-v1 and fastino/gliner2-multi-large-v1 are multilingual variants suitable for non-English text.
Decoder models — training
These LLMs are available for LoRA fine-tuning viaPOST /felix/training-jobs. When you submit a job, Pioneer automatically routes it to the best available provider.
| Model ID | Label | Input | Output | Context |
|---|---|---|---|---|
Qwen/Qwen3-32B | Qwen3 32B | $0.90 | $0.90 | 131K |
Qwen/Qwen3-8B | Qwen3 8B | $0.20 | $0.20 | 131K |
Qwen/Qwen3-4B-Instruct-2507 | Qwen3 4B Instruct | $0.20 | $0.20 | 262K |
Qwen/Qwen3-4B | Qwen3 4B Base | $0.15 | $0.15 | 131K |
Qwen/Qwen3-1.7B | Qwen3 1.7B Base | $0.10 | $0.10 | 131K |
Qwen/Qwen3.5-9B | Qwen3.5 9B | $0.30 | $0.30 | 32K |
Qwen/Qwen3.6-27B | Qwen3.6 27B | $0.60 | $0.60 | 32K |
google/gemma-4-31b-it | Gemma 4 31B IT | $0.50 | $0.50 | 128K |
google/gemma-4-12b-it | Gemma 4 12B IT | $0.25 | $0.25 | 128K |
google/gemma-4-e2b-it | Gemma 4 E2B IT | $0.10 | $0.10 | 128K |
google/gemma-4-e4b-it | Gemma 4 E4B IT | $0.20 | $0.20 | 128K |
google/gemma-3-4b | Gemma 3 4B | $0.15 | $0.15 | 128K |
meta-llama/Llama-3.3-70B-Instruct | Llama 3.3 70B Instruct | $0.90 | $0.90 | 131K |
meta-llama/Llama-3.1-8B-Instruct | Llama 3.1 8B Instruct | $0.02 | $0.05 | 131K |
meta-llama/Llama-3.2-3B-Instruct | Llama 3.2 3B Instruct | $0.051 | $0.335 | 131K |
meta-llama/Llama-3.2-1B-Instruct | Llama 3.2 1B Instruct | $0.027 | $0.201 | 131K |
HuggingFaceTB/SmolLM3-3B | SmolLM3 3B Base | $0.15 | $0.15 | 131K |
Decoder models — serverless inference
These models are pre-deployed and available for inference immediately — no fine-tuning required and no startup latency. Prices are per 1M tokens.Anthropic
| Model ID | Label | Input | Output | Context |
|---|---|---|---|---|
claude-haiku-4-5 | Claude Haiku 4.5 | $1.00 | $5.00 | 200K |
claude-sonnet-4-6 | Claude Sonnet 4.6 | $3.00 | $15.00 | 1M |
claude-opus-4-1 | Claude Opus 4.1 | $15.00 | $75.00 | 200K |
claude-opus-4-5 | Claude Opus 4.5 | $5.00 | $25.00 | 200K |
claude-opus-4-6 | Claude Opus 4.6 | $5.00 | $25.00 | 1M |
claude-opus-4-7 | Claude Opus 4.7 | $5.00 | $25.00 | 1M |
claude-opus-4-8 | Claude Opus 4.8 | $5.00 | $25.00 | 1M |
claude-fable-5 | Claude Fable 5 | $10.00 | $50.00 | 1M |
OpenAI
| Model ID | Label | Input | Output | Context |
|---|---|---|---|---|
gpt-4o | GPT-4o | $2.50 | $10.00 | 128K |
gpt-4o-mini | GPT-4o mini | $0.15 | $0.60 | 128K |
gpt-4.1 | GPT-4.1 | $2.00 | $8.00 | 1M |
gpt-4.1-mini | GPT-4.1 mini | $0.40 | $1.60 | 1M |
gpt-4.1-nano | GPT-4.1 nano | $0.10 | $0.40 | 1M |
gpt-5-mini | GPT-5 mini | $0.25 | $2.00 | 400K |
gpt-5-nano | GPT-5 nano | $0.05 | $0.40 | 400K |
gpt-5.1 | GPT-5.1 | $1.25 | $10.00 | 400K |
gpt-5.4 | GPT-5.4 | $2.50 | $15.00 | 400K |
gpt-5.4-mini | GPT-5.4 mini | $0.75 | $4.50 | 400K |
gpt-5.4-nano | GPT-5.4 nano | $0.20 | $1.25 | 400K |
gpt-5.5 | GPT-5.5 | $5.00 | $30.00 | 400K |
openai/gpt-oss-20b | GPT-OSS 20B | $0.07 | $0.30 | 131K |
openai/gpt-oss-120b | GPT-OSS 120B | $0.15 | $0.60 | 131K |
| Model ID | Label | Input | Output | Context |
|---|---|---|---|---|
gemini-3.1-pro | Gemini 3.1 Pro | $2.00 | $12.00 | 1M |
gemini-3.5-flash | Gemini 3.5 Flash | $1.50 | $9.00 | 1M |
gemini-3-flash | Gemini 3 Flash | $0.50 | $3.00 | 1M |
Mistral
| Model ID | Label | Input | Output | Context |
|---|---|---|---|---|
mistralai/Mistral-Medium-3.5 | Mistral Medium 3.5 | $1.50 | $7.50 | 131K |
mistralai/Mistral-Small-4 | Mistral Small 4 | $0.15 | $0.60 | 131K |
mistralai/Mistral-Nemo | Mistral Nemo | $0.02 | $0.03 | 131K |
DeepSeek
| Model ID | Label | Input | Output | Context |
|---|---|---|---|---|
deepseek-ai/DeepSeek-V4-Pro | DeepSeek V4 Pro | $0.435 | $0.87 | 163K |
deepseek-ai/DeepSeek-V4-Flash | DeepSeek V4 Flash | $0.10 | $0.20 | 163K |
Qwen
| Model ID | Label | Input | Output | Context |
|---|---|---|---|---|
Qwen/Qwen3-32B | Qwen3 32B | $0.08 | $0.28 | 131K |
Qwen/Qwen3.6-Flash | Qwen3.6 Flash | $0.1875 | $1.125 | 131K |
Qwen/Qwen3.6-35B-A3B | Qwen3.6 35B A3B | $0.14 | $1.00 | 131K |
Qwen/Qwen3.6-Plus | Qwen3.6 Plus | $0.33 | $1.95 | 131K |
Qwen/Qwen3.6-Max-Preview | Qwen3.6 Max Preview | $1.04 | $6.24 | 131K |
Qwen/Qwen3.7-Max | Qwen3.7 Max | $1.25 | $3.75 | 131K |
Meta (Llama)
| Model ID | Label | Input | Output | Context |
|---|---|---|---|---|
meta-llama/Llama-3.2-1B-Instruct | Llama 3.2 1B Instruct | $0.027 | $0.201 | 131K |
meta-llama/Llama-3.2-3B-Instruct | Llama 3.2 3B Instruct | $0.051 | $0.335 | 131K |
meta-llama/Llama-3.1-8B-Instruct | Llama 3.1 8B Instruct | $0.02 | $0.05 | 131K |
meta-llama/Llama-3.3-70B-Instruct | Llama 3.3 70B Instruct | $0.90 | $0.90 | 131K |
Others
| Model ID | Label | Input | Output | Context |
|---|---|---|---|---|
nvidia/nemotron-3-ultra | NVIDIA Nemotron 3 Ultra | $0.50 | $2.50 | 131K |
zai-org/GLM-5.1 | GLM 5.1 | $0.98 | $3.08 | 202.8K |
moonshotai/Kimi-K2.6 | Kimi K2.6 | $0.684 | $3.42 | 262K |
MiMo-V2.5-Pro | MiMo V2.5 Pro | $0.435 | $0.87 | 131K |
MiniMax/M2.7 | MiniMax M2.7 | $0.279 | $1.20 | 131K |
MiniMax/M3 | MiniMax M3 | $0.30 | $1.20 | 131K |
liquidai/LFM2-24B-A2B | LFM2 24B A2B | $0.03 | $0.12 | 131K |
fastino/gliguard-llm-guardrails-300m | GLiGuard LLM Guardrails 300M | $0.15 | $0.15 | — |
fastino/gliner2-privacy-filter-pii | GLiNER2 Privacy Filter PII | $0.15 | $0.15 | — |
On-demand vs. serverless inference
Pioneer offers two ways to serve predictions, and the right choice depends on your workflow. Serverless inference uses pre-deployed base model endpoints. There is no startup delay and you are billed per token. This is the default for models in the serverless table above and is ideal when you want to call a frontier model without fine-tuning. On-demand inference provisions a dedicated GPU after fine-tuning completes. Your LoRA adapter is loaded onto the GPU and served exclusively for your requests. Pioneer routes inference calls to an on-demand deployment automatically when you pass a training job ID asmodel_id.
Querying the live catalog
The tables above may lag behind newly added models. UseGET /base-models to get the current catalog at runtime.
supports_training and supports_inference. Use the model ID value directly in training job requests and inference calls.