Skip to main content
Pioneer supports two model families: encoder models (GLiNER) for structured extraction tasks like named entity recognition, and decoder models (LLMs) for text generation, classification, and open-ended prompting. The tables below are a snapshot of the current catalog — use GET /base-models to query the live list, which always reflects current availability and capabilities.

Encoder models (GLiNER)

GLiNER models perform named entity recognition and structured extraction. Most encoder models support both LoRA and full fine-tuning, and are served on-demand after training.
Model IDLabelInputOutputTrainingInference
fastino/gliner2-base-v1GLiNER2 Base$0.15$0.15LoRA, FullOn-demand
fastino/gliner2-large-v1GLiNER2 Large$0.15$0.15LoRA, FullOn-demand
fastino/gliner2-multi-v1GLiNER2 Multi$0.15$0.15LoRA, FullOn-demand
fastino/gliner2-multi-large-v1GLiNER2 Multi Large$0.15$0.15LoRA, FullOn-demand
fastino/gliner2-multi-v1 and fastino/gliner2-multi-large-v1 are multilingual variants suitable for non-English text.

Decoder models — training

These LLMs are available for LoRA fine-tuning via POST /felix/training-jobs. When you submit a job, Pioneer automatically routes it to the best available provider.
Model IDLabelInputOutputContext
Qwen/Qwen3-32BQwen3 32B$0.90$0.90131K
Qwen/Qwen3-8BQwen3 8B$0.20$0.20131K
Qwen/Qwen3-4B-Instruct-2507Qwen3 4B Instruct$0.20$0.20262K
Qwen/Qwen3-4BQwen3 4B Base$0.15$0.15131K
Qwen/Qwen3-1.7BQwen3 1.7B Base$0.10$0.10131K
Qwen/Qwen3.5-9BQwen3.5 9B$0.30$0.3032K
Qwen/Qwen3.6-27BQwen3.6 27B$0.60$0.6032K
google/gemma-4-31b-itGemma 4 31B IT$0.50$0.50128K
google/gemma-4-12b-itGemma 4 12B IT$0.25$0.25128K
google/gemma-4-e2b-itGemma 4 E2B IT$0.10$0.10128K
google/gemma-4-e4b-itGemma 4 E4B IT$0.20$0.20128K
google/gemma-3-4bGemma 3 4B$0.15$0.15128K
meta-llama/Llama-3.3-70B-InstructLlama 3.3 70B Instruct$0.90$0.90131K
meta-llama/Llama-3.1-8B-InstructLlama 3.1 8B Instruct$0.02$0.05131K
meta-llama/Llama-3.2-3B-InstructLlama 3.2 3B Instruct$0.051$0.335131K
meta-llama/Llama-3.2-1B-InstructLlama 3.2 1B Instruct$0.027$0.201131K
HuggingFaceTB/SmolLM3-3BSmolLM3 3B Base$0.15$0.15131K

Decoder models — serverless inference

These models are pre-deployed and available for inference immediately — no fine-tuning required and no startup latency. Prices are per 1M tokens.

Anthropic

Model IDLabelInputOutputContext
claude-haiku-4-5Claude Haiku 4.5$1.00$5.00200K
claude-sonnet-4-6Claude Sonnet 4.6$3.00$15.001M
claude-opus-4-1Claude Opus 4.1$15.00$75.00200K
claude-opus-4-5Claude Opus 4.5$5.00$25.00200K
claude-opus-4-6Claude Opus 4.6$5.00$25.001M
claude-opus-4-7Claude Opus 4.7$5.00$25.001M
claude-opus-4-8Claude Opus 4.8$5.00$25.001M
claude-fable-5Claude Fable 5$10.00$50.001M

OpenAI

Model IDLabelInputOutputContext
gpt-4oGPT-4o$2.50$10.00128K
gpt-4o-miniGPT-4o mini$0.15$0.60128K
gpt-4.1GPT-4.1$2.00$8.001M
gpt-4.1-miniGPT-4.1 mini$0.40$1.601M
gpt-4.1-nanoGPT-4.1 nano$0.10$0.401M
gpt-5-miniGPT-5 mini$0.25$2.00400K
gpt-5-nanoGPT-5 nano$0.05$0.40400K
gpt-5.1GPT-5.1$1.25$10.00400K
gpt-5.4GPT-5.4$2.50$15.00400K
gpt-5.4-miniGPT-5.4 mini$0.75$4.50400K
gpt-5.4-nanoGPT-5.4 nano$0.20$1.25400K
gpt-5.5GPT-5.5$5.00$30.00400K
openai/gpt-oss-20bGPT-OSS 20B$0.07$0.30131K
openai/gpt-oss-120bGPT-OSS 120B$0.15$0.60131K

Google

Model IDLabelInputOutputContext
gemini-3.1-proGemini 3.1 Pro$2.00$12.001M
gemini-3.5-flashGemini 3.5 Flash$1.50$9.001M
gemini-3-flashGemini 3 Flash$0.50$3.001M

Mistral

Model IDLabelInputOutputContext
mistralai/Mistral-Medium-3.5Mistral Medium 3.5$1.50$7.50131K
mistralai/Mistral-Small-4Mistral Small 4$0.15$0.60131K
mistralai/Mistral-NemoMistral Nemo$0.02$0.03131K

DeepSeek

Model IDLabelInputOutputContext
deepseek-ai/DeepSeek-V4-ProDeepSeek V4 Pro$0.435$0.87163K
deepseek-ai/DeepSeek-V4-FlashDeepSeek V4 Flash$0.10$0.20163K

Qwen

Model IDLabelInputOutputContext
Qwen/Qwen3-32BQwen3 32B$0.08$0.28131K
Qwen/Qwen3.6-FlashQwen3.6 Flash$0.1875$1.125131K
Qwen/Qwen3.6-35B-A3BQwen3.6 35B A3B$0.14$1.00131K
Qwen/Qwen3.6-PlusQwen3.6 Plus$0.33$1.95131K
Qwen/Qwen3.6-Max-PreviewQwen3.6 Max Preview$1.04$6.24131K
Qwen/Qwen3.7-MaxQwen3.7 Max$1.25$3.75131K

Meta (Llama)

Model IDLabelInputOutputContext
meta-llama/Llama-3.2-1B-InstructLlama 3.2 1B Instruct$0.027$0.201131K
meta-llama/Llama-3.2-3B-InstructLlama 3.2 3B Instruct$0.051$0.335131K
meta-llama/Llama-3.1-8B-InstructLlama 3.1 8B Instruct$0.02$0.05131K
meta-llama/Llama-3.3-70B-InstructLlama 3.3 70B Instruct$0.90$0.90131K

Others

Model IDLabelInputOutputContext
nvidia/nemotron-3-ultraNVIDIA Nemotron 3 Ultra$0.50$2.50131K
zai-org/GLM-5.1GLM 5.1$0.98$3.08202.8K
moonshotai/Kimi-K2.6Kimi K2.6$0.684$3.42262K
MiMo-V2.5-ProMiMo V2.5 Pro$0.435$0.87131K
MiniMax/M2.7MiniMax M2.7$0.279$1.20131K
MiniMax/M3MiniMax M3$0.30$1.20131K
liquidai/LFM2-24B-A2BLFM2 24B A2B$0.03$0.12131K
fastino/gliguard-llm-guardrails-300mGLiGuard LLM Guardrails 300M$0.15$0.15
fastino/gliner2-privacy-filter-piiGLiNER2 Privacy Filter PII$0.15$0.15

On-demand vs. serverless inference

Pioneer offers two ways to serve predictions, and the right choice depends on your workflow. Serverless inference uses pre-deployed base model endpoints. There is no startup delay and you are billed per token. This is the default for models in the serverless table above and is ideal when you want to call a frontier model without fine-tuning. On-demand inference provisions a dedicated GPU after fine-tuning completes. Your LoRA adapter is loaded onto the GPU and served exclusively for your requests. Pioneer routes inference calls to an on-demand deployment automatically when you pass a training job ID as model_id.

Querying the live catalog

The tables above may lag behind newly added models. Use GET /base-models to get the current catalog at runtime.
# All models
curl https://api.pioneer.ai/base-models \
  -H "X-API-Key: YOUR_API_KEY"

# Only models that support inference
curl "https://api.pioneer.ai/base-models?supports_inference=true" \
  -H "X-API-Key: YOUR_API_KEY"

# Only models that support training
curl "https://api.pioneer.ai/base-models?supports_training=true" \
  -H "X-API-Key: YOUR_API_KEY"

# Filter by model family
curl "https://api.pioneer.ai/base-models?task_type=encoder" \
  -H "X-API-Key: YOUR_API_KEY"

curl "https://api.pioneer.ai/base-models?task_type=decoder" \
  -H "X-API-Key: YOUR_API_KEY"
Each entry in the response includes the model ID, its display label, context length, and boolean flags for supports_training and supports_inference. Use the model ID value directly in training job requests and inference calls.