Skip to main content
Pioneer supports two model families: encoder models (GLiNER) for structured extraction tasks like named entity recognition, and decoder models (LLMs) for text generation, classification, and open-ended prompting. The tables below are a snapshot of the current catalog — use GET /base-models to query the live list, which always reflects current availability and capabilities.

Encoder models (GLiNER)

GLiNER models perform named entity recognition and structured extraction. All encoder models support both LoRA and full fine-tuning, and are served on-demand after training.
Model IDLabelTrainingInference
fastino/gliner2-base-v1GLiNER2 BaseLoRA, FullOn-demand
fastino/gliner2-large-v1GLiNER2 LargeLoRA, FullOn-demand
fastino/gliner2-multi-v1GLiNER2 MultiLoRA, FullOn-demand
fastino/gliner2-multi-large-v1GLiNER2 Multi LargeLoRA, FullOn-demand
fastino/gliner2-multi-v1 and fastino/gliner2-multi-large-v1 are multilingual variants suitable for non-English text.

Decoder models — training

These LLMs are available for LoRA fine-tuning via POST /felix/training-jobs. When you submit a job, Pioneer automatically routes it to the best available provider.
Model IDLabelContext
Qwen/Qwen3-32BQwen3 32B131K
Qwen/Qwen3-30B-A3B-Instruct-2507Qwen3 30B A3B Instruct262K
Qwen/Qwen3-30B-A3BQwen3 30B A3B131K
Qwen/Qwen3-8BQwen3 8B131K
Qwen/Qwen3-8B-BaseQwen3 8B Base32K
Qwen/Qwen3-4B-Instruct-2507Qwen3 4B Instruct262K
Qwen/Qwen2.5-Coder-0.5BQwen2.5 Coder 0.5B32K
Qwen/Qwen2.5-7B-InstructQwen2.5 7B Instruct131K
Qwen/Qwen2.5-14B-InstructQwen2.5 14B Instruct131K
google/gemma-4-31b-itGemma 4 31B IT128K
meta-llama/Llama-3.3-70B-InstructLlama 3.3 70B Instruct131K
meta-llama/Llama-3.1-8B-InstructLlama 3.1 8B Instruct131K
meta-llama/Llama-3.1-70B-InstructLlama 3.1 70B Instruct131K
meta-llama/Llama-3.2-3B-InstructLlama 3.2 3B Instruct131K
meta-llama/Llama-3.2-1B-InstructLlama 3.2 1B Instruct131K
meta-llama/Llama-3.2-3BLlama 3.2 3B131K
meta-llama/Llama-3.2-1BLlama 3.2 1B32K
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16Nemotron 3 Nano 30B64K
openai/gpt-oss-120bGPT-OSS 120B131K
openai/gpt-oss-20bGPT-OSS 20B131K
deepseek-ai/DeepSeek-V3.1DeepSeek V3.1163K

Decoder models — serverless inference

These models are pre-deployed and available for inference immediately — no fine-tuning required and no startup latency. You pay per token.
Model IDLabelContext
Qwen/Qwen3-235B-A22B-Instruct-2507Qwen3 235B A22B Instruct262K
Qwen/Qwen3-8BQwen3 8B131K
deepseek-ai/DeepSeek-V3.1DeepSeek V3.1163K
openai/gpt-oss-120bGPT-OSS 120B131K
openai/gpt-oss-20bGPT-OSS 20B131K
meta-llama/Llama-3.3-70B-InstructLlama 3.3 70B Instruct131K
moonshotai/Kimi-K2.6Kimi K2.6262K

On-demand vs. serverless inference

Pioneer offers two ways to serve predictions, and the right choice depends on your workflow. Serverless inference uses pre-deployed base model endpoints. There is no startup delay and you are billed per token. This is the default for models in the serverless table above and is ideal when you want to call a frontier model without fine-tuning. On-demand inference provisions a dedicated GPU after fine-tuning completes. Your LoRA adapter is loaded onto the GPU and served exclusively for your requests. Pioneer routes inference calls to an on-demand deployment automatically when you pass a training job ID as model_id.

Querying the live catalog

The tables above may lag behind newly added models. Use GET /base-models to get the current catalog at runtime.
# All models
curl https://api.pioneer.ai/base-models \
  -H "X-API-Key: YOUR_API_KEY"

# Only models that support inference
curl "https://api.pioneer.ai/base-models?supports_inference=true" \
  -H "X-API-Key: YOUR_API_KEY"

# Only models that support training
curl "https://api.pioneer.ai/base-models?supports_training=true" \
  -H "X-API-Key: YOUR_API_KEY"

# Filter by model family
curl "https://api.pioneer.ai/base-models?task_type=encoder" \
  -H "X-API-Key: YOUR_API_KEY"

curl "https://api.pioneer.ai/base-models?task_type=decoder" \
  -H "X-API-Key: YOUR_API_KEY"
Each entry in the response includes the model ID, its display label, context length, and boolean flags for supports_training and supports_inference. Use the model ID value directly in training job requests and inference calls.