GET /base-models to query the live list, which always reflects current availability and capabilities.
Encoder models (GLiNER)
GLiNER models perform named entity recognition and structured extraction. All encoder models support both LoRA and full fine-tuning, and are served on-demand after training.| Model ID | Label | Training | Inference |
|---|---|---|---|
fastino/gliner2-base-v1 | GLiNER2 Base | LoRA, Full | On-demand |
fastino/gliner2-large-v1 | GLiNER2 Large | LoRA, Full | On-demand |
fastino/gliner2-multi-v1 | GLiNER2 Multi | LoRA, Full | On-demand |
fastino/gliner2-multi-large-v1 | GLiNER2 Multi Large | LoRA, Full | On-demand |
fastino/gliner2-multi-v1 and fastino/gliner2-multi-large-v1 are multilingual variants suitable for non-English text.
Decoder models — training
These LLMs are available for LoRA fine-tuning viaPOST /felix/training-jobs. When you submit a job, Pioneer automatically routes it to the best available provider.
| Model ID | Label | Context |
|---|---|---|
Qwen/Qwen3-32B | Qwen3 32B | 131K |
Qwen/Qwen3-30B-A3B-Instruct-2507 | Qwen3 30B A3B Instruct | 262K |
Qwen/Qwen3-30B-A3B | Qwen3 30B A3B | 131K |
Qwen/Qwen3-8B | Qwen3 8B | 131K |
Qwen/Qwen3-8B-Base | Qwen3 8B Base | 32K |
Qwen/Qwen3-4B-Instruct-2507 | Qwen3 4B Instruct | 262K |
Qwen/Qwen2.5-Coder-0.5B | Qwen2.5 Coder 0.5B | 32K |
Qwen/Qwen2.5-7B-Instruct | Qwen2.5 7B Instruct | 131K |
Qwen/Qwen2.5-14B-Instruct | Qwen2.5 14B Instruct | 131K |
google/gemma-4-31b-it | Gemma 4 31B IT | 128K |
meta-llama/Llama-3.3-70B-Instruct | Llama 3.3 70B Instruct | 131K |
meta-llama/Llama-3.1-8B-Instruct | Llama 3.1 8B Instruct | 131K |
meta-llama/Llama-3.1-70B-Instruct | Llama 3.1 70B Instruct | 131K |
meta-llama/Llama-3.2-3B-Instruct | Llama 3.2 3B Instruct | 131K |
meta-llama/Llama-3.2-1B-Instruct | Llama 3.2 1B Instruct | 131K |
meta-llama/Llama-3.2-3B | Llama 3.2 3B | 131K |
meta-llama/Llama-3.2-1B | Llama 3.2 1B | 32K |
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 | Nemotron 3 Nano 30B | 64K |
openai/gpt-oss-120b | GPT-OSS 120B | 131K |
openai/gpt-oss-20b | GPT-OSS 20B | 131K |
deepseek-ai/DeepSeek-V3.1 | DeepSeek V3.1 | 163K |
Decoder models — serverless inference
These models are pre-deployed and available for inference immediately — no fine-tuning required and no startup latency. You pay per token.| Model ID | Label | Context |
|---|---|---|
Qwen/Qwen3-235B-A22B-Instruct-2507 | Qwen3 235B A22B Instruct | 262K |
Qwen/Qwen3-8B | Qwen3 8B | 131K |
deepseek-ai/DeepSeek-V3.1 | DeepSeek V3.1 | 163K |
openai/gpt-oss-120b | GPT-OSS 120B | 131K |
openai/gpt-oss-20b | GPT-OSS 20B | 131K |
meta-llama/Llama-3.3-70B-Instruct | Llama 3.3 70B Instruct | 131K |
moonshotai/Kimi-K2.6 | Kimi K2.6 | 262K |
On-demand vs. serverless inference
Pioneer offers two ways to serve predictions, and the right choice depends on your workflow. Serverless inference uses pre-deployed base model endpoints. There is no startup delay and you are billed per token. This is the default for models in the serverless table above and is ideal when you want to call a frontier model without fine-tuning. On-demand inference provisions a dedicated GPU after fine-tuning completes. Your LoRA adapter is loaded onto the GPU and served exclusively for your requests. Pioneer routes inference calls to an on-demand deployment automatically when you pass a training job ID asmodel_id.
Querying the live catalog
The tables above may lag behind newly added models. UseGET /base-models to get the current catalog at runtime.
supports_training and supports_inference. Use the model ID value directly in training job requests and inference calls.
