Choose a base model
Pioneer offers four GLiNER base models. For most tasks,
You can always fetch the latest catalog from the API:
fastino/gliner2-base-v1 is the right starting point: it’s fast, accurate, and supports LoRA and full fine-tuning. If your data includes non-English text, use a multi variant instead.| Model ID | Use case | Training |
|---|---|---|
fastino/gliner2-base-v1 | English, general purpose | LoRA, Full |
fastino/gliner2-large-v1 | English, higher accuracy | LoRA, Full |
fastino/gliner2-multi-v1 | Multilingual | LoRA, Full |
fastino/gliner2-multi-large-v1 | Multilingual, higher accuracy | LoRA, Full |
Prepare your training data
You have two options: generate synthetic labeled examples with Pioneer, or bring your own labeled data.Option A — Generate synthetic data. If you don’t have labeled examples yet, use the Option B — Upload through the platform. If you already have labeled data, upload it directly via the Pioneer dashboard.Once your dataset is ready, confirm its status before starting training:Wait until the dataset status is
/generate endpoint. See the Synthetic Data guide for full details.ready before proceeding.Start a training job
Submit your training job with The response includes your job ID and initial status:Save the
POST /felix/training-jobs. Set base_model to the GLiNER model you chose in step 1 and training_type to "lora".id — you’ll use it to poll status, run evaluations, and call inference.Poll job status and review metrics
Training typically takes a few minutes to a few hours depending on dataset size and epoch count. Poll the job endpoint until status is Job status values: A high F1 score (above 0.85) generally indicates a model ready for production. If scores are lower, consider adding more training examples or adjusting your entity label definitions.
"complete".requested → running → complete (or failed / stopped).When the job reaches "complete", the response includes evaluation metrics:Run an evaluation
Evaluate your trained model against a held-out dataset to get a more rigorous view of performance before deploying.Retrieve evaluation results with
GET /felix/evaluations/:id. Results include f1, precision, recall, and a per_entity breakdown so you can see which entity types need more training data.Run inference with your trained model
Use your job ID as the Schema optionsThe
You can also call inference using the OpenAI-compatible endpoint. Set
model_id to run predictions. The schema field controls what Pioneer extracts.schema field accepts four optional keys — use any combination:| Key | Type | Description |
|---|---|---|
entities | string[] | Entity type labels to extract (NER) |
classifications | {task, labels}[] | Text classification tasks with their label sets |
structures | object | Structure definitions for JSON extraction |
relations | object[] | Relation definitions between entities |
base_url to https://api.pioneer.ai/v1 and pass Pioneer fields via extra_body:Next steps
- Generate synthetic training data — create labeled examples without manual annotation
- Adaptive Inference — let Pioneer retrain your model automatically on live traffic
- API Reference — full endpoint documentation

