Choose a base model
Pioneer offers four GLiNER base models. For most tasks,
You can always fetch the latest catalog from the API:
fastino/gliner2-base-v1 is the right starting point: it’s fast, accurate, and supports LoRA and full fine-tuning. If your data includes non-English text, use a multi variant instead.| Model ID | Use case | Training |
|---|---|---|
fastino/gliner2-base-v1 | English, general purpose | LoRA, Full |
fastino/gliner2-large-v1 | English, higher accuracy | LoRA, Full |
fastino/gliner2-multi-v1 | Multilingual | LoRA, Full |
fastino/gliner2-multi-large-v1 | Multilingual, higher accuracy | LoRA, Full |
Prepare your training data
You have two options: generate synthetic labeled examples with Pioneer, or bring your own labeled data.Option A — Generate synthetic data. If you don’t have labeled examples yet, use the Option B — Upload through the platform. If you already have labeled data, upload it directly via the Pioneer dashboard.Once your dataset is ready, confirm its status before starting training:Wait until the dataset status is
/generate endpoint. See the Synthetic Data guide for full details.ready before proceeding.Start a training job
Submit your training job with The response includes your job ID and initial status:Save the
POST /felix/training-jobs. Set base_model to the GLiNER model you chose in step 1 and training_type to "lora".id — you’ll use it to poll status, run evaluations, and call inference.Poll job status and review metrics
Training typically takes a few minutes to a few hours depending on dataset size and epoch count. Poll the job endpoint until status is Job status values: A high F1 score (above 0.85) generally indicates a model ready for production. If scores are lower, consider adding more training examples or adjusting your entity label definitions.
"complete".requested → running → complete (or failed / stopped).When the job reaches "complete", the response includes evaluation metrics:Run an evaluation
Evaluate your trained model against a held-out dataset to get a more rigorous view of performance before deploying.Retrieve evaluation results with
GET /felix/evaluations/:id. Results include f1, precision, recall, and a per_entity breakdown so you can see which entity types need more training data.Run inference with your trained model
Use your job ID as the Schema optionsThe
You can also call inference using the OpenAI-compatible endpoint. Set
model_id to run predictions. The schema field controls what Pioneer extracts.schema field accepts four optional keys — use any combination:| Key | Type | Description |
|---|---|---|
entities | string[] | Entity type labels to extract (NER) |
classifications | {task, labels}[] | Text classification tasks with their label sets |
structures | object | Structure definitions for JSON extraction |
relations | object[] | Relation definitions between entities |
base_url to https://api.pioneer.ai/v1 and pass Pioneer fields via extra_body:Entity descriptions
Instead of passing a plain list of entity type names, you can pass a dictionary mapping each entity type to a natural-language description. Descriptions give the model more context about what to extract, improving accuracy — especially for ambiguous or domain-specific entities.Basic (no descriptions)
With descriptions (more accurate)
- When entity types are ambiguous (e.g. “time” could mean many things)
- In domain-specific contexts (medical, legal, financial)
- When you need higher precision and the model is making wrong extractions
- Be specific about what counts and what doesn’t
- Include examples inline (e.g. “like ‘400mg’ or ‘2 tablets’”)
- Keep them to one sentence — concise beats verbose
Next steps
- Fine-tune a classification model — assign labels to text with the same GLiNER base model
- Fine-tune a structured extraction model — pull JSON-shaped records out of text
- Generate synthetic training data — create labeled examples without manual annotation
- Adaptive Inference — let Pioneer retrain your model automatically on live traffic
- API Reference — full endpoint documentation