Choose a base model
Pioneer offers four GLiNER base models. For most tasks,
You can always fetch the latest catalog from the API:
fastino/gliner2-base-v1 is the right starting point: it’s fast, accurate, and supports LoRA and full fine-tuning. If your data includes non-English text, use a multi variant instead.| Model ID | Use case | Training |
|---|---|---|
fastino/gliner2-base-v1 | English, general purpose | LoRA, Full |
fastino/gliner2-large-v1 | English, higher accuracy | LoRA, Full |
fastino/gliner2-multi-v1 | Multilingual | LoRA, Full |
fastino/gliner2-multi-large-v1 | Multilingual, higher accuracy | LoRA, Full |
Decide single-label vs multi-label
Pick one mode and use it consistently across every row in your dataset — mixing the two in the same dataset is rejected at validation time.
The label vocabulary itself is inferred from the dataset — you don’t declare it up front. Pioneer collects every distinct
| Mode | Row shape | Use when |
|---|---|---|
| Single-label | {"text": "...", "label": "positive"} | Each input gets exactly one label (sentiment, intent, language). |
| Multi-label | {"text": "...", "labels": ["positive", "fast-shipping"]} | Each input can carry multiple labels (topic tagging, content moderation, multi-aspect review). |
label / labels value across your training rows and uses that as the candidate set.Prepare your training data
You have two options: generate synthetic labeled examples with Pioneer, or bring your own labeled data.Option A — Generate synthetic data. If you don’t have labeled examples yet, use the Option B — Auto-label existing text. If you have raw text but no labels, send it to Option C — Upload through the platform. If you already have labeled data, upload it directly via the Pioneer dashboard. Each row needs a Wait until the dataset status is
/generate endpoint with task_type: "classification". See the Synthetic Data guide for full details.POST /generate/classification/label-existing and Pioneer will annotate it synchronously. Accepts 1–1,000 strings per call.text column and either a label column (single-label) or a labels column (multi-label).Once your dataset is ready, confirm its status before starting training:ready before proceeding.Start a training job
Submit your training job with The response includes your job ID and initial status:Save the
POST /felix/training-jobs. Set base_model to the GLiNER model you chose in step 1 and training_type to "lora". The training endpoint is shared with NER — Pioneer infers the task heads from the dataset columns.id — you’ll use it to poll status, run evaluations, and call inference.Poll job status and review metrics
Training typically takes a few minutes to a few hours depending on dataset size and epoch count. Poll the job endpoint until status is Job status values: A high F1 score (above 0.85) generally indicates a model ready for production. If scores are lower, consider adding more training examples — especially for any minority classes — or making your label definitions more distinct.
"complete".requested → running → complete (or failed / stopped).When the job reaches "complete", the response includes evaluation metrics:Run an evaluation
Evaluate your trained model against a held-out dataset to get a more rigorous view of performance before deploying.Retrieve evaluation results with
GET /felix/evaluations/:id. Results include f1, precision, recall, and a per-label breakdown so you can see which classes need more training data.Run inference with your trained model
Use your job ID as the Classification entry optionsEach object inside
You can attach multiple classification heads in one call — for example, sentiment and topic from the same input — by adding more entries to the list. Classification can also be combined with NER (
model_id to run predictions. Classification lives under the classifications key of the schema field — each entry defines one independent classification head.classifications accepts these keys:| Key | Type | Description |
|---|---|---|
task | string | Name for this classification head (free-form; used in the response). |
labels | string[] | Candidate labels for this task. |
multi_label | boolean | false returns one winning label; true allows multiple labels above threshold. |
top_k | integer | Optional cap on the number of labels returned (single-label only). |
entities), structured extraction (structures), or relations (relations) in the same request; the response carries each head independently.You can also call inference using the OpenAI-compatible endpoint. Set base_url to https://api.pioneer.ai/v1 and pass Pioneer fields via extra_body:Multi-label vs single-label at inference time
Themulti_label flag on each classification entry is independent of how your training data was shaped — you can train on single-label data and still query a multi-label head, or vice versa, as long as the candidate labels you pass match labels the model has seen.
- Single-label (
multi_label: false) — Returns exactly one winning label (or up totop_kranked labels). Use for mutually exclusive taxonomies like sentiment or intent. - Multi-label (
multi_label: true) — Returns every label whose confidence exceedsthreshold. Use for tagging-style tasks where multiple labels can be true at once.
Next steps
- Fine-tune a NER model — extract entities with the same GLiNER base model
- Fine-tune a structured extraction model — pull JSON-shaped records out of text
- Generate synthetic training data — create labeled classification examples without manual annotation
- Adaptive Inference — let Pioneer retrain your classifier automatically on live traffic
- API Reference — full endpoint documentation