Decide on your task type
Pioneer generates training data for three task types:
Choose the task type that matches the model you plan to train. You’ll pass it as
| Task type | Use case |
|---|---|
ner | Named entity recognition — extract spans of text with entity labels |
classification | Text classification — assign one or more labels to each input |
decoder | Generative LLM training — prompt-completion or conversation pairs |
task_type in the request body.Start a generation job
Send a Required fields:
Optional fields:
The response includes a job ID you’ll use to poll status.
POST /generate request with your task type, a dataset name, the labels you want annotated, a description of your domain, and the number of examples to generate.| Field | Description |
|---|---|
task_type | "ner", "classification", or "decoder" |
dataset_name | Name for the generated dataset (used when starting training) |
num_examples | Number of labeled examples to generate |
| Field | Description |
|---|---|
labels | List of label strings (required for NER and classification) |
domain_description | Short description of your content domain — improves output relevance |
classified_examples | Seed examples with existing labels (classification only) |
prompt | Additional instructions for the generation model |
Poll job status
Generation jobs run asynchronously. Poll Once complete, the dataset is available under the name you provided in
GET /generate/jobs/:job_id until the status is "complete".dataset_name.Use the dataset in a training job
Pass the dataset name directly to See the NER fine-tuning guide or LLM fine-tuning guide for full training walkthroughs.
POST /felix/training-jobs:Auto-label existing text
If you already have raw text and want Pioneer to annotate it — rather than generating new examples from scratch — use the label-existing endpoints. This is useful when you have a corpus of real documents but haven’t labeled them yet. Auto-label for NER:labels and inputs.
Next steps
- Fine-tune a NER model — use your generated dataset to train a custom GLiNER model
- Fine-tune an LLM — train a decoder model on generated prompt-completion pairs
- Adaptive Inference — let Pioneer generate training data from live inference traffic automatically

