Generate endpoints are rate-limited to 120 requests per minute per user. For large datasets, consider batching your requests or using the job polling endpoint to monitor long-running generation jobs.
Start a generation job
POST /generate
Starts an asynchronous job that generates labeled training examples and stores them in a named dataset. Returns a job ID you can use to poll for completion.
Request body
The type of task to generate data for. Accepted values:
ner, classification, decoder.The name of the dataset to create or append to. If a dataset with this name already exists, new examples are added as a new version.
Number of labeled examples to generate.
List of label strings for NER or classification tasks. For NER, these are entity type names (e.g.
"person", "organization"). For classification, these are the class names.A natural-language description of the domain or topic for the generated examples. Providing a detailed description improves example quality and relevance.
Few-shot examples with labels to guide generation for classification tasks.
Custom instruction prompt to control generation style for decoder tasks.
Unique identifier for the generation job. Use this with
GET /generate/jobs/:job_id to poll for status.Initial job status, typically
queued.Poll generation job status
GET /generate/jobs/:job_id
Returns the current status of a data generation job. Poll this endpoint until the status is complete or failed before starting a training job on the resulting dataset.
Path parameters
The job ID returned by
POST /generate.The generation job ID.
Current job status. Values:
queued, running, complete, failed.The dataset name that examples are being written to.
Number of examples generated so far.
Auto-label text for NER
POST /generate/ner/label-existing
Sends your own unlabeled text to Pioneer and returns NER annotations. Use this when you have existing text that you want to annotate rather than generating new synthetic examples.
Request body
List of entity type names to detect. For example:
["person", "organization", "location"].List of text strings to annotate. Accepts between 1 and 1,000 strings per request.
Auto-classify text
POST /generate/classification/label-existing
Sends your own unlabeled text to Pioneer and returns classification labels. Use this when you have existing text that you want to classify rather than generating new synthetic examples.
Request body
List of class names to classify text into. For example:
["positive", "negative", "neutral"].List of text strings to classify. Accepts between 1 and 1,000 strings per request.

