Skip to main content
Datasets in Pioneer are collections of labeled examples used to train and evaluate models. Each dataset has a name you define, and Pioneer versions it automatically as you add or regenerate data. You reference a dataset by name when starting a training job or running an evaluation — so the name you choose is the stable identifier you’ll use throughout your workflow.

How datasets are created

You create datasets in two ways: Synthetic data generation — Use POST /generate to have Pioneer produce labeled examples from a description of your domain and the labels you care about. This is the fastest way to bootstrap a dataset without any existing labeled data.
curl -X POST https://api.pioneer.ai/generate \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task_type": "ner",
    "dataset_name": "my-ner-dataset",
    "labels": ["person", "company", "product"],
    "num_examples": 100,
    "domain_description": "Tech industry news articles"
  }'
Once generated or labeled, examples are stored in your dataset automatically. Poll GET /generate/jobs/:job_id to check when generation is complete before starting training.

Listing your datasets

Retrieve all datasets in your account:
curl https://api.pioneer.ai/felix/datasets \
  -H "X-API-Key: YOUR_API_KEY"
The response lists each dataset by name along with metadata such as creation time and version count.

Inspecting a dataset

To see the versions and details of a specific dataset, pass its name:
curl https://api.pioneer.ai/felix/datasets/my-ner-dataset \
  -H "X-API-Key: YOUR_API_KEY"
This returns version history and example counts, which is useful for confirming the dataset is ready before training.

Deleting a dataset

curl -X DELETE https://api.pioneer.ai/felix/datasets/my-ner-dataset \
  -H "X-API-Key: YOUR_API_KEY"
Deletion is permanent. Any training jobs or evaluations that already completed using this dataset are unaffected, but you will no longer be able to start new jobs referencing it.
Dataset storage is free. You are not charged for storing datasets in Pioneer, regardless of size or number of versions.

Dataset endpoints summary

MethodEndpointDescription
GET/felix/datasetsList all datasets
GET/felix/datasets/:nameGet details and versions for a dataset
DELETE/felix/datasets/:namePermanently delete a dataset
POST/generateStart a synthetic data generation job
GET/generate/jobs/:job_idPoll generation job status
POST/generate/ner/label-existingAuto-label raw text for NER
POST/generate/classification/label-existingAuto-classify raw text