Skip to main content
Evaluations let you measure how well a trained model performs against a labeled dataset before you deploy it. You can evaluate your own fine-tuned models or compare them against Pioneer’s baseline LLM models to understand the improvement your training has achieved. Results include overall F1, precision, and recall scores as well as per-entity breakdowns for NER tasks.
The base_model field in evaluation requests accepts a training job ID — unlike training jobs, which require a HuggingFace model ID or checkpoint UUID. You can also pass a base model ID to evaluate an untuned model as a baseline.

Run an evaluation

POST /felix/evaluations Starts an evaluation run that measures model performance against a labeled dataset. Returns an evaluation ID you can use to poll for results. Request body
base_model
string
required
The model to evaluate. Accepts a training job ID (to evaluate your fine-tuned model) or a base model ID (to evaluate an untuned model as a baseline).
dataset_name
string
required
The name of the labeled dataset to evaluate against. The dataset must be in the ready state.
project_id
string
Associate this evaluation with a specific project for organizational purposes.
curl -X POST https://api.pioneer.ai/felix/evaluations \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_model": "YOUR_TRAINING_JOB_ID",
    "dataset_name": "YOUR_DATASET_NAME"
  }'
Response
id
string
UUID of the evaluation. Use this with GET /felix/evaluations/:id to retrieve results.
status
string
Initial evaluation status.

List evaluations

GET /felix/evaluations Returns all evaluations for your account. Supports filtering by project. Query parameters
project_id
string
Filter results to evaluations associated with a specific project.
curl https://api.pioneer.ai/felix/evaluations \
  -H "X-API-Key: YOUR_API_KEY"

Get evaluation results

GET /felix/evaluations/:id Returns the status and, once complete, the full results of an evaluation run. Path parameters
id
string
required
The evaluation UUID.
curl https://api.pioneer.ai/felix/evaluations/YOUR_EVALUATION_ID \
  -H "X-API-Key: YOUR_API_KEY"
Response
id
string
Evaluation UUID.
status
string
Current status of the evaluation. Values: queued, running, complete, failed.
metrics
object
Overall performance metrics. Only present when status is complete.

Delete an evaluation

DELETE /felix/evaluations/:id Permanently deletes an evaluation and its results. Path parameters
id
string
required
The evaluation UUID.
curl -X DELETE https://api.pioneer.ai/felix/evaluations/YOUR_EVALUATION_ID \
  -H "X-API-Key: YOUR_API_KEY"
Returns 204 No Content on success.

List baseline models

GET /felix/baseline-models Returns the list of baseline LLM models available for evaluation. Use these to benchmark your fine-tuned model’s performance against general-purpose models and quantify the improvement from training.
curl https://api.pioneer.ai/felix/baseline-models \
  -H "X-API-Key: YOUR_API_KEY"
Response Returns an array of baseline model objects, each with an id and display name you can pass as base_model in POST /felix/evaluations.