Skip to main content
Evaluations let you measure how well a trained model performs against a labeled dataset before you deploy it. You can evaluate your own fine-tuned models or compare them against Pioneer’s baseline LLM models to understand the improvement your training has achieved. Results include overall F1, precision, and recall scores as well as per-entity breakdowns for NER tasks.
The base_model field in evaluation requests accepts a training job ID — unlike training jobs, which require a HuggingFace model ID or checkpoint UUID. You can also pass a base model ID to evaluate an untuned model as a baseline.

Run an evaluation

POST /felix/evaluations Starts an evaluation run that measures model performance against a labeled dataset. Returns an evaluation ID you can use to poll for results. Request body
base_model
string
required
The model to evaluate. Accepts a training job ID (to evaluate your fine-tuned model) or a base model ID (to evaluate an untuned model as a baseline).
dataset_name
string
The name of the labeled dataset to evaluate against. The dataset must be in the ready state.
project_id
string
Associate this evaluation with a specific project for organizational purposes.
curl -X POST https://api.pioneer.ai/felix/evaluations \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_model": "YOUR_TRAINING_JOB_ID",
    "dataset_name": "YOUR_DATASET_NAME"
  }'
Response
success
boolean
true on success.
count
number
Number of evaluations created.
evaluations
object[]
Array of created evaluation objects. Each includes an id you can pass to GET /felix/evaluations/:id to poll for results.

List evaluations

GET /felix/evaluations Returns all evaluations for your account. Supports filtering by project. Query parameters
project_id
string
Filter results to evaluations associated with a specific project.
curl https://api.pioneer.ai/felix/evaluations \
  -H "X-API-Key: YOUR_API_KEY"

Get evaluation results

GET /felix/evaluations/:id Returns the status and, once complete, the full results of an evaluation run. Path parameters
id
string
required
The evaluation UUID.
curl https://api.pioneer.ai/felix/evaluations/YOUR_EVALUATION_ID \
  -H "X-API-Key: YOUR_API_KEY"
Response
id
string
Evaluation UUID.
status
string
Current status of the evaluation. Values: queued, running, complete, failed.
f1_score
number
Overall F1 score. Present once the evaluation is complete.
precision_score
number
Overall precision score.
recall_score
number
Overall recall score.
sample_count
number
Number of examples evaluated.
completed_at
string
ISO 8601 timestamp of when the evaluation finished.

Delete an evaluation

DELETE /felix/evaluations/:id Permanently deletes an evaluation and its results. Path parameters
id
string
required
The evaluation UUID.
curl -X DELETE https://api.pioneer.ai/felix/evaluations/YOUR_EVALUATION_ID \
  -H "X-API-Key: YOUR_API_KEY"
Returns 200 with {"success": true, "message": "..."} on success.

List baseline models

GET /felix/baseline-models Returns the list of baseline LLM models available for evaluation. Use these to benchmark your fine-tuned model’s performance against general-purpose models and quantify the improvement from training.
curl https://api.pioneer.ai/felix/baseline-models \
  -H "X-API-Key: YOUR_API_KEY"
Response Returns an object with a models array and a count. Each model has id, name, provider, and description. Pass the id as base_model in POST /felix/evaluations to evaluate against a baseline.