Before you put a fine-tuned model into production, you want to know how it performs on held-out data. Pioneer’s evaluation API runs your model against a labeled dataset and returns F1, precision, and recall — both as overall scores and broken down per entity type. This gives you a clear picture of where the model is strong and where it may need more training data.Documentation Index
Fetch the complete documentation index at: https://docs.pioneer.ai/llms.txt
Use this file to discover all available pages before exploring further.
What evaluations measure
An evaluation compares your model’s predictions against the ground-truth labels in your dataset. Pioneer reports:- F1 — the harmonic mean of precision and recall, the primary summary metric
- Precision — of all predictions made, how many were correct
- Recall — of all ground-truth labels, how many the model found
- Per-entity breakdown — the same three metrics for each individual entity type, so you can identify which labels are underperforming
Running an evaluation
Pass your training job ID asbase_model and the name of your evaluation dataset as dataset_name:
Retrieving results
Poll the evaluation endpoint until results are ready:Managing evaluations
List all evaluations in your account:project_id query parameter.
Delete an evaluation you no longer need:
Evaluations endpoint summary
| Method | Endpoint | Description |
|---|---|---|
POST | /felix/evaluations | Run an evaluation |
GET | /felix/evaluations | List all evaluations |
GET | /felix/evaluations/:id | Get evaluation results |
DELETE | /felix/evaluations/:id | Delete an evaluation |