What evaluations measure
An evaluation compares your model’s predictions against the ground-truth labels in your dataset. Pioneer reports:- F1 — the harmonic mean of precision and recall, the primary summary metric
- Precision — of all predictions made, how many were correct
- Recall — of all ground-truth labels, how many the model found
- Per-entity breakdown — the same three metrics for each individual entity type, so you can identify which labels are underperforming
Running an evaluation
Pass your training job ID asbase_model and the name of your evaluation dataset as dataset_name:
Retrieving results
Poll the evaluation endpoint until results are ready:Managing evaluations
List all evaluations in your account:project_id query parameter.
Delete an evaluation you no longer need:
Evaluations endpoint summary
| Method | Endpoint | Description |
|---|---|---|
POST | /felix/evaluations | Run an evaluation |
GET | /felix/evaluations | List all evaluations |
GET | /felix/evaluations/:id | Get evaluation results |
DELETE | /felix/evaluations/:id | Delete an evaluation |

