The
base_model field in evaluation requests accepts a training job ID — unlike training jobs, which require a HuggingFace model ID or checkpoint UUID. You can also pass a base model ID to evaluate an untuned model as a baseline.Run an evaluation
POST /felix/evaluations
Starts an evaluation run that measures model performance against a labeled dataset. Returns an evaluation ID you can use to poll for results.
Request body
The model to evaluate. Accepts a training job ID (to evaluate your fine-tuned model) or a base model ID (to evaluate an untuned model as a baseline).
The name of the labeled dataset to evaluate against. The dataset must be in the
ready state.Associate this evaluation with a specific project for organizational purposes.
true on success.Number of evaluations created.
Array of created evaluation objects. Each includes an
id you can pass to GET /felix/evaluations/:id to poll for results.List evaluations
GET /felix/evaluations
Returns all evaluations for your account. Supports filtering by project.
Query parameters
Filter results to evaluations associated with a specific project.
Get evaluation results
GET /felix/evaluations/:id
Returns the status and, once complete, the full results of an evaluation run.
Path parameters
The evaluation UUID.
Evaluation UUID.
Current status of the evaluation. Values:
queued, running, complete, failed.Overall F1 score. Present once the evaluation is complete.
Overall precision score.
Overall recall score.
Number of examples evaluated.
ISO 8601 timestamp of when the evaluation finished.
Delete an evaluation
DELETE /felix/evaluations/:id
Permanently deletes an evaluation and its results.
Path parameters
The evaluation UUID.
200 with {"success": true, "message": "..."} on success.
List baseline models
GET /felix/baseline-models
Returns the list of baseline LLM models available for evaluation. Use these to benchmark your fine-tuned model’s performance against general-purpose models and quantify the improvement from training.
models array and a count. Each model has id, name, provider, and description. Pass the id as base_model in POST /felix/evaluations to evaluate against a baseline.