The
base_model field in evaluation requests accepts a training job ID — unlike training jobs, which require a HuggingFace model ID or checkpoint UUID. You can also pass a base model ID to evaluate an untuned model as a baseline.Run an evaluation
POST /felix/evaluations
Starts an evaluation run that measures model performance against a labeled dataset. Returns an evaluation ID you can use to poll for results.
Request body
The model to evaluate. Accepts a training job ID (to evaluate your fine-tuned model) or a base model ID (to evaluate an untuned model as a baseline).
The name of the labeled dataset to evaluate against. The dataset must be in the
ready state.Associate this evaluation with a specific project for organizational purposes.
UUID of the evaluation. Use this with
GET /felix/evaluations/:id to retrieve results.Initial evaluation status.
List evaluations
GET /felix/evaluations
Returns all evaluations for your account. Supports filtering by project.
Query parameters
Filter results to evaluations associated with a specific project.
Get evaluation results
GET /felix/evaluations/:id
Returns the status and, once complete, the full results of an evaluation run.
Path parameters
The evaluation UUID.
Evaluation UUID.
Current status of the evaluation. Values:
queued, running, complete, failed.Overall performance metrics. Only present when status is
complete.Delete an evaluation
DELETE /felix/evaluations/:id
Permanently deletes an evaluation and its results.
Path parameters
The evaluation UUID.
204 No Content on success.
List baseline models
GET /felix/baseline-models
Returns the list of baseline LLM models available for evaluation. Use these to benchmark your fine-tuned model’s performance against general-purpose models and quantify the improvement from training.
id and display name you can pass as base_model in POST /felix/evaluations.
