Documentation Index
Fetch the complete documentation index at: https://docs.pioneer.ai/llms.txt
Use this file to discover all available pages before exploring further.
title: “Pioneer inference: native, OpenAI, and Anthropic formats” sidebarTitle: “Inference” description: “Run predictions on any base or fine-tuned model using Pioneer’s native API, the OpenAI-compatible endpoint, or the Anthropic-compatible messages endpoint.”
Once you have a trained model — or want to use a base model directly — you run inference by sending a request to the Pioneer API. Themodel_id field accepts either a base model ID (like fastino/gliner2-base-v1) or the job ID returned from a completed training job (like job_abc123). Pioneer routes the request to the right deployment automatically.
Pioneer supports three request formats: its own native format, an OpenAI-compatible format, and an Anthropic-compatible format. All three reach the same underlying models.
Pioneer native format
UsePOST /inference with the Pioneer schema format. This is the most expressive option and gives you full control over extraction tasks.
Schema structure
Theschema field is a dictionary with optional keys. Include only the keys that apply to your task.
| Key | Type | Description |
|---|---|---|
entities | string[] | Entity type labels for named entity recognition (NER). |
classifications | object[] | Classification tasks, each with a task name and labels list. |
structures | object | Named structure definitions for JSON extraction. |
relations | object[] | Relation definitions linking extracted entities. |
Decoder models
For decoder models (LLMs), replaceschema with "task": "generate":
OpenAI-compatible format
Pioneer exposes an OpenAI-compatible endpoint athttps://api.pioneer.ai/v1. Point any existing OpenAI SDK or integration at this base URL and use your Pioneer API key — no other changes required.
| Method | Endpoint | Description |
|---|---|---|
POST | /v1/chat/completions | Chat completions |
POST | /v1/completions | Text completions |
POST | /v1/responses | Responses API |
GET | /v1/models | List available models |
Anthropic-compatible format
Pioneer also exposes an Anthropic-compatible endpoint. Set your SDK’sbase_url to https://api.pioneer.ai/v1 and use your Pioneer API key in place of an Anthropic key.
Opting out of inference persistence
By default, Pioneer stores every inference — the input, output, and metadata — so it can drive evaluation, use-case clustering, and adapter training. Passstore: false to skip persistence for a specific request.
store: false is supported on all three request formats — native /inference, /v1/chat/completions, and /v1/messages — and works identically for streaming and non-streaming requests.
What changes with store: false
Default (store: true) | store: false | |
|---|---|---|
| Inference executes | Yes | Yes |
| Input/output stored | Yes | No |
| Evaluation run | Yes | No |
| Use-case clustering | Yes | No |
| Adapter training feed | Yes | No |
| Token billing | Yes | Yes |
inference_id in response | Yes | Yes (for correlation) |
Billing still applies. Token usage, COGS, and metered billing are recorded even when
store: false is set — only the full request/response payload is not retained.When to use it
- Health checks — liveness and readiness probes that run continuously - Internal benchmarks — evaluations you run against your own ground truth that shouldn’t pollute user-facing inference history - Development and testing — exploratory calls during integration work where accumulating inference rows adds noise
Inference history
Pioneer records every inference call. You can retrieve past results and submit corrections to improve future training data.GET /inferences: limit, offset, model_id, task, project_id, training_job_id.