Skip to main content
Named Entity Recognition (NER) lets you extract structured information — people, organizations, products, locations, and any custom entity type you define — from unstructured text. Pioneer’s GLiNER encoder models are purpose-built for this task and support LoRA fine-tuning so you can adapt them to your domain with a small labeled dataset and no GPU infrastructure of your own.
1

Choose a base model

Pioneer offers four GLiNER base models. For most tasks, fastino/gliner2-base-v1 is the right starting point: it’s fast, accurate, and supports LoRA and full fine-tuning. If your data includes non-English text, use a multi variant instead.
Model IDUse caseTraining
fastino/gliner2-base-v1English, general purposeLoRA, Full
fastino/gliner2-large-v1English, higher accuracyLoRA, Full
fastino/gliner2-multi-v1MultilingualLoRA, Full
fastino/gliner2-multi-large-v1Multilingual, higher accuracyLoRA, Full
You can always fetch the latest catalog from the API:
curl "https://api.pioneer.ai/base-models?task_type=encoder&supports_training=true" \
  -H "X-API-Key: YOUR_API_KEY"
2

Prepare your training data

You have two options: generate synthetic labeled examples with Pioneer, or bring your own labeled data.Option A — Generate synthetic data. If you don’t have labeled examples yet, use the /generate endpoint. See the Synthetic Data guide for full details.
curl -X POST https://api.pioneer.ai/generate \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task_type": "ner",
    "dataset_name": "my-ner-dataset",
    "labels": ["person", "company", "product"],
    "num_examples": 100,
    "domain_description": "Tech industry news articles"
  }'
Option B — Upload through the platform. If you already have labeled data, upload it directly via the Pioneer dashboard.Once your dataset is ready, confirm its status before starting training:
curl https://api.pioneer.ai/felix/datasets/my-ner-dataset \
  -H "X-API-Key: YOUR_API_KEY"
Wait until the dataset status is ready before proceeding.
3

Start a training job

Submit your training job with POST /felix/training-jobs. Set base_model to the GLiNER model you chose in step 1 and training_type to "lora".
curl -X POST https://api.pioneer.ai/felix/training-jobs \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "my-ner-model",
    "base_model": "fastino/gliner2-base-v1",
    "datasets": [{"name": "my-ner-dataset"}],
    "training_type": "lora",
    "nr_epochs": 5,
    "learning_rate": 5e-5
  }'
The response includes your job ID and initial status:
{ "id": "uuid-of-training-job", "status": "requested" }
Save the id — you’ll use it to poll status, run evaluations, and call inference.
4

Poll job status and review metrics

Training typically takes a few minutes to a few hours depending on dataset size and epoch count. Poll the job endpoint until status is "complete".
curl https://api.pioneer.ai/felix/training-jobs/YOUR_JOB_ID \
  -H "X-API-Key: YOUR_API_KEY"
Job status values: requestedrunningcomplete (or failed / stopped).When the job reaches "complete", the response includes evaluation metrics:
{
  "id": "YOUR_JOB_ID",
  "status": "complete",
  "metrics": {
    "f1": 0.94,
    "precision": 0.96,
    "recall": 0.92
  }
}
A high F1 score (above 0.85) generally indicates a model ready for production. If scores are lower, consider adding more training examples or adjusting your entity label definitions.
5

Run an evaluation

Evaluate your trained model against a held-out dataset to get a more rigorous view of performance before deploying.
curl -X POST https://api.pioneer.ai/felix/evaluations \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_model": "YOUR_JOB_ID",
    "dataset_name": "my-eval-dataset"
  }'
Retrieve evaluation results with GET /felix/evaluations/:id. Results include f1, precision, recall, and a per_entity breakdown so you can see which entity types need more training data.
6

Run inference with your trained model

Use your job ID as the model_id to run predictions. The schema field controls what Pioneer extracts.
curl -X POST https://api.pioneer.ai/inference \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "YOUR_JOB_ID",
    "text": "Apple announced the MacBook Pro at WWDC in Cupertino.",
    "schema": {
      "entities": ["organization", "product", "event", "location"]
    },
    "threshold": 0.5
  }'
Schema optionsThe schema field accepts four optional keys — use any combination:
KeyTypeDescription
entitiesstring[]Entity type labels to extract (NER)
classifications{task, labels}[]Text classification tasks with their label sets
structuresobjectStructure definitions for JSON extraction
relationsobject[]Relation definitions between entities
You can also call inference using the OpenAI-compatible endpoint. Set base_url to https://api.pioneer.ai/v1 and pass Pioneer fields via extra_body:
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.pioneer.ai/v1"
)

response = client.chat.completions.create(
    model="YOUR_JOB_ID",
    messages=[{
        "role": "user",
        "content": "Apple announced the MacBook Pro at WWDC in Cupertino."
    }],
    extra_body={
        "schema": {
            "entities": ["organization", "product", "event", "location"]
        }
    }
)
The threshold parameter controls the confidence cutoff for returned entities. The default is 0.5. Lower it (e.g., 0.3) to surface more candidates at the cost of more false positives; raise it (e.g., 0.7) for higher-precision results with fewer extractions.

Next steps