Pioneer API rate limits: per-endpoint request quotas

The Pioneer API enforces two layers of rate limits: a global default applied per client IP address, and per-endpoint limits applied per authenticated user. Per-endpoint limits stack on top of the global default — exceeding either limit results in a 429 Too Many Requests response.

Limits by endpoint

Endpoint	Scope	Limit
All endpoints (default)	Per client IP	1,000 / min · 10,000 / hour
`POST /inference`	Per user	1,200 / min
`POST /v1/chat/completions`, `/v1/completions`, `/v1/responses`, `/v1/messages`	Per user	200 / min
`POST /gliner-2/*`	Per user	15,000 / min
`POST /generate/*`	Per user	120 / min
`POST /felix/training-jobs`	Per user	20 / min

Pro and Research plan subscribers have higher per-user rate limits than those shown above for the inference endpoints. Upgrade your plan at pioneer.ai/billing to unlock higher limits.

Handling 429 responses

When you exceed a limit, the API returns 429 Too Many Requests and includes a Retry-After header that tells you how many seconds to wait before retrying.

cURL

HTTP/2 429
retry-after: 3
content-type: application/json

{
  "error": "rate_limit_exceeded",
  "message": "Too many requests. Retry after 3 seconds."
}

The following pattern handles 429 responses with a simple sleep-and-retry loop:

Python

import time
import requests

def call_with_retry(url, headers, payload, max_retries=5):
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, json=payload)

        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 1))
            print(f"Rate limited. Retrying in {retry_after}s...")
            time.sleep(retry_after)
            continue

        response.raise_for_status()
        return response.json()

    raise RuntimeError("Max retries exceeded.")

Requesting higher limits

If the default or Pro-tier limits don’t fit your workload, contact the Pioneer enterprise team to discuss a custom plan. Request higher limits

Overview

Inference

Training & Data

Projects

Pioneer API rate limits: per-endpoint request quotas

Limits by endpoint

Handling 429 responses

Requesting higher limits

Overview

Inference

Training & Data

Projects

​Limits by endpoint

​Handling 429 responses

​Requesting higher limits

Limits by endpoint

Handling 429 responses

Requesting higher limits