Skip to main content
The Pioneer API enforces two layers of rate limits: a global default applied per client IP address, and per-endpoint limits applied per authenticated user. Per-endpoint limits stack on top of the global default — exceeding either limit results in a 429 Too Many Requests response.

Limits by endpoint

EndpointScopeLimit
All endpoints (default)Per client IP1,000 / min · 10,000 / hour
POST /inferencePer user1,200 / min
POST /v1/chat/completions, /v1/completions, /v1/responses, /v1/messagesPer user200 / min
POST /gliner-2/*Per user15,000 / min
POST /generate/*Per user120 / min
POST /felix/training-jobsPer user20 / min
Pro and Research plan subscribers have higher per-user rate limits than those shown above for the inference endpoints. Upgrade your plan at pioneer.ai/billing to unlock higher limits.

Handling 429 responses

When you exceed a limit, the API returns 429 Too Many Requests and includes a Retry-After header that tells you how many seconds to wait before retrying.
cURL
HTTP/2 429
retry-after: 3
content-type: application/json

{
  "error": "rate_limit_exceeded",
  "message": "Too many requests. Retry after 3 seconds."
}
The following pattern handles 429 responses with a simple sleep-and-retry loop:
Python
import time
import requests

def call_with_retry(url, headers, payload, max_retries=5):
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, json=payload)

        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 1))
            print(f"Rate limited. Retrying in {retry_after}s...")
            time.sleep(retry_after)
            continue

        response.raise_for_status()
        return response.json()

    raise RuntimeError("Max retries exceeded.")

Requesting higher limits

If the default or Pro-tier limits don’t fit your workload, contact the Pioneer enterprise team to discuss a custom plan. Request higher limits