Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.pioneer.ai/llms.txt

Use this file to discover all available pages before exploring further.

Pioneer reuses repeated prompt prefixes (long system prompts, replayed conversation history) so you pay less and get faster responses on the cached portion. You don’t have to opt in — caching is applied automatically per provider.

What you do

Nothing. Send requests normally. Pioneer handles cache setup for you:
  • GPT (OpenAI) — caches prompt prefixes upstream automatically. No request changes are needed.
  • Opus / Claude (Anthropic) — caches only the prefix before a cache_control breakpoint. Pioneer inserts those breakpoints for you (on the system prompt, and on the latest turn of a multi-turn conversation) once a prompt is large enough to be worth caching.
If you set your own cache_control breakpoints, Pioneer respects them and does not add its own.

How to read it back

Token usage on every response splits input tokens by cache status:
FieldMeaning
prompt_tokensNon-cached input tokens
cache_read_tokensInput tokens served from cache (discounted)
cache_write_tokensInput tokens written to cache this request
completion_tokensOutput tokens
total_tokensSum of the four above
Caching charges are visible in Settings → Credits and can be displayed by model or by individual request.

Billing

Cached input is cheaper than fresh input. Rates are relative to a model’s input price:
ProviderCache readCache write
Claude / Opus0.1× input1.25× input
GPT-4 family0.5× inputbilled at input rate
GPT-5 family0.1× inputbilled at input rate
The first request that populates the cache pays the write rate on those tokens; subsequent requests that hit the cache pay the lower read rate. Caches are short-lived, so the savings come from sending similar prompts close together.
For Anthropic models, the system prompt must be at least 1024 tokens for caching to activate. Cache entries expire after 5 minutes of inactivity.

Tips

  • Keep your system prompt static. Any change to the cached prefix invalidates the cache.
  • Do not inject dynamic content (timestamps, user IDs, session data) into the system prompt, move it to the user message instead.
  • The Inference UI shows input and output token counts per request. Cache token breakdowns and discounted billing are visible on the Settings → Credit page.