doteb
Features

Caching

Overview of the two types of caching available in doteb.

Caching

doteb supports two distinct kinds of caching, and they solve different problems. Pick the one that matches your workload — they can also be used together.

Provider / Model Caching

The provider performs the caching. When your request reuses a long prefix from a previous call (a system prompt, conversation history, tool definitions, a long document), the model serves that prefix from its prompt cache and bills it at a reduced rate. New input tokens and all output tokens are still billed at the normal rate — only the cached portion is discounted.

This is the type of caching that powers efficient chat-based and assistant-based interactions, including chat apps and coding tools (Cursor, Cline, Claude Code, etc.) where the same context is reused turn after turn.

You see it in your usage as prompt_tokens_details.cached_tokens. For most providers it works automatically; some (notably Anthropic) also let you mark blocks explicitly with cache_control and choose a longer TTL.

Read the Provider Cache Control docs

Gateway Caching

doteb performs the caching. When a request is byte-identical to a previous one (same model, same messages, same parameters), the response is served from the gateway's cache without any provider call. Repeated identical calls cost $0.

This is most useful for deterministic API workloads — classification, batch jobs, FAQ lookups, retries — rather than free-form chat, because chat prompts almost always differ on the latest turn.

Read the Gateway Caching docs

Which one do I want?

If you…Use
Build a chat app, assistant, or coding toolProvider Cache Control
Send long system prompts or growing conversation historyProvider Cache Control
Want longer cache lifetimes than the provider defaultProvider Cache Control (explicit cache_control)
Send the exact same request many times (batches, retries, FAQs)Gateway Caching
Want $0 on repeated calls instead of a discountGateway Caching

The two are not mutually exclusive. A coding tool can rely on provider caching for its long system prompt and enable gateway caching so that deterministic tool calls (e.g., file lookups) cost nothing on retry.

How is this guide?

Last updated on

On this page

Ready for production?

Ship to production with SSO, audit logs, spend controls, and guardrails your security team will approve.

Explore Enterprise