Overview
When you cache content:- Cache Write: You pay to store content in the cache (first use)
- Cache Read: You pay a discounted rate when reusing cached content
- Storage (Google only): Additional hourly storage costs
Anthropic (Claude)
Anthropic uses a simple multiplier-based pricing model for prompt caching.Pricing Structure
Operation | Multiplier | Example (Claude Sonnet @ $3/MTok) |
---|---|---|
Cache Read | 0.1× | $0.30/MTok |
Cache Write (5 min) | 1.25× | $3.75/MTok |
Cache Write (1 hour) | 2.0× | $6.00/MTok |
Key Points
- TTL Options: 5 minutes or 1 hour
- Providers: Available on Anthropic API, Vertex AI, and AWS Bedrock
- Limitation: Vertex AI and Bedrock only support 5-minute caching
- Minimum: 1024 tokens for most models
Calculation Example
Google Gemini
Google uses a multiplier plus storage cost model for context caching.Pricing Structure
Operation | Multiplier | Storage Cost |
---|---|---|
Cache Read | 0.25× | N/A |
Cache Write | 1.0× | + Storage fee |
- Gemini 2.5 Pro: $4.50/MTok/hour
- Gemini 2.5 Flash: $1.00/MTok/hour
- Gemini 2.5 Flash-Lite: $1.00/MTok/hour
Key Points
- TTL: 5 minutes only
- Cache Types: Implicit (automatic) and Explicit (manual)
- Minimum: 1024 tokens (Flash), 2048 tokens (Pro)
- Discount: 75% off input costs for cache reads
Calculation Example
For Gemini 2.5 Pro (≤200K tokens):Tiered Pricing
Gemini 2.5 Pro has different rates for larger contexts:Context Size | Input Price | Cache Read | Cache Write (5 min) |
---|---|---|---|
≤200K tokens | $1.25/MTok | $0.31/MTok | $1.625/MTok |
>200K tokens | $2.50/MTok | $0.625/MTok | $2.875/MTok |