Per-token LLM pricing spans three orders of magnitude in 2026: from Groq's Llama 3.1 8B at $0.05 per million input tokens to OpenAI's o1 at $15.00 — a 300× difference. Most teams overpay not because frontier models are expensive, but because they use a frontier model for work a $0.15 model handles fine.
CloudMart tracks token pricing across 8 API providers, refreshed twice weekly. Here's the full board, the math for three realistic workloads, and the two pricing levers (caching and batching) most people ignore.
The full pricing board
All prices are USD per 1 million tokens, as tracked on June 12, 2026:
| Model | Input | Output | Cached input |
|---|---|---|---|
| Frontier tier | |||
| OpenAI o1 | $15.00 | $60.00 | $7.50 |
| Claude Opus 4.6 | $5.00 | $25.00 | $0.50 |
| GPT-4.1 | $3.00 | $12.00 | $0.75 |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 |
| Grok 4 | $3.00 | $15.00 | $0.75 |
| Mid tier | |||
| GPT-4o | $2.50 | $10.00 | $1.25 |
| o3-mini | $1.10 | $4.40 | $0.55 |
| Claude Haiku 4.5 | $1.00 | $5.00 | $0.10 |
| GPT-4.1-mini | $0.80 | $3.20 | $0.20 |
| DeepSeek R1 (reasoner) | $0.55 | $2.19 | $0.14 |
| Budget tier | |||
| Grok 4.1 Fast | $0.20 | $0.50 | — |
| GPT-4o-mini | $0.15 | $0.60 | $0.075 |
| DeepSeek V3 | $0.14 | $0.28 | $0.014 |
| Mistral Small 3.1 | $0.10 | $0.30 | — |
| Gemini 2.5 Flash | $0.075 | $0.30 | $0.0075 |
| Groq Llama 3.1 8B | $0.05 | $0.08 | — |
These move constantly — the live API marketplace always has current numbers, including Together AI's open-source catalog (Llama 3.3 70B at $0.88/M flat).
What three real workloads cost per month
1. Support chatbot — 10M input / 2M output tokens per month
| Model | Monthly cost |
|---|---|
| Gemini 2.5 Flash | $1.35 |
| DeepSeek V3 | $1.96 |
| GPT-4o-mini | $2.70 |
| Claude Haiku 4.5 | $20.00 |
| Claude Sonnet 4.6 | $60.00 |
| GPT-4o | $45.00 |
For classification, routing, and FAQ-style support, the budget tier is genuinely good now. Most teams running GPT-4o here are paying a 17–33× premium for quality they don't use.
2. RAG over documents — 50M input / 5M output per month
Input-heavy workloads reward cheap input pricing and big context windows: Gemini 2.5 Flash ≈ $5.25/mo, DeepSeek V3 ≈ $8.40, Claude Haiku 4.5 ≈ $75, Claude Sonnet 4.6 ≈ $225. If your retrieval is good, a mid-tier model reading the right chunks beats a frontier model reading everything.
3. Coding assistant — 20M input / 10M output per month
Output-heavy work flips the math, because output tokens cost 3–5× input: DeepSeek V3 ≈ $5.60/mo, GPT-4.1-mini ≈ $48, Claude Sonnet 4.6 ≈ $210, GPT-4.1 ≈ $180. Code is where quality differences are most visible, so this is the one workload where paying for Sonnet/GPT-4.1 most often makes sense.
The two levers that cut bills 50–90%
Prompt caching
If every request re-sends the same system prompt or document context, cached-input pricing applies to the repeated part. The discounts are not subtle: Claude's cache reads are 90% off ($0.30/M vs $3.00/M on Sonnet), Gemini's are 90% off, DeepSeek's are 90% off. A RAG app with a 5K-token system prompt called 100K times a month saves real money here.
Batch processing
OpenAI's Batch API runs async jobs at 50% off. If your workload isn't interactive — nightly summarization, bulk classification, embedding pipelines — you should never pay the realtime rate.
Quick picks
- High-volume, simple tasks: Gemini 2.5 Flash or DeepSeek V3
- Best quality-per-dollar all-rounder: Claude Sonnet 4.6 or GPT-4.1
- Long documents (200K context): Claude Sonnet/Haiku
- Lowest latency (voice, realtime): Groq
- Open-source, no lock-in: Together AI or Groq
- EU data residency: Mistral
What would your workload cost?
Describe your app and volume — the Planner estimates monthly token costs across providers and picks the right tier.
Estimate my API costs →Related: H100 rental prices compared — for when you'd rather self-host the model than pay per token.