LLM API Pricing in 2026: GPT vs Claude vs Gemini vs DeepSeek, Cost Per Million Tokens

LLM APIs June 12, 2026 · 8 min read · by CloudMart

Per-token LLM pricing spans three orders of magnitude in 2026: from Groq's Llama 3.1 8B at $0.05 per million input tokens to OpenAI's o1 at $15.00 — a 300× difference. Most teams overpay not because frontier models are expensive, but because they use a frontier model for work a $0.15 model handles fine.

CloudMart tracks token pricing across 8 API providers, refreshed twice weekly. Here's the full board, the math for three realistic workloads, and the two pricing levers (caching and batching) most people ignore.

The full pricing board

All prices are USD per 1 million tokens, as tracked on June 12, 2026:

Model	Input	Output	Cached input
Frontier tier
OpenAI o1	$15.00	$60.00	$7.50
Claude Opus 4.6	$5.00	$25.00	$0.50
GPT-4.1	$3.00	$12.00	$0.75
Claude Sonnet 4.6	$3.00	$15.00	$0.30
Grok 4	$3.00	$15.00	$0.75
Mid tier
GPT-4o	$2.50	$10.00	$1.25
o3-mini	$1.10	$4.40	$0.55
Claude Haiku 4.5	$1.00	$5.00	$0.10
GPT-4.1-mini	$0.80	$3.20	$0.20
DeepSeek R1 (reasoner)	$0.55	$2.19	$0.14
Budget tier
Grok 4.1 Fast	$0.20	$0.50	—
GPT-4o-mini	$0.15	$0.60	$0.075
DeepSeek V3	$0.14	$0.28	$0.014
Mistral Small 3.1	$0.10	$0.30	—
Gemini 2.5 Flash	$0.075	$0.30	$0.0075
Groq Llama 3.1 8B	$0.05	$0.08	—

These move constantly — the live API marketplace always has current numbers, including Together AI's open-source catalog (Llama 3.3 70B at $0.88/M flat).

What three real workloads cost per month

1. Support chatbot — 10M input / 2M output tokens per month

Model	Monthly cost
Gemini 2.5 Flash	$1.35
DeepSeek V3	$1.96
GPT-4o-mini	$2.70
Claude Haiku 4.5	$20.00
Claude Sonnet 4.6	$60.00
GPT-4o	$45.00

For classification, routing, and FAQ-style support, the budget tier is genuinely good now. Most teams running GPT-4o here are paying a 17–33× premium for quality they don't use.

2. RAG over documents — 50M input / 5M output per month

Input-heavy workloads reward cheap input pricing and big context windows: Gemini 2.5 Flash ≈ $5.25/mo, DeepSeek V3 ≈ $8.40, Claude Haiku 4.5 ≈ $75, Claude Sonnet 4.6 ≈ $225. If your retrieval is good, a mid-tier model reading the right chunks beats a frontier model reading everything.

3. Coding assistant — 20M input / 10M output per month

Output-heavy work flips the math, because output tokens cost 3–5× input: DeepSeek V3 ≈ $5.60/mo, GPT-4.1-mini ≈ $48, Claude Sonnet 4.6 ≈ $210, GPT-4.1 ≈ $180. Code is where quality differences are most visible, so this is the one workload where paying for Sonnet/GPT-4.1 most often makes sense.

The two levers that cut bills 50–90%

Prompt caching

If every request re-sends the same system prompt or document context, cached-input pricing applies to the repeated part. The discounts are not subtle: Claude's cache reads are 90% off ($0.30/M vs $3.00/M on Sonnet), Gemini's are 90% off, DeepSeek's are 90% off. A RAG app with a 5K-token system prompt called 100K times a month saves real money here.

Batch processing

OpenAI's Batch API runs async jobs at 50% off. If your workload isn't interactive — nightly summarization, bulk classification, embedding pipelines — you should never pay the realtime rate.

Quick picks

High-volume, simple tasks: Gemini 2.5 Flash or DeepSeek V3
Best quality-per-dollar all-rounder: Claude Sonnet 4.6 or GPT-4.1
Long documents (200K context): Claude Sonnet/Haiku
Lowest latency (voice, realtime): Groq
Open-source, no lock-in: Together AI or Groq
EU data residency: Mistral

What would your workload cost?

Describe your app and volume — the Planner estimates monthly token costs across providers and picks the right tier.

Estimate my API costs →

Related: H100 rental prices compared — for when you'd rather self-host the model than pay per token.

← All posts

Share: X HN