Fine-tuning has a reputation problem: people assume it costs what pre-training costs. It doesn't. A LoRA fine-tune of a 7B model is a single-digit-dollars job if you rent the right GPU, and the right GPU is almost never the expensive one.
What a fine-tune actually needs
With LoRA and QLoRA you're not updating the whole model, just small adapter layers. That collapses the VRAM requirement:
- 7B model, QLoRA: fits on a single 24GB card
- 13B model, QLoRA: comfortable on a 48GB card
- 70B model, QLoRA: a single 80GB card, or 2x48GB
Typical training time for a 10-50K example dataset: 2 to 6 hours. So the cost formula is simply hourly rate x hours, and here are the hourly rates we track.
The price board
| Card | VRAM | Cheapest tracked | Also available |
|---|---|---|---|
| RTX 3090 | 24GB | Vast.ai - $0.30/hr | - |
| RTX 4090 | 24GB | Vast.ai - $0.55/hr | RunPod $0.74/hr |
| L40S | 48GB | RunPod - $0.79/hr | Vast.ai $1.20/hr |
| A100 | 80GB | Paperspace - $1.15/hr | RunPod $1.64/hr, Lambda $1.99/hr |
Per-job math
- 7B on a Vast RTX 3090: 4 hours x $0.30 = $1.20. Yes, really.
- 7B on a RunPod 4090 (faster, more reliable queue): 3 hours x $0.74 = $2.22
- 13B on a RunPod L40S: 5 hours x $0.79 = $3.95
- 70B QLoRA on a Paperspace A100 80GB: 8 hours x $1.15 = $9.20
Add a dollar or two for storage and failed runs and the honest range is $5-15 per serious fine-tuning job, including the experiments that don't work. Budget for 3-5 runs; nobody's first hyperparameters are right.
Marketplace vs dedicated: the reliability trade
Vast.ai is a marketplace of other people's hardware - the cheapest rates anywhere, but machines can be interrupted and quality varies. Fine for experiments with checkpointing. RunPod's Secure Cloud and Lambda's dedicated instances cost more per hour and behave predictably - the right call for a job you need finished by Friday.
When you shouldn't fine-tune at all
The most expensive fine-tune is the unnecessary one. Skip training entirely when:
- You want the model to know your data. That's retrieval (RAG), not fine-tuning. Stuff the relevant documents into the prompt of a long-context model from the LLM API marketplace. Zero training, better factual accuracy.
- You want a specific tone or format. Try a detailed system prompt with examples first. It solves this more often than people expect, for free.
- You have fewer than ~1,000 quality examples. Below that, prompting a frontier model usually beats a fine-tuned small model.
Fine-tune when you need consistent behavior at scale on a cheap model: classification with your labels, your company's exact output format, a narrow skill repeated millions of times.
Prices as tracked on June 8, 2026. Spot deals move fast - the GPU marketplace board is re-checked twice a week.
Tell the AI stack planner what you're training and it will pick the card, the provider, and estimate the job cost for your dataset size.