Use Case Comparison
| Use Case | Budget Pick | Performance Pick | Monthly Cost Estimate |
|---|---|---|---|
| Serve 8B model (quantized) | RTX 4060 Ti 16GB | L40S | $50-$500 |
| Serve 70B model (quantized) | 1x A100 80GB | 1x H200 | $1,000-$7,000 |
| QLoRA fine-tune 8B | RTX 4070 12GB | A100 40GB | $5-$50 per run |
| QLoRA fine-tune 70B | 1x A100 80GB | 1x H100 | $50-$300 per run |
| Pretrain 1B model | 8x A100 80GB | 8x H100 | $5,000-$20,000 |
| Pretrain 7B model | 32x H100 | 64x H100 | $100,000-$500,000 |
Consumer GPUs Are Viable
For inference of quantized models up to ~14B parameters, and for QLoRA fine-tuning of 7B-8B models, consumer GPUs (RTX 4070/4080/4090) are a cost-effective choice. The RTX 4090 with 24 GB of VRAM remains one of the best price-performance options for individual researchers and small teams.