Appendices
Appendix G: GPU Hardware and Cloud Compute

Quick Reference: Common Configurations

Use Case Comparison
Use Case Budget Pick Performance Pick Monthly Cost Estimate
Serve 8B model (quantized) RTX 4060 Ti 16GB L40S $50-$500
Serve 70B model (quantized) 1x A100 80GB 1x H200 $1,000-$7,000
QLoRA fine-tune 8B RTX 4070 12GB A100 40GB $5-$50 per run
QLoRA fine-tune 70B 1x A100 80GB 1x H100 $50-$300 per run
Pretrain 1B model 8x A100 80GB 8x H100 $5,000-$20,000
Pretrain 7B model 32x H100 64x H100 $100,000-$500,000
Consumer GPUs Are Viable

For inference of quantized models up to ~14B parameters, and for QLoRA fine-tuning of 7B-8B models, consumer GPUs (RTX 4070/4080/4090) are a cost-effective choice. The RTX 4090 with 24 GB of VRAM remains one of the best price-performance options for individual researchers and small teams.