Section G.5: Quick Reference: Common Configurations

Use Case Comparison

Use Case	Budget Pick	Performance Pick	Monthly Cost Estimate
Serve 8B model (quantized)	RTX 4060 Ti 16GB	L40S	$50-$500
Serve 70B model (quantized)	1x A100 80GB	1x H200	$1,000-$7,000
QLoRA fine-tune 8B	RTX 4070 12GB	A100 40GB	$5-$50 per run
QLoRA fine-tune 70B	1x A100 80GB	1x H100	$50-$300 per run
Pretrain 1B model	8x A100 80GB	8x H100	$5,000-$20,000
Pretrain 7B model	32x H100	64x H100	$100,000-$500,000

Consumer GPUs Are Viable

For inference of quantized models up to ~14B parameters, and for QLoRA fine-tuning of 7B-8B models, consumer GPUs (RTX 4070/4080/4090) are a cost-effective choice. The RTX 4090 with 24 GB of VRAM remains one of the best price-performance options for individual researchers and small teams.