Cloud GPU pricing is volatile and varies significantly by provider, region, commitment level, and availability. The table below provides approximate on-demand hourly rates as of early 2026. Prices should be treated as rough guidelines; always check current pricing before making decisions.
GPU Comparison
| GPU | AWS | GCP | Azure | Lambda | RunPod |
|---|---|---|---|---|---|
| A100 80GB (1x) | $4.10/hr (p4d) | $3.67/hr (a2) | $3.67/hr (NC A100) | $1.29/hr | $1.64/hr |
| H100 80GB (1x) | $8.50/hr (p5) | $8.34/hr (a3-mega) | $8.20/hr (NC H100) | $2.49/hr | $3.29/hr |
| H200 141GB (1x) | ~$10.50/hr (p5e) | ~$10.00/hr (a3-ultra) | ~$10.00/hr | $3.49/hr | $4.49/hr |
| L40S 48GB (1x) | $2.80/hr (g6e) | $2.50/hr (g2) | $2.40/hr | $0.99/hr | $0.74/hr |
| 8x H100 cluster | $65.00/hr (p5.48xlarge) | $64.00/hr (a3-mega) | $63.00/hr | $19.92/hr | $26.32/hr |
Pricing Caveats
These figures are approximate on-demand rates. Reserved instances (1-3 year commitments) can reduce costs by 30-60%. Spot/preemptible instances offer 60-80% savings but can be interrupted. GPU-specialized providers like Lambda and RunPod typically offer lower per-GPU rates but fewer managed services. Prices change frequently, so verify before budgeting.
Cost Reduction Strategies
- Spot instances with checkpointing: For training jobs that can tolerate interruptions, spot pricing offers dramatic savings. Implement checkpoint saving every 15-30 minutes to minimize lost work.
- Right-size your GPU: An L40S is often sufficient for inference of quantized models up to 30B parameters. Reserve H100/H200 for training or large model serving.
- Time-of-day arbitrage: Some cloud regions have lower demand overnight, potentially improving spot availability.
- Multi-cloud strategy: Use the cheapest available GPU across providers for batch workloads. Tools like SkyPilot automate this.