Appendices
Appendix G: GPU Hardware and Cloud Compute

GPU Comparison: The Accelerator Landscape

Choosing the right GPU is one of the most consequential decisions in any LLM project. The table below summarizes the key specifications of the most widely used accelerators for LLM workloads as of early 2026. Specifications are for the data center (SXM or OAM) variants unless otherwise noted.

GPU Comparison
GPU Vendor HBM Memory BW BF16 TFLOPS FP8 TFLOPS Interconnect TDP
A100 SXM NVIDIA 80 GB HBM2e 2.0 TB/s 312 N/A NVLink 3 (600 GB/s) 400W
H100 SXM NVIDIA 80 GB HBM3 3.35 TB/s 990 1,979 NVLink 4 (900 GB/s) 700W
H200 SXM NVIDIA 141 GB HBM3e 4.8 TB/s 990 1,979 NVLink 4 (900 GB/s) 700W
B100 NVIDIA 192 GB HBM3e 8.0 TB/s 1,750 3,500 NVLink 5 (1,800 GB/s) 700W
B200 NVIDIA 192 GB HBM3e 8.0 TB/s 2,250 4,500 NVLink 5 (1,800 GB/s) 1,000W
MI300X AMD 192 GB HBM3 5.3 TB/s 1,307 2,615 Infinity Fabric (896 GB/s) 750W
L40S NVIDIA 48 GB GDDR6 864 GB/s 362 733 PCIe 4.0 (64 GB/s) 350W
Reading the Table

HBM = High Bandwidth Memory, the total VRAM available for model weights, activations, and KV cache. Memory BW = bandwidth between the memory and compute cores; this is often the bottleneck during inference. BF16/FP8 TFLOPS = theoretical peak throughput in tera floating-point operations per second at each precision. TDP = thermal design power, relevant for cooling and electricity costs.

Key Takeaways from the Spec Sheet