GPU Hardware Reference
The following are the technical specifications for the GPUs used in the Local LLM Resource Estimator. This information is loaded dynamically from our central JSON data store, allowing transparent and scalable architecture updates.
NVIDIA H200 SXM
VRAM: 141 GB HBM3e
Bandwidth: 4800 GB/s
Compute: 1979 TFLOPS (FP16/BF16)
TDP: 700 W
Bus Interface: 6144-bit
GPU Clock: 1590 MHz
Memory Clock: 1500 MHz (6.0 Gbps)
Cores/TMUs/ROPs: 16896 / 528 / 112
PCIe Gen: Gen 5 x16
NVLink: 900 GB/s
NVIDIA H100 SXM
VRAM: 80 GB HBM3
Bandwidth: 3350 GB/s
Compute: 1979 TFLOPS (FP16/BF16)
TDP: 700 W
Bus Interface: 5120-bit
GPU Clock: 1590 MHz
Memory Clock: 1313 MHz (5.25 Gbps)
Cores/TMUs/ROPs: 16896 / 528 / 112
PCIe Gen: Gen 5 x16
NVLink: 900 GB/s
NVIDIA H100 PCIe
VRAM: 80 GB HBM2e
Bandwidth: 2000 GB/s
Compute: 756 TFLOPS (FP16/BF16)
TDP: 350 W
Bus Interface: 5120-bit
GPU Clock: 1095 MHz
Memory Clock: 1000 MHz (4.0 Gbps)
Cores/TMUs/ROPs: 14592 / 456 / 112
PCIe Gen: Gen 5 x16
NVLink: 0 GB/s
NVIDIA A100 80GB
VRAM: 80 GB HBM2e
Bandwidth: 2000 GB/s
Compute: 312 TFLOPS (FP16/BF16)
TDP: 300 W
Bus Interface: 5120-bit
GPU Clock: 1065 MHz
Memory Clock: 1593 MHz (3.2 Gbps)
Cores/TMUs/ROPs: 6912 / 432 / 160
PCIe Gen: Gen 4 x16
NVLink: 600 GB/s
NVIDIA A100 40GB
VRAM: 40 GB HBM2
Bandwidth: 1555 GB/s
Compute: 312 TFLOPS (FP16/BF16)
TDP: 250 W
Bus Interface: 5120-bit
GPU Clock: 1065 MHz
Memory Clock: 1215 MHz (2.4 Gbps)
Cores/TMUs/ROPs: 6912 / 432 / 160
PCIe Gen: Gen 4 x16
NVLink: 600 GB/s
NVIDIA A6000 Ada
VRAM: 48 GB GDDR6
Bandwidth: 960 GB/s
Compute: 182 TFLOPS (FP16/BF16)
TDP: 300 W
Bus Interface: 384-bit
GPU Clock: 2235 MHz
Memory Clock: 2500 MHz (20.0 Gbps)
Cores/TMUs/ROPs: 18176 / 568 / 192
PCIe Gen: Gen 4 x16
NVLink: 0 GB/s
NVIDIA RTX 4090
VRAM: 24 GB GDDR6X
Bandwidth: 1008 GB/s
Compute: 165 TFLOPS (FP16/BF16)
TDP: 450 W
Bus Interface: 384-bit
GPU Clock: 2235 MHz
Memory Clock: 1313 MHz (21.0 Gbps)
Cores/TMUs/ROPs: 16384 / 512 / 176
PCIe Gen: Gen 4 x16
NVLink: 0 GB/s
NVIDIA RTX 3090
VRAM: 24 GB GDDR6X
Bandwidth: 936 GB/s
Compute: 71 TFLOPS (FP16/BF16)
TDP: 350 W
Bus Interface: 384-bit
GPU Clock: 1395 MHz
Memory Clock: 1219 MHz (19.5 Gbps)
Cores/TMUs/ROPs: 10496 / 328 / 112
PCIe Gen: Gen 4 x16
NVLink: 0 GB/s
NVIDIA L40S
VRAM: 48 GB GDDR6
Bandwidth: 864 GB/s
Compute: 366 TFLOPS (FP16/BF16)
TDP: 350 W
Bus Interface: 384-bit
GPU Clock: 1110 MHz
Memory Clock: 2250 MHz (18.0 Gbps)
Cores/TMUs/ROPs: 18176 / 568 / 192
PCIe Gen: Gen 4 x16
NVLink: 0 GB/s
AMD MI300X
VRAM: 192 GB HBM3
Bandwidth: 5300 GB/s
Compute: 1307 TFLOPS (FP16/BF16)
TDP: 750 W
Bus Interface: 8192-bit
GPU Clock: 2100 MHz
Memory Clock: 1300 MHz (5.2 Gbps)
Cores/TMUs/ROPs: 19456 / 1216 / 512
PCIe Gen: Gen 5 x16
NVLink: 400 GB/s
AMD MI250X
VRAM: 128 GB HBM2e
Bandwidth: 3276 GB/s
Compute: 383 TFLOPS (FP16/BF16)
TDP: 560 W
Bus Interface: 8192-bit
GPU Clock: 1700 MHz
Memory Clock: 1600 MHz (3.2 Gbps)
Cores/TMUs/ROPs: 14080 / 880 / 256
PCIe Gen: Gen 4 x16
NVLink: 400 GB/s
System RAM Reference
The following are the technical specifications for the System RAM configurations used for calculating the performance impact of offloading layers to CPU memory via the PCIe Bus Wall.
DDR5 8-channel
Bandwidth: 358.4 GB/s
Transfer Speed: 4800-5600 MT/s
Typical Use Case:
Enterprise Servers (e.g., AMD EPYC Genoa)
DDR5 4-channel
Bandwidth: 179.2 GB/s
Transfer Speed: 4800-5600 MT/s
Typical Use Case:
High-End Workstations (e.g., Threadripper Pro)
DDR5 2-channel
Bandwidth: 89.6 GB/s
Transfer Speed: 4800-6000 MT/s
Typical Use Case:
Mainstream Desktops
DDR4 8-channel
Bandwidth: 204.8 GB/s
Transfer Speed: 3200 MT/s
Typical Use Case:
Older Enterprise Servers (e.g., AMD EPYC Rome/Milan)
DDR4 4-channel
Bandwidth: 102.4 GB/s
Transfer Speed: 3200 MT/s
Typical Use Case:
Older High-End Workstations
DDR4 2-channel
Bandwidth: 51.2 GB/s
Transfer Speed: 3200 MT/s
Typical Use Case:
Older Mainstream Desktops