Multi-GPU Tensor Parallelism active. Efficiency depends on NVLink vs
PCIe interconnect quality.
Roofline model: decode is bandwidth-bound (low arithmetic intensity),
prefill is compute-bound. MoE models use active-parameter count.
Actual results depend on engine, batch strategy, and kernel
optimization.
Power & cost estimation
Power draw
—
W
Cost / hour
—
$
Cost / month
—
$
Cost / 1M tokens
—
$
Energy / hour
—
kWh
Cost / day
—
$
CO2 / hour
—
kg
Annual CO2
—
tonnes
Power draw = TDP × utilization%. Energy cost varies
significantly by region ($0.05–$0.40/kWh). Carbon intensity:
world avg ~0.417 kg CO2/kWh, EU ~0.255, US ~0.387, France ~0.056
(nuclear), Poland ~0.769 (coal).