NVIDIA H100 GPU: Full Specs, Price, and Cloud Rates for 2026

Key Numbers
Key Takeaways
- 1The NVIDIA H100 is a Hopper-architecture GPU (TSMC 4nm, 80 billion transistors) released in March 2023. The SXM5 variant delivers 989 TFLOPS in FP16, 3,350 GB/s of HBM3 memory bandwidth, and 700W TDP across 80GB of on-chip memory.
- 2New H100 SXM5 units cost $25,000 to $40,000 per GPU as of Q1 2026. Cloud rental runs $1.38 to $11.06 per GPU-hour depending on provider and configuration, with a market median of approximately $2.29 per hour (Fluence Network, 2026). An 8-GPU DGX H100 server system costs $400,000 or more.
- 3The H100's Transformer Engine, which automatically switches between FP8 and FP16 precision per operation during training, is why the chip outperforms the A100 by 3 to 5x on transformer models specifically. Every major language model trained from 2023 to 2025, including GPT-5, Llama 4, and Claude 4, relied on H100 infrastructure.
The NVIDIA H100 is a data center GPU built on the Hopper architecture, released in March 2023. The GH100 chip contains 80 billion transistors on TSMC's 4nm process. The SXM5 variant delivers 989 TFLOPS in FP16 without sparsity, 3,350 GB/s of HBM3 memory bandwidth, and 700W thermal design power across 80GB of on-chip memory.
The H100 arrived at an unusual moment in computing history. GPT-4 launched in February 2023. The H100 began shipping the following month. Demand from cloud providers, AI labs, and enterprises immediately exceeded supply, and H100 prices reached $40,000 per unit on secondary markets in late 2023 before supply constraints eased. By Q1 2026, prices have stabilized in the $25,000 to $40,000 range depending on variant.
What separates the H100 from the A100 is not simply faster processing. The defining addition is the Transformer Engine: dedicated hardware that automatically selects between FP8 and FP16 precision per operation during training. For transformer models specifically, this delivers 3 to 5x better throughput than the A100 at comparable utilization.
This article covers complete H100 specifications across all three variants, current pricing across buy and rental markets, a direct performance comparison against the A100, H200, and B200, and the export control context that shaped which version of the chip reached which markets.
In This Article
What Is the NVIDIA H100 GPU?
The NVIDIA H100 Tensor Core GPU is the seventh generation of NVIDIA's data center accelerator line, following the A100 (Ampere, 2020). Its core chip, the GH100, contains 80 billion transistors on TSMC's 4nm process, compared to 54.2 billion transistors on the A100's GA100 at 7nm. The H100 was designed from the ground up for transformer neural networks, the architecture behind every major language model deployed since 2020.
The chip ships in three variants, each suited to different deployment scenarios:
| Variant | Memory | Memory Type | Memory BW | TDP | NVLink | Best For |
|---|---|---|---|---|---|---|
| H100 SXM5 | 80GB | HBM3 | 3,350 GB/s | 700W | 900 GB/s | Large training clusters |
| H100 PCIe | 80GB | HBM2e | 2,000 GB/s | 350W | None | Standard servers |
| H100 NVL | 94GB | HBM3 | 3,900 GB/s | 400W | 600 GB/s | Inference deployments |
The SXM5 is what nearly everyone means when they say "H100." It slots into NVIDIA DGX H100 and HGX H100 systems, where 8 SXM5 GPUs connect at 900 GB/s bidirectional via NVLink 4.0 through an NVSwitch fabric. This is the variant used in the large training clusters at Microsoft Azure, Meta, and CoreWeave.
The H100 PCIe fits standard server PCIe slots and draws only 350W. It uses HBM2e rather than HBM3, delivering 2,000 GB/s versus 3,350 GB/s on the SXM5. For buyers who do not need multi-GPU NVLink clusters, it is the lower-cost option, though the bandwidth gap makes it substantially slower per dollar for memory-bound training workloads.
According to NVIDIA's Hopper architecture technical overview, the H100 was designed around the observation that transformer models had become the central workload in AI, and that existing architectures were not optimized for the specific computation patterns transformers require.
NVIDIA H100 Full Technical Specifications
The following table covers all H100 variants against the A100 SXM4 80GB for direct comparison. All TFLOPS figures are without sparsity unless noted.
| Specification | H100 SXM5 | H100 PCIe | H100 NVL | A100 SXM4 80GB |
|---|---|---|---|---|
| Architecture | Hopper (GH100) | Hopper (GH100) | Hopper (GH100) | Ampere (GA100) |
| Process node | TSMC 4nm | TSMC 4nm | TSMC 4nm | TSMC 7nm |
| Transistors | 80 billion | 80 billion | 80 billion | 54.2 billion |
| CUDA cores | 16,896 | 14,592 | 16,896 | 6,912 |
| Tensor cores | 528 (4th gen) | 456 (4th gen) | 528 (4th gen) | 432 (3rd gen) |
| Memory | 80GB HBM3 | 80GB HBM2e | 94GB HBM3 | 80GB HBM2e |
| Memory bandwidth | 3,350 GB/s | 2,000 GB/s | 3,900 GB/s | 2,039 GB/s |
| FP16 Tensor Core | 989 TFLOPS | 756 TFLOPS | 989 TFLOPS | 312 TFLOPS |
| FP8 Tensor Core | 1,979 TFLOPS | 1,513 TFLOPS | 1,979 TFLOPS | N/A |
| FP16 (with sparsity) | 1,979 TFLOPS | 1,513 TFLOPS | 1,979 TFLOPS | 624 TFLOPS |
| FP8 (with sparsity) | 3,958 TFLOPS | 3,026 TFLOPS | 3,958 TFLOPS | N/A |
| TF32 Tensor Core | 494 TFLOPS | 378 TFLOPS | 494 TFLOPS | 156 TFLOPS |
| FP64 | 34 TFLOPS | 26 TFLOPS | 34 TFLOPS | 9.7 TFLOPS |
| TDP | 700W | 350W | 400W | 400W |
| NVLink bandwidth | 900 GB/s | None | 600 GB/s | 600 GB/s |
| PCIe generation | PCIe 5.0 | PCIe 5.0 | PCIe 5.0 | PCIe 4.0 |
| MIG instances | 7 | 7 | 7 | 7 |
| Transformer Engine | Yes | Yes | Yes | No |
On sparsity: the figures above without sparsity are the ones relevant to most production workloads. NVIDIA's sparsity acceleration doubles throughput only when at least 50% of weight values are near-zero. Most production transformer models do not meet that threshold, so non-sparsity numbers are what you will achieve in practice.
The H100 PCIe's bandwidth gap versus the SXM5 is significant. At 2,000 GB/s versus 3,350 GB/s, the PCIe variant is memory-bandwidth-constrained on long-context inference and multi-batch training jobs that saturate memory. The compute gap (756 TFLOPS vs 989 TFLOPS FP16) is secondary to the bandwidth difference for these workloads.
NVIDIA H100 Price: Buy vs Rent in 2026
H100 pricing has moved in three phases since launch. In 2023, supply constraints pushed SXM5 units to $35,000 to $45,000 per unit on secondary markets. Through 2024, supply normalized as TSMC expanded CoWoS packaging capacity, and prices fell toward the $27,000 to $40,000 range. By Q1 2026, with the Blackwell B200 beginning to ship in volume, H100 prices have settled further as buyers negotiate the transition to the next generation.
Purchase Prices (Q1 2026)
| Variant | New Unit Price | Used Unit Price |
|---|---|---|
| H100 SXM5 80GB | $25,000–$40,000 | $15,000–$22,000 |
| H100 PCIe 80GB | $25,000–$30,000 | $12,000–$18,000 |
| H100 NVL 94GB | $24,500+ | $14,000–$20,000 |
| DGX H100 (8x SXM5) | $400,000+ | $250,000–$320,000 |
Cloud Rental Rates (Q1 2026)
Per-GPU rental prices for H100 vary significantly by provider, contract length, and region:
| Provider | H100 Rate | Notes |
|---|---|---|
| Lambda Labs | ~$2.49/hr | On-demand, 80GB SXM |
| RunPod | ~$2.39/hr | On-demand, spot cheaper |
| CoreWeave | $2.00–$3.50/hr | Reserved discounts available |
| AWS (p4de) | ~$4.50–$9.00/hr | On-demand, spot much cheaper |
| Market median | $2.29/hr | Across 49 tracked configurations (Fluence Network, 2026) |
| Market range | $1.38–$11.06/hr | Low = spot; high = on-demand premium providers |
Cloud spot pricing runs materially lower than on-demand, often $1.00 to $1.50/hr for H100 on spot markets with interruption risk. For interruptible batch training jobs, spot H100s are the standard choice at cost-sensitive AI labs.
Training a 70-billion-parameter model from scratch requires approximately 100,000 to 200,000 GPU-hours on H100 hardware. At $2.29/hr, that is $229,000 to $458,000 in compute cost for a single pre-training run, before storage, networking, or the cost of failed runs.
H100 vs A100 vs H200 vs B200: Full Comparison
Four generations of NVIDIA data center GPU span the AI buildout from 2020 to 2026. Here is how they stack up on the metrics that matter for AI workloads, using the 80GB SXM variants where available.
| GPU | Architecture | FP16 TFLOPS | Memory | BW (GB/s) | TDP | Est. Price Q1 2026 |
|---|---|---|---|---|---|---|
| A100 SXM4 | Ampere (7nm) | 312 | 80GB HBM2e | 2,039 | 400W | $8,000–$15,000 |
| H100 SXM5 | Hopper (4nm) | 989 | 80GB HBM3 | 3,350 | 700W | $25,000–$40,000 |
| H200 SXM5 | Hopper+ (4nm) | 989 | 141GB HBM3e | 4,800 | 700W | $35,000–$45,000 |
| B200 SXM6 | Blackwell (4nm) | ~2,250 | 192GB HBM3e | 8,000+ | ~1,000W | $30,000–$40,000 |
The H200 is not a new architecture. It uses the same GH100 chip as the H100 SXM5 but with a larger HBM3e memory stack (141GB versus 80GB) and significantly higher memory bandwidth (4,800 GB/s versus 3,350 GB/s). For inference on large models that fit in memory, the H200 is 20 to 40% faster than the H100 purely because of the bandwidth increase, not because of more compute. For training, where memory bandwidth is also a limiting factor, the gain is similar.
The B200 (Blackwell) is a genuine architectural leap. It delivers roughly 2.3x the FP16 compute of the H100 SXM5 and uses a 2-chip package design where two Blackwell dies share memory, reaching 192GB HBM3e per module. For more on the full range of AI accelerator types, including how Google TPUs and AMD MI300X compare to NVIDIA's lineup, see our guide to AI accelerator cards.
The Number Most Guides Don't Show
Meta disclosed in 2023 that its initial H100 training cluster contained 24,000 H100 SXM5 GPUs. At a market price of $35,000 per unit (mid-range for Q1 2024 pricing), the GPU hardware cost alone came to $840 million. The facility to house and cool that cluster draws approximately 16.8 megawatts at full load (24,000 x 700W). At $10 to $12 million per MW of data center construction cost, the facility represents another $168 to $202 million in infrastructure. Total investment for that single training cluster: over $1 billion before networking, storage, or software.
That figure excludes the 49,152-GPU cluster Meta later announced for Llama training. GPU hardware alone for that cluster, at $35,000 per unit, totals $1.72 billion.
The Transformer Engine: Why It Changed AI Training
The Transformer Engine is dedicated hardware inside every H100 that the A100 does not have. It solves a specific problem: during transformer model training, different layers of the network have different numerical ranges. Some tolerate low-precision arithmetic (FP8) without losing accuracy. Others require higher precision (FP16) to converge properly.
Before the H100, training runs had to pick one precision and apply it uniformly. Using FP16 throughout was safe but slow. Using lower precision could cause numerical instability. The Transformer Engine monitors activation and weight magnitudes layer by layer and automatically switches between FP8 and FP16 per operation mid-training, maintaining the accuracy of FP16 training while capturing most of the speed advantage of FP8.
The practical result: for large transformer models, the H100 with Transformer Engine delivers 3 to 5x higher effective training throughput than the A100 per GPU. This is not just a function of the H100 having more CUDA cores. It is the Transformer Engine specifically that makes the gap so wide for language model workloads. For non-transformer workloads (scientific computing, rendering, general HPC), the gap between H100 and A100 is closer to 3x, reflecting the raw chip improvement without the Transformer Engine multiplier.
FP8 precision was not available on the A100 at all. The A100's minimum training precision was BF16 or FP16. The H100 adding FP8 as a stable training format was a meaningful architectural change, not just an incremental speed improvement.
"The H100 is the engine of the AI industrial revolution. It's not just faster than the A100. It's a fundamentally different chip for transformer workloads." (Jensen Huang, NVIDIA CEO, GTC 2022 keynote)
For context on how H100 clusters fit into the hyperscale data centers running large-scale AI training, see our overview of what hyperscalers are and how they operate.
H100 Export Controls, the H800, and What Comes Next
In October 2022, the US Bureau of Industry and Security (BIS) imposed export controls on NVIDIA's A100 and H100 GPUs, restricting sales to China and certain other countries. The controls targeted chips exceeding specific performance thresholds for AI training compute and chip-to-chip interconnect bandwidth.
NVIDIA responded by developing the H800, a China-specific variant of the H100. The H800 reduced NVLink bandwidth from 900 GB/s to 400 GB/s, falling below the regulatory threshold for interconnect performance. The H800 was sold in China through 2023. In November 2023, BIS updated the export control rules to restrict the H800 as well, closing the interconnect bandwidth loophole. After that point, NVIDIA had no H100-equivalent product available for the Chinese market.
For the NVIDIA A100 China situation, see our full NVIDIA A100 specs and pricing article, which covers how the A800 variant was similarly restricted.
China's AI hardware market has since fragmented. Huawei's Ascend 910C is the closest domestic alternative to the H100, with estimated performance at roughly 60 to 80% of the H100 for standard training workloads, though software tooling and ecosystem maturity remain significantly behind NVIDIA's CUDA platform.
NVIDIA's H100 successor, the H200, began shipping in Q4 2024 with the same GH100 chip but 141GB HBM3e memory. The B200 (Blackwell), shipping from mid-2025, represents the first full architectural change since the H100 launched. The next generation, Rubin, is planned for 2026 to 2027.
For organizations acquiring H100 hardware in 2026, the calculus is straightforward: H100s are now available without the supply constraints of 2023 to 2024, at lower prices, and with a large installed base of benchmarks, tuning guides, and cloud providers. The B200 offers more raw throughput but at higher cost and with less mature deployment tooling. For established training pipelines that run on H100, the migration cost to B200 is non-trivial.
Frequently Asked Questions
What is the NVIDIA H100 GPU used for?
The H100 is used primarily for training and running large AI models. Its main applications are: pre-training large language models (the H100 was the dominant chip for this from 2023 to 2025), fine-tuning smaller models for specific tasks, AI inference at commercial scale, and high-performance computing workloads that benefit from FP64 precision. Every major language model deployed from 2023 onward was trained on H100 clusters, including Meta's Llama family, Anthropic's Claude models, and OpenAI's GPT models after GPT-4.
How much does the NVIDIA H100 GPU cost?
As of Q1 2026, new H100 SXM5 units cost approximately $25,000 to $40,000 per GPU depending on vendor and configuration. The H100 PCIe variant runs $25,000 to $30,000 new. Used units are available from $12,000 to $22,000. An 8-GPU DGX H100 server system costs $400,000 or more. Cloud rental rates run $1.38 to $11.06 per GPU-hour, with a market median of $2.29/hr across tracked configurations (Fluence Network, 2026). Spot pricing is significantly cheaper, often $1.00 to $1.50/hr with interruption risk.
What is the difference between H100 SXM5 and H100 PCIe?
The H100 SXM5 uses HBM3 memory at 3,350 GB/s bandwidth and connects to 7 other GPUs via NVLink 4.0 at 900 GB/s in a DGX H100 system. It draws 700W. The H100 PCIe uses HBM2e at 2,000 GB/s bandwidth, has no NVLink connectivity, and draws only 350W. The SXM5 is faster by roughly 30% on compute (989 vs 756 TFLOPS FP16) and 67% faster on memory bandwidth. The PCIe is the option for standard server deployments where NVLink multi-GPU clusters are not needed or where power constraints apply.
How much faster is the H100 than the A100?
For transformer model training using FP16 precision, the H100 SXM5 is approximately 3.2x faster than the A100 SXM4 80GB (989 TFLOPS vs 312 TFLOPS). For FP8 training, which is only available on H100 and not on A100, the effective gap reaches 5 to 6x for eligible workloads. Memory bandwidth improved from 2,039 GB/s (A100) to 3,350 GB/s (H100 SXM5), a 1.6x improvement. The Transformer Engine on H100 adds an additional multiplier specifically for transformer architecture workloads that compounds the raw spec difference.
What is NVLink on the H100?
NVLink 4.0 on the H100 SXM5 provides 900 GB/s bidirectional bandwidth between GPUs in the same server. In a DGX H100 system with 8 SXM5 GPUs, NVLink connects all 8 GPUs via an NVSwitch fabric, allowing any GPU to read from or write to any other GPU at full NVLink speed. This is critical for training large models that cannot fit in a single GPU's memory: gradient synchronization, tensor parallelism, and pipeline parallelism all require fast GPU-to-GPU communication. The H100 PCIe variant has no NVLink and relies on PCIe 5.0 bandwidth instead, which is significantly slower for multi-GPU communication.
Is the H100 available in China?
No. The standard H100 SXM5 and H100 PCIe are restricted from sale to China under US Bureau of Industry and Security (BIS) export controls imposed in October 2022. NVIDIA developed the H800 as a China-specific variant with reduced NVLink bandwidth (400 GB/s instead of 900 GB/s) to comply with the original rules. The H800 was also restricted by updated rules in November 2023. As of 2026, NVIDIA has no H100-equivalent product approved for the Chinese market. Huawei's Ascend 910C is the primary domestic alternative in China.
What GPU comes after the H100?
Two chips succeed the H100. The H200 uses the same GH100 chip as the H100 SXM5 but upgrades memory to 141GB HBM3e with 4,800 GB/s bandwidth, making it significantly faster for inference on large models. The B200 (Blackwell architecture) is a full generational leap, delivering roughly 2.3x the FP16 compute of the H100 SXM5 with 192GB HBM3e. Beyond Blackwell, NVIDIA's Rubin architecture is planned for 2026 to 2027.