AI Hardware11 min read

NVIDIA H100 GPU: Full Specs, Price, and Cloud Rates for 2026

Q: What is the NVIDIA H100 GPU used for?

The H100 is used for AI model training and inference. It was the dominant GPU for pre-training large language models from 2023 to 2025. Meta's Llama, Anthropic's Claude, and OpenAI's GPT models after GPT-4 all trained on H100 clusters.

Q: How much does the NVIDIA H100 GPU cost?

New H100 SXM5 units cost $25,000–40,000 per GPU as of Q1 2026. PCIe variant: $25,000–30,000. Used: $12,000–22,000. DGX H100 system (8 GPUs): $400,000+. Cloud rental median: $2.29/hr (Fluence Network, 2026).

Q: What is the difference between H100 SXM5 and H100 PCIe?

H100 SXM5: 80GB HBM3, 3,350 GB/s, 700W, NVLink 900 GB/s, 989 TFLOPS FP16. H100 PCIe: 80GB HBM2e, 2,000 GB/s, 350W, no NVLink, 756 TFLOPS FP16. SXM5 is faster; PCIe fits standard servers without NVLink.

Q: How much faster is the H100 than the A100?

H100 SXM5 is 3.2x faster than A100 SXM4 for FP16 training (989 vs 312 TFLOPS). For FP8 workloads (H100 only), the gap reaches 5–6x. Memory bandwidth improved 1.6x (3,350 GB/s vs 2,039 GB/s).

Q: What is NVLink on the H100?

NVLink 4.0 on H100 SXM5 provides 900 GB/s bidirectional GPU-to-GPU bandwidth, enabling fast gradient sync and tensor parallelism across 8 GPUs in a DGX H100 via NVSwitch fabric. PCIe H100 has no NVLink.

Q: Is the H100 available in China?

No. The H100 is restricted from sale to China under US export controls since October 2022. NVIDIA's H800 China variant was also restricted in November 2023. As of 2026, no H100-equivalent NVIDIA product is available for the Chinese market.

Q: What GPU comes after the H100?

The H200 (same GH100 chip, 141GB HBM3e, 4,800 GB/s bandwidth) and B200 (Blackwell, 192GB HBM3e, ~2.3x H100 FP16 compute) are the direct successors. Rubin architecture follows in 2026–2027.

By Amara|Updated 7 May 2026

NVIDIA H100 SXM5 GPU module photographed at an angle on a dark carbon fiber surface, showing the gold and copper heat spreader frame, rows of black HBM3 memory stacks, and a glowing blue GH100 chip die at the center, with a blurred server rack visible in the background

Key Numbers

989 TFLOPS

FP16 performance, H100 SXM5 (no sparsity)

NVIDIA Hopper whitepaper, 2023

$25K–$40K

Estimated H100 SXM5 price range per unit (Q1 2026)

Market data, Q1 2026

$2.29/hr

Median cloud rental rate across 49 H100 configurations

Fluence Network analysis, 2026

80B

Transistors in the GH100 chip on TSMC 4nm

NVIDIA, 2023

700W

Thermal design power for H100 SXM5

NVIDIA H100 datasheet

Key Takeaways

1The NVIDIA H100 is a Hopper-architecture GPU (TSMC 4nm, 80 billion transistors) released in March 2023. The SXM5 variant delivers 989 TFLOPS in FP16, 3,350 GB/s of HBM3 memory bandwidth, and 700W TDP across 80GB of on-chip memory.
2New H100 SXM5 units cost $25,000 to $40,000 per GPU as of Q1 2026. Cloud rental runs $1.38 to $11.06 per GPU-hour depending on provider and configuration, with a market median of approximately $2.29 per hour (Fluence Network, 2026). An 8-GPU DGX H100 server system costs $400,000 or more.
3The H100's Transformer Engine, which automatically switches between FP8 and FP16 precision per operation during training, is why the chip outperforms the A100 by 3 to 5x on transformer models specifically. Every major language model trained from 2023 to 2025, including GPT-5, Llama 4, and Claude 4, relied on H100 infrastructure.

The NVIDIA H100 is a data center GPU built on the Hopper architecture, released in March 2023. The GH100 chip contains 80 billion transistors on TSMC's 4nm process. The SXM5 variant delivers 989 TFLOPS in FP16 without sparsity, 3,350 GB/s of HBM3 memory bandwidth, and 700W thermal design power across 80GB of on-chip memory.

The H100 arrived at an unusual moment in computing history. GPT-4 launched in February 2023. The H100 began shipping the following month. Demand from cloud providers, AI labs, and enterprises immediately exceeded supply, and H100 prices reached $40,000 per unit on secondary markets in late 2023 before supply constraints eased. By Q1 2026, prices have stabilized in the $25,000 to $40,000 range depending on variant.

What separates the H100 from the A100 is not simply faster processing. The defining addition is the Transformer Engine: dedicated hardware that automatically selects between FP8 and FP16 precision per operation during training. For transformer models specifically, this delivers 3 to 5x better throughput than the A100 at comparable utilization.

This article covers complete H100 specifications across all three variants, current pricing across buy and rental markets, a direct performance comparison against the A100, H200, and B200, and the export control context that shaped which version of the chip reached which markets.

What Is the NVIDIA H100 GPU?

The NVIDIA H100 Tensor Core GPU is the seventh generation of NVIDIA's data center accelerator line, following the A100 (Ampere, 2020). Its core chip, the GH100, contains 80 billion transistors on TSMC's 4nm process, compared to 54.2 billion transistors on the A100's GA100 at 7nm. The H100 was designed from the ground up for transformer neural networks, the architecture behind every major language model deployed since 2020.

The chip ships in three variants, each suited to different deployment scenarios:

Variant	Memory	Memory Type	Memory BW	TDP	NVLink	Best For
H100 SXM5	80GB	HBM3	3,350 GB/s	700W	900 GB/s	Large training clusters
H100 PCIe	80GB	HBM2e	2,000 GB/s	350W	None	Standard servers
H100 NVL	94GB	HBM3	3,900 GB/s	400W	600 GB/s	Inference deployments

The SXM5 is what nearly everyone means when they say "H100." It slots into NVIDIA DGX H100 and HGX H100 systems, where 8 SXM5 GPUs connect at 900 GB/s bidirectional via NVLink 4.0 through an NVSwitch fabric. This is the variant used in the large training clusters at Microsoft Azure, Meta, and CoreWeave.

The H100 PCIe fits standard server PCIe slots and draws only 350W. It uses HBM2e rather than HBM3, delivering 2,000 GB/s versus 3,350 GB/s on the SXM5. For buyers who do not need multi-GPU NVLink clusters, it is the lower-cost option, though the bandwidth gap makes it substantially slower per dollar for memory-bound training workloads.

According to NVIDIA's Hopper architecture technical overview, the H100 was designed around the observation that transformer models had become the central workload in AI, and that existing architectures were not optimized for the specific computation patterns transformers require.

NVIDIA H100 Full Technical Specifications

The following table covers all H100 variants against the A100 SXM4 80GB for direct comparison. All TFLOPS figures are without sparsity unless noted.

Specification	H100 SXM5	H100 PCIe	H100 NVL	A100 SXM4 80GB
Architecture	Hopper (GH100)	Hopper (GH100)	Hopper (GH100)	Ampere (GA100)
Process node	TSMC 4nm	TSMC 4nm	TSMC 4nm	TSMC 7nm
Transistors	80 billion	80 billion	80 billion	54.2 billion
CUDA cores	16,896	14,592	16,896	6,912
Tensor cores	528 (4th gen)	456 (4th gen)	528 (4th gen)	432 (3rd gen)
Memory	80GB HBM3	80GB HBM2e	94GB HBM3	80GB HBM2e
Memory bandwidth	3,350 GB/s	2,000 GB/s	3,900 GB/s	2,039 GB/s
FP16 Tensor Core	989 TFLOPS	756 TFLOPS	989 TFLOPS	312 TFLOPS
FP8 Tensor Core	1,979 TFLOPS	1,513 TFLOPS	1,979 TFLOPS	N/A
FP16 (with sparsity)	1,979 TFLOPS	1,513 TFLOPS	1,979 TFLOPS	624 TFLOPS
FP8 (with sparsity)	3,958 TFLOPS	3,026 TFLOPS	3,958 TFLOPS	N/A
TF32 Tensor Core	494 TFLOPS	378 TFLOPS	494 TFLOPS	156 TFLOPS
FP64	34 TFLOPS	26 TFLOPS	34 TFLOPS	9.7 TFLOPS
TDP	700W	350W	400W	400W
NVLink bandwidth	900 GB/s	None	600 GB/s	600 GB/s
PCIe generation	PCIe 5.0	PCIe 5.0	PCIe 5.0	PCIe 4.0
MIG instances	7	7	7	7
Transformer Engine	Yes	Yes	Yes	No

On sparsity: the figures above without sparsity are the ones relevant to most production workloads. NVIDIA's sparsity acceleration doubles throughput only when at least 50% of weight values are near-zero. Most production transformer models do not meet that threshold, so non-sparsity numbers are what you will achieve in practice.

The H100 PCIe's bandwidth gap versus the SXM5 is significant. At 2,000 GB/s versus 3,350 GB/s, the PCIe variant is memory-bandwidth-constrained on long-context inference and multi-batch training jobs that saturate memory. The compute gap (756 TFLOPS vs 989 TFLOPS FP16) is secondary to the bandwidth difference for these workloads.

NVIDIA H100 Price: Buy vs Rent in 2026

H100 pricing has moved in three phases since launch. In 2023, supply constraints pushed SXM5 units to $35,000 to $45,000 per unit on secondary markets. Through 2024, supply normalized as TSMC expanded CoWoS packaging capacity, and prices fell toward the $27,000 to $40,000 range. By Q1 2026, with the Blackwell B200 beginning to ship in volume, H100 prices have settled further as buyers negotiate the transition to the next generation.

Purchase Prices (Q1 2026)

Variant	New Unit Price	Used Unit Price
H100 SXM5 80GB	$25,000–$40,000	$15,000–$22,000
H100 PCIe 80GB	$25,000–$30,000	$12,000–$18,000
H100 NVL 94GB	$24,500+	$14,000–$20,000
DGX H100 (8x SXM5)	$400,000+	$250,000–$320,000

Cloud Rental Rates (Q1 2026)

Per-GPU rental prices for H100 vary significantly by provider, contract length, and region:

Provider	H100 Rate	Notes
Lambda Labs	~$2.49/hr	On-demand, 80GB SXM
RunPod	~$2.39/hr	On-demand, spot cheaper
CoreWeave	$2.00–$3.50/hr	Reserved discounts available
AWS (p4de)	~$4.50–$9.00/hr	On-demand, spot much cheaper
Market median	$2.29/hr	Across 49 tracked configurations (Fluence Network, 2026)
Market range	$1.38–$11.06/hr	Low = spot; high = on-demand premium providers

Cloud spot pricing runs materially lower than on-demand, often $1.00 to $1.50/hr for H100 on spot markets with interruption risk. For interruptible batch training jobs, spot H100s are the standard choice at cost-sensitive AI labs.

Training a 70-billion-parameter model from scratch requires approximately 100,000 to 200,000 GPU-hours on H100 hardware. At $2.29/hr, that is $229,000 to $458,000 in compute cost for a single pre-training run, before storage, networking, or the cost of failed runs.

H100 vs A100 vs H200 vs B200: Full Comparison

Four generations of NVIDIA data center GPU span the AI buildout from 2020 to 2026. Here is how they stack up on the metrics that matter for AI workloads, using the 80GB SXM variants where available.

GPU	Architecture	FP16 TFLOPS	Memory	BW (GB/s)	TDP	Est. Price Q1 2026
A100 SXM4	Ampere (7nm)	312	80GB HBM2e	2,039	400W	$8,000–$15,000
H100 SXM5	Hopper (4nm)	989	80GB HBM3	3,350	700W	$25,000–$40,000
H200 SXM5	Hopper+ (4nm)	989	141GB HBM3e	4,800	700W	$35,000–$45,000
B200 SXM6	Blackwell (4nm)	~2,250	192GB HBM3e	8,000+	~1,000W	$30,000–$40,000

The H200 is not a new architecture. It uses the same GH100 chip as the H100 SXM5 but with a larger HBM3e memory stack (141GB versus 80GB) and significantly higher memory bandwidth (4,800 GB/s versus 3,350 GB/s). For inference on large models that fit in memory, the H200 is 20 to 40% faster than the H100 purely because of the bandwidth increase, not because of more compute. For training, where memory bandwidth is also a limiting factor, the gain is similar.

The B200 (Blackwell) is a genuine architectural leap. It delivers roughly 2.3x the FP16 compute of the H100 SXM5 and uses a 2-chip package design where two Blackwell dies share memory, reaching 192GB HBM3e per module. For more on the full range of AI accelerator types, including how Google TPUs and AMD MI300X compare to NVIDIA's lineup, see our guide to AI accelerator cards.

The Number Most Guides Don't Show

Meta disclosed in 2023 that its initial H100 training cluster contained 24,000 H100 SXM5 GPUs. At a market price of $35,000 per unit (mid-range for Q1 2024 pricing), the GPU hardware cost alone came to $840 million. The facility to house and cool that cluster draws approximately 16.8 megawatts at full load (24,000 x 700W). At $10 to $12 million per MW of data center construction cost, the facility represents another $168 to $202 million in infrastructure. Total investment for that single training cluster: over $1 billion before networking, storage, or software.

That figure excludes the 49,152-GPU cluster Meta later announced for Llama training. GPU hardware alone for that cluster, at $35,000 per unit, totals $1.72 billion.

The Transformer Engine: Why It Changed AI Training

The Transformer Engine is dedicated hardware inside every H100 that the A100 does not have. It solves a specific problem: during transformer model training, different layers of the network have different numerical ranges. Some tolerate low-precision arithmetic (FP8) without losing accuracy. Others require higher precision (FP16) to converge properly.

Before the H100, training runs had to pick one precision and apply it uniformly. Using FP16 throughout was safe but slow. Using lower precision could cause numerical instability. The Transformer Engine monitors activation and weight magnitudes layer by layer and automatically switches between FP8 and FP16 per operation mid-training, maintaining the accuracy of FP16 training while capturing most of the speed advantage of FP8.

The practical result: for large transformer models, the H100 with Transformer Engine delivers 3 to 5x higher effective training throughput than the A100 per GPU. This is not just a function of the H100 having more CUDA cores. It is the Transformer Engine specifically that makes the gap so wide for language model workloads. For non-transformer workloads (scientific computing, rendering, general HPC), the gap between H100 and A100 is closer to 3x, reflecting the raw chip improvement without the Transformer Engine multiplier.

FP8 precision was not available on the A100 at all. The A100's minimum training precision was BF16 or FP16. The H100 adding FP8 as a stable training format was a meaningful architectural change, not just an incremental speed improvement.

"The H100 is the engine of the AI industrial revolution. It's not just faster than the A100. It's a fundamentally different chip for transformer workloads." (Jensen Huang, NVIDIA CEO, GTC 2022 keynote)

For context on how H100 clusters fit into the hyperscale data centers running large-scale AI training, see our overview of what hyperscalers are and how they operate.

H100 Export Controls, the H800, and What Comes Next

In October 2022, the US Bureau of Industry and Security (BIS) imposed export controls on NVIDIA's A100 and H100 GPUs, restricting sales to China and certain other countries. The controls targeted chips exceeding specific performance thresholds for AI training compute and chip-to-chip interconnect bandwidth.

NVIDIA responded by developing the H800, a China-specific variant of the H100. The H800 reduced NVLink bandwidth from 900 GB/s to 400 GB/s, falling below the regulatory threshold for interconnect performance. The H800 was sold in China through 2023. In November 2023, BIS updated the export control rules to restrict the H800 as well, closing the interconnect bandwidth loophole. After that point, NVIDIA had no H100-equivalent product available for the Chinese market.

For the NVIDIA A100 China situation, see our full NVIDIA A100 specs and pricing article, which covers how the A800 variant was similarly restricted.

China's AI hardware market has since fragmented. Huawei's Ascend 910C is the closest domestic alternative to the H100, with estimated performance at roughly 60 to 80% of the H100 for standard training workloads, though software tooling and ecosystem maturity remain significantly behind NVIDIA's CUDA platform.

NVIDIA's H100 successor, the H200, began shipping in Q4 2024 with the same GH100 chip but 141GB HBM3e memory. The B200 (Blackwell), shipping from mid-2025, represents the first full architectural change since the H100 launched. The next generation, Rubin, is planned for 2026 to 2027.

For organizations acquiring H100 hardware in 2026, the calculus is straightforward: H100s are now available without the supply constraints of 2023 to 2024, at lower prices, and with a large installed base of benchmarks, tuning guides, and cloud providers. The B200 offers more raw throughput but at higher cost and with less mature deployment tooling. For established training pipelines that run on H100, the migration cost to B200 is non-trivial.

Frequently Asked Questions

What is the NVIDIA H100 GPU used for?

The H100 is used primarily for training and running large AI models. Its main applications are: pre-training large language models (the H100 was the dominant chip for this from 2023 to 2025), fine-tuning smaller models for specific tasks, AI inference at commercial scale, and high-performance computing workloads that benefit from FP64 precision. Every major language model deployed from 2023 onward was trained on H100 clusters, including Meta's Llama family, Anthropic's Claude models, and OpenAI's GPT models after GPT-4.

How much does the NVIDIA H100 GPU cost?

As of Q1 2026, new H100 SXM5 units cost approximately $25,000 to $40,000 per GPU depending on vendor and configuration. The H100 PCIe variant runs $25,000 to $30,000 new. Used units are available from $12,000 to $22,000. An 8-GPU DGX H100 server system costs $400,000 or more. Cloud rental rates run $1.38 to $11.06 per GPU-hour, with a market median of $2.29/hr across tracked configurations (Fluence Network, 2026). Spot pricing is significantly cheaper, often $1.00 to $1.50/hr with interruption risk.

What is the difference between H100 SXM5 and H100 PCIe?

The H100 SXM5 uses HBM3 memory at 3,350 GB/s bandwidth and connects to 7 other GPUs via NVLink 4.0 at 900 GB/s in a DGX H100 system. It draws 700W. The H100 PCIe uses HBM2e at 2,000 GB/s bandwidth, has no NVLink connectivity, and draws only 350W. The SXM5 is faster by roughly 30% on compute (989 vs 756 TFLOPS FP16) and 67% faster on memory bandwidth. The PCIe is the option for standard server deployments where NVLink multi-GPU clusters are not needed or where power constraints apply.

How much faster is the H100 than the A100?

For transformer model training using FP16 precision, the H100 SXM5 is approximately 3.2x faster than the A100 SXM4 80GB (989 TFLOPS vs 312 TFLOPS). For FP8 training, which is only available on H100 and not on A100, the effective gap reaches 5 to 6x for eligible workloads. Memory bandwidth improved from 2,039 GB/s (A100) to 3,350 GB/s (H100 SXM5), a 1.6x improvement. The Transformer Engine on H100 adds an additional multiplier specifically for transformer architecture workloads that compounds the raw spec difference.

What is NVLink on the H100?

NVLink 4.0 on the H100 SXM5 provides 900 GB/s bidirectional bandwidth between GPUs in the same server. In a DGX H100 system with 8 SXM5 GPUs, NVLink connects all 8 GPUs via an NVSwitch fabric, allowing any GPU to read from or write to any other GPU at full NVLink speed. This is critical for training large models that cannot fit in a single GPU's memory: gradient synchronization, tensor parallelism, and pipeline parallelism all require fast GPU-to-GPU communication. The H100 PCIe variant has no NVLink and relies on PCIe 5.0 bandwidth instead, which is significantly slower for multi-GPU communication.

Is the H100 available in China?

No. The standard H100 SXM5 and H100 PCIe are restricted from sale to China under US Bureau of Industry and Security (BIS) export controls imposed in October 2022. NVIDIA developed the H800 as a China-specific variant with reduced NVLink bandwidth (400 GB/s instead of 900 GB/s) to comply with the original rules. The H800 was also restricted by updated rules in November 2023. As of 2026, NVIDIA has no H100-equivalent product approved for the Chinese market. Huawei's Ascend 910C is the primary domestic alternative in China.

What GPU comes after the H100?

Two chips succeed the H100. The H200 uses the same GH100 chip as the H100 SXM5 but upgrades memory to 141GB HBM3e with 4,800 GB/s bandwidth, making it significantly faster for inference on large models. The B200 (Blackwell architecture) is a full generational leap, delivering roughly 2.3x the FP16 compute of the H100 SXM5 with 192GB HBM3e. Beyond Blackwell, NVIDIA's Rubin architecture is planned for 2026 to 2027.

AI Hardware

NVIDIA A100 GPU: Specs, Price, and Performance in 2026

12 min read

AI Hardware

What Is an AI Accelerator Card? Types, Specs, and Costs for 2026

10 min read

AI Data Centers

What Is a Hyperscaler? Hyperscale Data Centers Explained

9 min read

Back to AI Infrastructure