Tool DiscoveryTool Discovery
AI Hardware11 min read

NVIDIA H100 GPU: Full Specs, Price, and Cloud Rates for 2026

A
By Amara
|Updated 24 March 2026
NVIDIA H100 SXM5 GPU module photographed at an angle on a dark carbon fiber surface, showing the gold and copper heat spreader frame, rows of black HBM3 memory stacks, and a glowing blue GH100 chip die at the center, with a blurred server rack visible in the background

Key Numbers

989 TFLOPS
FP16 performance, H100 SXM5 (no sparsity)
NVIDIA Hopper whitepaper, 2023
$25K–$40K
Estimated H100 SXM5 price range per unit (Q1 2026)
Market data, Q1 2026
$2.29/hr
Median cloud rental rate across 49 H100 configurations
Fluence Network analysis, 2026
80B
Transistors in the GH100 chip on TSMC 4nm
NVIDIA, 2023
700W
Thermal design power for H100 SXM5
NVIDIA H100 datasheet

Key Takeaways

  • 1The NVIDIA H100 is a Hopper-architecture GPU (TSMC 4nm, 80 billion transistors) released in March 2023. The SXM5 variant delivers 989 TFLOPS in FP16, 3,350 GB/s of HBM3 memory bandwidth, and 700W TDP across 80GB of on-chip memory.
  • 2New H100 SXM5 units cost $25,000 to $40,000 per GPU as of Q1 2026. Cloud rental runs $1.38 to $11.06 per GPU-hour depending on provider and configuration, with a market median of approximately $2.29 per hour (Fluence Network, 2026). An 8-GPU DGX H100 server system costs $400,000 or more.
  • 3The H100's Transformer Engine, which automatically switches between FP8 and FP16 precision per operation during training, is why the chip outperforms the A100 by 3 to 5x on transformer models specifically. Every major language model trained from 2023 to 2025, including GPT-5, Llama 4, and Claude 4, relied on H100 infrastructure.

The NVIDIA H100 is a data center GPU built on the Hopper architecture, released in March 2023. The GH100 chip contains 80 billion transistors on TSMC's 4nm process. The SXM5 variant delivers 989 TFLOPS in FP16 without sparsity, 3,350 GB/s of HBM3 memory bandwidth, and 700W thermal design power across 80GB of on-chip memory.

The H100 arrived at an unusual moment in computing history. GPT-4 launched in February 2023. The H100 began shipping the following month. Demand from cloud providers, AI labs, and enterprises immediately exceeded supply, and H100 prices reached $40,000 per unit on secondary markets in late 2023 before supply constraints eased. By Q1 2026, prices have stabilized in the $25,000 to $40,000 range depending on variant.

What separates the H100 from the A100 is not simply faster processing. The defining addition is the Transformer Engine: dedicated hardware that automatically selects between FP8 and FP16 precision per operation during training. For transformer models specifically, this delivers 3 to 5x better throughput than the A100 at comparable utilization.

This article covers complete H100 specifications across all three variants, current pricing across buy and rental markets, a direct performance comparison against the A100, H200, and B200, and the export control context that shaped which version of the chip reached which markets.

What Is the NVIDIA H100 GPU?

The NVIDIA H100 Tensor Core GPU is the seventh generation of NVIDIA's data center accelerator line, following the A100 (Ampere, 2020). Its core chip, the GH100, contains 80 billion transistors on TSMC's 4nm process, compared to 54.2 billion transistors on the A100's GA100 at 7nm. The H100 was designed from the ground up for transformer neural networks, the architecture behind every major language model deployed since 2020.

The chip ships in three variants, each suited to different deployment scenarios:

VariantMemoryMemory TypeMemory BWTDPNVLinkBest For
H100 SXM580GBHBM33,350 GB/s700W900 GB/sLarge training clusters
H100 PCIe80GBHBM2e2,000 GB/s350WNoneStandard servers
H100 NVL94GBHBM33,900 GB/s400W600 GB/sInference deployments

The SXM5 is what nearly everyone means when they say "H100." It slots into NVIDIA DGX H100 and HGX H100 systems, where 8 SXM5 GPUs connect at 900 GB/s bidirectional via NVLink 4.0 through an NVSwitch fabric. This is the variant used in the large training clusters at Microsoft Azure, Meta, and CoreWeave.

The H100 PCIe fits standard server PCIe slots and draws only 350W. It uses HBM2e rather than HBM3, delivering 2,000 GB/s versus 3,350 GB/s on the SXM5. For buyers who do not need multi-GPU NVLink clusters, it is the lower-cost option, though the bandwidth gap makes it substantially slower per dollar for memory-bound training workloads.

According to NVIDIA's Hopper architecture technical overview, the H100 was designed around the observation that transformer models had become the central workload in AI, and that existing architectures were not optimized for the specific computation patterns transformers require.

NVIDIA H100 Full Technical Specifications

The following table covers all H100 variants against the A100 SXM4 80GB for direct comparison. All TFLOPS figures are without sparsity unless noted.

SpecificationH100 SXM5H100 PCIeH100 NVLA100 SXM4 80GB
ArchitectureHopper (GH100)Hopper (GH100)Hopper (GH100)Ampere (GA100)
Process nodeTSMC 4nmTSMC 4nmTSMC 4nmTSMC 7nm
Transistors80 billion80 billion80 billion54.2 billion
CUDA cores16,89614,59216,8966,912
Tensor cores528 (4th gen)456 (4th gen)528 (4th gen)432 (3rd gen)
Memory80GB HBM380GB HBM2e94GB HBM380GB HBM2e
Memory bandwidth3,350 GB/s2,000 GB/s3,900 GB/s2,039 GB/s
FP16 Tensor Core989 TFLOPS756 TFLOPS989 TFLOPS312 TFLOPS
FP8 Tensor Core1,979 TFLOPS1,513 TFLOPS1,979 TFLOPSN/A
FP16 (with sparsity)1,979 TFLOPS1,513 TFLOPS1,979 TFLOPS624 TFLOPS
FP8 (with sparsity)3,958 TFLOPS3,026 TFLOPS3,958 TFLOPSN/A
TF32 Tensor Core494 TFLOPS378 TFLOPS494 TFLOPS156 TFLOPS
FP6434 TFLOPS26 TFLOPS34 TFLOPS9.7 TFLOPS
TDP700W350W400W400W
NVLink bandwidth900 GB/sNone600 GB/s600 GB/s
PCIe generationPCIe 5.0PCIe 5.0PCIe 5.0PCIe 4.0
MIG instances7777
Transformer EngineYesYesYesNo

On sparsity: the figures above without sparsity are the ones relevant to most production workloads. NVIDIA's sparsity acceleration doubles throughput only when at least 50% of weight values are near-zero. Most production transformer models do not meet that threshold, so non-sparsity numbers are what you will achieve in practice.

The H100 PCIe's bandwidth gap versus the SXM5 is significant. At 2,000 GB/s versus 3,350 GB/s, the PCIe variant is memory-bandwidth-constrained on long-context inference and multi-batch training jobs that saturate memory. The compute gap (756 TFLOPS vs 989 TFLOPS FP16) is secondary to the bandwidth difference for these workloads.

NVIDIA H100 Price: Buy vs Rent in 2026

H100 pricing has moved in three phases since launch. In 2023, supply constraints pushed SXM5 units to $35,000 to $45,000 per unit on secondary markets. Through 2024, supply normalized as TSMC expanded CoWoS packaging capacity, and prices fell toward the $27,000 to $40,000 range. By Q1 2026, with the Blackwell B200 beginning to ship in volume, H100 prices have settled further as buyers negotiate the transition to the next generation.

Purchase Prices (Q1 2026)

VariantNew Unit PriceUsed Unit Price
H100 SXM5 80GB$25,000–$40,000$15,000–$22,000
H100 PCIe 80GB$25,000–$30,000$12,000–$18,000
H100 NVL 94GB$24,500+$14,000–$20,000
DGX H100 (8x SXM5)$400,000+$250,000–$320,000

Cloud Rental Rates (Q1 2026)

Per-GPU rental prices for H100 vary significantly by provider, contract length, and region:

ProviderH100 RateNotes
Lambda Labs~$2.49/hrOn-demand, 80GB SXM
RunPod~$2.39/hrOn-demand, spot cheaper
CoreWeave$2.00–$3.50/hrReserved discounts available
AWS (p4de)~$4.50–$9.00/hrOn-demand, spot much cheaper
Market median$2.29/hrAcross 49 tracked configurations (Fluence Network, 2026)
Market range$1.38–$11.06/hrLow = spot; high = on-demand premium providers

Cloud spot pricing runs materially lower than on-demand, often $1.00 to $1.50/hr for H100 on spot markets with interruption risk. For interruptible batch training jobs, spot H100s are the standard choice at cost-sensitive AI labs.

Training a 70-billion-parameter model from scratch requires approximately 100,000 to 200,000 GPU-hours on H100 hardware. At $2.29/hr, that is $229,000 to $458,000 in compute cost for a single pre-training run, before storage, networking, or the cost of failed runs.

H100 vs A100 vs H200 vs B200: Full Comparison

Four generations of NVIDIA data center GPU span the AI buildout from 2020 to 2026. Here is how they stack up on the metrics that matter for AI workloads, using the 80GB SXM variants where available.

GPUArchitectureFP16 TFLOPSMemoryBW (GB/s)TDPEst. Price Q1 2026
A100 SXM4Ampere (7nm)31280GB HBM2e2,039400W$8,000–$15,000
H100 SXM5Hopper (4nm)98980GB HBM33,350700W$25,000–$40,000
H200 SXM5Hopper+ (4nm)989141GB HBM3e4,800700W$35,000–$45,000
B200 SXM6Blackwell (4nm)~2,250192GB HBM3e8,000+~1,000W$30,000–$40,000

The H200 is not a new architecture. It uses the same GH100 chip as the H100 SXM5 but with a larger HBM3e memory stack (141GB versus 80GB) and significantly higher memory bandwidth (4,800 GB/s versus 3,350 GB/s). For inference on large models that fit in memory, the H200 is 20 to 40% faster than the H100 purely because of the bandwidth increase, not because of more compute. For training, where memory bandwidth is also a limiting factor, the gain is similar.

The B200 (Blackwell) is a genuine architectural leap. It delivers roughly 2.3x the FP16 compute of the H100 SXM5 and uses a 2-chip package design where two Blackwell dies share memory, reaching 192GB HBM3e per module. For more on the full range of AI accelerator types, including how Google TPUs and AMD MI300X compare to NVIDIA's lineup, see our guide to AI accelerator cards.

The Number Most Guides Don't Show

Meta disclosed in 2023 that its initial H100 training cluster contained 24,000 H100 SXM5 GPUs. At a market price of $35,000 per unit (mid-range for Q1 2024 pricing), the GPU hardware cost alone came to $840 million. The facility to house and cool that cluster draws approximately 16.8 megawatts at full load (24,000 x 700W). At $10 to $12 million per MW of data center construction cost, the facility represents another $168 to $202 million in infrastructure. Total investment for that single training cluster: over $1 billion before networking, storage, or software.

That figure excludes the 49,152-GPU cluster Meta later announced for Llama training. GPU hardware alone for that cluster, at $35,000 per unit, totals $1.72 billion.

The Transformer Engine: Why It Changed AI Training

The Transformer Engine is dedicated hardware inside every H100 that the A100 does not have. It solves a specific problem: during transformer model training, different layers of the network have different numerical ranges. Some tolerate low-precision arithmetic (FP8) without losing accuracy. Others require higher precision (FP16) to converge properly.

Before the H100, training runs had to pick one precision and apply it uniformly. Using FP16 throughout was safe but slow. Using lower precision could cause numerical instability. The Transformer Engine monitors activation and weight magnitudes layer by layer and automatically switches between FP8 and FP16 per operation mid-training, maintaining the accuracy of FP16 training while capturing most of the speed advantage of FP8.

The practical result: for large transformer models, the H100 with Transformer Engine delivers 3 to 5x higher effective training throughput than the A100 per GPU. This is not just a function of the H100 having more CUDA cores. It is the Transformer Engine specifically that makes the gap so wide for language model workloads. For non-transformer workloads (scientific computing, rendering, general HPC), the gap between H100 and A100 is closer to 3x, reflecting the raw chip improvement without the Transformer Engine multiplier.

FP8 precision was not available on the A100 at all. The A100's minimum training precision was BF16 or FP16. The H100 adding FP8 as a stable training format was a meaningful architectural change, not just an incremental speed improvement.

"The H100 is the engine of the AI industrial revolution. It's not just faster than the A100. It's a fundamentally different chip for transformer workloads." (Jensen Huang, NVIDIA CEO, GTC 2022 keynote)

For context on how H100 clusters fit into the hyperscale data centers running large-scale AI training, see our overview of what hyperscalers are and how they operate.

H100 Export Controls, the H800, and What Comes Next

In October 2022, the US Bureau of Industry and Security (BIS) imposed export controls on NVIDIA's A100 and H100 GPUs, restricting sales to China and certain other countries. The controls targeted chips exceeding specific performance thresholds for AI training compute and chip-to-chip interconnect bandwidth.

NVIDIA responded by developing the H800, a China-specific variant of the H100. The H800 reduced NVLink bandwidth from 900 GB/s to 400 GB/s, falling below the regulatory threshold for interconnect performance. The H800 was sold in China through 2023. In November 2023, BIS updated the export control rules to restrict the H800 as well, closing the interconnect bandwidth loophole. After that point, NVIDIA had no H100-equivalent product available for the Chinese market.

For the NVIDIA A100 China situation, see our full NVIDIA A100 specs and pricing article, which covers how the A800 variant was similarly restricted.

China's AI hardware market has since fragmented. Huawei's Ascend 910C is the closest domestic alternative to the H100, with estimated performance at roughly 60 to 80% of the H100 for standard training workloads, though software tooling and ecosystem maturity remain significantly behind NVIDIA's CUDA platform.

NVIDIA's H100 successor, the H200, began shipping in Q4 2024 with the same GH100 chip but 141GB HBM3e memory. The B200 (Blackwell), shipping from mid-2025, represents the first full architectural change since the H100 launched. The next generation, Rubin, is planned for 2026 to 2027.

For organizations acquiring H100 hardware in 2026, the calculus is straightforward: H100s are now available without the supply constraints of 2023 to 2024, at lower prices, and with a large installed base of benchmarks, tuning guides, and cloud providers. The B200 offers more raw throughput but at higher cost and with less mature deployment tooling. For established training pipelines that run on H100, the migration cost to B200 is non-trivial.

Frequently Asked Questions

What is the NVIDIA H100 GPU used for?

The H100 is used primarily for training and running large AI models. Its main applications are: pre-training large language models (the H100 was the dominant chip for this from 2023 to 2025), fine-tuning smaller models for specific tasks, AI inference at commercial scale, and high-performance computing workloads that benefit from FP64 precision. Every major language model deployed from 2023 onward was trained on H100 clusters, including Meta's Llama family, Anthropic's Claude models, and OpenAI's GPT models after GPT-4.

How much does the NVIDIA H100 GPU cost?

As of Q1 2026, new H100 SXM5 units cost approximately $25,000 to $40,000 per GPU depending on vendor and configuration. The H100 PCIe variant runs $25,000 to $30,000 new. Used units are available from $12,000 to $22,000. An 8-GPU DGX H100 server system costs $400,000 or more. Cloud rental rates run $1.38 to $11.06 per GPU-hour, with a market median of $2.29/hr across tracked configurations (Fluence Network, 2026). Spot pricing is significantly cheaper, often $1.00 to $1.50/hr with interruption risk.

What is the difference between H100 SXM5 and H100 PCIe?

The H100 SXM5 uses HBM3 memory at 3,350 GB/s bandwidth and connects to 7 other GPUs via NVLink 4.0 at 900 GB/s in a DGX H100 system. It draws 700W. The H100 PCIe uses HBM2e at 2,000 GB/s bandwidth, has no NVLink connectivity, and draws only 350W. The SXM5 is faster by roughly 30% on compute (989 vs 756 TFLOPS FP16) and 67% faster on memory bandwidth. The PCIe is the option for standard server deployments where NVLink multi-GPU clusters are not needed or where power constraints apply.

How much faster is the H100 than the A100?

For transformer model training using FP16 precision, the H100 SXM5 is approximately 3.2x faster than the A100 SXM4 80GB (989 TFLOPS vs 312 TFLOPS). For FP8 training, which is only available on H100 and not on A100, the effective gap reaches 5 to 6x for eligible workloads. Memory bandwidth improved from 2,039 GB/s (A100) to 3,350 GB/s (H100 SXM5), a 1.6x improvement. The Transformer Engine on H100 adds an additional multiplier specifically for transformer architecture workloads that compounds the raw spec difference.

What is NVLink on the H100?

NVLink 4.0 on the H100 SXM5 provides 900 GB/s bidirectional bandwidth between GPUs in the same server. In a DGX H100 system with 8 SXM5 GPUs, NVLink connects all 8 GPUs via an NVSwitch fabric, allowing any GPU to read from or write to any other GPU at full NVLink speed. This is critical for training large models that cannot fit in a single GPU's memory: gradient synchronization, tensor parallelism, and pipeline parallelism all require fast GPU-to-GPU communication. The H100 PCIe variant has no NVLink and relies on PCIe 5.0 bandwidth instead, which is significantly slower for multi-GPU communication.

Is the H100 available in China?

No. The standard H100 SXM5 and H100 PCIe are restricted from sale to China under US Bureau of Industry and Security (BIS) export controls imposed in October 2022. NVIDIA developed the H800 as a China-specific variant with reduced NVLink bandwidth (400 GB/s instead of 900 GB/s) to comply with the original rules. The H800 was also restricted by updated rules in November 2023. As of 2026, NVIDIA has no H100-equivalent product approved for the Chinese market. Huawei's Ascend 910C is the primary domestic alternative in China.

What GPU comes after the H100?

Two chips succeed the H100. The H200 uses the same GH100 chip as the H100 SXM5 but upgrades memory to 141GB HBM3e with 4,800 GB/s bandwidth, making it significantly faster for inference on large models. The B200 (Blackwell architecture) is a full generational leap, delivering roughly 2.3x the FP16 compute of the H100 SXM5 with 192GB HBM3e. Beyond Blackwell, NVIDIA's Rubin architecture is planned for 2026 to 2027.

Related Articles