Tool DiscoveryTool Discovery
AI Hardware11 min read

NVIDIA H200 GPU: Full Specs, Price, and Cloud Rates for 2026

AmaraBy Amara|Updated 21 June 2026
Four NVIDIA H200 Tensor Core GPU modules with gold heatsinks in a server stack

Key Numbers

141GB
HBM3e memory capacity, nearly double the H100's 80GB
NVIDIA, 2024
4.8 TB/s
Memory bandwidth, a 1.4x increase over the H100 SXM5
NVIDIA H200 datasheet
$28K-$45K
Per-GPU price range for PCIe and SXM5 variants
Mercatus / Fluence Network, 2026
$3.82/hr
Market median cloud rental price across 42 tracked H200 configurations
2026 market snapshot
+15%
AWS price increase on 8-GPU H200 instances, January 2026
Introl

Key Takeaways

  • 1The NVIDIA H200 uses the same GH100 die and CUDA core count as the H100, but jumps memory from 80GB to 141GB and bandwidth from 3,350 GB/s to 4.8 TB/s, a 1.4x increase. It launched November 18, 2024.
  • 2An H200 GPU costs $28,000 to $45,000 depending on the PCIe or SXM5 variant in 2026, with cloud rental ranging $2.43 to $13.78 per GPU-hour and a market median near $3.82/hr. An 8-GPU DGX H200 system runs $400,000 to $500,000.
  • 3The extra memory matters for AI because it removes the out-of-memory bottleneck on large language model training and long-context inference. AWS, Google Cloud, Azure, and Oracle had all launched H200 cloud instances by early 2026.

The NVIDIA H200 is a Hopper architecture GPU that keeps the same GH100 compute die as the H100 but swaps its 80GB of HBM3 memory for 141GB of HBM3e running at 4.8 TB/s, a 1.4x jump in bandwidth. It launched November 18, 2024, according to TechPowerUp's GPU database, as a drop-in upgrade for existing H100-based HGX and DGX systems rather than a new architecture.

What's notable is what stayed the same. The H200 has the same 16,896 CUDA cores, the same peak FP8 and FP16 tensor throughput, and the same 700W thermal envelope as the H100 SXM5. According to NVIDIA's official H200 product page, the performance story is memory capacity and bandwidth, not added compute. That's also why AWS could raise H200 instance pricing 15% in January 2026: supply for the high-memory variant stayed tight even as NVIDIA's next-generation B200 began shipping.

This article covers the full H200 spec sheet across PCIe and SXM5 variants, current purchase and cloud rental pricing, a direct comparison against the H100, A100, and B200, and who actually needs the extra memory versus who is better off on cheaper H100 capacity.

What is the NVIDIA H200 GPU?

The NVIDIA H200 is a data center GPU built on the same Hopper GH100 die as the NVIDIA H100, with one change that matters: it replaces 80GB of HBM3 memory with 141GB of HBM3e running at 4.8 TB/s, up from the H100 SXM5's 3,350 GB/s. NVIDIA designed it for workloads that run out of memory before they run out of compute, which describes most large language model training and inference today. For a broader primer on this category of hardware, see our explainer on what an AI accelerator is.

It ships in two form factors, matching the H100 lineup:

VariantMemoryBandwidthTDPUse case
H200 SXM5141GB HBM3e4.8 TB/s700WDGX/HGX systems, multi-GPU training clusters
H200 NVL141GB HBM3e4.8 TB/sup to 600WPCIe servers, air-cooled deployments

Both variants use the same GH100 silicon as the H100, with 80 billion transistors fabricated on TSMC's 4nm process. NVIDIA did not redesign the compute pipeline. It widened the memory pipe.

"The NVIDIA H200 Tensor Core GPU is the first GPU to offer HBM3e memory, delivering the largest capacity and fastest memory in the world." (NVIDIA, H200 product page, 2024)

NVIDIA H200 full specifications

The H200's spec sheet looks almost identical to the H100's outside of the memory subsystem. Here's the full comparison against the previous two generations:

SpecA100 80GBH100 SXM5H200 SXM5
ArchitectureAmpere (7nm)Hopper (4nm)Hopper (4nm)
CUDA cores6,91216,89616,896
Memory80GB HBM2e80GB HBM3141GB HBM3e
Memory bandwidth2.0 TB/s3,350 GB/s4.8 TB/s
FP8 Tensor (dense)n/a~1,979 TFLOPS~1,979 TFLOPS
TDP400W700W700W
Transistors54 billion80 billion80 billion

Look at the FP8 throughput row. It's identical between H100 and H200, because both run on the same GH100 die at the same clock targets. The only specification that changed between generations is memory capacity and bandwidth. For a workload that already fits in 80GB and doesn't bottleneck on memory bandwidth, an H200 will not run noticeably faster than an H100. For one that doesn't fit, the difference can be the gap between needing two GPUs and needing one.

How much does the NVIDIA H200 cost in 2026?

NVIDIA does not publish list pricing for the H200, so the numbers below come from OEM quotes, reseller listings, and cloud provider rate cards tracked through 2026.

Outright purchase, per GPU:

VariantPrice range (2026)
H200 PCIe/NVL 141GB$28,000 to $35,000
H200 SXM5 141GB$32,000 to $45,000
8-GPU HGX H200 board$308,000 to $315,000
DGX H200 system (8x H200)$400,000 to $500,000

Cloud rental, per GPU-hour, varies more than purchase price because providers bundle networking, storage, and support differently:

Provider typeRate (2026)
Decentralized GPU marketplaces$2.43 to $3.80/hr
Mid-tier specialized clouds$3.72 to $7.00/hr
Major hyperscalers (AWS, GCP, Azure)$4.33 to $13.78/hr
Market median, 42 tracked configurations$3.82/hr

The number most guides don't show

AWS raised pricing on its 8-GPU H200 instances in January 2026. The p5e.48xlarge instance went from $34.61/hr to $39.80/hr, a $5.19/hr increase, according to Introl's analysis of the change. Run that instance continuously for a year and the increase alone adds $45,464 in extra cost, for hardware that didn't change at all between December 2025 and January 2026. That's close to the price of a brand new H200 PCIe card, lost entirely to a pricing adjustment in a single month. Introl frames the hike as a sign of structural supply constraints in high-end AI accelerators, not ordinary demand growth, which is the clearest evidence that H200 capacity, not B200 capacity, was the actual bottleneck heading into 2026.

H200 vs H100: what actually changed

Memory is the only meaningful difference between these two GPUs, but it shows up differently depending on the workload.

MetricH100 SXM5H200 SXM5Change
Memory80GB HBM3141GB HBM3e+76%
Bandwidth3,350 GB/s4.8 TB/s+43%
FP8 compute~1,979 TFLOPS~1,979 TFLOPSNo change
TDP700W700WNo change
Price (SXM5)$27,000-$40,000$32,000-$45,000+15-20%

For memory-bound workloads, Fluence's H200 deep dive reports training speedups around 1.4x and inference speedups up to 1.8x over the H100, purely from the bandwidth increase letting the GPU feed its tensor cores faster. For compute-bound workloads that already fit comfortably inside 80GB, the gain shrinks toward zero, since both GPUs run the same compute pipeline at the same clocks.

The practical effect: a 70B-parameter model that needed two H100s to fit comfortably with room for a long context window can often run on a single H200. That's not a speed improvement so much as an elimination of a problem. Fewer GPUs needed per model means less cross-GPU communication overhead and simpler deployment math.

Who actually needs an H200 over an H100?

The H200 is worth the premium for workloads that hit memory limits on an H100, not for workloads that are simply slow.

Good fit for H200:

  • Training or fine-tuning models above 70B parameters where activation memory and KV cache eat into the 80GB H100 ceiling
  • Long-context inference, 32K tokens and up, where the KV cache grows large enough to force batch size down on an 80GB card
  • Multi-tenant inference serving where higher memory per GPU means more concurrent users without adding GPUs

Better off on H100 or A100:

  • Fine-tuning smaller models, under 13B parameters, that fit in 80GB with room to spare
  • Inference workloads with short context windows and small batch sizes
  • Budget-constrained teams where the 15-20% price premium doesn't pay back in fewer GPUs needed

If your workloads fit comfortably in less memory, the NVIDIA A100 remains the better value choice, especially on the secondary market. Teams training smaller models on a tighter budget should look at our guide to the best GPU for AI training, which covers consumer and prosumer cards like the RTX 4090 that cost a fraction of an H200. Hyperscalers, model labs, and large enterprises running frontier-scale LLM training are the buyers actually driving H200 demand. By early 2026, AWS, Google Cloud, Microsoft Azure, and Oracle had all rolled out H200-backed instances, alongside specialized providers offering per-GPU rental for smaller teams that don't want to commit to a multi-year reserved instance.

Should you wait for the B200 instead?

NVIDIA's Blackwell-based B200 is the direct successor to the H200, and it's already shipping in volume to large customers as of 2026, which raises an obvious question for anyone pricing out H200 capacity now.

GPUArchitectureMemoryPrice per GPU (2026)8-GPU system price
H200 SXM5Hopper141GB HBM3e$32,000-$45,000$400,000-$500,000 (DGX H200)
B200BlackwellUp to 192GB HBM3e$30,000-$50,000$300,000-$350,000 (DGX B300)

B200 offers more memory, better performance per watt, and roughly double the total system throughput of H100-class hardware in NVIDIA's own positioning, at a similar or only slightly higher price per GPU. The catch is availability. H200 is a drop-in replacement for H100 in existing HGX and DGX infrastructure, so it ships faster and integrates into systems that are already racked and validated. B200 generally requires newer power and cooling infrastructure.

For teams that need capacity now and already run H100 fleets, H200 remains the path of least resistance. For new builds with no existing Hopper infrastructure to protect, waiting for B200 allocation is increasingly the better economic call, assuming the wait doesn't cost more in delayed training runs than the price difference saves.

Common misconceptions about the H200

A few claims about the H200 circulate widely and don't hold up against NVIDIA's own specifications.

It's not a new architecture. Some marketing material describes the H200 as a generational leap, but it uses the identical GH100 die, the same 16,896 CUDA cores, and the same Transformer Engine as the H100. The only architectural change is the memory subsystem.

It doesn't have more CUDA cores. A handful of vendor listings claim up to 30% more CUDA cores than the H100. Independent hardware databases, including TechPowerUp, list the same 16,896 shading units for both GPUs. The compute side is unchanged.

It's not a gaming or workstation card. The H200 has no display outputs and doesn't support DirectX, Vulkan, or OpenGL. It's built exclusively for data center racks, not desktops.

It doesn't make every workload faster. The performance gain is workload-dependent. Memory-bound jobs see a real uplift. Compute-bound jobs that already fit in 80GB see little to no difference, because the underlying tensor core throughput didn't change.

Frequently Asked Questions

What is the NVIDIA H200 GPU used for?

The H200 is used for training and serving large AI models, particularly large language models with long context windows or parameter counts above 70 billion, where the H100's 80GB of memory becomes a limiting factor. It is deployed in cloud data centers, DGX systems, and enterprise AI clusters, not in desktops or workstations.

How much does the NVIDIA H200 GPU cost?

As of 2026, a single H200 GPU costs $28,000 to $35,000 for the PCIe/NVL variant and $32,000 to $45,000 for the SXM5 variant. An 8-GPU DGX H200 system costs $400,000 to $500,000. Cloud rental runs $2.43 to $13.78 per GPU-hour depending on the provider, with a market median around $3.82/hr.

What is the difference between the H200 and H100?

The H200 and H100 share the same GH100 compute die, the same CUDA core count, and the same peak compute throughput. The difference is memory: the H200 has 141GB of HBM3e running at 4.8 TB/s, compared to the H100's 80GB of HBM3 at 3,350 GB/s. That is a 76% increase in capacity and a 43% increase in bandwidth, with no change to compute performance.

Is the H200 faster than the H100?

It depends on the workload. For memory-bound jobs like long-context inference or training models that strain an 80GB memory budget, the H200 runs roughly 1.4x faster in training and up to 1.8x faster in inference, according to Fluence's H200 analysis. For workloads that already fit comfortably in 80GB, the speed difference is minimal because both GPUs share the same compute pipeline.

What is HBM3e memory?

HBM3e, or High Bandwidth Memory 3e, is a stacked memory technology that sits directly on the GPU package next to the compute die, instead of on a separate circuit board. It allows much higher bandwidth than traditional GDDR memory. The H200 was the first GPU to ship with HBM3e, delivering 4.8 TB/s of bandwidth across 141GB of capacity.

Should I buy an H200 or wait for the B200?

If you already run H100-based infrastructure and need capacity soon, the H200 is the lower-friction choice since it is a drop-in replacement for existing HGX and DGX systems. If you are building new infrastructure with no Hopper fleet to protect, the B200 offers more memory and better performance per watt at a similar price, but requires newer power and cooling infrastructure and has tighter availability in 2026.

Who uses the NVIDIA H200?

Hyperscalers, AI model labs, and large enterprises running frontier-scale LLM training and inference are the primary buyers. By early 2026, AWS, Google Cloud, Microsoft Azure, and Oracle had all launched H200-backed cloud instances, and specialized providers offer per-GPU rental for smaller teams.

When did the NVIDIA H200 launch?

The H200 SXM5 141GB launched on November 18, 2024, according to TechPowerUp's GPU database. Broader OEM and cloud availability ramped through 2025, and by 2026 H200 capacity had become widely available across major cloud providers and specialized GPU rental platforms.

Related Articles