Tool DiscoveryTool Discovery
AI Hardware10 min read

NVIDIA GB300 NVL72: Full Specs, Price, and Cloud Rates for 2026

AmaraBy Amara|Updated 24 June 2026
NVIDIA GB300 NVL72 rack-scale AI server with Blackwell Ultra GPUs in a data center

Key Numbers

72 + 36
Blackwell Ultra GPUs and Grace CPUs per GB300 NVL72 rack
NVIDIA, 2026
20TB
GPU memory (HBM3e) in a single GB300 NVL72 rack
NVIDIA / Lambda, 2026
$3.7M-$4.0M
Estimated price per GB300 NVL72 rack
Loop Capital analyst estimate, 2025
4,600+
GB300 NVL72 racks in Microsoft Azure's cluster for OpenAI
Microsoft Azure, 2026
50x
Claimed AI factory output increase vs Hopper-generation systems
NVIDIA, 2026

Key Takeaways

  • 1The GB300 NVL72 packs 72 Blackwell Ultra GPUs and 36 Grace CPUs into a single NVLink domain with about 20TB of shared GPU memory and 130 TB/s of bandwidth.
  • 2A single rack costs an estimated $3.7 million to $4.0 million. Microsoft's GB300 cluster for OpenAI reportedly uses over 4,600 racks, putting that one deployment's hardware cost above $17 billion.
  • 3GB300 NVL72 targets training and inference equally, with NVIDIA claiming up to 50x AI factory output versus Hopper-generation systems. Buyers in 2026 include Microsoft Azure, CoreWeave, Lambda, Verda, and reportedly Apple.

The NVIDIA GB300 NVL72 is a rack-scale AI system built around 72 Blackwell Ultra GPUs and 36 Grace CPUs, wired into a single NVLink domain with 20TB of shared GPU memory. It is the current flagship of NVIDIA's AI factory lineup, the direct successor to the GB200 NVL72, and the platform now shipping to the largest AI buyers in the world.

Here is the detail most coverage skips: a single rack costs an estimated $3.7 million to $4 million, according to Loop Capital analyst Ananda Baruah. Microsoft's GB300 cluster for OpenAI alone reportedly uses more than 4,600 of these racks, which works out to over 330,000 GPUs and an estimated $17 billion to $18.4 billion in hardware, before power, cooling, or networking costs are added.

This article breaks down what is actually inside a GB300 NVL72 rack, what it costs to buy or rent, how it stacks up against the H100 and H200 GPUs most teams run today, and who is actually buying it in 2026, including a buyer you might not expect.

What is the NVIDIA GB300 NVL72?

The NVIDIA GB300 NVL72 is a liquid-cooled rack that connects 72 Blackwell Ultra GPUs and 36 Grace CPUs into one NVLink domain, so all 72 GPUs can read and write to a shared 20TB pool of HBM3e memory at roughly 130 TB/s of total interconnect bandwidth, according to NVIDIA. NVIDIA calls it an AI factory: a single product built to run continuous training, fine-tuning, and inference at the scale of tens of thousands of GPUs, not a handful of servers in a closet.

"72 NVIDIA Blackwell Ultra GPUs and 36 Arm-based NVIDIA Grace CPUs into a single platform." (NVIDIA, on the GB300 NVL72, 2026)

The part that is easy to miss: this is not a new chip generation. GB300 uses Blackwell Ultra, a higher-binned version of the same Blackwell silicon inside the GB200 NVL72 and B200 GPU. What turns a pile of GPUs into one coherent rack is the NVLink Switch fabric. Wire 72 ordinary GPUs into eight separate 8-GPU servers connected by Ethernet, and you get a cluster that has to shuttle data over the network for anything that does not fit on one server. Wire those same 72 GPUs into a GB300 NVL72 chassis, and software sees one enormous GPU with shared memory.

SystemGPUs + CPUs per rackGPU generationGPU memory per rackTotal NVLink bandwidth
GB200 NVL7272 + 36 GraceBlackwell (B200)~13.8TB HBM3e~130 TB/s (NVLink 5)
GB300 NVL7272 + 36 GraceBlackwell Ultra~20TB HBM3e~130 TB/s (NVLink 5)
8x H100 server8, no CPU pairingHopper (H100 SXM5)640GB HBM3900 GB/s per GPU (NVLink 4)

That memory jump from GB200 to GB300, roughly 45 percent more HBM3e per rack, is the headline spec change. It comes from swapping in higher-capacity memory stacks on the same Blackwell Ultra die, not from adding more GPUs or a new architecture.

GB300 NVL72 full specifications

A full GB300 NVL72 rack delivers up to 720 PFLOPS of FP8/FP6 compute, 360 PFLOPS of FP16/BF16, 180 PFLOPS of TF32, and 6 PFLOPS of FP32, according to NVIDIA. The FP4 and FP8 numbers matter more than they used to: Blackwell Ultra adds dedicated low-precision math built for inference, where dropping numerical precision barely affects answer quality but multiplies throughput.

SpecGB300 NVL72
GPUs72x Blackwell Ultra
CPUs36x Grace (Arm-based)
GPU memory~20TB HBM3e
Total NVLink bandwidth~130 TB/s (NVLink 5)
FP8/FP6 compute720 PFLOPS
FP16/BF16 compute360 PFLOPS
TF32 compute180 PFLOPS
FP32 compute6 PFLOPS
CoolingFully liquid-cooled, 48U rack
NetworkingQuantum-X800 InfiniBand or Spectrum-X Ethernet, ConnectX-8 SuperNICs

Power delivery is the other change worth knowing about. NVIDIA built a new power supply unit for GB300 NVL72 with integrated energy storage that smooths out the rack's power draw, cutting peak grid demand by up to 30 percent compared to a rack without it, per NVIDIA's developer blog. That sounds like a footnote until you consider how this scales: a data center hosting hundreds of these racks is negotiating directly with the local utility for power capacity. Smoothing each rack's draw means the utility can provision for a lower peak, which is part of why this feature is also rolling out to GB200 NVL72 racks already in the field.

How much does the GB300 NVL72 cost in 2026?

A single GB300 NVL72 rack costs an estimated $3.7 million to $4.0 million, based on a March 2025 analyst note from Loop Capital's Ananda Baruah that priced Apple's reported GB300 order at roughly $1 billion for about 250 racks, or 18,000 GPUs. NVIDIA does not publish a list price for the system, so every figure in circulation, including this one, is a third-party estimate rather than an official number.

Renting GB300 NVL72 capacity by the hour is the more accessible option for most teams. Compute Prices listed rates from $3.02 per hour at Verda up to $18 per hour at Oracle Cloud as of late June 2026, a spread wide enough that shopping around for GB300 capacity is worth the time it takes.

The number most guides don't show

Microsoft's GB300 cluster for OpenAI reportedly runs more than 4,600 NVL72 racks. Multiply that by the Loop Capital per-rack estimate and you get somewhere between $17 billion and $18.4 billion in rack hardware for a single customer's deployment, before counting power infrastructure, networking, cooling, or the building itself. Nobody publishes that number directly. NVIDIA reports rack counts. Analysts report per-rack pricing. Nobody puts the two together, but multiplying them gives a sense of just how much capital one AI lab's compute order represents, and why GPU supply has become a boardroom topic rather than just an engineering one.

GB300 NVL72 vs H100 and H200: pricing and performance

Comparing GB300 NVL72 to the H100 and H200 on price alone is misleading, because you are not buying the same thing. An H100 or H200 purchase gets you a GPU. A GB300 NVL72 purchase gets you 72 GPUs, 36 CPUs, the NVLink switch fabric connecting all of them, liquid cooling, and the power delivery hardware, all pre-integrated into one rack.

H100 SXM5H200 SXM5GB300 NVL72 (per rack)
Price$25,000-$40,000 per GPU$28,000-$45,000 per GPU$3.7M-$4.0M per rack (72 GPUs)
Approx. price per GPU$25K-$40K$28K-$45K~$51K-$56K (includes CPUs, fabric, cooling)
GPU memory80GB HBM3141GB HBM3e~278GB HBM3e effective per GPU (20TB / 72)
ArchitectureHopperHopperBlackwell Ultra
InterconnectNVLink 4, 900 GB/s per GPUNVLink 4, 900 GB/s per GPUNVLink 5, full-rack shared domain

The per-GPU price for GB300 looks higher than a standalone H100 or H200, and it is, because that price includes the CPUs and the switch fabric that turn 72 separate chips into one coherent system. NVIDIA's own performance claim for that premium: up to 50x the AI factory output of an equivalent Hopper-generation deployment. CoreWeave, the first cloud provider to put GB300 NVL72 into production, reported a 5x improvement in throughput per watt and a 10x boost in user-facing responsiveness compared to its H100 fleet. If your workload genuinely needs the shared memory domain, for very large models or high-concurrency inference, that premium buys real headroom you cannot get by simply buying more H100s. If it does not, H200 still offers a substantially cheaper way to get more memory per GPU than H100 without paying for rack-scale integration you will not use.

Who is actually buying GB300 NVL72 in 2026?

Microsoft Azure operates the largest known deployment, a cluster built specifically to run OpenAI's workloads that Microsoft's blog describes as the first large-scale production GB300 NVL72 cluster anywhere, reportedly exceeding 4,600 racks and 330,000 GPUs.

CoreWeave says it was first to deploy GB300 NVL72 in production overall, ahead of the hyperscalers, and has published its own performance numbers from running it. Lambda and Verda both offer GB300 NVL72 access through their cloud platforms, mostly aimed at AI labs that need the rack's shared memory domain without buying the hardware outright.

The buyer that surprised people: Apple. According to a March 2025 note from Loop Capital analyst Ananda Baruah, first reported by Investor's Business Daily and covered widely from there, Apple placed orders for roughly $1 billion in GB300 NVL72 systems, around 250 racks or 18,000 GPUs, with Dell and Super Micro named as the server integration partners. Apple has not confirmed the order or said publicly what it is for, but the timing lines up with Apple's well-documented struggles getting generative AI features into Siri, and analysts have connected the purchase to that effort rather than anything Apple has stated outright.

"an impressive 10x boost in user responsiveness (TPS per user) and a 5x improvement in throughput (TPS per megawatt)" (NVIDIA, on GB300 NVL72 vs Hopper-generation systems, 2026)

None of this is cheap. A 250-rack order at the Loop Capital estimate works out to roughly $925 million to $1 billion, which lines up with the reported figure and gives a sense of why a buyer needs hyperscaler-level budgets, or hyperscaler-level ambitions, to place an order this size.

GB300 vs GB200: what Blackwell Ultra actually changes

GB300 NVL72 replaces GB200 NVL72 as NVIDIA's flagship rack, and the upgrade comes from the GPU silicon, not the rack architecture. Both racks pack 72 GPUs and 36 Grace CPUs into the same NVLink 5 fabric at roughly the same 130 TB/s of bandwidth. What changed is the GPU itself: Blackwell Ultra over plain Blackwell.

That swap gets you roughly 45 percent more HBM3e memory per rack (about 20TB versus 13.8TB), up to 1.5x more dense FP4 compute, and about 2x the attention-layer performance that matters most for long-context and reasoning models, according to NVIDIA. None of that requires touching the rack's cooling, power, or networking design, which is exactly why NVIDIA can roll GB300 out to the same physical infrastructure that already supports GB200.

For a buyer already running GB200 NVL72, the calculation is straightforward: GB300 is worth it specifically for workloads bottlenecked on memory capacity or attention throughput, like very long context windows or test-time-scaling inference. For a buyer weighing GB300 against an H200 cluster instead, the relevant question is not which chip is newer. It is whether the workload actually benefits from a single 20TB shared-memory domain, or whether independently scaled H200 nodes get the job done for less.

Common misconceptions about GB300 NVL72

GB300 is not a new architecture. It is Blackwell Ultra, a refined version of the same Blackwell family that powers the B200 GPU and GB200 NVL72. NVIDIA has not announced a successor architecture for this product line yet, so GB300 is the current ceiling within Blackwell, not the start of something new.

It does not replace the standalone B200 GPU for every use case. B200 is a GPU you can put in a smaller server for workloads that do not need rack-scale shared memory. GB300 NVL72 is the rack-scale system for workloads that do.

It is not training-only hardware. NVIDIA positions GB300 NVL72 explicitly for inference and reasoning at scale, what it calls test-time scaling, alongside training. The 720 PFLOPS of FP8/FP6 compute is aimed as much at serving large models to millions of users as it is at training them.

And you cannot replicate it by buying 72 ordinary GPU servers and networking them together. The single NVLink domain with a shared 20TB memory pool and 130 TB/s of bandwidth is a property of the rack-scale design itself. Standard Ethernet or InfiniBand between separate servers, even fast InfiniBand, does not give software the same shared-memory view that NVLink provides inside one GB300 NVL72 chassis.

Frequently Asked Questions

What is the NVIDIA GB300 NVL72?

The NVIDIA GB300 NVL72 is a rack-scale AI system that combines 72 Blackwell Ultra GPUs and 36 Grace CPUs into a single NVLink domain with about 20TB of shared GPU memory. It is NVIDIA's current flagship AI factory product, built for training and inference at the scale of large AI labs rather than individual servers.

How much does a GB300 NVL72 rack cost?

A GB300 NVL72 rack costs an estimated $3.7 million to $4.0 million, based on a March 2025 Loop Capital analyst estimate tied to Apple's reported order. NVIDIA does not publish an official list price, so this figure and others in circulation are third-party estimates rather than confirmed pricing.

What is the difference between GB300 and GB200 NVL72?

GB300 NVL72 uses Blackwell Ultra GPUs instead of the standard Blackwell (B200) GPUs in GB200 NVL72. Both racks have the same 72-GPU, 36-CPU layout and the same NVLink 5 fabric, but GB300 has about 45 percent more GPU memory per rack and up to 1.5x more FP4 compute.

Is GB300 the same thing as Blackwell Ultra?

Not exactly. Blackwell Ultra is the GPU architecture inside GB300 NVL72. GB300 NVL72 is the full rack-scale system, 72 Blackwell Ultra GPUs plus 36 Grace CPUs plus the NVLink switch fabric, cooling, and power hardware, built around that GPU.

Who is buying NVIDIA GB300 NVL72 systems?

Microsoft Azure runs the largest known deployment, reportedly over 4,600 racks built for OpenAI's workloads. CoreWeave says it was first to put GB300 NVL72 into production, and Lambda and Verda both offer it through their cloud platforms. Apple also reportedly ordered about $1 billion worth, roughly 250 racks, according to a March 2025 analyst estimate.

Is the GB300 NVL72 better than H100 or H200 for AI workloads?

It depends on the workload. GB300 NVL72 wins for models or inference jobs that need a single shared-memory domain larger than any individual GPU can hold. For workloads that fit comfortably on independent GPUs, H200 offers more memory per GPU than H100 at a much lower entry cost than buying into a full GB300 rack.

Is GB300 NVL72 designed for training or inference?

Both. NVIDIA explicitly positions GB300 NVL72 for inference and reasoning workloads, what it calls test-time scaling, in addition to training. Its FP8 and FP4 compute is tuned for serving large models at scale, not just training them.

When did GB300 NVL72 become available?

Blackwell architecture was announced at GTC in March 2024, with GB300 NVL72 following as the Blackwell Ultra flagship. Early production deployments from CoreWeave, Lambda, and Verda began in 2025, and Microsoft Azure announced its large-scale GB300 cluster for OpenAI in 2026.

Related Articles