Tool DiscoveryTool Discovery
AI Hardware10 min read

NVIDIA Blackwell Architecture: What the B200 GPU Can Do

A
By Amara
|Updated 5 April 2026
NVIDIA Blackwell B200 GPU accelerator module aerial close-up showing two golden silicon chiplet dies at center with visible die boundary, copper HBM3e memory stacks on all four sides, green circuit traces on black PCB, and NVIDIA shield logo on dark blue background

Key Numbers

208B
Transistors per Blackwell GPU, across two chiplet dies on TSMC 4NP
NVIDIA GTC, March 2024
20 PFLOPS
FP4 AI performance per B200 GPU, versus 4 PFLOPS for H100
NVIDIA, 2024
$30-40K
Price per B200 GPU module as of July 2025
Modal Labs, July 2025
30x
Faster LLM inference on GB200 NVL72 vs equivalent H100 cluster
NVIDIA, 2024
192 GB
HBM3e memory per B200, at 8 TB/s bandwidth
NVIDIA B200 specs, 2024

Key Takeaways

  • 1NVIDIA Blackwell is the GPU architecture announced March 18, 2024, succeeding Hopper. Each Blackwell GPU packs 208 billion transistors across two chiplet dies connected by a 10 TB/s NVLink interface, built on TSMC 4NP process.
  • 2The B200 data center GPU costs $30,000-40,000 per module as of July 2025, with a bill-of-materials estimate of $5,700-7,300 (Epoch.ai). Cloud rental runs $3.79-18.53 per hour on-demand, reaching $8-15 per hour in April 2026.
  • 3The GB200 NVL72 rack system runs trillion-parameter AI inference 30x faster than an equivalent H100 cluster while using 25x less energy per inference token, making it the primary infrastructure choice for frontier AI model serving (NVIDIA, 2024).

NVIDIA Blackwell is a GPU microarchitecture announced on March 18, 2024 at GTC. The lead data center product, the B200, packs 208 billion transistors into a dual-die chiplet design built on TSMC's custom 4NP process, delivering 20 petaFLOPS of FP4 AI performance per GPU. That is roughly five times the throughput of the H100 it replaces.

The architecture is named after David Blackwell, the first Black scholar inducted into the National Academy of Sciences. What is surprising about the Blackwell design is how NVIDIA solved the physical limit on chip size: instead of a single monolithic die (which Hopper used), Blackwell uses two separate dies, each at the maximum reticle size, connected by a 10 TB/s NVLink chip-to-chip interface. To software, it appears as a single GPU. This chiplet approach is why Blackwell can pack 208 billion transistors when no single die could hold that many at viable yields.

This article covers the full Blackwell product lineup from B100 to the GB200 NVL72 rack system, the real performance and cost numbers compared to H100, the power infrastructure requirements, and what the architecture means for the data centers running AI at scale.

What Is NVIDIA Blackwell Architecture?

NVIDIA Blackwell is the GPU microarchitecture generation that succeeded Hopper (H100) and Ada Lovelace (RTX 40 series). Announced at the GTC Spring 2024 keynote on March 18, 2024, Blackwell covers two product families: data center GPUs (B100, B200, GB200) and consumer/workstation GPUs (RTX 50 series). The data center line targets AI training and inference; the RTX line targets gaming, content creation, and local AI workloads.

The name honors David Blackwell (1919-2010), a statistician and mathematician who became the first Black scholar inducted into the National Academy of Sciences in 1965. NVIDIA began using mathematician and scientist names for its architectures starting with Volta (2017), following Fermi, Kepler, Maxwell, Pascal, and Turing.

The defining technical departure from Hopper is the chiplet design. Hopper used a single monolithic GH100 die with 80 billion transistors on TSMC N4. Blackwell splits the GPU across two reticle-limited dies, each at the physical maximum size that semiconductor lithography can expose in one step, connected by NVIDIA's NV-High Bandwidth Interface (NV-HBI). The two dies communicate at 10 TB/s, fast enough that from a software perspective they behave as a single GPU with a unified address space.

ArchitectureGPU DieTransistorsProcessMemoryFP8 Performance
Ampere (A100)GA10054BTSMC N780GB HBM2e~312 TFLOPS
Hopper (H100)GH10080BTSMC N480GB HBM3~4,000 TFLOPS
Blackwell (B200)GB202 (dual)208BTSMC 4NP192GB HBM3e~9,000 TFLOPS

Source: NVIDIA architecture whitepapers, 2024. FP8 figures are theoretical peak.

Key Technical Innovations in Blackwell

Blackwell introduces five technical changes that matter for AI workloads. Each one addresses a specific bottleneck in how large language models are trained or served.

**Fifth-generation Tensor Cores with FP4 support.** Previous NVIDIA architectures supported FP16, BF16, FP8, and INT8 precision. Blackwell adds FP4 (4-bit floating point) with micro-tensor scaling, which halves the memory required to store model weights and roughly doubles throughput for inference workloads where FP4 precision is acceptable. The second-generation Transformer Engine automatically chooses the optimal precision layer by layer without developer intervention.

**NVLink 5.** Blackwell GPUs connect to each other at 1.8 TB/s bidirectional bandwidth via NVLink 5, up from 900 GB/s in Hopper's NVLink 4. Multi-GPU systems used for large model training depend on this interconnect speed to move activations and gradients between cards. The GB200 NVL72 system uses a 72-GPU NVLink domain, meaning all 72 B200 GPUs share a single memory address space at NVLink speeds rather than relying on slower Ethernet or InfiniBand between racks.

**192GB HBM3e memory at 8 TB/s.** The B200 carries 192GB of HBM3e, compared to 80GB in the H100 SXM5. More on-GPU memory directly determines which models can run without quantization: at 192GB, a single B200 can load a 70-billion-parameter model in FP16 (requiring around 140GB) and still have memory available for KV cache. H100 cannot do this on a single card.

**Confidential Computing via TEE-I/O.** Blackwell is the first GPU with Trusted Execution Environment I/O capability, meaning compute and model weights can be encrypted with near-zero throughput penalty. This matters for regulated industries (healthcare, finance) that cannot run inference on unencrypted data.

**RAS Engine.** A dedicated Reliability, Availability, and Serviceability engine monitors GPU health and can migrate workloads off a card before it fails. At the scale of GB200 NVL72 deployments with 72 GPUs per rack, hardware failure is a regular occurrence and automated fault handling is a practical necessity.

FP4 Precision: Why It Matters for Inference

The shift from FP8 to FP4 is not just about speed. A 70-billion-parameter model stored in FP16 requires approximately 140GB. In FP8, that drops to 70GB. In FP4, it drops to 35GB. That means a single B200 with 192GB can run four simultaneous FP4 inference copies of a 70B model, where an H100 with 80GB could run none in FP16 and one in FP8. For cloud providers selling inference-as-a-service, the revenue-per-GPU-hour difference is substantial.

The Full Blackwell Product Lineup

NVIDIA released five distinct Blackwell product lines. They share the same underlying architecture but differ significantly in thermal design power, form factor, memory capacity, and target workload.

ProductForm FactorGPUsMemoryTDPTarget Use
B100 SXMData center module1x Blackwell192GB HBM3e700WTraining, lower-power racks
B200 SXMData center module1x Blackwell192GB HBM3e700WFlagship data center training
GB200 SuperchipCombined SoC2x B200 + 1x Grace CPU384GB HBM3e1.2kWAI factory scale inference
GB200 NVL72Full rack system72x B200 + 36x Grace13.5TB HBM3eLiquid cooledTrillion-parameter inference
DGX B2008-GPU server8x B2001.44TB HBM3e~14.3kWEnterprise on-premises AI
RTX 5090Consumer GPU1x Blackwell (desktop)32GB GDDR7575WGaming + local AI

Source: NVIDIA product specifications, 2024-2025.

The GB200 NVL72 is the system drawing the most attention from AI labs. It places 72 Blackwell GPUs and 36 Grace ARM CPUs into a single liquid-cooled rack connected by a single NVLink 5 fabric. The entire rack operates as one unified compute system with 13.5TB of aggregate HBM3e memory. This is the configuration that delivers the 30x inference speedup over H100 that NVIDIA published at GTC 2024.

For teams that want Blackwell compute on a desk rather than in a rack, the NVIDIA DGX Spark uses the desktop GB10 variant of Grace Blackwell, delivering 1 petaFLOP at $4,699.

The RTX Blackwell line (RTX 5090, 5080, 5070, 5060) uses a different die variant optimised for gaming and includes up to 24,576 CUDA cores, 192 ray tracing cores, and 768 Tensor Cores. Neural rendering via DLSS 4 is a primary feature: the RTX 5090 generates multiple frames simultaneously using AI prediction rather than rendering each frame from scratch.

Blackwell vs H100: Performance, Cost, and the Numbers Most Guides Skip

The published 30x inference speedup for GB200 NVL72 over H100 is real, but it applies to a specific workload: trillion-parameter LLM inference running across a fully loaded 72-GPU NVLink domain. For smaller models on individual cards, the per-GPU improvement is approximately 5x in FP4 throughput.

MetricH100 SXM5B200 SXMChange
Transistors80B208B+160%
Memory80GB HBM3192GB HBM3e+140%
Memory bandwidth3.35 TB/s8 TB/s+139%
FP8 AI performance~4 PFLOPS~9 PFLOPS+125%
FP4 AI performanceNot supported20 PFLOPSNew capability
TDP700W700WUnchanged
Price per GPU$25,000-35,000$30,000-40,000+15-20%
Cloud on-demand$2.06-8/hr$3.79-18.53/hr+40-130%

Sources: NVIDIA specifications (2024), Modal Labs pricing data (July 2025).

The Number Most Guides Don't Show

The most revealing figure in the Blackwell story is the production cost versus the sale price. Epoch.ai estimated the bill of materials for a B200 GPU at $5,700-$7,300 per unit, based on component costs for HBM3e, packaging, TSMC wafer costs at 4NP, and assembly. The street price is $30,000-$40,000.

At a midpoint of $35,000 sale price and $6,500 production cost, NVIDIA earns approximately $28,500 gross profit per B200 shipped. That is an 81% gross margin on the hardware itself. For context, NVIDIA's blended company-wide gross margin for Q4 FY2025 was 73%, which includes software, services, and consumer GPU lines with lower margins that pull the average down.

This margin structure explains why NVIDIA's data center revenue reached $47.5B in Q4 FY2025 alone (NVIDIA earnings, February 2025). Each Blackwell rack sale generates more gross profit than most companies earn in annual revenue.

It also means NVIDIA has room to reduce prices if competitive pressure from AMD MI350 or Google TPU v5 intensifies, without threatening profitability. At $10,000 per B200, NVIDIA would still be profitable at 35% gross margin.

Power Consumption and Data Center Infrastructure

The B200 draws 700W per GPU, the same rated TDP as the H100 SXM5. On a per-GPU basis, power consumption did not increase with Blackwell. What changed is the rack configuration: Blackwell systems pack more GPUs per rack, and the GB200 NVL72 creates a 72-GPU liquid-cooled rack that pulls significantly more power per square foot than any previous GPU system.

The DGX B200 server, which houses 8 B200 GPUs, draws approximately 14.3kW maximum (NVIDIA DGX B200 spec sheet, 2024). A full rack of DGX B200 servers draws 60-100kW depending on configuration, well above the 10-30kW standard for enterprise racks and requiring liquid cooling or rear-door heat exchangers rather than conventional CRAC unit air cooling.

The GB200 NVL72 rack requires liquid cooling as a hardware prerequisite, not an option. This is a material infrastructure change for data centers: retrofit costs for liquid cooling in an existing air-cooled facility range from $2M-$5M per rack row, according to data center engineering firm Cbre Group's 2025 infrastructure report.

"The infrastructure requirements for Blackwell are fundamentally different from Hopper. It is not a GPU swap; it is a facility upgrade." (Data center operator comment cited in Goldman Sachs AI Infrastructure report, 2025)

The Energy Efficiency Claim Unpacked

NVIDIA's figure of "25x less energy per inference token" for GB200 NVL72 versus H100 equivalents is frequently cited without context. The comparison is: one GB200 NVL72 rack versus the number of H100 servers needed to match its inference throughput (approximately 30x the throughput of one H100). If 30 H100 servers would be needed to match one GB200 NVL72 on LLM inference, and each H100 server draws roughly the same power, then the GB200 NVL72 does use 25-30x less electricity for the same output.

This is correct math, but it describes total-system efficiency, not per-GPU efficiency. A single B200 uses the same power as a single H100 but delivers more compute. The rack-level advantage comes from NVLink 5 eliminating high-power InfiniBand switching between GPUs, plus architectural improvements in how Transformer operations are executed.

For AI infrastructure planning, the practical impact is that a data center can serve 30x more LLM inference requests per MW of power with Blackwell versus Hopper, which directly reduces the GPU rental price per 1,000 inference tokens over time. This is driving cloud provider GPU pricing competition through 2026.

For broader context on how AI infrastructure power consumption is tracked at the national level, see AI data center power consumption trends.

Who Is Deploying Blackwell and When

Cloud providers began deploying Blackwell GPU instances in late 2024 and expanded through 2025. AWS, Azure, Google Cloud, CoreWeave, Lambda Labs, and Oracle Cloud Infrastructure all announced Blackwell availability in this period.

ProviderBlackwell ProductOn-Demand PriceAvailability
AWSB200 (p5en instances)Not publicly listedEnterprise contract
AzureND B200 v5$3.79-18.53/hrLimited, waitlist
Google CloudA3 Ultra (B200)Variable, enterpriseWaitlist
CoreWeaveB200 SXM$8-15/hrApril 2026, limited
Lambda LabsB200$8-12/hrQ1 2026
Oracle CloudB200Contract pricingEnterprise

Source: Provider pricing pages and Modal Labs (July 2025, April 2026).

Supply constraints are significant. NVIDIA's primary TSMC 4NP allocation and CoWoS packaging capacity (used for HBM3e integration) limited Blackwell production through most of 2025. Most enterprise buyers access Blackwell through multi-year cloud contracts rather than spot market GPU rental.

"The demand for Blackwell is extraordinary. Every customer wants more than we can supply." (Jensen Huang, NVIDIA CEO, Q3 FY2025 earnings call, November 2024)

For AI labs training frontier models (GPT-scale, Gemini-scale), Blackwell is the only viable architecture. The 192GB per GPU memory capacity and NVLink 5 interconnect make it possible to train models that simply cannot fit on H100 hardware without excessive gradient checkpointing. OpenAI, Google DeepMind, Anthropic, Meta FAIR, and xAI are all reported to have Blackwell cluster allocations.

Whether Blackwell is worth the price for teams not training frontier models is a separate question. Thunder Compute's April 2026 analysis concluded that for inference-only workloads on models below 70B parameters, H100 clusters are still cost-competitive with B200 on a per-token basis, since the FP4 advantage only compounds at very high throughput and model sizes. For a detailed comparison of cloud GPU providers and pricing, the cloud GPU providers comparison covers current H100 and B200 rental rates across major platforms.

What Comes After Blackwell: NVIDIA's Architecture Roadmap

NVIDIA has publicly committed to an annual GPU architecture cadence, a departure from the previous two-to-three year cycles. Blackwell (2024) is followed by Rubin (2025), and NVIDIA has referenced Feynman (named after physicist Richard Feynman) for the generation after that.

Rubin will use HBM4 memory, which delivers approximately 50% higher bandwidth than HBM3e. NVIDIA has not published full Rubin specifications as of early 2026, but the transition from HBM3e to HBM4 is expected to be the defining performance improvement, similar to how the jump from HBM2e (A100) to HBM3 (H100) was the key bandwidth advance in that generation.

The annual cadence creates a new challenge for data center operators: when to refresh GPU infrastructure. A facility that spends $500M deploying Blackwell in 2025 faces the question of whether to wait for Rubin in 2026, or whether Rubin supply will be constrained enough that Blackwell remains the practical choice for 18-24 months. Based on Blackwell's own availability timeline (announced March 2024, broadly available through 2025), Rubin availability for most buyers will likely extend well into 2026.

The key implication for buyers considering H100 vs Blackwell today: H100 prices have declined substantially as Blackwell arrived. H100 SXM5 spot prices fell from $25,000-35,000 in 2023 to $18,000-25,000 by early 2026, and cloud H100 on-demand rates are at multi-year lows. For workloads that fit in 80GB of VRAM, H100 remains a cost-effective option. Blackwell's premium is justified when 192GB memory capacity, FP4 inference throughput, or NVLink 5 interconnect bandwidth are specifically required.

For the current landscape of AI accelerator hardware including H100 and its alternatives, see the AI accelerator hardware overview.

Frequently Asked Questions

What is NVIDIA Blackwell?

NVIDIA Blackwell is a GPU microarchitecture announced March 18, 2024 at GTC, succeeding Hopper (H100) for data center AI workloads and Ada Lovelace for consumer GPUs. The data center B200 GPU uses a dual-die chiplet design with 208 billion transistors on TSMC 4NP, delivering 20 petaFLOPS of FP4 AI performance and 192GB of HBM3e memory per GPU. Named after mathematician David Blackwell, the architecture introduces FP4 precision, NVLink 5, and a 10 TB/s chip-to-chip interconnect between the two dies.

How does NVIDIA Blackwell B200 compare to H100?

The B200 outperforms the H100 in every measurable dimension: 208B transistors versus 80B, 192GB HBM3e versus 80GB HBM3, 8 TB/s memory bandwidth versus 3.35 TB/s, and 20 PFLOPS FP4 versus no FP4 support on H100. FP8 throughput roughly doubles. Price per GPU is 15-20% higher than H100 (B200 at $30,000-40,000 versus H100 at $25,000-35,000 as of 2025). The system-level advantage is larger: one GB200 NVL72 rack delivers equivalent LLM inference to approximately 30 H100 servers while consuming substantially less power.

How much does a Blackwell B200 GPU cost?

A single B200 SXM module costs $30,000-40,000 as of July 2025, according to Modal Labs pricing research. The GB200 Superchip (one Grace CPU plus two B200 GPUs) costs $60,000-70,000. A full DGX B200 server with eight B200 GPUs costs approximately $515,000. Cloud rental for B200 runs $3.79-18.53 per hour on-demand (July 2025), rising to $8-15 per hour in some platforms by April 2026 due to limited availability. Epoch.ai estimates the bill-of-materials production cost at $5,700-7,300 per B200, implying approximately 81% gross margin at the $35,000 midpoint sale price.

What is the GB200 NVL72?

The GB200 NVL72 is a rack-scale AI compute system that integrates 72 Blackwell B200 GPUs and 36 Grace ARM CPUs into a single liquid-cooled rack connected by a unified NVLink 5 fabric. The entire rack operates as one compute system with 13.5TB of aggregate HBM3e memory and delivers 30x faster LLM inference versus an equivalent H100 cluster, according to NVIDIA's GTC 2024 announcement. It requires liquid cooling as a hardware prerequisite and is the primary configuration purchased by hyperscalers and major AI labs for large-scale model serving.

What process node is NVIDIA Blackwell built on?

NVIDIA Blackwell is built on TSMC's 4NP process, a custom variant of TSMC's N4P node developed specifically for NVIDIA. The "N" in 4NP refers to TSMC's 4-nanometer family; the "P" indicates enhanced performance tuning. Each Blackwell GPU consists of two separately manufactured dies, each at the maximum reticle size (the largest area a single lithography exposure can cover), connected by NVIDIA's NV-High Bandwidth Interface. Producing two smaller dies and connecting them is more cost-effective at this transistor count than attempting a single monolithic die, which would have unacceptably low yields.

Does Blackwell support FP4 precision?

Yes. Blackwell's fifth-generation Tensor Cores natively support FP4 (4-bit floating point) with micro-tensor scaling through the second-generation Transformer Engine. FP4 stores model weights in 4 bits instead of 16 or 8, halving the memory required compared to FP8 and enabling a single B200 to run four simultaneous copies of a 70B FP4 model within its 192GB memory. Prior NVIDIA architectures (Hopper, Ampere) did not support FP4 natively. The Transformer Engine automatically selects precision per layer, so developers do not need to manually quantize models to benefit from FP4 acceleration.

When did NVIDIA Blackwell GPUs start shipping?

NVIDIA announced Blackwell at GTC on March 18, 2024. Cloud providers began receiving B200 allocations in late 2024, with Azure, AWS, and Google Cloud deploying Blackwell instances from late 2024 through 2025. The DGX B200 server became available to enterprise customers in late 2024. Supply remained constrained through most of 2025 due to limited TSMC 4NP CoWoS packaging capacity, and most buyers access Blackwell through multi-year cloud contracts rather than spot market GPU rental. Broad availability on cloud platforms with on-demand pricing was still limited as of April 2026.

What GPU architecture comes after Blackwell?

The architecture after Blackwell is called Rubin, named after astronomer Vera Rubin. NVIDIA has committed to an annual GPU architecture cadence, positioning Rubin for 2025. Rubin is expected to use HBM4 memory, which offers approximately 50% higher bandwidth than HBM3e. After Rubin, NVIDIA has referenced an architecture named Feynman (after physicist Richard Feynman). Full Rubin specifications had not been published as of early 2026. Based on Blackwell's production timeline, Rubin is expected to reach broad availability through 2026-2027.

Related Articles