Tool DiscoveryTool Discovery
AI Hardware12 min read

NVIDIA A100 GPU: Specs, Price, and Performance in 2026

A
By Amara
|Published 20 March 2026
NVIDIA A100 GPU PCIe card close-up showing the Ampere GA100 chip with HBM2e memory stacks and A100 label on green circuit board

Key Numbers

312 TFLOPS
FP16 Tensor Core performance (A100 80GB SXM4)
NVIDIA, 2020
$8K-$15K
New A100 80GB unit price range, 2025-2026
Market data, 2026
$1.49-$3.43
Cloud rental per GPU hour (A100 80GB), 2026
RunPod, JarvisLabs, 2026
6,912
CUDA cores in the GA100 Ampere chip
NVIDIA Ampere whitepaper
2,039 GB/s
Memory bandwidth on A100 80GB SXM4 (HBM2e)
NVIDIA, 2020

Key Takeaways

  • 1The NVIDIA A100 (Ampere GA100, TSMC 7nm, released May 2020) delivers 312 TFLOPS FP16 and 2,039 GB/s memory bandwidth on the 80 GB HBM2e configuration. It supports up to 7 isolated MIG instances for concurrent multi-model inference on a single GPU.
  • 2New A100 80GB units cost $8,000-$15,000 in 2026. Used units trade at $4,000-$9,000. Cloud rental costs $1.49-$3.43/hour. For a 7B parameter training run on 128 GPUs, A100 costs $43,008 versus $23,654 on H100 — the H100 saves $19,000 per run despite costing more per hour, because it finishes 3.2x faster.
  • 3For inference workloads below 40% sustained GPU utilization, the A100 is 1.5 to 2x cheaper per request than the H100. Large-scale training above 30B parameters has migrated to H100 and H200, while moderate-traffic inference and fine-tuning of smaller models continues to run cost-efficiently on A100 infrastructure in 2026.

The NVIDIA A100 is a data center GPU built on the Ampere architecture (TSMC 7nm), released in May 2020. It delivers 312 TFLOPS in FP16 and 2,039 GB/s memory bandwidth on its 80 GB HBM2e configuration. At launch it was described as 20 times faster than the V100 for AI workloads, and it became the standard GPU for AI model training from 2020 through 2023.

New A100 80GB units trade at $8,000 to $15,000 in 2026, roughly one-third the cost of an H100. On the secondary market, used units are available from $4,000 to $9,000. Cloud rental costs $1.49 to $3.43 per GPU hour depending on provider, making the A100 the most accessible entry point for serious AI workloads.

This article covers the complete A100 specifications, pricing across buy and rental markets, a direct performance and cost comparison with the H100, and an honest assessment of which workloads still belong on the A100 in 2026 versus those that have genuinely outgrown it.

What Is the NVIDIA A100 GPU?

The NVIDIA A100 Tensor Core GPU is a data center accelerator built on NVIDIA's Ampere architecture, succeeding the V100 (Volta, 2017). Its core chip, the GA100, contains 54.2 billion transistors manufactured on TSMC's 7nm process node. The A100 was designed as a single platform for three workload categories: AI training, AI inference, and high-performance computing (HPC).

Three capabilities distinguish the A100 from its predecessor. First, third-generation Tensor Cores supporting TF32, BF16, FP16, INT8, and FP64 data types in a single chip, eliminating the need for separate training and inference GPUs. Second, Multi-Instance GPU (MIG) technology, which allows a single A100 to be partitioned into up to 7 fully isolated GPU instances, each with its own memory, compute, and bandwidth allocation. Third, structural sparsity acceleration, which doubles effective throughput for AI models with at least 50% sparse weights.

The A100 comes in two memory configurations and two physical form factors:

VariantMemoryForm FactorNVLink
A100 SXM4 40GB40 GB HBM2SXM4 server module600 GB/s
A100 SXM4 80GB80 GB HBM2eSXM4 server module600 GB/s
A100 PCIe 80GB80 GB HBM2eStandard PCIe cardNone

The SXM4 form factor is designed for NVIDIA DGX A100 and HGX A100 systems, where 8 GPUs connect via NVLink at 600 GB/s total bidirectional bandwidth. The PCIe variant fits standard server slots and costs less, but lacks NVLink connectivity for multi-GPU training.

From 2020 to 2023, the A100 was the GPU running every major AI training cluster. GPT-3 (175B parameters), Stable Diffusion 1.x, and early Meta LLaMA models were all trained on A100 infrastructure. According to NVIDIA's official A100 product page, the chip was deployed across research institutions, cloud providers, and enterprise data centers on every continent.

NVIDIA A100 Full Technical Specifications

The following table covers all three A100 variants against the H100 SXM5 for direct comparison. All TFLOPS figures are without sparsity unless noted.

SpecificationA100 SXM4 40GBA100 SXM4 80GBA100 PCIe 80GBH100 SXM5 80GB
ArchitectureAmpere (GA100)Ampere (GA100)Ampere (GA100)Hopper (GH100)
Process nodeTSMC 7nmTSMC 7nmTSMC 7nmTSMC 4nm
CUDA cores6,9126,9126,91216,896
Tensor cores432 (3rd gen)432 (3rd gen)432 (3rd gen)528 (4th gen)
Memory40 GB HBM280 GB HBM2e80 GB HBM2e80 GB HBM3
Memory bandwidth1,555 GB/s2,039 GB/s1,935 GB/s3,350 GB/s
FP3219.5 TFLOPS19.5 TFLOPS19.5 TFLOPS67.0 TFLOPS
TF32 Tensor Core156 TFLOPS156 TFLOPS156 TFLOPS494 TFLOPS
FP16 Tensor Core312 TFLOPS312 TFLOPS312 TFLOPS989 TFLOPS
BF16 Tensor Core312 TFLOPS312 TFLOPS312 TFLOPS989 TFLOPS
FP8 Tensor CoreN/AN/AN/A1,979 TFLOPS
INT8 Tensor Core624 TOPS624 TOPS624 TOPS1,979 TOPS
FP649.7 TFLOPS9.7 TFLOPS9.7 TFLOPS34.0 TFLOPS
TDP400W400W300W700W
NVLink bandwidth600 GB/s600 GB/sNone900 GB/s
MIG instances7777
FP8 trainingNoNoNoYes

A note on sparsity figures: when AI model weights have at least 50% near-zero values (the sparsity threshold NVIDIA uses), all tensor core throughput figures double. FP16 on the A100 reaches 624 TFLOPS with sparsity, and INT8 reaches 1,248 TOPS. Most production transformer models do not achieve the 50% sparsity threshold required for these figures, so the non-sparsity numbers are the ones that apply in practice.

The most meaningful difference between the A100 40GB and 80GB is not just memory capacity. The 80GB variant uses HBM2e rather than HBM2, delivering 2,039 GB/s versus 1,555 GB/s, a 31% bandwidth improvement. For memory-bandwidth-bound workloads, including large transformer inference at long sequence lengths, the 80GB delivers measurably higher throughput, not just larger model support.

NVIDIA A100 Price: What You Pay to Buy or Rent in 2026

The A100 trades across three markets in 2026: new hardware from authorized resellers, used hardware on secondary markets, and cloud rental by the hour. Each has distinct economics.

New hardware (2025-2026 market prices):

ConfigurationNew Price Range
A100 40GB PCIe$8,000-$10,000
A100 80GB PCIe$10,000-$15,000
A100 80GB SXM4$18,000-$20,000
DGX A100 (8x SXM4 + server)$150,000-$200,000

Used hardware (secondary market, 2025-2026):

  • A100 80GB: $4,000-$9,000 depending on condition, original warranty status, and seller
  • Used SXM4 modules require compatible HGX A100 server boards, adding significant integration cost

Cloud rental per GPU per hour (2026):

  • A100 80GB on RunPod, Lambda Labs: $1.49-$2.00/hour
  • A100 80GB SXM4 on CoreWeave: approximately $2.00-$2.50/hour
  • AWS p4d.24xlarge (8x A100 40GB, full instance): approximately $32.77/hour ($4.10 per A100)
  • H100 80GB SXM5 comparison: $2.50-$4.50/hour depending on provider

The Number Most Guides Don't Show

The per-hour cloud price comparison does not tell you which GPU is cheaper for a complete training run. Here is the full calculation.

The H100 SXM5 delivers 989 TFLOPS in FP16. The A100 SXM4 delivers 312 TFLOPS. The H100 provides 3.17x more raw training throughput.

Training a 7B parameter language model (comparable to Llama 2 7B) on a 128-GPU cluster:

  • On A100 at $2/hour per GPU: 128 GPUs x 168 hours (7 days) x $2 = $43,008 total
  • On H100 at $3.50/hour (3.17x faster, so 2.2 days): 128 GPUs x 52.8 hours x $3.50 = $23,654 total
  • H100 saves approximately $19,000 per training run at this scale despite the higher hourly rate

For inference at low utilization, the math reverses. An A100 handling 15 inference requests per second uses roughly 30% of its compute capacity. The same load on an H100 uses about 9% of its capacity. Both GPUs sit mostly idle, but the H100 costs 75% more per hour for identical output. Below approximately 40% sustained GPU utilization, the A100 is the more cost-efficient inference server by a factor of 1.5 to 2.

This divide explains the current market split: large-scale training has migrated to H100 and H200 clusters, while stable inference workloads at moderate traffic levels continue running cost-efficiently on A100 hardware.

What AI Workloads Run on the A100?

The A100 covers the full range of deep learning workloads, though its position in the compute hierarchy has shifted since the H100 arrived in 2022-2023.

AI training:

  • Large language model pre-training: GPT-3 (175B parameters), DALL-E, Stable Diffusion 1.x, and Meta's original LLaMA were trained on A100 clusters. These represented the frontier in 2020-2022.
  • Fine-tuning models up to 13B parameters: the A100 80GB handles full-parameter fine-tuning of 7B-13B models in FP16. LoRA fine-tuning of 70B models is feasible with gradient checkpointing.
  • Vision transformer training: standard image classification, object detection, and segmentation training workloads fit within the A100's memory and compute envelope.

AI inference:

  • API inference for models up to 13B parameters: providers including Together AI, Baseten, and smaller cloud GPU operators use A100 clusters for serving Llama 2 7B, Mistral 7B, and similar models.
  • MIG-based multi-tenant inference: a single A100 splits into 7 isolated GPU instances, each capable of serving a separate model. This makes a single A100 economically efficient for diverse inference API deployments.
  • Real-time NLP: NVIDIA's own benchmarks show the A100 delivering inference throughput 249 times faster than a CPU server on BERT-class models (NVIDIA, 2020), making sub-100ms API response times achievable.

High-performance computing (non-AI):

  • Molecular dynamics simulation and protein structure prediction
  • Climate and weather modeling
  • Financial risk modeling requiring FP64 double-precision support (9.7 TFLOPS FP64)

For context on the hyperscale facilities where A100 clusters are deployed, see our overview of what hyperscalers are. For organizations considering colocation of owned A100 hardware, our article on colocation data center costs covers the full pricing structure.

NVIDIA A100 vs H100: When Does the Upgrade Make Sense?

The H100 (Hopper architecture, TSMC 4nm) represents a full architectural generation over the A100. The key additions are: FP8 training precision (not available on A100), fourth-generation Tensor Cores, a dedicated Transformer Engine that automatically optimizes between FP8 and FP16 precision mid-operation, and NVLink 4.0 at 900 GB/s versus the A100's 600 GB/s.

Performance gap by metric:

MetricA100 80GB SXM4H100 80GB SXM5Ratio
FP16 Tensor Core312 TFLOPS989 TFLOPSH100 = 3.2x
Memory bandwidth2,039 GB/s3,350 GB/sH100 = 1.6x
NVLink bandwidth600 GB/s900 GB/sH100 = 1.5x
FP8 trainingNoYesH100 only
New unit price$18,000-$20,000$27,000-$40,000H100 = 1.5-2x costlier
Cloud rental$2.00-$2.50/hr$2.50-$4.50/hrH100 = 1.25-1.8x costlier
TDP400W700WH100 = 1.75x more power

"The H100 with its Transformer Engine is the first GPU designed specifically for transformer model architecture. The A100 is a general-purpose accelerator that happens to work well for transformers." (NVIDIA Hopper Architecture Technical Brief, 2022)

When upgrading to H100 makes sense: 1. Training runs exceeding 72 hours on A100 clusters, where the 3.2x throughput difference converts to direct cost savings (as shown in the pricing section above). 2. Models requiring FP8 precision for training efficiency, particularly models above 30B parameters where memory becomes the constraint. 3. Organizations where researcher and engineering time cost more than GPU time, and faster iterations justify the higher hardware spend.

When staying on A100 makes sense: 1. Inference-dominant workloads at less than 40% sustained GPU utilization. 2. Existing owned A100 hardware not yet fully depreciated, where the replacement cost does not recover through training speed savings within the depreciation window. 3. Budget-constrained research teams where $1.49-$2.00/hour A100 rental is the practical ceiling for compute spend.

Multi-Instance GPU and Ampere Architecture Details

The A100 introduced two architectural features not present in the V100 that remain significant for deployment in 2026: Multi-Instance GPU (MIG) and structural sparsity support.

Multi-Instance GPU (MIG)

MIG allows a single A100 to be partitioned into up to 7 fully isolated GPU instances, each with its own dedicated slice of the GPU's compute, memory, and bandwidth resources. Each instance appears to the operating system and running software as a separate physical GPU, with complete isolation: a crash or memory error in one instance does not affect others.

MIG partition sizes on an A100 80GB (selected options):

Instance ProfileGPU FractionMemoryUse Case
1g.10gb1/7 of compute10 GBSingle model inference
2g.20gb2/7 of compute20 GBLarger model inference
3g.40gb3/7 of compute40 GBSmall fine-tuning jobs
7g.80gbFull GPU80 GBFull training or large inference

For inference API operators, MIG is a material cost saver. A single A100 at $2/hour that serves 7 concurrent model instances costs $0.29/hour per effective GPU equivalent. The same workload on 7 separate cloud GPU instances would cost $10.43/hour.

Third-Generation Tensor Cores and Sparsity

The A100's Tensor Cores added TF32 (TensorFloat-32) precision, a 19-bit format designed to match FP32 range with FP16-equivalent computation speed. TF32 delivers 156 TFLOPS on the A100 and requires no code changes from standard FP32 training, making it an automatic speed improvement for PyTorch and TensorFlow users.

Structural sparsity support accelerates computations where at least 50% of matrix values are zero. The A100 can prune and compress sparse matrices at the hardware level, theoretically doubling FP16 throughput to 624 TFLOPS. The limitation: generating the 2:4 sparsity pattern that the A100 hardware requires (exactly 2 non-zero values in every group of 4) requires specific training procedures that most production models do not currently use.

For a full picture of how the A100 fits into the broader GPU data center hardware landscape, the NVIDIA A100 product specifications page remains the authoritative reference for official benchmark figures.

Is the A100 Still Worth Using in 2026?

The A100 is over five years old in 2026. In GPU hardware terms, that is a significant generational gap. The H100 and H200 are both commercially available, and NVIDIA's Blackwell architecture (B100, B200) is entering volume production. Yet the A100 continues to serve a defined and economically rational role.

Three conditions favor continued A100 use in 2026:

1. Inference at moderate scale: models up to 13B parameters in FP16 inference, at request volumes below roughly 40% GPU utilization, run more cost-efficiently on A100 than H100. The math was shown in the pricing section above. This covers a large fraction of production AI API deployments that are not serving millions of requests per hour.

2. Owned hardware not yet depreciated: organizations that purchased A100 hardware in 2022-2023 at peak prices have assets with remaining useful life. Replacing them with H100s before the A100s are fully depreciated incurs a double capital cost that cannot be recovered through training speed savings for most workload profiles.

3. Budget-constrained teams: at $1.49-$2.00/hour for cloud rental, A100 is the lowest-cost entry point for running 7B-13B parameter model training and fine-tuning without resorting to consumer GPU workarounds. University research labs, small AI startups, and individual developers operate in this range.

The conditions where A100 is no longer competitive:

  • Training runs for frontier models (above 30B parameters) where FP8 precision and H100 Transformer Engine give a 4-6x effective throughput advantage.
  • Multi-node training at scale (128+ GPUs), where the NVLink bandwidth gap (600 GB/s vs H100's 900 GB/s) adds communication overhead that lengthens training wall-clock time.
  • Applications requiring the latest CUDA features and toolkits that NVIDIA is targeting specifically at Hopper and Blackwell architectures.

The A100's market position in 2026 is similar to the V100's position in 2022: clearly not the leading chip, but far from obsolete, and more cost-effective than its successor for a defined set of workloads.

Frequently Asked Questions

What are the NVIDIA A100 GPU specs?

The NVIDIA A100 (Ampere GA100, TSMC 7nm) has 6,912 CUDA cores, 432 third-generation Tensor Cores, and is available in 40 GB HBM2 or 80 GB HBM2e memory configurations. The 80GB SXM4 delivers 312 TFLOPS in FP16, 2,039 GB/s memory bandwidth, 9.7 TFLOPS FP64, and 600 GB/s NVLink bandwidth. TDP is 400W for the SXM4 form factor and 300W for PCIe. It supports up to 7 isolated MIG instances and sparsity-accelerated throughput of 624 TFLOPS FP16 for sparse model weights. Released May 2020.

How much does an NVIDIA A100 GPU cost?

New A100 80GB PCIe units cost $10,000-$15,000. The SXM4 variant (for DGX/HGX server systems) costs $18,000-$20,000 new. Used A100 80GB units trade on the secondary market for $4,000-$9,000 depending on condition. Cloud rental costs $1.49-$3.43 per GPU hour depending on provider and configuration. The DGX A100 system (8x A100 SXM4 with full server) costs $150,000-$200,000 new. For comparison, the H100 80GB costs $27,000-$40,000 new and $2.50-$4.50/hour to rent.

What is the difference between the A100 40GB and 80GB?

The A100 80GB uses HBM2e memory instead of the 40GB's HBM2, delivering 2,039 GB/s bandwidth versus 1,555 GB/s, a 31% improvement. Beyond larger model support, the 80GB handles memory-bandwidth-bound workloads faster. The 40GB suits fine-tuning and inference for models up to approximately 13B parameters in FP16. The 80GB is needed for models above 20B parameters, multi-instance serving of larger models, and workloads where memory bandwidth is the bottleneck. New price premium for 80GB over 40GB is approximately $2,000-$5,000.

How does the A100 compare to the H100?

The H100 (Hopper, TSMC 4nm) delivers 989 TFLOPS FP16 versus the A100's 312 TFLOPS, a 3.17x advantage. Memory bandwidth is 3,350 GB/s versus 2,039 GB/s (1.6x). The H100 adds FP8 training precision (not on A100), a Transformer Engine, and NVLink 4.0 at 900 GB/s. H100 costs $27,000-$40,000 new versus $10,000-$20,000 for A100, and $2.50-$4.50/hour to rent versus $1.49-$2.50 for A100. For training runs over 72 hours, the H100's throughput advantage pays for its higher hourly cost. For inference below 40% utilization, the A100 is cheaper.

Can I rent an NVIDIA A100 GPU in the cloud?

Yes. A100 80GB GPU cloud rental costs $1.49-$3.43 per hour depending on provider, region, and whether the GPU is SXM4 or PCIe. RunPod and Lambda Labs offer A100 80GB from $1.49-$2.00/hour. CoreWeave charges approximately $2.00-$2.50/hour for SXM4 configurations. AWS p4d.24xlarge provides 8x A100 40GB at approximately $32.77/hour for the full instance. Google Cloud and Azure also offer A100 instances at comparable rates. Spot or preemptible instances are typically 40-60% cheaper but can be interrupted.

What is Multi-Instance GPU (MIG) on the A100?

MIG (Multi-Instance GPU) is a feature on the A100 that partitions one physical GPU into up to 7 fully isolated GPU instances. Each instance gets a dedicated slice of CUDA cores, memory, and bandwidth, and appears to the OS as a separate physical GPU. Isolation is complete: a crash in one instance does not affect others. The A100 80GB can be split into seven 1g.10gb instances (10 GB each), serving 7 concurrent models on a single GPU. This makes the A100 highly cost-efficient for inference API deployments with diverse or low-traffic model loads.

What architecture does the NVIDIA A100 use?

The A100 uses NVIDIA's Ampere architecture, built on TSMC's 7nm process node. The core chip is the GA100, which contains 54.2 billion transistors. Ampere introduced third-generation Tensor Cores supporting TF32, BF16, FP16, INT8, and FP64 in a single chip, structural sparsity acceleration (doubling throughput for 50% sparse models), and Multi-Instance GPU (MIG). The Ampere architecture succeeded Volta (V100, 2017) and was succeeded by Hopper (H100, 2022). The A100 was manufactured in the same GA100 chip used across SXM4 and PCIe variants.

Is the A100 still good enough for training LLMs in 2026?

Yes, for models up to approximately 30B parameters. The A100 80GB can full-parameter fine-tune 7B-13B models in FP16 and run LoRA fine-tuning on 70B models with gradient checkpointing. For pre-training runs on models above 30B parameters, the H100's FP8 support and Transformer Engine give a 4-6x effective throughput advantage that makes A100 clusters materially slower per dollar. For inference on 7B-13B models, A100 remains cost-competitive at $1.49-$2.00/hour cloud rental, particularly for teams where training speed is less critical than cost per token served.

Related Articles