Tool DiscoveryTool Discovery
Cloud Compute11 min read

What Is a NeoCloud Provider? GPU Cloud vs Hyperscalers Explained

AmaraBy Amara|Updated 2 June 2026
NeoCloud GPU-as-a-Service diagram showing AI companies connecting through NeoCloud at $2.50/hr versus AWS at $6.88/hr and Azure at $12.29/hr for AI training and inference workloads

Key Numbers

60-85%
Lower GPU compute cost on neoclouds vs hyperscalers for equivalent H100 hardware
McKinsey, 2025
$2.50/hr
On-demand H100 SXM5 per GPU-hour on leading neoclouds vs $6.88/hr on AWS p5 instances
Spheron Network, May 2026
$11.9B
Value of CoreWeave's multi-year GPU compute contract with OpenAI, announced 2023
Press reporting, 2023
30-40%
Annual growth rate projected for the GPU-as-a-Service market through 2030
IDC / Gartner, 2025

Key Takeaways

  • 1A neocloud is a cloud provider built exclusively around GPU compute for AI workloads. CoreWeave, Lambda Labs, RunPod, and Vast.ai are the largest. They offer no general-purpose cloud services, no databases, no serverless, no managed application tiers: just GPUs, InfiniBand networking, and NVMe storage.
  • 2H100 80GB on-demand pricing on neoclouds runs $2.50-$4.00 per GPU-hour in Q2 2026 versus $6.88/hr on AWS and $12.29/hr on Azure for the same GPU. Across a 1,000 GPU-hour training job, that difference is $3,890 to $9,900 in savings depending on the provider (Spheron Network / Thunder Compute, May-June 2026).
  • 3Neoclouds suit GPU-intensive AI training and large-batch inference. They do not replace hyperscalers for general IT infrastructure, compliance-heavy enterprise workloads, or applications that need hundreds of managed cloud services alongside compute.

A neocloud is a cloud provider built around one thing: GPU compute for AI workloads. No databases. No serverless. No load balancers. CoreWeave, Lambda Labs, RunPod, and Vast.ai are the four largest, and each one prices NVIDIA H100 and A100 access 60-85% below what AWS and Azure charge for identical silicon, according to McKinsey's 2025 analysis.

The term came out of SemiAnalysis and McKinsey research separating GPU-first companies from hyperscalers that bolted GPU instances onto an existing general-purpose platform. The gap is visible in the numbers: AWS charges $6.88 per H100 GPU-hour on-demand as of May 2026. RunPod's Secure Cloud prices the same GPU at $2.39. Azure ND H100 v5 comes in at $12.29 (Spheron Network, May 2026).

What follows covers what neoclouds are, how they differ architecturally from AWS and Azure, which providers matter in 2026, how pricing works from spot marketplaces to reserved enterprise clusters, and when moving to a neocloud actually makes sense.

What Is a NeoCloud Provider?

A neocloud's primary product, often its only product, is GPU compute. Hyperscalers like AWS and Azure offer hundreds of managed services with GPUs listed somewhere in the catalog. Neoclouds don't bother with the rest. Dense GPU clusters, high-bandwidth interconnects, and NVMe storage tuned for AI: that's it.

McKinsey defines the category as "independent GPU-as-a-service providers" that emerged in response to global scarcity of high-end compute hardware and GPU manufacturers' push to move product outside of direct hyperscaler agreements. McKinsey, Equinix, ABI Research, and Cisco all use the term in their infrastructure research.

Where the term came from

"Neocloud" gained traction in 2023-2024 through SemiAnalysis GPU supply chain reporting and McKinsey's generative AI infrastructure work. Before that, these companies were called "GPU cloud" or "GPU-as-a-Service" providers. The category is older than the label. CoreWeave started in 2017 doing rendering and moved to AI compute around 2020. Lambda Labs was making ML hardware in 2012 and shifted to cloud rental as research demand grew.

What a neocloud does and does not offer

FeatureNeoCloudHyperscaler (AWS / Azure)
GPU compute (H100, A100, L40S)Core productOne of hundreds of services
InfiniBand networking for trainingStandardSelect instance families only
Bare-metal or near-bare-metal GPU accessCommonRare (mostly virtualized)
Managed databases, serverless, ML platformsNot offeredFull stack available
Global regions5-15 typically30+
Enterprise compliance (SOC2, HIPAA, FedRAMP)CoreWeave: yes; others: variesFull
Transparent hourly GPU pricingYesComplex multi-tier billing

The row that matters most is managed services. When an AI team needs only to run a training job or a batch inference pipeline, hyperscaler complexity and pricing overhead do not help them. Neoclouds remove both. When that same team needs a database, a queue service, and a GPU cluster on the same bill with unified identity and networking, a hyperscaler is the more practical choice.

NeoCloud vs Hyperscaler: What Actually Differs

The difference between a neocloud and a hyperscaler isn't mainly about price. Hyperscalers added NVIDIA hardware to platforms built for web services, databases, and virtual machines. Their networking, storage, and management layers came from that context and got adapted. Neoclouds started from the GPU cluster.

How the architectures diverge

Neoclouds run H100 and A100 GPUs on InfiniBand fabric at 400 Gbps per link. In distributed training, gradient synchronization between GPU nodes moves terabytes of intermediate model state, and that traffic is latency-sensitive. AWS EFA and Azure's selective InfiniBand instances handle many workloads fine but aren't uniformly as fast as the InfiniBand setups neoclouds run by default.

Neoclouds also lean toward bare-metal or thin-VM access. Less hypervisor overhead between the workload and the hardware. For training jobs that run days, that efficiency difference shows up in cost.

The managed services trade-off

"For most AI teams, neocloud providers deliver 40-85% lower GPU compute costs than hyperscalers with comparable or better GPU availability in 2026." (Thunder Compute, June 2026)

AWS has 200+ distinct services. A team already running their application on AWS can add a p5 instance and keep everything in one place: data, networking, identity, monitoring. Moving the training workload to a neocloud means data transfer overhead, a new auth layer, separate monitoring, and a second vendor relationship to manage.

For teams that only need GPU compute and handle their own orchestration anyway, that's workable. For teams deep in a hyperscaler ecosystem, the friction may outweigh the savings.

The Major NeoCloud Providers in 2026

Six companies dominate the neocloud market in 2026. CoreWeave and Lambda Labs operate centralized, curated GPU clusters with enterprise-grade SLAs. RunPod and Vast.ai run marketplace models where independent hardware operators list capacity on a shared platform. Together AI and Crusoe Energy occupy focused niches: inference APIs, and low-cost stranded-energy sourcing respectively.

ProviderModelH100 On-Demand (Q2 2026)Bare-metalInfiniBandBest for
CoreWeaveCentralized fleet~$6.16/GPU-hrVMs + managed KubernetesYesEnterprise, SLA-required training
Lambda LabsCentralized fleet$2.49-$3.44/GPU-hrVM + bare-metalYesML researchers, distributed training
RunPodMarketplace$1.99-$2.89/GPU-hrNear-bare-metalHost-dependentCost-sensitive training, inference
Vast.aiSpot marketplace$1.50-$2.58/GPU-hrBare-metal typicalHost-dependentCheapest batch compute
Together AIInference APIPer-token / APINo direct accessYes (internal)Inference APIs, managed fine-tuning
Crusoe EnergyCentralized fleetCustomAvailableYesSustainability-mandated organizations

Pricing from Spheron Network and Thunder Compute, May-June 2026. Marketplace providers show typical ranges; actual prices vary by host.

CoreWeave

CoreWeave is the largest neocloud by managed fleet size and the only one to have signed a verifiable multi-billion-dollar contract with a major AI lab. In 2023, the company announced an $11.9 billion, five-year compute agreement with OpenAI, according to press reporting from that period. CoreWeave began as a rendering service in 2017 and pivoted to AI compute around 2020. By 2025, its fleet spanned hundreds of thousands of GPUs across US data centers.

For the full CoreWeave pricing breakdown, GPU availability, and who the platform serves, see our CoreWeave review.

Lambda Labs

Lambda Labs started as a hardware company selling GPU workstations to machine learning researchers and launched its cloud service in 2019. It operates centralized clusters with dedicated InfiniBand fabric, making it the most consistent neocloud choice for multi-node jobs requiring guaranteed inter-GPU bandwidth. H100 on-demand in Q2 2026 runs $2.49-$3.44 per GPU-hour depending on configuration (Spheron Network, May 2026).

RunPod

RunPod operates a marketplace where independent GPU operators list their hardware under two tiers: Community Cloud (cheaper, lower guaranteed uptime) and Secure Cloud (stricter quality requirements, higher prices). H100 PCIe starts at $1.99/hr on Community Cloud and $2.39/hr on Secure Cloud as of Q2 2026. Networking quality and reliability vary by host rather than being uniformly guaranteed across the platform.

Vast.ai

Vast.ai runs the most aggressive pricing in the market, using a spot-style auction marketplace where hosts bid for workloads. H100 instances average $1.50-$2.58/GPU-hr depending on host and availability window. Preemption risk is real: spot instances can be reclaimed by the host with limited notice. For the full breakdown of Vast.ai's strengths and limitations for production workloads, see our Vast.ai review.

Crusoe Energy

Crusoe Energy differentiates on power source rather than price. The company builds GPU clusters at natural gas flare sites and data centers powered by stranded renewable energy, positioning itself as the sustainability choice for companies with carbon reporting requirements. Its enterprise H100 clusters use custom negotiated pricing.

H100 GPU On-Demand Price per Hour: NeoCloud vs Hyperscaler (Q2 2026)

Vast.ai2.58 $/GPU-hrSpheron2.5 $/GPU-hrRunPod2.89 $/GPU-hrLambda Labs3.3 $/GPU-hrCoreWeave6.16 $/GPU-hrAWS p56.88 $/GPU-hrAzure ND H10012.29 $/GPU-hrSource: Spheron Network, May 2026; Thunder Compute, June 2026

On-demand list pricing only. Reserved contracts reduce rates 35-50% at most providers. CoreWeave competes on managed services and SLAs rather than on-demand list price. Lambda Labs shows midpoint of $2.49-$3.44 range. Sources: Spheron Network GPU pricing comparison (May 14, 2026) and Thunder Compute neocloud comparison guide (June 2026).

H100 GPU On-Demand Price per Hour: NeoCloud vs Hyperscaler (Q2 2026)
CategoryValueUnit
Vast.ai2.58$/GPU-hr
Spheron2.5$/GPU-hr
RunPod2.89$/GPU-hr
Lambda Labs3.3$/GPU-hr
CoreWeave6.16$/GPU-hr
AWS p56.88$/GPU-hr
Azure ND H10012.29$/GPU-hr

How NeoCloud Pricing Works

Neocloud pricing divides into three tiers: spot or interruptible instances at the lowest cost with preemption risk, on-demand hourly instances at list price, and reserved contracts at significant discounts for 3-month or 12-month commitments. Hyperscalers use the same three tiers, but their list rates for identical GPUs are materially higher.

The pricing spread across tiers (H100 80GB, Q2 2026)

ProviderSpot / InterruptibleOn-Demand (hourly)Source
Vast.ai$0.80-$1.50/hr$1.50-$2.58/hrThunder Compute, June 2026
RunPod$1.03/hr (spot)$1.99-$2.89/hrRunPod / Spheron, 2026
Spheron$1.03/hr (spot)$2.50/hrSpheron Network, May 2026
Lambda LabsNot standard$2.49-$3.44/hrSpheron, May 2026
CoreWeaveNot standard$6.16/hrThunder Compute, June 2026
AWS p5 (H100 SXM)~$2.74/hr (spot)$6.88/hrSpheron, May 2026
Azure ND H100 v5Limited$12.29/hrSpheron, May 2026

For a complete listing of GPU cloud providers including non-neocloud options, see our cloud GPU providers comparison.

The Number Most Guides Don't Show

Compare the total compute cost of a single H100 training run across providers. 1,000 GPU-hours is a reasonable proxy for a 7B LLM fine-tuning job running 10 hours across an 8-GPU node, with several iterations.

ProviderCost for 1,000 H100 GPU-hoursvs AWS On-Demand
Vast.ai (spot, typical)$1,500$5,380 saved
RunPod Secure Cloud$2,390$4,490 saved
Lambda Labs$2,990$3,890 saved
CoreWeave (on-demand)$6,160$720 saved
AWS p5 (on-demand)$6,880Baseline
Azure ND H100 v5$12,290$5,410 more expensive

The CoreWeave on-demand rate sits close to AWS, which surprises users expecting all neoclouds to be dramatically cheaper. CoreWeave competes on managed services and SLAs rather than raw spot pricing. Its reserved contracts for enterprise customers, typically multi-month commitments, bring the effective rate into the $2.50-$3.50 range. That is where the substantial savings appear relative to AWS.

McKinsey's finding that neoclouds are "up to 85% cheaper" applies most directly to the marketplace tier (Vast.ai, RunPod spot) versus Azure on-demand pricing: the two extremes of the market. The median saving for equivalent on-demand GPU hours when comparing leading neoclouds to AWS is more typically 40-60%.

When to Use a NeoCloud Instead of AWS or Azure

It comes down to whether the workload needs GPU compute only, or GPU compute wired into a broader cloud stack.

Use a neocloud when:

  • The workload is GPU training, fine-tuning, or large-batch inference and needs no managed cloud services alongside it
  • Your team already manages its own storage (S3-compatible or local NVMe), orchestration (Kubernetes or Slurm), and monitoring
  • The compute job runs for hours or days and the cost difference is material (at $3,890-$5,380 saved per 1,000 GPU-hours vs AWS, this threshold arrives quickly for teams running regular training jobs)
  • You need InfiniBand networking for multi-node distributed training and your hyperscaler cannot provide it consistently at your instance size
  • GPU availability is the constraint: neoclouds often have shorter provisioning queues for H100 hardware than hyperscalers during periods of high demand

Stay with a hyperscaler when:

  • The application uses GPU compute alongside managed databases, Lambda functions, S3, or other services with data staying in one VPC
  • Your organization has enterprise agreements with AWS, Azure, or GCP that include committed use discounts making the effective GPU rate competitive
  • Compliance requirements (SOC 2 Type II, HIPAA, FedRAMP, ISO 27001) rule out providers that have not completed those certifications
  • You need 30+ global regions for latency or data residency requirements
  • The team lacks capacity to manage a separate cloud vendor relationship alongside an existing hyperscaler stack

What most enterprise teams actually do

"Neoclouds complement the big clouds by filling a gap: ultra-high-performance computing for AI, rather than replacing them." (NEXTDC, 2025)

Most organizations with mature cloud setups keep their application infrastructure, databases, and user-facing services on a hyperscaler and send GPU-intensive training jobs to a neocloud. It requires planning for cross-cloud data movement, but on multi-week training runs the compute savings are usually large enough to cover that overhead with room to spare.

For context on when training a model makes sense versus calling an inference API, the AI training vs inference explained guide covers the cost and performance trade-offs in full.

The Technical Stack Behind a NeoCloud

Neoclouds build their performance advantage into three infrastructure layers: the networking fabric, the storage system, and the GPU density of each rack. All three are configured from the ground up for distributed AI workloads rather than adapted from a general-purpose design.

InfiniBand vs Ethernet

InfiniBand HDR delivers 400 Gbps per port with single-digit microsecond latency. This is what NVIDIA NVLink multi-GPU clusters use internally and what multi-node training clusters require for gradient synchronization. AWS and Azure offer InfiniBand selectively through specific instance families (AWS p5, Azure ND H100 v5), but not uniformly across all GPU instance types. CoreWeave and Lambda Labs run InfiniBand as the default fabric for H100 deployments.

For a full breakdown of NVIDIA H100 bandwidth specifications and NVLink architecture, see our H100 GPU specs and pricing guide.

NVMe storage for training throughput

Training large models requires reading and writing checkpoints and dataset batches at high throughput. NVMe-over-Fabrics storage delivers sequential read speeds of 5-7 GB/s per drive versus 0.5-1.5 GB/s for the SATA SSDs common in general-purpose cloud VMs. The difference is invisible on light workloads but becomes the throughput ceiling on data-intensive training pipelines.

GPU rack density

Standard enterprise server racks handle 10-20 kW of power draw. A rack of eight NVIDIA H100 SXM5 GPUs draws approximately 56 kW at full load (eight GPUs at 700W TDP each). Neoclouds design and build their facilities to handle this density from day one. Power, cooling, and physical rack structure are all specified for it. Hyperscalers host these racks as well, but dense GPU configurations are not the default infrastructure specification the way they are at a neocloud.

GPU-as-a-Service Market Size 2023-2028, USD Billions (Projected)

20232.1 B USD20243 B USD2025 (est)4.8 B USD2026 (proj)6.5 B USD2027 (proj)8.8 B USD2028 (proj)12 B USDSource: IDC, Gartner, Grand View Research analyst estimates, 2025

2023-2025 figures represent analyst consensus estimates from IDC and Gartner research published in 2025. 2026-2028 projections apply 35% CAGR from the 2025 base, consistent with the 30-40% range cited by IDC and Grand View Research for the GPU cloud / AI IaaS segment. Individual analyst estimates vary.

GPU-as-a-Service Market Size 2023-2028, USD Billions (Projected)
CategoryValueUnit
20232.1B USD
20243B USD
2025 (est)4.8B USD
2026 (proj)6.5B USD
2027 (proj)8.8B USD
2028 (proj)12B USD

The NeoCloud Market in 2026

The GPU cloud market is growing at 30-40% annually through the 2020s, per IDC and Gartner projections from 2025. The neocloud segment hit an estimated $4-7 billion in 2025. At current growth rates, $10 billion by 2028 is achievable.

What drove the neocloud wave

When H100s became scarce in 2023-2024, AI labs took GPU capacity wherever they could find it. Neoclouds had supplier relationships with NVIDIA before the boom and could provision hardware that hyperscalers were quoting months out. The cost gap did the rest. A company paying $200,000 per month on AWS GPU instances could run the same workload for $80,000-$120,000 on Lambda Labs or RunPod. For a startup, that's not a rounding error; it's runway.

Enterprise teams started treating GPU purchasing as a separate line item, negotiating neocloud capacity outside their existing hyperscaler deals once training budgets crossed the threshold where managing two vendors was worth it.

Where this is heading

"The long-term viability of neoclouds depends on moving up the stack into AI-native services, putting them in more direct competition with hyperscalers." (McKinsey, 2025)

Renting bare-metal GPUs by the hour is a thin business. Hyperscalers are locking in better NVIDIA contracts, and as supply normalizes, the on-demand price advantage will compress. McKinsey's 2025 analysis is clear on where neoclouds need to go: AI-native managed services, training pipelines, inference endpoints, model serving, and MLOps tooling carry better margins than raw compute.

CoreWeave's $11.9 billion OpenAI contract points the direction. That deal isn't on-demand GPU hours. It's a long-term managed infrastructure relationship with pricing and capacity locked in. Lambda Labs has been building managed ML tooling alongside its GPU rental business for the same reason. The neoclouds with real staying power in 2026 and beyond are the ones adding services on top of hardware, not just renting racks.

Frequently Asked Questions

What is a neocloud?

A neocloud is a cloud provider built exclusively around GPU compute for AI workloads. Unlike hyperscalers such as AWS and Azure, which offer hundreds of managed services, neoclouds provide only GPU compute, high-bandwidth networking (typically InfiniBand), and NVMe storage configured for AI training and inference.

The term was popularized through McKinsey and SemiAnalysis research describing the new category of GPU-first providers that emerged during the AI infrastructure buildout of 2022-2024. The major neoclouds in 2026 are CoreWeave, Lambda Labs, RunPod, Vast.ai, Together AI, and Crusoe Energy.

What is the difference between a neocloud and a hyperscaler?

A hyperscaler (AWS, Azure, Google Cloud) offers general-purpose cloud computing with 200+ managed services, where GPUs are one product among hundreds. GPU instances are virtualized and come with the full cloud stack: managed databases, serverless, compliance certifications, 30+ global regions, and enterprise support.

A neocloud or specialized GPU cloud (CoreWeave, Lambda Labs, RunPod) exists solely to provide GPU compute, typically on bare-metal or near-bare-metal machines with InfiniBand networking. Neoclouds run 40-85% cheaper for raw GPU access but offer no managed services and fewer geographic regions. They suit GPU-intensive AI workloads for teams that manage their own orchestration stack.

Is CoreWeave a neocloud?

Yes. CoreWeave is the largest neocloud by managed fleet size and enterprise contract volume. It offers NVIDIA H100, A100, and L40S GPUs on InfiniBand-connected clusters with enterprise SLAs and Kubernetes-based orchestration, but no general-purpose cloud services such as databases, serverless, or CDN.

CoreWeave signed an $11.9 billion, five-year compute contract with OpenAI in 2023, making it the highest-profile neocloud customer relationship in the market. Its on-demand H100 pricing of approximately $6.16 per GPU-hour (Thunder Compute, June 2026) sits close to AWS p5 rates, but reserved enterprise contracts run considerably lower.

How much cheaper are neoclouds than AWS for GPU compute?

For on-demand H100 80GB pricing as of Q2 2026: AWS p5 charges approximately $6.88 per GPU-hour, while RunPod Secure Cloud charges $2.39, Lambda Labs charges $2.49-$3.44, and Vast.ai averages $2.58 (Spheron Network, May 2026). Across a 1,000 GPU-hour training job, the difference runs from $3,890 to $5,380 in savings compared to AWS on-demand.

CoreWeave is the exception: its on-demand rate of $6.16 is close to AWS. CoreWeave competes on managed services and SLAs, with cost advantages appearing primarily on reserved multi-month contracts.

What is GPU-as-a-Service (GPUaaS)?

GPU-as-a-Service (GPUaaS) is the cloud delivery model where customers rent GPU compute by the hour, day, or month without purchasing hardware. Neoclouds are the primary GPUaaS providers, though hyperscalers also offer GPU instances under the same model.

The GPUaaS market is projected to grow at 30-40% annually through 2030, reaching an estimated $10+ billion globally by 2028, driven by demand from AI training and inference workloads that require GPU clusters too large for most organizations to own outright (IDC / Gartner, 2025). The term is often used interchangeably with "neocloud" in analyst reports, though GPUaaS describes the pricing model while neocloud describes the provider category.

Should I use a neocloud or AWS for AI training?

Use a neocloud if your workload is GPU training or large-batch inference that requires no managed cloud services alongside it. RunPod or Lambda Labs can reduce your GPU compute bill by 40-60% versus AWS on-demand rates.

Stay with AWS if your application infrastructure already runs on AWS and the cost of data transfer, separate vendor management, and additional orchestration overhead exceeds the GPU compute savings. A team running RDS, S3, Lambda, and EKS on AWS and adding a GPU training job may find staying on p5 instances operationally simpler than running a separate neocloud relationship.

The most common enterprise pattern is keeping application infrastructure on a hyperscaler and routing GPU-intensive training jobs to a neocloud for the compute cost advantage.

What networking infrastructure do neoclouds use?

Most enterprise neoclouds (CoreWeave, Lambda Labs) use InfiniBand HDR at 400 Gbps per port as the default networking fabric for H100 and A100 GPU clusters. InfiniBand delivers single-digit microsecond latency, which is required for gradient synchronization during distributed training across multiple GPU nodes.

Marketplace neoclouds (RunPod, Vast.ai) vary by host: some hosts run InfiniBand, others run high-speed Ethernet. Networking quality is host-dependent rather than uniformly guaranteed across the marketplace platform. If InfiniBand is required for a multi-node distributed training job, the specific host's specifications must be verified before provisioning.

Which companies are considered neoclouds?

The main neoclouds operating in 2026 are CoreWeave (largest by enterprise contract volume, H100 at ~$6.16/hr on-demand), Lambda Labs (ML researcher focus, dedicated InfiniBand clusters, $2.49-$3.44/hr), RunPod (marketplace, $1.99-$2.89/hr), Vast.ai (spot marketplace, lowest prices at $1.50-$2.58/hr), Together AI (inference API focus), and Crusoe Energy (sustainability angle, stranded-energy GPU clusters).

Paperspace, which operated as a neocloud through 2024, was acquired by DigitalOcean and has been partially integrated into that platform. Spheron and several smaller operators also provide neocloud services in specific regional markets.

Related Articles

Want hands-on setup guides?

These step-by-step guides relate to topics covered in this article.