Nvidia H200 Price: 2025 Cost Breakdown & Cheapest Cloud Options

May 12, 2025 · 8 min read

Founder @JarvisLabs.ai

Remember when 32GB of GPU VRAM seemed like a luxury? I sure do. Back in my Kaggle competition days, I was desperately trying to get my hands on a V100 with its "massive" 32GB VRAM. AWS had them, but by the time I cleared their quota hurdles, the competition was long over. Fast forward to 2025, and those numbers feel almost laughable.

Here's the short answer if you're in a hurry...

The Nvidia H200 costs $30 k–$40 k to buy outright and $3.72–$10.60 per GPU hour to rent (May 2025). Jarvislabs offers on‑demand H200 at $3.80/hr—cheapest single‑GPU access.

The latest LLaMA 4 models from Meta are pushing the boundaries, requiring a minimum of 80GB VRAM just to get started. It's a stark reminder of how quickly AI model requirements are evolving - what was once cutting-edge is now barely enough to run the smallest of today's foundation models.

I'll admit it - I was skeptical when NVIDIA announced the H200 GPUs last year. The price tag was steep, and I wasn't convinced that the 76% VRAM increase (from 80GB to 141GB) would justify the investment. But here's where I was wrong: the timing couldn't have been better. As models like DeepSeek and LLaMA 4 hit the scene, that extra VRAM became a game-changer. Let me break it down with a real-world example: Running LLaMA 4's larger models (like Maverick 400B) on H100s requires two full nodes with 8 GPUs each for a higher context window. But with H200s? You can do it on a single 8-GPU node. That's not just a convenience - it's a massive cost and complexity reduction that makes advanced AI more accessible to everyone.

H200 Price Snapshot (May 12, 2025)

TL;DR – Across major clouds, H200 GPU prices range from $3.72 to $10.60 per GPU hour. Jarvislabs offers the most affordable on-demand option at $3.80/hr, and is one of the few provider with 1-GPU rentals, making it ideal for individual developers and experimentation.

Cloud GPU Pricing Table

Provider	Instance / Shape	GPUs	Region	On-Demand Price	Per-GPU Price	Notes
Jarvislabs	H200	8×H200	Europe	$30.4/hr	$3.80	Lowest-cost option. Also supports 8×H200 for $30.40/hr.
AWS	p5e.48xlarge	8×H200	Europe (STH)	$84.8/hr	$10.6	EC2 Capacity Blocks hourly price is close to $32/hr.
Azure	Standard_ND96isr_H200_v5	8×H200	West US 3	$84.80/hr	$10.60	Prices vary by region. Cited from Azure calculator.
Oracle	BM.GPU.H200.8	8×H200	US Ashburn	$80.00/hr	$10.00	Bare-metal server. Source: OCI pricing list.
Google Cloud	A3-H200 (Spot)	8×H200	US Central 1	TBA (Spot: $29.80/hr)	$3.72	Spot pricing only; on-demand not published yet. ¹

Note: Hyperscalers currently only offer H200s in 8-GPU bundles. The "Per-GPU Price" column divides total price by 8 for fair comparison.

Key Takeaways

Jarvislabs is the one of the affordable H200 provider, especially for single-GPU access.
Google Cloud Spot offers the lowest hourly rate at $3.72, but is preemptible.
AWS provides a good price-to-performance balance with better availability than GCP. ²
Azure remains the most expensive, suitable for high-availability needs.
Oracle (OCI) matches Azure performance at a lower flat rate. ³

Hardware MSRP Insight

Configuration	MSRP Range	Resale Range	Notes
Single H200 GPU	$40,000–$55,000	$35,000–$45,000	Base configuration
4-GPU HGX Board	$180,000–$220,000	$160,000–$190,000	Includes NVLink
8-GPU Server	$400,000–$500,000	$350,000–$420,000	Full HGX system

H200 vs H100: Spec Comparison

For a detailed breakdown of H100 pricing and availability, check out our NVIDIA H100 Price Guide.

Feature	H100 (SXM)	H200 (SXM)	🔍 What Changes
Launch architecture	Hopper (2023)	Hopper + HBM3e (2024)	Same core silicon; memory subsystem upgraded
GPU memory	80 GB HBM3	141 GB HBM3e	+76 % capacity lets you keep ≥ 70 B-parameter models on a single card (NVIDIA)
Memory bandwidth	3.0 TB/s	4.8 TB/s	+60 % throughput cuts batch-size bottlenecks on inference (NVIDIA)
FP8 peak (Tensor Core)	3.96 PFLOPS	3.96 PFLOPS	Compute parity—raw flops aren't the selling point
NVLink 4 speed	900 GB/s	900 GB/s	Multi-GPU scaling unchanged (NVIDIA)
PCIe interface	Gen 5 ×16	Gen 5 ×16	No change
Max TDP (SXM)	700 W (configurable)	700 W (configurable)	Same rack-power footprint (TRG Datacenters, Sunbird DCIM)

Key takeaways — why the extra 61 GB matters

Single-GPU fits for giant models. A lone H200 can load Llama 4 70B or Mixtral-8×22B in full precision, eliminating tensor-parallel gymnastics (splitting model tensors across GPUs) across two H100s.
Bandwidth boosts context length. When you crank up sequence length (≥ 32 k tokens) the 4.8 TB/s pipe keeps attention kernels fed instead of stalling on HBM (High Bandwidth Memory).
No free compute lunch. Peak TFLOPs are identical, so training throughput only rises if you're memory-bound.
Power and networking stay flat. If your rack can cool an H100, it can cool an H200; NVLink fabric configs carry over 1-for-1.

When to choose which

Pick this	If you…
H200	Need to run ≥ 70 B models, push long-context inference, or want the simplest 8-GPU DGX/HGX topology without sharding headaches.
H100	Care more about tokens/sec per dollar on models that already fit in 80 GB, e.g., Mistral 7B-MoE or SD-XL training runs.

Benchmark Snapshot (Coming Soon)

We'll be running benchmarks to help you compare H200 vs H100 performance across different model sizes.

Toolchain

Ollama – For single-request latency testing
SGLang – For throughput testing under batch loads

Model mix

Size band	Models we'll test	Why it matters
Medium (10B – 100B)	Gemma 3 27B, Llama 4 Scout 109B	Tests memory efficiency with moderate parameter counts
Large (≥ 400B)	Llama 4 Maverick 400B, DeepSeek V3 671B	Shows H200's advantage for large MoE models

Metrics

Tokens/sec (prefill + generate)
P95 latency
Cost per 1M tokens
Memory utilization

Conclusion & Next Steps

Want H200 performance without the $30k+ price tag? Here's the deal:

Rent for $3.80/hr instead of buying outright
Skip the hardware headaches (power, cooling, depreciation)
Stay flexible for Blackwell GPUs coming later this year

The math is simple: launch an H200 in 90 seconds, scale when needed, pay as you use.

Ready to try it? Spin up an H200 now → or drop us an email at hello@jarvislabs.ai for custom quotes on multi-server setups with monthly commitments.

We'll keep this guide updated with the latest prices and benchmarks as they drop.

FAQ Corner — H200 Price

Q1. What is the hourly price of an NVIDIA H200 GPU right now?
A: As of May 2025, on-demand rates span $3.72 – $10.60 per GPU-hour across the big clouds, with Jarvislabs at $3.80/hr for single-GPU access.

Q2. Why is the H200 more expensive than the H100?
A: You're paying for memory: the H200 jumps from 80 GB HBM3 to 141 GB HBM3e and bumps bandwidth to 4.8 TB/s (+60 %), letting it swallow 70-B-parameter models on one card. Compute silicon is the same Hopper die.

Q3. Can I rent a single H200 or do I have to take an eight-GPU server?
A: Hyperscalers still ship H200 only in 8-GPU HGX nodes. Jarvislabs is one of the few platforms offering 1× H200 on demand at $3.80/hr, so you can prototype without paying for a whole server.

Q4. How much does an H200 cost to buy outright?
A: Channel quotes put MSRP between $40 k and $55 k per GPU, and an 8-GPU HGX server retails north of $400 k.

Q5. Is it cheaper to rent or buy if I need 24/7 access?
A: Running a single H200 on Jarvislabs 24 × 7 for a year is about $33 k (3.80 × 24 × 365), still ~35 % below the lowest hardware MSRP, plus you skip power, cooling, and depreciation headaches.

Need multiple servers or longer commitments? We offer volume discounts that can reduce your hourly rate by up to 40% for monthly and quarterly commitments. Drop us a line at hello@jarvislabs.ai to get a custom quote tailored to your needs.

Q6. Does the published H200 price include NVLink or networking fees?
A: Yes—NVLink/NVSwitch fabric is included in the instance rate. Unlike other providers, we don't charge extra for bandwidth or data transfer. What you see is what you pay.

Q7. Will H200 prices drop once Blackwell (B100/B200) ships?
A: Historically, previous-gen flagship GPUs see ~15 % list-price cuts within six months of the next generation's launch. With Blackwell B100 samples expected in Q4 2025, expect H200 rates to soften in early 2026, with spot/pre-emptible prices sliding first.

H200 Price Snapshot (May 12, 2025)​

Cloud GPU Pricing Table​

Key Takeaways​

Hardware MSRP Insight​

H200 vs H100: Spec Comparison​

Key takeaways — why the extra 61 GB matters​

When to choose which​

Benchmark Snapshot (Coming Soon)​

Conclusion & Next Steps​

FAQ Corner — H200 Price​

Footnotes​