Nvidia H200 Price: 2025 Cost Breakdown & Cheapest Cloud Options
Remember when 32GB of GPU VRAM seemed like a luxury? I sure do. Back in my Kaggle competition days, I was desperately trying to get my hands on a V100 with its "massive" 32GB VRAM. AWS had them, but by the time I cleared their quota hurdles, the competition was long over. Fast forward to 2025, and those numbers feel almost laughable.
Here's the short answer if you're in a hurry...
The Nvidia H200 costs $30 k–$40 k to buy outright and $3.72–$10.60 per GPU hour to rent (May 2025). Jarvislabs offers on‑demand H200 at $3.80/hr—cheapest single‑GPU access.
The latest LLaMA 4 models from Meta are pushing the boundaries, requiring a minimum of 80GB VRAM just to get started. It's a stark reminder of how quickly AI model requirements are evolving - what was once cutting-edge is now barely enough to run the smallest of today's foundation models.
I'll admit it - I was skeptical when NVIDIA announced the H200 GPUs last year. The price tag was steep, and I wasn't convinced that the 76% VRAM increase (from 80GB to 141GB) would justify the investment. But here's where I was wrong: the timing couldn't have been better. As models like DeepSeek and LLaMA 4 hit the scene, that extra VRAM became a game-changer. Let me break it down with a real-world example: Running LLaMA 4's larger models (like Maverick 400B) on H100s requires two full nodes with 8 GPUs each for a higher context window. But with H200s? You can do it on a single 8-GPU node. That's not just a convenience - it's a massive cost and complexity reduction that makes advanced AI more accessible to everyone.
H200 Price Snapshot (May 12, 2025)
TL;DR – Across major clouds, H200 GPU prices range from $3.72 to $10.60 per GPU hour. Jarvislabs offers the most affordable on-demand option at $3.80/hr, and is one of the few provider with 1-GPU rentals, making it ideal for individual developers and experimentation.
Cloud GPU Pricing Table
Provider | Instance / Shape | GPUs | Region | On-Demand Price | Per-GPU Price | Notes |
---|---|---|---|---|---|---|
Jarvislabs | H200 | 8×H200 | Europe | $30.4/hr | $3.80 | Lowest-cost option. Also supports 8×H200 for $30.40/hr. |
AWS | p5e.48xlarge | 8×H200 | Europe (STH) | $84.8/hr | $10.6 | EC2 Capacity Blocks hourly price is close to $32/hr. |
Azure | Standard_ND96isr_H200_v5 | 8×H200 | West US 3 | $84.80/hr | $10.60 | Prices vary by region. Cited from Azure calculator. |
Oracle | BM.GPU.H200.8 | 8×H200 | US Ashburn | $80.00/hr | $10.00 | Bare-metal server. Source: OCI pricing list. |
Google Cloud | A3-H200 (Spot) | 8×H200 | US Central 1 | TBA (Spot: $29.80/hr) | $3.72 | Spot pricing only; on-demand not published yet. 1 |
Note: Hyperscalers currently only offer H200s in 8-GPU bundles. The "Per-GPU Price" column divides total price by 8 for fair comparison.
Key Takeaways
- Jarvislabs is the one of the affordable H200 provider, especially for single-GPU access.
- Google Cloud Spot offers the lowest hourly rate at $3.72, but is preemptible.
- AWS provides a good price-to-performance balance with better availability than GCP. 2
- Azure remains the most expensive, suitable for high-availability needs.
- Oracle (OCI) matches Azure performance at a lower flat rate. 3
Hardware MSRP Insight
Configuration | MSRP Range | Resale Range | Notes |
---|---|---|---|
Single H200 GPU | $40,000–$55,000 | $35,000–$45,000 | Base configuration |
4-GPU HGX Board | $180,000–$220,000 | $160,000–$190,000 | Includes NVLink |
8-GPU Server | $400,000–$500,000 | $350,000–$420,000 | Full HGX system |
H200 vs H100: Spec Comparison
For a detailed breakdown of H100 pricing and availability, check out our NVIDIA H100 Price Guide.
Feature | H100 (SXM) | H200 (SXM) | 🔍 What Changes |
---|---|---|---|
Launch architecture | Hopper (2023) | Hopper + HBM3e (2024) | Same core silicon; memory subsystem upgraded |
GPU memory | 80 GB HBM3 | 141 GB HBM3e | +76 % capacity lets you keep ≥ 70 B-parameter models on a single card (NVIDIA) |
Memory bandwidth | 3.0 TB/s | 4.8 TB/s | +60 % throughput cuts batch-size bottlenecks on inference (NVIDIA) |
FP8 peak (Tensor Core) | 3.96 PFLOPS | 3.96 PFLOPS | Compute parity—raw flops aren't the selling point |
NVLink 4 speed | 900 GB/s | 900 GB/s | Multi-GPU scaling unchanged (NVIDIA) |
PCIe interface | Gen 5 ×16 | Gen 5 ×16 | No change |
Max TDP (SXM) | 700 W (configurable) | 700 W (configurable) | Same rack-power footprint (TRG Datacenters, Sunbird DCIM) |
Key takeaways — why the extra 61 GB matters
- Single-GPU fits for giant models. A lone H200 can load Llama 4 70B or Mixtral-8×22B in full precision, eliminating tensor-parallel gymnastics (splitting model tensors across GPUs) across two H100s.
- Bandwidth boosts context length. When you crank up sequence length (≥ 32 k tokens) the 4.8 TB/s pipe keeps attention kernels fed instead of stalling on HBM (High Bandwidth Memory).
- No free compute lunch. Peak TFLOPs are identical, so training throughput only rises if you're memory-bound.
- Power and networking stay flat. If your rack can cool an H100, it can cool an H200; NVLink fabric configs carry over 1-for-1.
When to choose which
Pick this | If you… |
---|---|
H200 | Need to run ≥ 70 B models, push long-context inference, or want the simplest 8-GPU DGX/HGX topology without sharding headaches. |
H100 | Care more about tokens/sec per dollar on models that already fit in 80 GB, e.g., Mistral 7B-MoE or SD-XL training runs. |
Benchmark Snapshot (Coming Soon)
We'll be running benchmarks to help you compare H200 vs H100 performance across different model sizes.
Toolchain
Model mix
Size band | Models we'll test | Why it matters |
---|---|---|
Medium (10B – 100B) | Gemma 3 27B, Llama 4 Scout 109B | Tests memory efficiency with moderate parameter counts |
Large (≥ 400B) | Llama 4 Maverick 400B, DeepSeek V3 671B | Shows H200's advantage for large MoE models |
Metrics
- Tokens/sec (prefill + generate)
- P95 latency
- Cost per 1M tokens
- Memory utilization
Conclusion & Next Steps
Want H200 performance without the $30k+ price tag? Here's the deal:
- Rent for $3.80/hr instead of buying outright
- Skip the hardware headaches (power, cooling, depreciation)
- Stay flexible for Blackwell GPUs coming later this year
The math is simple: launch an H200 in 90 seconds, scale when needed, pay as you use.
Ready to try it? Spin up an H200 now → or drop us an email at hello@jarvislabs.ai for custom quotes on multi-server setups with monthly commitments.
We'll keep this guide updated with the latest prices and benchmarks as they drop.
FAQ Corner — H200 Price
Q1. What is the hourly price of an NVIDIA H200 GPU right now?
A: As of May 2025, on-demand rates span $3.72 – $10.60 per GPU-hour across the big clouds, with Jarvislabs at $3.80/hr for single-GPU access.
Q2. Why is the H200 more expensive than the H100?
A: You're paying for memory: the H200 jumps from 80 GB HBM3 to 141 GB HBM3e and bumps bandwidth to 4.8 TB/s (+60 %), letting it swallow 70-B-parameter models on one card. Compute silicon is the same Hopper die.
Q3. Can I rent a single H200 or do I have to take an eight-GPU server?
A: Hyperscalers still ship H200 only in 8-GPU HGX nodes. Jarvislabs is one of the few platforms offering 1× H200 on demand at $3.80/hr, so you can prototype without paying for a whole server.
Q4. How much does an H200 cost to buy outright?
A: Channel quotes put MSRP between $40 k and $55 k per GPU, and an 8-GPU HGX server retails north of $400 k.
Q5. Is it cheaper to rent or buy if I need 24/7 access?
A: Running a single H200 on Jarvislabs 24 × 7 for a year is about $33 k (3.80 × 24 × 365), still ~35 % below the lowest hardware MSRP, plus you skip power, cooling, and depreciation headaches.
Need multiple servers or longer commitments? We offer volume discounts that can reduce your hourly rate by up to 40% for monthly and quarterly commitments. Drop us a line at hello@jarvislabs.ai to get a custom quote tailored to your needs.
Q6. Does the published H200 price include NVLink or networking fees?
A: Yes—NVLink/NVSwitch fabric is included in the instance rate. Unlike other providers, we don't charge extra for bandwidth or data transfer. What you see is what you pay.
Q7. Will H200 prices drop once Blackwell (B100/B200) ships?
A: Historically, previous-gen flagship GPUs see ~15 % list-price cuts within six months of the next generation's launch. With Blackwell B100 samples expected in Q4 2025, expect H200 rates to soften in early 2026, with spot/pre-emptible prices sliding first.