NVIDIA A100 GPU Price in 2026: Cost Per Hour, Cloud Pricing & Specs
The NVIDIA A100 GPU price has dropped significantly as everyone chases H100s and H200s — and that's great news if you want an A100. The GPU that trained GPT-3 and powered the first wave of open-source LLMs is now available at $1.49/hr — and for most workloads, it's more than enough.
Here's the short answer if you're in a hurry...
The NVIDIA A100 80GB GPU costs $7,000-$15,000 to buy (new) or $4,000-$9,000 used. Cloud rental ranges from $1.49 to $3.43 per GPU hour (March 2026). Jarvislabs offers on-demand A100 80GB access at $1.49/hr with per-minute billing — no commitments, no minimum rental period.
At $1.49/hr, the A100 is roughly half the cost of an H100 ($2.99/hr) — and for inference, fine-tuning, and training medium-sized models, the performance difference rarely justifies the price gap.
How Much Does an NVIDIA A100 GPU Cost?
The NVIDIA A100 80GB GPU costs between $7,000 and $15,000 to buy new, or $4,000–$9,000 on the used market. For cloud GPU rental, A100 pricing ranges from $1.49 to $3.43 per GPU hour depending on provider and configuration. Jarvislabs offers the A100 80GB at $1.49/hr with per-minute billing and no commitments — making it the cheapest way to access A100 compute on demand.
NVIDIA A100 Price Snapshot (March 2026)
Cloud GPU Pricing Table
| Provider | GPU Config | On-Demand Price | Billing | Notes |
|---|---|---|---|---|
| Jarvislabs | A100 80GB SXM | $1.49/hr | Per-minute | Single GPU available. No commitments. |
| Lambda Labs | A100 80GB SXM | $2.06/hr | Per-hour | Academic-friendly. Limited availability. |
| RunPod | A100 80GB SXM | $1.49/hr | Per-second | Secure Cloud. Community Cloud from $1.39/hr. |
| AWS | p4de.24xlarge (8×A100 80GB) | $27.45/hr ($3.43/GPU) | Per-second | 8-GPU minimum. On-demand. |
| Azure | Standard_ND96asr_v4 | $27.20/hr ($3.40/GPU) | Per-hour | 8×A100 40GB. ND96amsr for 80GB. |
Note: Hyperscalers (AWS, Azure) typically require multi-GPU instances, which means you're paying for 8 GPUs even if you only need one. Jarvislabs, Lambda, and RunPod let you rent individual A100 GPUs — a massive cost advantage if you don't need 8 GPUs.
Key Takeaways
- A100 is now 40-60% cheaper than H100 across all providers, making it the clear value leader.
- RunPod matches at $1.49/hr (Secure Cloud), with Community Cloud from $1.39/hr.
- Lambda Labs offers A100 80GB at $2.06/hr with academic-friendly terms.
- Hyperscalers remain 2-3x more expensive per GPU due to bundled instance pricing.
- Jarvislabs offers per-minute billing so you never pay for idle time — a big deal when you're experimenting.
Hardware Purchase Pricing
| Configuration | New Price | Used / Refurbished | Notes |
|---|---|---|---|
| A100 40GB PCIe | $5,000-$8,000 | $2,000-$4,000 | Older config, good for inference |
| A100 80GB PCIe | $8,000-$12,000 | $4,000-$7,000 | Most common on secondary market |
| A100 80GB SXM | $10,000-$15,000 | $5,000-$9,000 | Best performance, requires SXM baseboard |
| DGX A100 (8×GPU) | $150,000-$200,000 | $80,000-$120,000 | Full system with NVLink and networking |
A100 purchase prices have dropped dramatically as enterprises upgrade to H100 and H200 systems. The secondary market is flooded with used units — this is genuinely the best time to buy if you're building on-premises infrastructure and want a reliable, proven GPU.
NVIDIA A100 Specs: Full GPU Specifications & VRAM Details
The A100 was NVIDIA's flagship data center GPU from the Ampere generation. Here are the full specs:
| Specification | A100 80GB SXM | A100 80GB PCIe | A100 40GB |
|---|---|---|---|
| Architecture | Ampere | Ampere | Ampere |
| GPU Memory | 80GB HBM2e | 80GB HBM2e | 40GB HBM2e |
| Memory Bandwidth | 2.0 TB/s | 2.0 TB/s | 1.6 TB/s |
| FP32 Performance | 19.5 TFLOPS | 19.5 TFLOPS | 19.5 TFLOPS |
| FP16 Tensor | 312 TFLOPS | 312 TFLOPS | 312 TFLOPS |
| TF32 Tensor | 156 TFLOPS | 156 TFLOPS | 156 TFLOPS |
| INT8 Tensor | 624 TOPS | 624 TOPS | 624 TOPS |
| TDP | 400W | 300W | 250-400W |
| NVLink | 3rd Gen (600 GB/s) | Via NVLink Bridge (2-GPU, 600 GB/s) | 3rd Gen (600 GB/s) |
| PCIe | Gen4 ×16 | Gen4 ×16 | Gen4 ×16 |
| MIG Support | Up to 7 instances | Up to 7 instances | Up to 7 instances |
Why 80GB Matters
The jump from 40GB to 80GB isn't just "more VRAM." It fundamentally changes what you can do with a single GPU:
- LLaMA 3 8B fits comfortably in FP16 on a 40GB A100, but LLaMA 3 70B needs 80GB (quantized) or multiple GPUs
- Fine-tuning with LoRA: 80GB lets you fine-tune 70B models on a single GPU with QLoRA — on 40GB you're limited to ~13B models
- vLLM inference: More VRAM means larger KV-cache, which directly translates to higher throughput and longer context windows. Check our vLLM optimization guide for practical tips.
- Batch processing: Larger batches = better GPU utilization = lower cost per token
NVIDIA A100 Performance Benchmarks (2026)
Here's what we see in practice:
LLM Inference Performance
| Model | A100 80GB (tokens/sec) | H100 80GB (tokens/sec) | A100 Cost/1M tokens |
|---|---|---|---|
| LLaMA 3 8B (FP16) | ~4,200 | ~8,500 | ~$0.10 |
| LLaMA 3 70B (INT8) | ~850 | ~2,100 | ~$0.52 |
| Mixtral 8x7B | ~1,800 | ~4,200 | ~$0.24 |
| Qwen 2.5 32B (FP16) | ~1,200 | ~3,000 | ~$0.37 |
These benchmarks use vLLM with default settings. Throughput can improve 20-40% with prefix caching and FP8 KV-cache optimization.
Training & Fine-Tuning Performance
| Task | A100 80GB | H100 80GB | A100 Savings |
|---|---|---|---|
| LoRA fine-tune LLaMA 3 8B | ~2 hours | ~0.8 hours | ~50% cheaper despite 2.5x slower |
| Full fine-tune LLaMA 3 8B (4×GPU) | ~18 hours | ~7 hours | ~35% cheaper |
| Train small model from scratch (1B params) | ~48 hours | ~20 hours | ~40% cheaper |
Bottom line: The A100 is slower than the H100, but it's proportionally much cheaper. For workloads where time-to-completion isn't critical — fine-tuning overnight, running batch inference, experimenting with model architectures — the A100 delivers better value per dollar.
A100 vs H100 Price and Performance Comparison
We have a detailed H100 vs A100 comparison, but here's the quick decision framework:
| Choose A100 if... | Choose H100 if... |
|---|---|
| Budget is the primary constraint | Training speed is critical |
| Running inference at moderate scale | Serving high-throughput production inference |
| Fine-tuning with LoRA/QLoRA | Training large models from scratch |
| Experimenting or prototyping | Need FP8 native support |
| Model fits in 80GB VRAM | Need the latest architecture optimizations |
| Cost per token matters more than latency | Time-to-first-token is your bottleneck |
For a deeper dive into the architectural differences — FP8, Transformer Engine, memory bandwidth — see our full H100 vs A100 guide.
A100 vs L4: Price and Performance Comparison
If you're choosing between an A100 and an L4, you're comparing a heavyweight against a lightweight. Both are excellent GPUs — for very different reasons:
| Spec | A100 80GB | L4 24GB |
|---|---|---|
| VRAM | 80GB HBM2e | 24GB GDDR6 |
| Memory Bandwidth | 2.0 TB/s | 300 GB/s |
| Architecture | Ampere | Ada Lovelace |
| FP16 Tensor | 312 TFLOPS | 121 TFLOPS |
| INT8 Tensor | 624 TOPS | 242 TOPS |
| TDP | 400W | 72W |
| Cloud Cost | ~$1.49/hr | ~$0.44/hr |
| Best For | Training + large inference | Cost-efficient small inference |
| Choose A100 if... | Choose L4 if... |
|---|---|
| Running models over 24GB VRAM | Running smaller models (under 24GB) |
| Need high memory bandwidth | Power efficiency is critical |
| Training or fine-tuning | Inference-only workloads |
| Serving 70B+ models | Serving 7B-13B models |
| Need maximum throughput | Need the lowest cost per token for small models |
The L4 is a power efficiency champion — 72W TDP means dramatically lower operating costs. For inference workloads that fit in 24GB (LLaMA 3 8B, Mistral 7B, Qwen 2.5 14B quantized), the L4 is hard to beat on cost. But once you need more than 24GB of VRAM or higher memory bandwidth, the A100 is really the next step up. For a detailed comparison with benchmarks, see our L4 vs A100 guide.
A100 vs RTX 4090: Data Center vs Consumer GPU Comparison
The RTX 4090 is NVIDIA's most powerful consumer GPU and a popular choice for local AI workloads. Here's how it stacks up against the A100:
| Spec | A100 80GB | RTX 4090 24GB |
|---|---|---|
| VRAM | 80GB HBM2e | 24GB GDDR6X |
| Memory Bandwidth | 2.0 TB/s | 1.0 TB/s |
| Architecture | Ampere | Ada Lovelace |
| FP16 Tensor | 312 TFLOPS | 165 TFLOPS |
| INT8 Tensor | 624 TOPS | 330 TOPS |
| TDP | 400W | 450W |
| Multi-GPU | NVLink (600 GB/s) | No NVLink |
| MIG Support | Yes (up to 7 instances) | No |
| Cloud Cost | ~$1.49/hr | Not widely available |
| Choose A100 if... | Choose RTX 4090 if... |
|---|---|
| Running models that need >24GB VRAM | Running models that fit in 24GB |
| Need multi-GPU scaling with NVLink | Working locally on a single GPU |
| Running production inference serving | Prototyping and personal projects |
| Need MIG for multi-tenant workloads | Cost-sensitive and own the hardware |
| Cloud-based workflow | Prefer local development |
The RTX 4090 offers strong single-GPU performance at a lower purchase price (~$1,600-$2,000), but it lacks the A100's VRAM capacity, multi-GPU interconnect, and data center features. For serious LLM work — especially models over 13B parameters — the A100's 80GB VRAM is the deciding factor.
Buy vs Rent A100 GPU: Cost Analysis
Monthly Cost Comparison
Scenario: 1×A100 80GB SXM
| Usage Pattern | Cloud Cost (Jarvislabs) | Ownership Cost | Winner |
|---|---|---|---|
| 24/7 (720 hrs/mo) | $1.49 × 720 = ~$1,073/mo | ~$500/mo (depreciation + power + cooling) | Ownership (after ~12 months) |
| 8 hrs/day (240 hrs/mo) | $1.49 × 240 = ~$358/mo | Same fixed: ~$500/mo | Cloud |
| Variable / on-demand | Pay only when used | Same fixed: ~$500/mo | Cloud (always) |
The Hidden Costs of Buying
Hidden costs of buying used:
- Power: 400W × $0.12/kWh × 24hrs × 30days = ~$35/month per GPU
- Cooling: Industrial cooling for 400W adds $15-30/month per GPU
- Network: InfiniBand for NVLink setups costs $2,000-5,000 upfront
- Depreciation: A100s are losing ~20-30% of value per year as newer GPUs ship
- Risk: Hardware failure means downtime and replacement costs — no SLA to fall back on
Our recommendation: Rent unless you have a guaranteed 24/7 workload with 18+ months of runway and in-house infrastructure expertise. The A100 is losing value fast, so buying only makes sense if you're sure you'll use it long enough to break even.
Frequently Asked Questions About NVIDIA A100 Price
How much does an NVIDIA A100 GPU cost?
The NVIDIA A100 80GB GPU costs $8,000-$15,000 new or $4,000-$9,000 used (March 2026). For cloud rentals, A100 prices range from $1.49 to $3.43 per GPU hour depending on the provider, with Jarvislabs offering competitive on-demand rates at $1.49/hour with per-minute billing.
Is the A100 still worth it in 2026?
For cloud rental — absolutely. The A100 offers the best price-to-performance ratio for inference, fine-tuning, and medium-scale training. For purchasing hardware, it depends: the secondary market offers great deals, but depreciation is accelerating as H100 and H200 become the new standard.
A100 40GB vs 80GB — which should I rent?
Always go for the 80GB if available. The price difference is minimal ($0.20-0.50/hr more), but the 80GB version lets you run models and batch sizes that simply won't fit in 40GB. The 40GB variant is increasingly hard to find on cloud providers anyway.
How does A100 pricing compare to H100?
The A100 is typically 40-60% cheaper than the H100 per hour. On Jarvislabs, A100 80GB costs $1.49/hr vs $2.99/hr for H100. The H100 is 2-3x faster for most workloads, so the cost per unit of work is often similar — but the A100 wins when you don't need maximum speed and want to keep hourly costs low.
Can I run LLaMA 3 70B on an A100?
Yes. LLaMA 3 70B fits on a single A100 80GB when quantized to INT8 or INT4 (using vLLM with AWQ or GPTQ quantization). For full FP16 precision, you'll need 2 A100 80GB GPUs with tensor parallelism. See our vLLM quantization guide for detailed benchmarks.
Is the A100 good for vLLM inference?
Excellent choice. The A100's 80GB VRAM and 2.0 TB/s memory bandwidth make it a strong platform for vLLM inference. We've published extensive benchmarks in our vLLM optimization guide and vLLM quantization guide — all tested on A100 and H200 GPUs.
How much does it cost to fine-tune a model on A100?
Fine-tuning costs on A100 80GB (estimated at $1.49/hr):
- 7B model (LoRA): ~2 hours = ~$3
- 70B model (QLoRA): ~8 hours = ~$12
- 70B model (Full fine-tune, 4×A100): ~18 hours × 4 GPUs = ~$107
These are significantly cheaper than equivalent H100 runs. See our LLM fine-tuning tutorial for step-by-step instructions.
Will A100 prices continue to drop?
Cloud prices have largely stabilized. We don't expect significant further drops since providers have already adjusted to the H100/H200 market reality. Purchase prices for used hardware may decline another 10-15% through 2026 as more enterprises upgrade to Blackwell GPUs.
How much VRAM does the A100 have?
The NVIDIA A100 comes in two VRAM configurations: 40GB HBM2e (older, less common) and 80GB HBM2e (standard). The 80GB variant offers 2.0 TB/s memory bandwidth on the SXM version, making it well-suited for large language model inference and training.
Is it cheaper to rent or buy an A100 for 24/7 use?
At current cloud rates, running a single A100 80GB on Jarvislabs 24/7 for a year costs approximately $1.49 × 24 × 365 = ~$13,052. A used A100 80GB SXM costs $5,000-$9,000, plus ~$500-600/year in power and cooling. The break-even point is roughly 6-9 months — but factor in depreciation risk, maintenance, and the flexibility of cloud before committing to a purchase.
How much does a DGX A100 cost?
The NVIDIA DGX A100 — a complete 8×A100 server with NVLink, networking, and NVMe storage — costs $150,000-$200,000 new or $80,000-$120,000 used. For most teams, renting 8 individual A100 GPUs in the cloud is far more practical: 8 × $1.49/hr = $11.92/hr on Jarvislabs, with no upfront capital.
What is the A100 memory bandwidth?
The A100 80GB SXM delivers 2.0 TB/s memory bandwidth via HBM2e — 6.8x higher than the L4's 300 GB/s and critical for memory-bandwidth-bound workloads like LLM token generation. The PCIe version also delivers 2.0 TB/s. The older 40GB variant has 1.6 TB/s.
A100 SXM vs PCIe — what's the difference?
The SXM version supports NVLink (600 GB/s GPU-to-GPU bandwidth) and has a higher TDP (400W vs 300W for PCIe). In practice, the SXM version is faster for multi-GPU training due to NVLink, while the PCIe version fits in standard server chassis. Both have the same VRAM (80GB) and memory bandwidth (2.0 TB/s). Cloud providers mostly offer SXM.
Can the A100 run Stable Diffusion?
Yes, comfortably. The A100's 80GB VRAM can load SDXL, ControlNet, and an upscaler simultaneously — something smaller GPUs struggle with. It generates SDXL images at ~12-15 images/min at 1024×1024. For pure image generation on a budget, the L4 is cheaper per image, but the A100 excels when you need multiple models loaded at once.
What models fit on an A100 80GB?
In FP16: models up to ~40B parameters (LLaMA 3 8B, Mistral 7B, Qwen 2.5 32B). In INT8: models up to ~70B parameters (LLaMA 3 70B with vLLM). In INT4/GPTQ: models up to ~130B parameters (LLaMA 3.1 70B with extra KV-cache room). The 80GB VRAM is what makes the A100 versatile — most production LLMs fit on a single card.
Where can I rent an A100 GPU?
You can rent individual A100 80GB GPUs from Jarvislabs ($1.49/hr, per-minute billing), RunPod ($1.49/hr), and Lambda Labs ($2.06/hr). AWS and Azure offer A100s only in 8-GPU instances ($3.40-$3.43/GPU/hr). For single-GPU workloads, Jarvislabs and RunPod offer the best value.
NVIDIA A100 GPU: Is It Worth It in 2026?
In 2026, the A100 80GB is the best price-to-performance GPU for teams that don't need the latest Hopper or Blackwell silicon. Whether you're fine-tuning LLMs, running inference with vLLM, or training medium-sized models, the A100 gets the job done at 40-60% less per hour than the H100.
- For inference: Start with an A100 80GB. If throughput isn't enough, scale up to H100.
- For fine-tuning: A100 80GB handles LoRA/QLoRA on models up to 70B parameters.
- For large model serving: Use 2×A100 80GB with tensor parallelism for 70B+ models in full precision.
- For training from scratch: Consider H100 or H200 — the speed difference justifies the cost for long training runs.
Ready to try it? Launch an A100 on Jarvislabs — it takes 90 seconds, bills per minute, and requires no commitment. Or reach out at support@jarvislabs.ai for custom quotes on multi-GPU setups.
Once you're set up, check out our guides on optimizing inference with vLLM or running quantized models to get the most out of your A100.
Last updated: March 2026. Prices verified against provider websites.
Related Guides:
- NVIDIA L4 GPU: Price & Specs Guide — the budget alternative for inference under 24GB
- NVIDIA L4 vs A100 Comparison — detailed specs, benchmarks, and when to choose each
- NVIDIA H100 vs A100 Comparison — for when you're choosing between A100 and H100
- NVIDIA H200 Price Guide — the next-gen alternative with 141GB HBM3e
- vLLM Optimization Techniques — get 20-40% more throughput from your A100
- vLLM Quantization Guide — run larger models on A100 with quantization
Need custom A100 pricing? For multiple GPUs or monthly commitments, we offer volume discounts that can reduce your hourly rate significantly. Contact us at support@jarvislabs.ai for a custom quote.