Skip to main content

Jarvislabs Blog

How We Made GPU Instance Launch 4x Faster
March 10, 2026

How We Made GPU Instance Launch 4x Faster

From 8 seconds to 1.8 — how we tore apart every layer of our instance creation pipeline in three days to make GPU launches feel instant.

Vishnu SubramanianVishnu Subramanian
14 min read
Deploying MiniMax M2.1 with vLLM: Complete Guide for Agentic Workloads
December 26, 2025

Deploying MiniMax M2.1 with vLLM: Complete Guide for Agentic Workloads

Learn how to deploy MiniMax M2.1 with vLLM for agentic workloads and coding assistants. Covers hardware requirements, tensor/expert parallelism, benchmarking on InstructCoder, tool calling with interleaved thinking, and integration with Claude Code, Cline, and Cursor.

Atharva IngleAtharva Ingle
10 min read
Speculative Decoding in vLLM: Complete Guide to Faster LLM Inference
December 18, 2025

Speculative Decoding in vLLM: Complete Guide to Faster LLM Inference

Learn how to speed up LLM inference by 1.4-1.6x using speculative decoding in vLLM. This guide covers Draft Models, N-Gram Matching, Suffix Decoding, MLP Speculators, and EAGLE-3 with real benchmarks on Llama-3.1-8B and Llama-3.3-70B.

Jaydev TondeJaydev Tonde
34 min read
CUDA Cores Explained
December 8, 2024

CUDA Cores Explained

A deep dive into CUDA cores, Tensor Cores, precision modes, and other specialized GPU features that impact performance.

7 min read
NVIDIA H100 vs A100: Detailed GPU Comparison for 2026
December 4, 2024

NVIDIA H100 vs A100: Detailed GPU Comparison for 2026

H100 vs A100 GPU comparison: specs, benchmarks, pricing & which to choose. H100 is 2-3x faster, A100 is 40-60% cheaper. Updated February 2026 with latest cloud pricing.

Vishnu SubramanianVishnu Subramanian
12 min read