Skip to main content

Jarvislabs Blog

Deploying MiniMax M2.1 with vLLM: Complete Guide for Agentic Workloads
December 26, 2025

Deploying MiniMax M2.1 with vLLM: Complete Guide for Agentic Workloads

Learn how to deploy MiniMax M2.1 with vLLM for agentic workloads and coding assistants. Covers hardware requirements, tensor/expert parallelism, benchmarking on InstructCoder, tool calling with interleaved thinking, and integration with Claude Code, Cline, and Cursor.

Atharva IngleAtharva Ingle
10 min read
Speculative Decoding in vLLM: Complete Guide to Faster LLM Inference
December 18, 2025

Speculative Decoding in vLLM: Complete Guide to Faster LLM Inference

Learn how to speed up LLM inference by 1.4-1.6x using speculative decoding in vLLM. This guide covers Draft Models, N-Gram Matching, Suffix Decoding, MLP Speculators, and EAGLE-3 with real benchmarks on Llama-3.1-8B and Llama-3.3-70B.

Jaydev TondeJaydev Tonde
34 min read
CUDA Cores Explained
December 8, 2024

CUDA Cores Explained

A deep dive into CUDA cores, Tensor Cores, precision modes, and other specialized GPU features that impact performance.

7 min read