Skip to main content

One post tagged with "Performance"

View All Tags

How We Made GPU Instance Launch 4x Faster

· 14 min read
Vishnu Subramanian
Founder @JarvisLabs.ai

When we launched our new GPU region in Noida, India, spinning up a GPU instance took about 8 seconds. For a researcher clicking "Launch" and waiting for a notebook, that's annoying. But that's not the real reason we decided to fix it.

The way people use GPU instances is changing. Andrej Karpathy recently open-sourced autoresearch — a project where an AI agent autonomously runs dozens of training experiments, each one tweaking hyperparameters, architecture choices, and optimizer settings. The human writes the prompt. The agent iterates on the code. Every experiment is a separate GPU run.

This is the future we're building for. When an agent needs to spin up hundreds of experiments — launching instances, running training, tearing them down, and launching the next batch — 8 seconds per instance isn't just slow. It's a bottleneck that defeats the purpose of automation. A hundred experiments means over 13 minutes spent just waiting for instances to boot.

In about three days of focused work, we tore apart every layer of the instance creation pipeline — networking, storage, container runtime, database, even logging — and brought that number down to 1.8 seconds. A 4x improvement. At that speed, the same hundred experiments lose less than 3 minutes to startup.

This post is the story of how we did it: what we measured, what surprised us, and the specific optimizations that got us from 8 seconds to under 2.

Region availability

These optimizations are live in our new Noida region today. Our older regions still run the previous architecture, but we'll be deprecating those over time. Every future region we launch will ship with these improvements from day one.

And we're not done — our goal is to push launch times under a second in the coming weeks.