dstack
dstack is an open-source, AI-native orchestrator. You describe what you need in a YAML file — a dev environment, a training task, or a model endpoint — run dstack apply, and dstack provisions a Jarvislabs.ai GPU instance, runs the workload in a container, and deprovisions the instance when it's no longer needed.
With the native jarvislabs backend, dstack manages the full lifecycle: provisioning, queueing, logs, SSH access, port forwarding, and automatic termination of idle instances.
Setup
1. Install the server and CLI
Install dstack and start the server (it can run on your laptop):
pip install "dstack[all]" -U
dstack server
Applying ~/.dstack/server/config.yml...
The admin token is "bbae0f28-d3dd-4820-bf61-8f4bb40815da"
The server is running at http://127.0.0.1:3000/
Point the CLI to the server using the admin token from the output:
dstack project add \
--name main \
--url http://127.0.0.1:3000 \
--token bbae0f28-d3dd-4820-bf61-8f4bb40815da
If you work with AI coding agents (Claude Code, Cursor, Codex), install the dstack skill so your agent can write configs and manage dstack runs for you. To have your agent work with JarvisLabs directly — launching instances, running scripts, and monitoring experiments — use the jl CLI.
2. Configure the jarvislabs backend
Create an API key from jarvislabs.ai/settings/api-keys, add the backend to ~/.dstack/server/config.yml, and restart the server:
projects:
- name: main
backends:
- type: jarvislabs
creds:
type: api_key
api_key: your-api-key
To list the GPUs available to you, with prices:
dstack offer -b jarvislabs
dstack requests hardware via a resources spec in the form <model>:<memory>:<count> (memory and count are optional):
| GPU | Example resources.gpu values |
|---|---|
| NVIDIA L4 24GB | L4, L4:24GB |
| NVIDIA H100 | H100, H100:2 |
| NVIDIA H200 | H200 |
| NVIDIA RTX PRO 6000 | RTXPRO6000, RTXPRO6000:8 |
3. Create a fleet
Before submitting runs, you must create a fleet — a pool of instances that runs are scheduled onto. Create fleet.dstack.yml:
type: fleet
name: jarvislabs-fleet
# Allow to provision up to 2 instances on demand
nodes: 0..2
# Deprovision instances if they stay idle longer than this
idle_duration: 1h
backends: [jarvislabs]
resources:
# Allow to provision instances with up to 8 GPUs
gpu: 0..8
dstack apply -f fleet.dstack.yml
Since nodes starts with 0, this only creates a template — instances are provisioned when you submit runs and deprovisioned after idle_duration. The fleet's resources must cover whatever your runs request, so the broad gpu: 0..8 spec above works for all the examples below.
Dev environments
A dev environment gives you a GPU machine with SSH and desktop IDE access — ideal for interactive work. Create dev.dstack.yml:
type: dev-environment
name: my-dev
ide: vscode
resources:
gpu: H100
dstack apply -f dev.dstack.yml
Once provisioned, the CLI prints an IDE URL you can click to open the machine directly in VS Code Desktop (Cursor is supported too):
To open in VS Code Desktop, use this link:
vscode://vscode-remote/ssh-remote+my-dev/workflow
Alternatively, connect with ssh my-dev — dstack configures the SSH alias automatically.
Add inactivity_duration: 2h to automatically stop the environment after two hours of inactivity, so an idle instance doesn't keep billing.
Tasks
A task runs commands to completion — ideal for training and fine-tuning. To make your code available inside the container, mount it via files:
type: task
name: train
python: "3.12"
# Upload local files to the container
files:
- train.py
- requirements.txt
commands:
- uv pip install -r requirements.txt
- python train.py
resources:
gpu: H100:2
shm_size: 24GB
dstack apply -f train.dstack.yml
Logs stream to your terminal, and the instance is released when the task finishes.
A few options worth knowing:
- Repos — to work with an entire project instead of individual files, use
repos: dstack clones your Git repo on the instance and applies local changes. - Ports — if the task runs a web app (Streamlit, TensorBoard, etc.), list its
portsanddstack applyforwards them tolocalhost. - Docker image — if
imageis not specified, dstack uses its base image withuv, Python, and CUDA drivers pre-installed. Setimageto use any custom Docker image (withregistry_authfor private registries). - Guardrails —
max_durationcaps runaway jobs;retryresubmits on errors;utilization_policyterminates runs with underutilized GPUs.
See the task reference for all options.
Services
A service deploys a model or web app as a persistent endpoint. For example, serving Qwen3-8B with SGLang on an RTX PRO 6000:
type: service
name: qwen3-rtx-pro-6000
image: lmsysorg/sglang:latest
env:
- HF_TOKEN
- MODEL_ID=Qwen/Qwen3-8B
commands:
- |
sglang serve \
--model-path $MODEL_ID \
--host 0.0.0.0 \
--port 8000 \
--tp $DSTACK_GPUS_NUM \
--reasoning-parser qwen3 \
--context-length 8192
port: 8000
# Register the model on dstack's OpenAI-compatible endpoint
model: Qwen/Qwen3-8B
# Cache model weights on the instance to speed up restarts
volumes:
- instance_path: /root/.cache
path: /root/.cache
optional: true
resources:
gpu: RTXPRO6000:96GB
disk: 200GB
dstack apply -f service.dstack.yml
Once up, the CLI prints the service endpoint:
Service is published at:
http://localhost:3000/proxy/services/main/qwen3-rtx-pro-6000/
Model Qwen/Qwen3-8B is published at:
http://localhost:3000/proxy/models/main/
The model property makes the service available via an OpenAI-compatible API, so any OpenAI client works. Test it with curl, using your dstack user token for authorization:
curl http://127.0.0.1:3000/proxy/services/main/qwen3-rtx-pro-6000/v1/chat/completions \
-H 'Authorization: Bearer <user token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "Qwen/Qwen3-8B",
"messages": [{"role": "user", "content": "Reply with exactly: jarvislabs rtx pro 6000 ok"}],
"chat_template_kwargs": {"enable_thinking": false},
"max_tokens": 64
}'
Services also support replicas, auto-scaling, and custom domains with HTTPS via gateways.
Managing runs
| Command | Description |
|---|---|
dstack ps | List runs and their status |
dstack logs <run> | View logs of a run |
dstack stop <run> | Stop a run |
dstack metrics <run> | View GPU utilization metrics |
dstack offer -b jarvislabs | List available GPUs and prices |
dstack fleet | List fleets and instances |
Learn more
- dstack x JarvisLabs tutorial — hands-on walkthrough with more examples: an elastic L4 fleet, a first task, and a nanochat training run on 2x H100
- dstack docs — concepts, CLI, and API reference
- Examples — training (TRL, Axolotl) and inference (SGLang, vLLM, TensorRT-LLM) recipes
- GitHub — source code and issues