dstack

dstack is an open-source, AI-native orchestrator. You describe what you need in a YAML file — a dev environment, a training task, or a model endpoint — run dstack apply, and dstack provisions a Jarvislabs.ai GPU instance, runs the workload in a container, and deprovisions the instance when it's no longer needed.

With the native jarvislabs backend, dstack manages the full lifecycle: provisioning, queueing, logs, SSH access, port forwarding, and automatic termination of idle instances.

Setup

1. Install the server and CLI

Install dstack and start the server (it can run on your laptop):

pip install "dstack[all]" -U
dstack server

Applying ~/.dstack/server/config.yml...

The admin token is "bbae0f28-d3dd-4820-bf61-8f4bb40815da"
The server is running at http://127.0.0.1:3000/

Point the CLI to the server using the admin token from the output:

dstack project add \
  --name main \
  --url http://127.0.0.1:3000 \
  --token bbae0f28-d3dd-4820-bf61-8f4bb40815da

tip

If you work with AI coding agents (Claude Code, Cursor, Codex), install the dstack skill so your agent can write configs and manage dstack runs for you. To have your agent work with JarvisLabs directly — launching instances, running scripts, and monitoring experiments — use the jl CLI.

2. Configure the `jarvislabs` backend

Create an API key from jarvislabs.ai/settings/api-keys, add the backend to ~/.dstack/server/config.yml, and restart the server:

projects:
- name: main
  backends:
    - type: jarvislabs
      creds:
        type: api_key
        api_key: your-api-key

To list the GPUs available to you, with prices:

dstack offer -b jarvislabs

dstack requests hardware via a resources spec in the form <model>:<memory>:<count> (memory and count are optional):

GPU	Example `resources.gpu` values
NVIDIA L4 24GB	`L4`, `L4:24GB`
NVIDIA H100	`H100`, `H100:2`
NVIDIA H200	`H200`
NVIDIA RTX PRO 6000	`RTXPRO6000`, `RTXPRO6000:8`

3. Create a fleet

Before submitting runs, you must create a fleet — a pool of instances that runs are scheduled onto. Create fleet.dstack.yml:

type: fleet
name: jarvislabs-fleet

# Allow to provision up to 2 instances on demand
nodes: 0..2

# Deprovision instances if they stay idle longer than this
idle_duration: 1h

backends: [jarvislabs]

resources:
  # Allow to provision instances with up to 8 GPUs
  gpu: 0..8

dstack apply -f fleet.dstack.yml

Since nodes starts with 0, this only creates a template — instances are provisioned when you submit runs and deprovisioned after idle_duration. The fleet's resources must cover whatever your runs request, so the broad gpu: 0..8 spec above works for all the examples below.

Dev environments

A dev environment gives you a GPU machine with SSH and desktop IDE access — ideal for interactive work. Create dev.dstack.yml:

type: dev-environment
name: my-dev

ide: vscode

resources:
  gpu: H100

dstack apply -f dev.dstack.yml

Once provisioned, the CLI prints an IDE URL you can click to open the machine directly in VS Code Desktop (Cursor is supported too):

To open in VS Code Desktop, use this link:
  vscode://vscode-remote/ssh-remote+my-dev/workflow

Alternatively, connect with ssh my-dev — dstack configures the SSH alias automatically.

tip

Add inactivity_duration: 2h to automatically stop the environment after two hours of inactivity, so an idle instance doesn't keep billing.

Tasks

A task runs commands to completion — ideal for training and fine-tuning. To make your code available inside the container, mount it via files:

type: task
name: train

python: "3.12"

# Upload local files to the container
files:
  - train.py
  - requirements.txt

commands:
  - uv pip install -r requirements.txt
  - python train.py

resources:
  gpu: H100:2
  shm_size: 24GB

dstack apply -f train.dstack.yml

Logs stream to your terminal, and the instance is released when the task finishes.

A few options worth knowing:

Repos — to work with an entire project instead of individual files, use repos: dstack clones your Git repo on the instance and applies local changes.
Ports — if the task runs a web app (Streamlit, TensorBoard, etc.), list its ports and dstack apply forwards them to localhost.
Docker image — if image is not specified, dstack uses its base image with uv, Python, and CUDA drivers pre-installed. Set image to use any custom Docker image (with registry_auth for private registries).
Guardrails — max_duration caps runaway jobs; retry resubmits on errors; utilization_policy terminates runs with underutilized GPUs.

See the task reference for all options.

Services

A service deploys a model or web app as a persistent endpoint. For example, serving Qwen3-8B with SGLang on an RTX PRO 6000:

type: service
name: qwen3-rtx-pro-6000

image: lmsysorg/sglang:latest

env:
  - HF_TOKEN
  - MODEL_ID=Qwen/Qwen3-8B

commands:
  - |
    sglang serve \
      --model-path $MODEL_ID \
      --host 0.0.0.0 \
      --port 8000 \
      --tp $DSTACK_GPUS_NUM \
      --reasoning-parser qwen3 \
      --context-length 8192
port: 8000

# Register the model on dstack's OpenAI-compatible endpoint
model: Qwen/Qwen3-8B

# Cache model weights on the instance to speed up restarts
volumes:
  - instance_path: /root/.cache
    path: /root/.cache
    optional: true

resources:
  gpu: RTXPRO6000:96GB
  disk: 200GB

dstack apply -f service.dstack.yml

Once up, the CLI prints the service endpoint:

Service is published at:
  http://localhost:3000/proxy/services/main/qwen3-rtx-pro-6000/
Model Qwen/Qwen3-8B is published at:
  http://localhost:3000/proxy/models/main/

The model property makes the service available via an OpenAI-compatible API, so any OpenAI client works. Test it with curl, using your dstack user token for authorization:

curl http://127.0.0.1:3000/proxy/services/main/qwen3-rtx-pro-6000/v1/chat/completions \
  -H 'Authorization: Bearer <user token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "Qwen/Qwen3-8B",
    "messages": [{"role": "user", "content": "Reply with exactly: jarvislabs rtx pro 6000 ok"}],
    "chat_template_kwargs": {"enable_thinking": false},
    "max_tokens": 64
  }'

Services also support replicas, auto-scaling, and custom domains with HTTPS via gateways.

Managing runs

Command	Description
`dstack ps`	List runs and their status
`dstack logs <run>`	View logs of a run
`dstack stop <run>`	Stop a run
`dstack metrics <run>`	View GPU utilization metrics
`dstack offer -b jarvislabs`	List available GPUs and prices
`dstack fleet`	List fleets and instances

Learn more

dstack x JarvisLabs tutorial — hands-on walkthrough with more examples: an elastic L4 fleet, a first task, and a nanochat training run on 2x H100
dstack docs — concepts, CLI, and API reference
Examples — training (TRL, Axolotl) and inference (SGLang, vLLM, TensorRT-LLM) recipes
GitHub — source code and issues

Setup​

1. Install the server and CLI​

2. Configure the jarvislabs backend​

3. Create a fleet​

Dev environments​

Tasks​

Services​

Managing runs​

Learn more​