JarvisLabs CLI
jarvislabs packageThe jl CLI is part of the new jarvislabs package, replacing the deprecated jlclient. If you're still using jlclient, see the migration note.
The jl command-line tool lets you manage GPU instances, run training scripts, transfer files, and monitor experiments on JarvisLabs.ai — all from your terminal. It's built to work seamlessly with AI coding agents like Claude Code, Codex, Cursor, and OpenCode, so your agent can spin up GPUs, run experiments, and monitor results autonomously.
Package: jarvislabs | CLI command: jl | Version: 0.2.x | Python: 3.10+
Linux and macOS are fully supported. Windows is experimental and not fully tested — if you run into issues, please report them.
Jump to the Examples section for end-to-end workflows covering training runs, agent automation, filesystem management, and more.
Installation
Install with uv (recommended) or pip.
As a CLI tool (recommended)
uv tool install jarvislabs
To upgrade:
uv tool upgrade jarvislabs
With pip
pip install jarvislabs
After installation, the jl command is available in your terminal.
What does jl setup do? Run it once after installing. It walks you through:
- Authentication — prompts for your API token (get one from jarvislabs.ai/settings/api-keys) and saves it locally
- Account status — shows your current balance and active instances
- Agent skill installation — asks which AI coding agents you use (Claude Code, Codex, Cursor, OpenCode) and installs skill files for them with your approval, so your agent knows how to use
jlout of the box
Exploring the CLI with --help: Every command supports --help. It's the quickest way to see what's available, what flags a command takes, and what they do.
jl --help # top-level commands
jl run --help # run options, targets, lifecycle flags
jl create --help # every flag for creating an instance
Quick Start
There are two main ways to use the CLI, depending on how much control you need.
Path 1: Run a script directly on a fresh GPU
The fastest way to get started. This creates a GPU instance, uploads your code, installs dependencies, runs the script, and pauses the instance when done — all in one command.
# One-time setup
jl setup
# Check your balance and make sure you're good to go
jl status
# See which GPUs are currently available and their pricing
jl gpus
# Run a single training script on a fresh L4
# Creates instance, uploads train.py, installs requirements, runs it, pauses when done
jl run train.py --gpu L4 --requirements requirements.txt -- --epochs 50
# Or if you have a project directory, sync the whole thing
# This uploads your directory, creates a venv, installs deps, and runs the entrypoint
jl run . --script train.py --gpu A100 --requirements requirements.txt
# You can also run a setup command before your main command
jl run . --script train.py --gpu A100 \
--requirements requirements.txt \
--setup "pip install flash-attn"
# The CLI streams logs by default. Once the run finishes, the instance is auto-paused.
# If you detached (Ctrl+C) or used --no-follow, you can check logs anytime:
jl run logs <run_id> --tail 50
# Check the final status of your run
jl run status <run_id>
Path 2: Manage instances yourself
If you want more control — SSH access, reusing machines across runs, attaching filesystems, or interactive debugging — create and manage instances directly.
# One-time setup
jl setup
# See available GPUs and pricing
jl gpus
# Create a GPU instance with 100 GB storage
jl create --gpu A100 --storage 100 --name "my-experiment"
# List your instances to get the machine ID
jl list
# SSH into your instance for interactive work
jl ssh <machine_id>
# Or upload and run a script on it
jl run train.py --on <machine_id>
# Check logs while the run is going
jl run logs <run_id> --tail 50
# Upload additional files to the instance
jl upload <machine_id> ./data /home/data
# Download results when you're done
jl download <machine_id> /home/results ./results -r
# Pause when you're done - stops compute billing, keeps your data
jl pause <machine_id>
# Later, resume with the same or a different GPU
jl resume <machine_id> --gpu L4
# When you're completely done, destroy to stop all billing (including storage)
jl destroy <machine_id>
Authentication
Get your API token from jarvislabs.ai/settings/api-keys.
Interactive setup
jl setup
This authenticates, optionally installs agent skills, shows your account status, and displays a getting-started guide.
Non-interactive setup
jl setup --token YOUR_TOKEN --yes
Without --yes, jl setup will still prompt for agent-skill installation even when --token is provided. Use --agents all or --yes to make setup fully non-interactive.
Environment variable
export JL_API_KEY="YOUR_TOKEN"
Token precedence
Both the CLI and SDK use the same resolution chain:
| Priority | Method | Used by |
|---|---|---|
| 1 | Client(api_key="...") argument | SDK only |
| 2 | JL_API_KEY environment variable | CLI + SDK |
| 3 | Config file (saved by jl setup) | CLI + SDK |
See Config file location below for config paths. See the SDK Authentication docs for more details.
Config file location
The config file is stored via platformdirs:
- Linux:
~/.config/jl/config.toml - macOS:
~/Library/Application Support/jl/config.toml
Removing saved credentials
jl logout
Global Flags
These flags are available on most commands (exceptions noted below):
| Flag | Description |
|---|---|
--json | Output as machine-readable JSON (to stdout). Human-readable output goes to stderr. |
--yes / -y | Skip all confirmation prompts. |
--version | Print version and exit (root-level: jl --version). |
--json and --yes are command-level options, not root-level — so jl list --json works correctly. Most commands support --json. --yes is only available on commands that have confirmation prompts (create, pause, resume, destroy, rename, run start, etc.). jl setup supports --yes but not --json. Read-only commands like jl gpus and jl run logs do not accept --yes.
Account Commands
jl setup
Set up the JarvisLabs CLI: authenticate and install agent skills.
| Option | Short | Description |
|---|---|---|
--token | -t | API token (skips interactive prompt) |
--agents | Comma-separated agent list: claude-code, codex, cursor, opencode, or all | |
--yes | -y | Skip confirmation prompts; auto-selects all agents |
# Interactive setup
jl setup
# Non-interactive with token and all agent skills
jl setup --token YOUR_TOKEN --agents all --yes
# Install skills for specific agents only
jl setup --agents claude-code,cursor
If already authenticated, jl setup will show your current login and ask to re-authenticate. The --agents flag controls which coding agent skill files are installed:
| Agent | Skill file path |
|---|---|
claude-code | ~/.claude/skills/jarvislabs/SKILL.md |
codex | ~/.agents/skills/jarvislabs/SKILL.md |
cursor | ~/.cursor/skills/jarvislabs/SKILL.md |
opencode | ~/.config/opencode/skills/jarvislabs/SKILL.md |
jl logout
Remove the saved API token from the config file. Supports --json for scripted usage.
jl logout
jl status
Show account info: name, user ID, balance, grants, and running/paused instance counts.
jl status
jl status --json
JSON output includes additional fields not shown in the human-readable table: running VMs, paused VMs, active deployments, filesystems, and billing currency.
jl gpus
Show GPU types with availability, region, VRAM, RAM, CPUs, and hourly pricing. Available GPUs are marked with a green dot, unavailable with a dim circle.
jl gpus
jl gpus --json
jl templates
List available framework templates that can be used with --template when creating instances (e.g. pytorch, tensorflow, jax) or create VMs with --vm).
jl templates
jl templates --json
Regions & GPUs
JarvisLabs has three regions, each with different GPU types available. When creating an instance, the CLI auto-selects the best region based on your chosen GPU — or you can pin a specific region with --region.
| Region | Available GPUs |
|---|---|
IN1 | RTX5000, A5000Pro, A6000, RTX6000Ada, A100 |
IN2 | L4, A100, A100-80GB |
EU1 | H100, H200 |
Run jl gpus to see real-time availability and pricing for each GPU type.
- EU1 region: supports 1 or 8 GPUs per instance only, 100 GB minimum storage (auto-bumped if you specify less)
- VM instances (
--vm): 100 GB minimum storage (auto-bumped if you specify less) - VM instances are only available in
IN2andEU1regions, and requires at least one SSH key registered
Instance Commands
Manage the full lifecycle of GPU instances — from creation to teardown. Instances come in two types: containers (pre-configured with PyTorch, Jupyter, IDE — the default) and VMs (bare-metal SSH access, created with --vm).
jl list
List all your instances with their ID, name, status, GPU type, GPU count, storage, region, cost, and template.
jl list
jl list --json
jl get <machine_id>
Show full details of a specific instance including SSH command, notebook URL, HTTP ports, and endpoint URLs.
jl get 12345
jl get 12345 --json
jl create
Create a new GPU instance. The command blocks until the instance reaches Running status, so when it returns, your instance is ready to use.
| Option | Short | Default | Description |
|---|---|---|---|
--gpu | -g | (required) | GPU type (run jl gpus to see options) |
--vm | Create a VM instance (SSH-only, no container) | ||
--template | -t | pytorch | Framework template for containers (not used with --vm) |
--storage | -s | 40 | Storage in GB |
--name | -n | "Name me" | Instance name (max 40 chars, letters/numbers/spaces/hyphens/underscores only) |
--num-gpus | 1 | Number of GPUs | |
--region | Region pin (e.g. IN1, IN2, EU1) | ||
--http-ports | Comma-separated HTTP ports to expose (e.g. 7860,8080) | ||
--script-id | Startup script ID to run on launch | ||
--script-args | Arguments passed to the startup script | ||
--fs-id | Filesystem ID to attach | ||
--yes | -y | Skip confirmation | |
--json | Output as JSON |
# Basic instance
jl create --gpu L4
# H100 with more storage and a name
jl create --gpu H100 --storage 200 --name "training-box"
# With a startup script and filesystem
jl create --gpu A100 --script-id 42 --fs-id 10
# Pin to a region
jl create --gpu A100 --region EU1
# Expose HTTP ports
jl create --gpu L4 --http-ports "7860,8080"
# VM instance (requires SSH key - add one first with jl ssh-key add)
jl create --gpu A100-80GB --vm --name "my-vm"
# Non-interactive
jl create --gpu L4 --yes --json
Prompts for confirmation unless --yes is passed. See Regions & GPUs for which GPUs are available in each region and storage constraints.
jl pause <machine_id>
Pause a running instance. Compute billing stops; a small storage cost continues.
| Option | Short | Description |
|---|---|---|
--yes | -y | Skip confirmation |
--json | Output as JSON |
jl pause 12345
jl pause 12345 --yes --json
jl resume <machine_id>
Resume a paused instance. You can also use this opportunity to change the GPU type, expand storage, rename the instance, or attach a different startup script or filesystem. The command blocks until the instance is running again.
| Option | Short | Description |
|---|---|---|
--gpu | -g | Resume with a different GPU type |
--num-gpus | Change number of GPUs | |
--storage | -s | Expand storage in GB (can only increase, never shrink) |
--name | -n | Rename instance on resume |
--http-ports | Change exposed HTTP ports (e.g. 7860,8080) | |
--script-id | Startup script ID to run on resume | |
--script-args | Arguments for the startup script | |
--fs-id | Filesystem ID to attach | |
--yes | -y | Skip confirmation |
--json | Output as JSON |
# Resume with defaults
jl resume 12345
# Resume with a bigger GPU
jl resume 12345 --gpu H100
# Resume with more storage and a new name
jl resume 12345 --storage 200 --name "upgraded"
Resume is region-locked — an instance always resumes in its original region. If you request a GPU type not available in that region, the API returns an error.
Resume may also assign a new machine ID. The CLI warns you when this happens. Always use the returned ID for subsequent operations.
jl destroy <machine_id>
Permanently delete an instance and all its data.
This action is irreversible. All data on the instance is lost. If you need to keep data across instances, use a filesystem.
| Option | Short | Description |
|---|---|---|
--yes | -y | Skip confirmation |
--json | Output as JSON |
jl destroy 12345
jl destroy 12345 --yes --json
jl rename <machine_id>
Rename an instance.
| Option | Short | Description |
|---|---|---|
--name | -n | New instance name (required, max 40 characters) |
--yes | -y | Skip confirmation |
--json | Output as JSON |
jl rename 12345 --name "experiment-v2"
SSH, Exec & File Transfer
These commands let you interact directly with running instances — open a shell, run commands remotely, or transfer files back and forth.
jl ssh <machine_id>
SSH into a running instance. This opens an interactive shell session.
| Option | Short | Description |
|---|---|---|
--print-command | -p | Print the raw SSH command to stdout instead of connecting |
--json | Output the SSH command as JSON |
# Interactive session
jl ssh 12345
# Get the SSH command for use in scripts
jl ssh 12345 --print-command
The instance must be in Running status. If paused, you'll be told to resume it first.
--print-command and --json output the stored SSH command regardless of instance status — useful for scripting and automation.
jl exec <machine_id> -- <command>
Run a command on a running instance and stream the output back to your terminal. The -- separator is required so jl can distinguish your remote command from its own flags.
| Option | Short | Description |
|---|---|---|
--json | Capture output as JSON with stdout, stderr, and exit_code fields |
# Check GPU
jl exec 12345 -- nvidia-smi
# Run Python
jl exec 12345 -- python -c "import torch; print(torch.cuda.device_count())"
# List files
jl exec 12345 -- ls -la /home
# Use shell features (pipes, redirection) - wrap in sh -lc
jl exec 12345 -- sh -lc 'grep "loss" /home/output.log | tail -5'
# Structured output for scripting
jl exec 12345 --json -- nvidia-smi
The exit code of the remote command is propagated as the exit code of jl exec.
If your remote command uses pipes, redirection, or other shell features, wrap it in sh -lc '...' as shown above. Without the wrapper, each argument is treated as a separate command argument rather than shell syntax.
jl upload <machine_id> <source> [dest]
Upload a local file or directory to a running instance. If no remote destination is given, it uploads to the instance's home directory (/home/ for containers, /home/<user>/ for VMs).
Directories are uploaded recursively automatically.
| Option | Short | Description |
|---|---|---|
--json | Output upload result as JSON |
# Upload a file (lands at /home/data.csv)
jl upload 12345 ./data.csv
# Upload a directory (lands at /home/my-project/)
jl upload 12345 ./my-project
# Upload to a specific remote path
jl upload 12345 ./config.yaml /home/config.yaml
jl download <machine_id> <source> [dest] [-r]
Download a file or directory from a running instance. If no local destination is given, it saves to ./<filename> in the current directory.
| Option | Short | Description |
|---|---|---|
--recursive | -r | Download directories recursively |
--json | Output download result as JSON |
# Download a file (saves to ./results.csv)
jl download 12345 /home/results.csv
# Download to a specific local path
jl download 12345 /home/results.csv ./my-results.csv
# Download a directory
jl download 12345 /home/outputs ./local-outputs -r
Managed Runs
Managed runs are the fastest way to run scripts on GPU instances. A single jl run command handles uploading your code, setting up a Python virtual environment (via uv), installing dependencies (auto-detected from your project or specified with --requirements), running your command in the background, and tracking logs.
Runs persist in the background even if you disconnect or close your terminal. Logs, status, and lifecycle are tracked locally in ~/.jl/runs/.
Run Targets
| Target | What happens |
|---|---|
train.py | Uploads the single file, runs python3 train.py |
run.sh | Uploads the single file, runs bash run.sh |
. or ./my-project | Syncs the directory via rsync (excludes .venv/, .git/, __pycache__/), runs --script inside it (requires rsync locally) |
| (no target) | Runs the command given after -- directly on the instance |
Only .py and .sh file targets are supported directly. For other file types, use a directory target or jl upload + jl exec.
Starting a Run on an Existing Instance
jl run <target> --on <machine_id> [options] [-- extra args]
# Run a Python file
jl run train.py --on 12345
# Upload a directory and run a script inside it
jl run . --script train.py --on 12345
# Pass arguments to your script
jl run train.py --on 12345 -- --epochs 50 --lr 0.001
# Run an arbitrary remote command (no upload)
jl run --on 12345 -- python -c "print('hello from GPU')"
Starting a Run on a Fresh Instance
jl run <target> --gpu <gpu_type> [options] [-- extra args]
This creates a new instance, uploads your code, runs the command, and handles instance lifecycle when done.
# Run on a fresh L4
jl run train.py --gpu L4
# With requirements
jl run . --script train.py --gpu A100 --requirements requirements.txt
# Destroy instance after run (no leftover costs)
jl run train.py --gpu L4 --destroy
# Keep instance running after run (for debugging)
jl run train.py --gpu L4 --keep
You must use either --on or --gpu, not both.
All Start Options
| Option | Short | Default | Description |
|---|---|---|---|
--on | Run on an existing instance (machine ID) | ||
--gpu | -g | Create a fresh instance with this GPU type | |
--vm | Create a VM instead of a container (fresh instances only) | ||
--script | Entrypoint script path inside a directory target | ||
--template | -t | pytorch | Framework template for containers (not used with --vm) |
--storage | -s | 40 | Storage in GB (fresh instances only) |
--name | -n | jl-run | Instance name (fresh instances only) |
--num-gpus | 1 | Number of GPUs (fresh instances only) | |
--region | Region pin, e.g. IN1, EU1 (fresh instances only) | ||
--http-ports | Comma-separated HTTP ports to expose (fresh instances only) | ||
--requirements | Override auto-detection: upload and install this file instead | ||
--setup | Shell command to run before the main command | ||
--follow / --no-follow | --follow | Stream logs after starting the run | |
--pause | Pause fresh instance after the run (default for fresh) | ||
--destroy | Destroy fresh instance after the run | ||
--keep | Leave fresh instance running after the run | ||
--yes | -y | Skip confirmation prompts | |
--json | Output as JSON |
Environment & Dependency Management
For file and directory targets, jl run automatically creates and manages a Python virtual environment on the remote instance using uv. The environment is designed to work seamlessly with JarvisLabs templates — you get both template packages (like PyTorch and CUDA) and your project's dependencies without extra configuration.
How it works
Every managed run creates a .venv inside the project's working directory on the remote machine. This venv:
- Inherits template packages. If you chose the
pytorchtemplate,import torchworks immediately — no need to install it yourself. The same applies to any package pre-installed by the template (CUDA libraries, numpy, etc.). - Has pip and uv available. Both
pip installanduv pip installwork inside the venv and install packages into the venv, not the system Python. - Persists across runs. On the same instance with the same target, the venv is reused. Previously installed packages are still there, so re-runs are fast.
Auto-detection of dependencies
For directory targets, the CLI checks your local directory before uploading and automatically installs dependencies:
- If
pyproject.tomlexists with a[project]table → installs from[project].dependencies - Otherwise, if
requirements.txtexists → installs from it - If neither exists → no packages installed; template packages are enough
This means for most projects, you don't need to pass any flags — just make sure your requirements.txt or pyproject.toml is in the directory, and jl run handles the rest. Other dependency formats (uv.lock, poetry.lock, Pipfile) are not auto-detected — use --requirements with a requirements.txt for those projects.
For single file targets (e.g., jl run train.py), there is no directory to scan, so auto-detection does not apply. Use --requirements to specify a requirements file if needed.
For command-mode runs (no target, raw command after --), there is no venv or dependency installation. The command runs directly on the instance's system Python with template packages available. --setup still works as a pre-command hook, but --requirements is not available.
The --requirements flag
Use --requirements to override auto-detection. When provided, the specified file is uploaded to the remote and installed instead of any auto-detected file. This is useful when:
- You want to use a different requirements file than the one in your project directory
- You're running a single file target and need extra packages
- Your
pyproject.tomlhas a[project]table but you'd rather install from a separate requirements file
# Auto-detect (recommended for most projects)
jl run . --script train.py --gpu L4
# Override with a custom file
jl run . --script train.py --gpu L4 --requirements custom-reqs.txt
# Single file with requirements
jl run train.py --gpu L4 --requirements requirements.txt
The --setup flag
Use --setup to run a shell command after dependency installation but before your script. This is the escape hatch for anything that isn't a Python package — system libraries, compiled extensions, environment variables, or quick one-off installs.
# Install a system library (containers run as root; use sudo on VMs)
jl run . --script train.py --on 12345 --setup "apt-get update && apt-get install -y libsndfile1"
# Install a package that needs special flags
jl run . --script train.py --on 12345 --setup "pip install flash-attn --no-build-isolation"
# Set environment variables
jl run . --script train.py --on 12345 --setup "export CUDA_VISIBLE_DEVICES=0"
For recurring system-level setup (things you need on every instance boot), consider using startup scripts instead of --setup. Startup scripts run automatically when an instance is created or resumed, so you don't have to repeat the setup on every run.
The full setup chain
When you start a managed run with a file or directory target, the CLI executes these steps in order on the remote machine (chained with &&, so any failure stops the chain):
- uv installed if missing
.venvcreated if it doesn't exist (with template package visibility and pip).venvactivated- Dependencies installed — from auto-detected
pyproject.tomlorrequirements.txt, or from--requirementsif provided --setupcommand executed (if provided)- Your script runs
The run logs show which dependency file was detected:
[jl] Installing from requirements.txt # auto-detected
[jl] Installing from pyproject.toml # auto-detected
[jl] Installing from custom-reqs.txt # --requirements override
[jl] No dependency file detected, using template packages # nothing found
Template packages (like PyTorch) are available in the venv without installing them. However, if your requirements.txt or pyproject.toml lists torch as a dependency, uv will re-download and install it into the venv — this is because uv does not check system packages during dependency resolution. This is harmless (the correct version is installed) but wastes bandwidth on the first run. To avoid this, omit template packages from your dependency files and let the template provide them.
- For most projects: Put your extra dependencies in
requirements.txtorpyproject.toml. Don't include packages that the template already provides (torch, CUDA, numpy). Runjl run . --script train.py --gpu L4and let auto-detection handle the rest. - For quick experiments: A single Python file with no dependencies works out of the box on a pytorch template —
import torchjust works. - For system-level setup: Use
--setupfor one-off commands, or startup scripts for recurring setup. - For AI coding agents: Agents should use
--json --yesand monitor viajl run logs. The auto-detection, echo logging, and--requirementsoverride all work identically in agent workflows.
Lifecycle Flags (Fresh Instances Only)
When creating a fresh instance with --gpu, these flags control what happens after the run completes:
| Flag | Behavior |
|---|---|
--pause | Pause the instance after the run (default for fresh instances) |
--destroy | Destroy the instance — no leftover costs |
--keep | Leave the instance running (for debugging or follow-up work) |
Only one lifecycle flag can be used at a time. These flags cannot be used with --on (existing instances are not touched after the run).
--no-follow for fresh instances requires --keep. Since --pause and --destroy need the CLI to be connected when the run ends to perform the lifecycle action, they are incompatible with --no-follow. If you detach (Ctrl+C or --no-follow), the automatic lifecycle action will not happen — the instance stays running and billing continues. Manage it manually with jl pause or jl destroy.
Follow vs No-Follow
By default, jl run streams logs after starting (--follow). Press Ctrl+C to detach — the run keeps going in the background. Without --tail, --follow initially shows the last 20 lines before streaming new output.
# Default: stream logs, auto-pause when done
jl run train.py --gpu L4
# Detached: start and return immediately (requires --keep for fresh instances)
jl run train.py --gpu L4 --keep --no-follow
# Detached on existing instance (no lifecycle flag needed)
jl run train.py --on 12345 --no-follow
jl run logs <run_id>
View logs from a managed run.
| Option | Short | Description |
|---|---|---|
--follow | -f | Stream logs in real time (press Ctrl+C to stop) |
--tail | -n | Show only the last N lines (minimum: 1) |
--json | Output as JSON with content and run_exit_code fields |
# Full log output
jl run logs r_abc123
# Last 50 lines
jl run logs r_abc123 --tail 50
# Stream logs live
jl run logs r_abc123 --follow
# Stream with initial context
jl run logs r_abc123 --follow --tail 100
# JSON output with exit code (for scripting/agents)
jl run logs r_abc123 --tail 50 --json
JSON output fields:
| Field | Description |
|---|---|
run_id | The run identifier |
machine_id | Instance the run is on |
remote_log | Path to the log file on the remote instance |
content | The log text (last N lines if --tail used, full log otherwise) |
run_exit_code | null = still running, 0 = succeeded, non-zero = failed |
--json is not supported with --follow. Without --tail, the entire log file is returned — this can be very large for long-running jobs.
Non-JSON output shows raw logs with a header and footer indicating run state:
--- run r_abc123 | machine 12345 | running ---
step=100 loss=2.31
step=200 loss=2.11
--- still running | log: /home/jl-runs/r_abc123/output.log ---
jl run status <run_id>
Show the current state of a run.
| Option | Short | Description |
|---|---|---|
--json | Output as JSON |
Possible states: running, succeeded, failed, instance-paused, instance-pausing, instance-missing, instance-creating, instance-resuming, instance-destroying, instance-failed, unknown.
jl run status r_abc123
jl run status r_abc123 --json
jl run stop <run_id>
Stop a managed run by sending TERM to its process group. The instance itself is not affected.
| Option | Short | Description |
|---|---|---|
--json | Output as JSON |
jl run stop r_abc123
jl run stop r_abc123 --json
If the process doesn't exit after TERM, it escalates to SIGKILL. If the run has already finished, it reports the final state without error.
jl run list
List all locally tracked managed runs (most recent first).
| Option | Short | Description |
|---|---|---|
--refresh | Check live status for each run by querying the instance (slower) | |
--machine | -m | Filter by instance ID |
--limit | -l | Show only the N most recent runs |
--status | -s | Filter by state (e.g. running, succeeded, failed). Implies --refresh. |
--json | Output as JSON |
# All runs (shows "saved" state without live check)
jl run list
# With live status refresh
jl run list --refresh
# Filter by instance
jl run list --machine 12345
# Most recent 5 runs
jl run list --limit 5
# Only running jobs
jl run list --status running
# For scripting
jl run list --refresh --json
Without --refresh, the state column shows saved (from the local record). Use --refresh or --status to query each instance for live state. Using --status automatically implies --refresh.
Implicit start Subcommand
jl run <target> is shorthand for jl run start <target>. The start subcommand is implied when the first argument isn't a known subcommand (list, status, logs, stop).
# These are equivalent:
jl run train.py --gpu L4
jl run start train.py --gpu L4
Run Tracking is Local
All run management commands (jl run logs, jl run status, jl run stop, jl run list) depend on local records stored under ~/.jl/runs/. You need to start and monitor runs from the same machine. If the local record is missing, the run_id alone is not enough to interact with the run.
Each run record is a JSON file at ~/.jl/runs/<run_id>.json containing the machine ID, remote log path, PID file path, exit code path, and launch command.
SSH Key Commands
SSH keys are required if you want to create VM instances (--vm) (bare-metal SSH access without a pre-configured container). You can manage your keys with jl ssh-key.
jl ssh-key list
List all SSH keys (ID, name, and truncated key).
jl ssh-key list
jl ssh-key list --json
jl ssh-key add <pubkey_file>
Add an SSH public key.
| Option | Short | Description |
|---|---|---|
--name | -n | Name for this key (required) |
--json | Output as JSON |
jl ssh-key add ~/.ssh/id_ed25519.pub --name "my-laptop"
jl ssh-key remove <key_id>
Remove an SSH key.
| Option | Short | Description |
|---|---|---|
--yes | -y | Skip confirmation |
--json | Output as JSON |
jl ssh-key remove abc123
Startup Script Commands
Startup scripts are shell scripts that run automatically whenever an instance launches or resumes — useful for installing dependencies, pulling data, or setting up your environment. You can manage them with jl scripts.
jl scripts list
List startup scripts (ID and name).
jl scripts list
jl scripts list --json
jl scripts add <script_file>
Add a startup script.
| Option | Short | Description |
|---|---|---|
--name | -n | Script name (defaults to filename without extension) |
--json | Output as JSON |
jl scripts add ./setup.sh --name "install-deps"
jl scripts update <script_id> <script_file>
Replace the contents of an existing startup script.
| Option | Short | Description |
|---|---|---|
--json | Output as JSON |
jl scripts update 42 ./setup-v2.sh
jl scripts remove <script_id>
Remove a startup script.
| Option | Short | Description |
|---|---|---|
--yes | -y | Skip confirmation |
--json | Output as JSON |
jl scripts remove 42
Filesystem Commands
Filesystems are persistent storage volumes that survive instance pause, resume, and even destroy cycles. They're ideal for datasets, model checkpoints, or any data you want to reuse across multiple instances. You can manage them with jl filesystem.
Each filesystem is tied to the region where it was created. A filesystem created in IN2 is only accessible from IN2 instances. Data saved on an IN2 filesystem will not appear on an IN1 instance, even if you attach the same fs_id. Use jl filesystem list to see each filesystem's region.
jl filesystem list
List filesystems (ID, name, storage, region).
jl filesystem list
jl filesystem list --json
jl filesystem create
Create a new filesystem.
| Option | Short | Description |
|---|---|---|
--name | -n | Filesystem name (required, max 30 characters) |
--storage | -s | Storage in GB (required, 50–2048) |
--yes | -y | Skip confirmation |
--json | Output as JSON |
jl filesystem create --name "datasets" --storage 200
jl filesystem edit <fs_id>
Expand filesystem storage. Can only increase, never shrink.
| Option | Short | Description |
|---|---|---|
--storage | -s | New storage size in GB (required) |
--yes | -y | Skip confirmation |
--json | Output as JSON |
jl filesystem edit 10 --storage 500
edit may return a new filesystem ID. Always use the returned value for subsequent operations.
jl filesystem remove <fs_id>
Delete a filesystem.
| Option | Short | Description |
|---|---|---|
--yes | -y | Skip confirmation |
--json | Output as JSON |
jl filesystem remove 10
JSON Mode for Scripting
Most commands support --json for machine-readable output. JSON goes to stdout; human-readable status messages go to stderr.
# Instance list as JSON
jl list --json
# Create and capture the machine ID
RESULT=$(jl create --gpu L4 --yes --json)
MACHINE_ID=$(echo "$RESULT" | jq .machine_id)
# GPU availability pipeline
jl gpus --json | jq '.[] | select(.num_free_devices > 0) | .gpu_type'
# Run status in scripts
jl run status r_abc123 --json | jq .state
# Check if a run is still going
EXIT_CODE=$(jl run logs r_abc123 --tail 1 --json | jq .run_exit_code)
When --json is active:
- Spinners and progress indicators are suppressed
- Errors from
jlitself (bad arguments, auth failures, etc.) are emitted as{"error": "..."}to stdout. Commands likejl exec --jsonreturn their own structured payload (withexit_code,stdout,stderr) even on non-zero exit - Exit codes are still set appropriately
- For
jl run start,--jsonreturns immediately after the run is started (before log streaming), so lifecycle flags (--pause,--destroy) will not execute — use--keepwhen combining--jsonwith fresh instances
--json does not suppress confirmation prompts. Always use --yes alongside --json in scripts and agent workflows.
Shell Completion
Enable tab completion for your shell:
jl --install-completion
Supports bash, zsh, and fish.
Using with AI Coding Agents
One of the primary use cases for the jl CLI is letting AI coding agents manage GPU infrastructure on your behalf. Instead of manually creating instances, uploading code, and monitoring runs, you can let your agent handle the entire workflow — from provisioning a GPU to downloading results.
The CLI supports four major coding agents: Claude Code, Codex, Cursor, and OpenCode. During jl setup, you'll be asked which agents you use, and skill files are installed automatically to teach your agent how to use jl effectively.
Agent Setup
# Interactive: authenticates and asks which agents to install skills for
jl setup
# Non-interactive: installs skills for all supported agents
jl setup --token YOUR_TOKEN --agents all --yes
# Install skills for specific agents only
jl setup --agents claude-code,cursor
Once skills are installed, your coding agent already knows how to use jl. Try asking it: "Spin up an A100, run my training script, and download the results when it's done."
Mental Model
| Concept | CLI | Purpose |
|---|---|---|
| Instance | jl create/list/pause/... | A machine — create, pause, resume, destroy, SSH into |
| Run | jl run | A managed job with log file + PID tracking |
| Exec | jl exec | Quick one-off commands for system checks and debugging |
Core Rules for Agent Workflows
- Always use
--yeson commands with confirmation prompts (create, pause, resume, destroy, run start) — agents can't answer interactive prompts - Use
--jsonfor structured data — use it on commands where the agent needs to parse output (create, gpus, run start, instance list). Forjl run logs, the default output is designed for agents — the header/footer shows run ID, machine ID, and state in a readable format - Always use
--jsonwhen starting runs — it returns immediately. Without--json, the CLI streams logs and blocks - Always use
--tail Nwhen reading logs — full logs can be enormous - Do an early failure check — wait 15s after starting a run and check logs once. This catches fast failures (import errors, missing files, pip issues) before committing to a long polling loop
- Then poll at steady intervals — 60-120s for short jobs, 180-600s for long training runs
The Agent Monitoring Loop
This is the primary pattern for running and monitoring GPU jobs:
# 1. Start a detached run
jl run train.py --on <machine_id> --yes --json
# returns {"run_id": "r_abc123", ...}
# 2. Early failure check - catches import errors, bad paths, pip failures fast
sleep 15 && jl run logs r_abc123 --tail 30
# 3. If still running, poll at steady intervals
sleep 120 && jl run logs r_abc123 --tail 50
# The log output shows a header and footer with run state:
# --- run r_abc123 | machine 12345 | running ---
# <log output>
# --- still running | log: /home/jl-runs/r_abc123/output.log ---
#
# When done:
# --- run r_abc123 | machine 12345 | succeeded (exit 0) ---
# <log output>
# --- succeeded | exit code: 0 | log: /home/jl-runs/r_abc123/output.log ---
#
# On failure:
# --- run r_abc123 | machine 12345 | failed (exit 1) ---
# <log output>
# --- failed | exit code: 1 | log: /home/jl-runs/r_abc123/output.log ---
The log output is the primary monitoring primitive — the header gives you the run ID and machine ID, and the footer tells you whether the run is still going or finished (with exit code).
Agent Workflow Example (End-to-End)
# 1. Check GPU availability
jl gpus --json
# 2. Create an instance
jl create --gpu L4 --storage 50 --yes --json
# returns {"machine_id": 12345, ...}
# 3. Start a detached run
jl run . --script train.py --on 12345 --requirements requirements.txt --yes --json
# returns {"run_id": "r_abc123", ...}
# 4. Early failure check - catches crashes fast
sleep 15 && jl run logs r_abc123 --tail 30
# 5. If still running, poll at steady intervals (repeat until footer shows exit code)
sleep 120 && jl run logs r_abc123 --tail 50
# 6. Download results
jl download 12345 /home/results ./results -r
# 7. Clean up
jl pause 12345 --yes --json
Starting Runs on Fresh Instances (Agent Mode)
When the agent needs to create a fresh instance inline:
jl run . --script train.py --gpu L4 --keep --json --yes
Key points:
--keepis required with--no-followfor fresh instances (the CLI will error without it)- The agent must manually pause or destroy the instance after the run completes
- Additional fresh-instance flags:
--template,--storage,--num-gpus,--region,--http-ports
Use separate jl create when you need to inspect GPU availability first, reuse machines across runs, or attach filesystems/scripts beforehand.
Quick System Checks with Exec
jl exec <id> --json -- nvidia-smi
jl exec <id> --json -- ps -ef
jl exec <id> --json -- df -h
For pipes or shell syntax, wrap in sh -lc:
jl exec <id> --json -- sh -lc 'grep "loss" /path/to/log | tail -5'
All of the patterns above — the monitoring loop, early failure checks, polling intervals, --tail, and more — are included in the skill files that jl setup installs for your agent. Once skills are installed, your agent already knows how to use jl correctly. You don't need to teach it these patterns yourself.
File Persistence Rules
The remote home directory (typically /home/ on containers, /home/<user>/ on VMs) persists across pause/resume cycles. Everything else is ephemeral.
Persists across pause/resume:
- Files in the home directory (
/home/or/home/<user>/) - Uploaded directories:
<home>/<directory_name>/ - Uploaded files (via
jl upload):<home>/<filename> - Run metadata:
<home>/jl-runs/<run_id>/ .venvcreated inside the project directory- Attached filesystems
Lost on pause:
- System-level installs (
apt-get, global pip packages) - Files outside the home directory (
/tmp,/root, etc.)
Use --setup or --requirements to reinstall dependencies on each run, or use startup scripts for recurring setup.
Anti-Patterns
| Don't | Why |
|---|---|
Start runs without --json | Without --json, the CLI streams logs and blocks the agent |
Use jl run logs --follow | Blocks forever; --json is also incompatible with --follow |
Read full logs (omit --tail N) | Can return megabytes of output, overwhelming context |
| Poll every few seconds | Wasteful and noisy; use 60–600s intervals |
Use lifecycle flags with --on | --keep, --pause, --destroy only apply to fresh instances |
| Forget to pause/destroy instances | They cost money while running |
Examples
Train on a fresh GPU, auto-pause when done
The simplest workflow — run a training script on a fresh GPU with dependencies. The instance is automatically paused when the script finishes, so you only pay for compute time.
jl run train.py --gpu L4 --requirements requirements.txt -- --epochs 100
# Instance created > code uploaded > deps installed > training runs > instance paused
Run a project directory with setup
When your project has multiple files, sync the entire directory and specify the entrypoint with --script. The CLI uses rsync under the hood, so only changed files are transferred on subsequent runs — making re-runs on the same instance fast even with large projects. You can also run custom setup commands before training starts.
jl run . --script train.py --gpu A100 \
--requirements requirements.txt \
--setup "pip install flash-attn" \
-- --batch-size 32 --lr 1e-4
Multi-GPU training
For large-scale training, you can request multiple GPUs on a single instance. Check Regions & GPUs for available GPU counts per region.
# 8x H100 in EU1 for distributed training
jl create --gpu H100 --num-gpus 8 --region EU1 --storage 500 --name "distributed-training"
# Upload your project and run with torchrun for multi-GPU
jl run . --script train.py --on <machine_id> \
--requirements requirements.txt \
--setup "pip install flash-attn" \
-- --num_gpus 8
Long-running job with manual control
For jobs where you want full control — create an instance, start a detached run, monitor at your own pace, and clean up when done.
# Create an instance
jl create --gpu A100 --storage 200 --name "research"
# Sync project and start a background run (--no-follow detaches from logs)
jl run ./my-project --script train.py --on <machine_id> --no-follow
# Monitor later
jl run status <run_id>
jl run logs <run_id> --tail 100
jl run logs <run_id> --follow
# Pause when done
jl pause <machine_id>
Detached run on existing instance
Start a run and come back to check on it later — the run continues in the background even if you close your terminal.
# Start without following
jl run train.py --on <machine_id> --no-follow
# Check on it later
jl run logs <run_id> --tail 50
# Stop it if needed
jl run stop <run_id>
Persistent data with filesystems
Filesystems let you keep datasets and model checkpoints across instances. Create a filesystem once, attach it to any instance in the same region, and your data is always available — even after destroying the instance. Note that filesystems are region-bound — an IN2 filesystem is only accessible from IN2 instances.
# Create a filesystem for datasets
jl filesystem create --name "datasets" --storage 500
# Create an instance with the filesystem attached
jl create --gpu A100 --fs-id <fs_id> --name "training"
# Run your training - the filesystem is attached and accessible on the instance
jl run train.py --on <machine_id>
# Done with training? Destroy the instance - data is safe in the filesystem
jl destroy <machine_id>
# Spin up a cheaper GPU for inference, same data
jl create --gpu L4 --fs-id <fs_id> --name "inference"
VM workflow (bare metal SSH access)
VM instances give you a clean Linux machine with SSH access instead of a pre-configured container. You'll need to register an SSH key first.
# Add your SSH key
jl ssh-key add ~/.ssh/id_ed25519.pub --name "my-key"
# Create a VM instance (available in IN2 and EU1 only)
jl create --gpu A100-80GB --vm --name "my-vm"
# SSH in
jl ssh <machine_id>
Scripting with JSON and jq
Most commands support --json output (except jl setup), making it easy to build automation pipelines with jq.
# Get IDs of all running instances
jl list --json | jq '[.[] | select(.status == "Running") | .machine_id]'
# Find cheapest available GPU
jl gpus --json | jq '[.[] | select(.num_free_devices > 0)] | sort_by(.price_per_hour) | .[0].gpu_type'
# Pause all running instances
for id in $(jl list --json | jq -r '.[] | select(.status == "Running") | .machine_id'); do
jl pause "$id" --yes --json
done
# Check if a run is still going
jl run logs <run_id> --tail 1 --json | jq .run_exit_code
Autonomous research with coding agents
One of the most powerful patterns is letting a coding agent drive the entire research loop autonomously. Andrej Karpathy's autoresearch is a great example of this — an AI agent autonomously edits training code, runs experiments, checks metrics, and iterates, accumulating only improvements. In Karpathy's own run, the agent evaluated ~700 experimental changes over 2 days, found ~20 additive improvements, and achieved an 11% reduction in Time-to-GPT-2.
The core loop works like this:
- Agent modifies
train.pywith an experimental idea and commits the change - Agent runs the experiment on a GPU (via
jl run) - Agent reads the results from logs (via
jl run logs) and extracts the target metric - Agent logs the result — appends the commit hash, metric value, and a description to a
results.tsvfile so every experiment (successes and failures) is tracked - If metrics improved — keep the commit, the branch advances
- If metrics got worse or it crashed —
git resetto revert, try a different idea
The key insight is that the git branch only contains improvements (each commit is guaranteed better than the last), while results.tsv records the full history of all experiments including dead ends. This gives you a clean chain of improvements you can review, plus a complete log for analysis.
This pattern works for any ML problem — not just GPT training. You can apply it to hyperparameter sweeps, architecture search, data augmentation experiments, or any iterative research workflow.
Here's how to replicate this with jl:
# 1. Create a dedicated instance for experiments
jl create --gpu A100 --storage 200 --name "auto-research" --yes
# 2. Create a branch for this research session
git checkout -b autoresearch/session-1
# 3. Run baseline to establish initial metric
jl run . --script train.py --on <machine_id> \
--requirements requirements.txt --json --yes
# 4. Wait for it, then check results
sleep 15 && jl run logs <run_id> --tail 50
# The agent then loops autonomously:
# 5. Edit train.py with an idea, commit, and run
jl run . --script train.py --on <machine_id> \
--requirements requirements.txt --json --yes
# 6. Check results
sleep 15 && jl run logs <run_id> --tail 30
# ... then steady polling
sleep 120 && jl run logs <run_id> --tail 50
# 7. Extract metric from logs and append to results.tsv
# Format: commit | val_metric | memory_gb | status | description
# e.g.: a1b2c3d | 1.432 | 12.5 | keep | increased hidden dim to 512
# 8. If improved: keep the commit, loop back to step 5
# If worse: git reset to revert, loop back to step 5
# If crashed: log as crash, fix or try something else
# 9. When done, pause the instance
jl pause <machine_id>
With 5-minute experiments, the agent can run ~12 experiments per hour — roughly 100 experiments in an overnight session. Check results.tsv and git log the next morning to see what your agent discovered.
To get started, install agent skills with jl setup --agents all, then ask your agent something like: "Run a hyperparameter sweep comparing learning rates 1e-3, 1e-4, and 1e-5 on an A100 using my training script." The agent will handle the rest.