Skip to main content

JarvisLabs CLI

New CLI — Part of the jarvislabs package

The jl CLI is part of the new jarvislabs package, replacing the deprecated jlclient. If you're still using jlclient, see the migration note.

The jl command-line tool lets you manage GPU instances, run training scripts, transfer files, and monitor experiments on JarvisLabs.ai — all from your terminal. It's built to work seamlessly with AI coding agents like Claude Code, Codex, Cursor, and OpenCode, so your agent can spin up GPUs, run experiments, and monitor results autonomously.

Package: jarvislabs | CLI command: jl | Version: 0.2.x | Python: 3.10+

Platform Support

Linux and macOS are fully supported. Windows is experimental and not fully tested — if you run into issues, please report them.

Jump to the Examples section for end-to-end workflows covering training runs, agent automation, filesystem management, and more.

Installation

Install with uv (recommended) or pip.

uv tool install jarvislabs

To upgrade:

uv tool upgrade jarvislabs

With pip

pip install jarvislabs

After installation, the jl command is available in your terminal.

What does jl setup do? Run it once after installing. It walks you through:

  1. Authentication — prompts for your API token (get one from jarvislabs.ai/settings/api-keys) and saves it locally
  2. Account status — shows your current balance and active instances
  3. Agent skill installation — asks which AI coding agents you use (Claude Code, Codex, Cursor, OpenCode) and installs skill files for them with your approval, so your agent knows how to use jl out of the box

Exploring the CLI with --help: Every command supports --help. It's the quickest way to see what's available, what flags a command takes, and what they do.

jl --help                  # top-level commands
jl run --help # run options, targets, lifecycle flags
jl create --help # every flag for creating an instance

Quick Start

There are two main ways to use the CLI, depending on how much control you need.

Path 1: Run a script directly on a fresh GPU

The fastest way to get started. This creates a GPU instance, uploads your code, installs dependencies, runs the script, and pauses the instance when done — all in one command.

# One-time setup
jl setup

# Check your balance and make sure you're good to go
jl status

# See which GPUs are currently available and their pricing
jl gpus

# Run a single training script on a fresh L4
# Creates instance, uploads train.py, installs requirements, runs it, pauses when done
jl run train.py --gpu L4 --requirements requirements.txt -- --epochs 50

# Or if you have a project directory, sync the whole thing
# This uploads your directory, creates a venv, installs deps, and runs the entrypoint
jl run . --script train.py --gpu A100 --requirements requirements.txt

# You can also run a setup command before your main command
jl run . --script train.py --gpu A100 \
--requirements requirements.txt \
--setup "pip install flash-attn"

# The CLI streams logs by default. Once the run finishes, the instance is auto-paused.
# If you detached (Ctrl+C) or used --no-follow, you can check logs anytime:
jl run logs <run_id> --tail 50

# Check the final status of your run
jl run status <run_id>

Path 2: Manage instances yourself

If you want more control — SSH access, reusing machines across runs, attaching filesystems, or interactive debugging — create and manage instances directly.

# One-time setup
jl setup

# See available GPUs and pricing
jl gpus

# Create a GPU instance with 100 GB storage
jl create --gpu A100 --storage 100 --name "my-experiment"

# List your instances to get the machine ID
jl list

# SSH into your instance for interactive work
jl ssh <machine_id>

# Or upload and run a script on it
jl run train.py --on <machine_id>

# Check logs while the run is going
jl run logs <run_id> --tail 50

# Upload additional files to the instance
jl upload <machine_id> ./data /home/data

# Download results when you're done
jl download <machine_id> /home/results ./results -r

# Pause when you're done - stops compute billing, keeps your data
jl pause <machine_id>

# Later, resume with the same or a different GPU
jl resume <machine_id> --gpu L4

# When you're completely done, destroy to stop all billing (including storage)
jl destroy <machine_id>

Authentication

Get your API token from jarvislabs.ai/settings/api-keys.

Interactive setup

jl setup

This authenticates, optionally installs agent skills, shows your account status, and displays a getting-started guide.

Non-interactive setup

jl setup --token YOUR_TOKEN --yes
tip

Without --yes, jl setup will still prompt for agent-skill installation even when --token is provided. Use --agents all or --yes to make setup fully non-interactive.

Environment variable

export JL_API_KEY="YOUR_TOKEN"

Token precedence

Both the CLI and SDK use the same resolution chain:

PriorityMethodUsed by
1Client(api_key="...") argumentSDK only
2JL_API_KEY environment variableCLI + SDK
3Config file (saved by jl setup)CLI + SDK

See Config file location below for config paths. See the SDK Authentication docs for more details.

Config file location

The config file is stored via platformdirs:

  • Linux: ~/.config/jl/config.toml
  • macOS: ~/Library/Application Support/jl/config.toml

Removing saved credentials

jl logout

Global Flags

These flags are available on most commands (exceptions noted below):

FlagDescription
--jsonOutput as machine-readable JSON (to stdout). Human-readable output goes to stderr.
--yes / -ySkip all confirmation prompts.
--versionPrint version and exit (root-level: jl --version).
info

--json and --yes are command-level options, not root-level — so jl list --json works correctly. Most commands support --json. --yes is only available on commands that have confirmation prompts (create, pause, resume, destroy, rename, run start, etc.). jl setup supports --yes but not --json. Read-only commands like jl gpus and jl run logs do not accept --yes.


Account Commands

jl setup

Set up the JarvisLabs CLI: authenticate and install agent skills.

OptionShortDescription
--token-tAPI token (skips interactive prompt)
--agentsComma-separated agent list: claude-code, codex, cursor, opencode, or all
--yes-ySkip confirmation prompts; auto-selects all agents
# Interactive setup
jl setup

# Non-interactive with token and all agent skills
jl setup --token YOUR_TOKEN --agents all --yes

# Install skills for specific agents only
jl setup --agents claude-code,cursor

If already authenticated, jl setup will show your current login and ask to re-authenticate. The --agents flag controls which coding agent skill files are installed:

AgentSkill file path
claude-code~/.claude/skills/jarvislabs/SKILL.md
codex~/.agents/skills/jarvislabs/SKILL.md
cursor~/.cursor/skills/jarvislabs/SKILL.md
opencode~/.config/opencode/skills/jarvislabs/SKILL.md

jl logout

Remove the saved API token from the config file. Supports --json for scripted usage.

jl logout

jl status

Show account info: name, user ID, balance, grants, and running/paused instance counts.

jl status
jl status --json
info

JSON output includes additional fields not shown in the human-readable table: running VMs, paused VMs, active deployments, filesystems, and billing currency.

jl gpus

Show GPU types with availability, region, VRAM, RAM, CPUs, and hourly pricing. Available GPUs are marked with a green dot, unavailable with a dim circle.

jl gpus
jl gpus --json

jl templates

List available framework templates that can be used with --template when creating instances (e.g. pytorch, tensorflow, jax) or create VMs with --vm).

jl templates
jl templates --json

Regions & GPUs

JarvisLabs has three regions, each with different GPU types available. When creating an instance, the CLI auto-selects the best region based on your chosen GPU — or you can pin a specific region with --region.

RegionAvailable GPUs
IN1RTX5000, A5000Pro, A6000, RTX6000Ada, A100
IN2L4, A100, A100-80GB
EU1H100, H200

Run jl gpus to see real-time availability and pricing for each GPU type.

Storage & Template Constraints
  • EU1 region: supports 1 or 8 GPUs per instance only, 100 GB minimum storage (auto-bumped if you specify less)
  • VM instances (--vm): 100 GB minimum storage (auto-bumped if you specify less)
  • VM instances are only available in IN2 and EU1 regions, and requires at least one SSH key registered

Instance Commands

Manage the full lifecycle of GPU instances — from creation to teardown. Instances come in two types: containers (pre-configured with PyTorch, Jupyter, IDE — the default) and VMs (bare-metal SSH access, created with --vm).

jl list

List all your instances with their ID, name, status, GPU type, GPU count, storage, region, cost, and template.

jl list
jl list --json

jl get <machine_id>

Show full details of a specific instance including SSH command, notebook URL, HTTP ports, and endpoint URLs.

jl get 12345
jl get 12345 --json

jl create

Create a new GPU instance. The command blocks until the instance reaches Running status, so when it returns, your instance is ready to use.

OptionShortDefaultDescription
--gpu-g(required)GPU type (run jl gpus to see options)
--vmCreate a VM instance (SSH-only, no container)
--template-tpytorchFramework template for containers (not used with --vm)
--storage-s40Storage in GB
--name-n"Name me"Instance name (max 40 chars, letters/numbers/spaces/hyphens/underscores only)
--num-gpus1Number of GPUs
--regionRegion pin (e.g. IN1, IN2, EU1)
--http-portsComma-separated HTTP ports to expose (e.g. 7860,8080)
--script-idStartup script ID to run on launch
--script-argsArguments passed to the startup script
--fs-idFilesystem ID to attach
--yes-ySkip confirmation
--jsonOutput as JSON
# Basic instance
jl create --gpu L4

# H100 with more storage and a name
jl create --gpu H100 --storage 200 --name "training-box"

# With a startup script and filesystem
jl create --gpu A100 --script-id 42 --fs-id 10

# Pin to a region
jl create --gpu A100 --region EU1

# Expose HTTP ports
jl create --gpu L4 --http-ports "7860,8080"

# VM instance (requires SSH key - add one first with jl ssh-key add)
jl create --gpu A100-80GB --vm --name "my-vm"

# Non-interactive
jl create --gpu L4 --yes --json

Prompts for confirmation unless --yes is passed. See Regions & GPUs for which GPUs are available in each region and storage constraints.

jl pause <machine_id>

Pause a running instance. Compute billing stops; a small storage cost continues.

OptionShortDescription
--yes-ySkip confirmation
--jsonOutput as JSON
jl pause 12345
jl pause 12345 --yes --json

jl resume <machine_id>

Resume a paused instance. You can also use this opportunity to change the GPU type, expand storage, rename the instance, or attach a different startup script or filesystem. The command blocks until the instance is running again.

OptionShortDescription
--gpu-gResume with a different GPU type
--num-gpusChange number of GPUs
--storage-sExpand storage in GB (can only increase, never shrink)
--name-nRename instance on resume
--http-portsChange exposed HTTP ports (e.g. 7860,8080)
--script-idStartup script ID to run on resume
--script-argsArguments for the startup script
--fs-idFilesystem ID to attach
--yes-ySkip confirmation
--jsonOutput as JSON
# Resume with defaults
jl resume 12345

# Resume with a bigger GPU
jl resume 12345 --gpu H100

# Resume with more storage and a new name
jl resume 12345 --storage 200 --name "upgraded"
Region Lock & ID Changes

Resume is region-locked — an instance always resumes in its original region. If you request a GPU type not available in that region, the API returns an error.

Resume may also assign a new machine ID. The CLI warns you when this happens. Always use the returned ID for subsequent operations.

jl destroy <machine_id>

Permanently delete an instance and all its data.

warning

This action is irreversible. All data on the instance is lost. If you need to keep data across instances, use a filesystem.

OptionShortDescription
--yes-ySkip confirmation
--jsonOutput as JSON
jl destroy 12345
jl destroy 12345 --yes --json

jl rename <machine_id>

Rename an instance.

OptionShortDescription
--name-nNew instance name (required, max 40 characters)
--yes-ySkip confirmation
--jsonOutput as JSON
jl rename 12345 --name "experiment-v2"

SSH, Exec & File Transfer

These commands let you interact directly with running instances — open a shell, run commands remotely, or transfer files back and forth.

jl ssh <machine_id>

SSH into a running instance. This opens an interactive shell session.

OptionShortDescription
--print-command-pPrint the raw SSH command to stdout instead of connecting
--jsonOutput the SSH command as JSON
# Interactive session
jl ssh 12345

# Get the SSH command for use in scripts
jl ssh 12345 --print-command

The instance must be in Running status. If paused, you'll be told to resume it first.

tip

--print-command and --json output the stored SSH command regardless of instance status — useful for scripting and automation.

jl exec <machine_id> -- <command>

Run a command on a running instance and stream the output back to your terminal. The -- separator is required so jl can distinguish your remote command from its own flags.

OptionShortDescription
--jsonCapture output as JSON with stdout, stderr, and exit_code fields
# Check GPU
jl exec 12345 -- nvidia-smi

# Run Python
jl exec 12345 -- python -c "import torch; print(torch.cuda.device_count())"

# List files
jl exec 12345 -- ls -la /home

# Use shell features (pipes, redirection) - wrap in sh -lc
jl exec 12345 -- sh -lc 'grep "loss" /home/output.log | tail -5'

# Structured output for scripting
jl exec 12345 --json -- nvidia-smi

The exit code of the remote command is propagated as the exit code of jl exec.

tip

If your remote command uses pipes, redirection, or other shell features, wrap it in sh -lc '...' as shown above. Without the wrapper, each argument is treated as a separate command argument rather than shell syntax.

jl upload <machine_id> <source> [dest]

Upload a local file or directory to a running instance. If no remote destination is given, it uploads to the instance's home directory (/home/ for containers, /home/<user>/ for VMs).

Directories are uploaded recursively automatically.

OptionShortDescription
--jsonOutput upload result as JSON
# Upload a file (lands at /home/data.csv)
jl upload 12345 ./data.csv

# Upload a directory (lands at /home/my-project/)
jl upload 12345 ./my-project

# Upload to a specific remote path
jl upload 12345 ./config.yaml /home/config.yaml

jl download <machine_id> <source> [dest] [-r]

Download a file or directory from a running instance. If no local destination is given, it saves to ./<filename> in the current directory.

OptionShortDescription
--recursive-rDownload directories recursively
--jsonOutput download result as JSON
# Download a file (saves to ./results.csv)
jl download 12345 /home/results.csv

# Download to a specific local path
jl download 12345 /home/results.csv ./my-results.csv

# Download a directory
jl download 12345 /home/outputs ./local-outputs -r

Managed Runs

Managed runs are the fastest way to run scripts on GPU instances. A single jl run command handles uploading your code, setting up a Python virtual environment (via uv), installing dependencies (auto-detected from your project or specified with --requirements), running your command in the background, and tracking logs.

Runs persist in the background even if you disconnect or close your terminal. Logs, status, and lifecycle are tracked locally in ~/.jl/runs/.

Run Targets

TargetWhat happens
train.pyUploads the single file, runs python3 train.py
run.shUploads the single file, runs bash run.sh
. or ./my-projectSyncs the directory via rsync (excludes .venv/, .git/, __pycache__/), runs --script inside it (requires rsync locally)
(no target)Runs the command given after -- directly on the instance

Only .py and .sh file targets are supported directly. For other file types, use a directory target or jl upload + jl exec.

Starting a Run on an Existing Instance

jl run <target> --on <machine_id> [options] [-- extra args]
# Run a Python file
jl run train.py --on 12345

# Upload a directory and run a script inside it
jl run . --script train.py --on 12345

# Pass arguments to your script
jl run train.py --on 12345 -- --epochs 50 --lr 0.001

# Run an arbitrary remote command (no upload)
jl run --on 12345 -- python -c "print('hello from GPU')"

Starting a Run on a Fresh Instance

jl run <target> --gpu <gpu_type> [options] [-- extra args]

This creates a new instance, uploads your code, runs the command, and handles instance lifecycle when done.

# Run on a fresh L4
jl run train.py --gpu L4

# With requirements
jl run . --script train.py --gpu A100 --requirements requirements.txt

# Destroy instance after run (no leftover costs)
jl run train.py --gpu L4 --destroy

# Keep instance running after run (for debugging)
jl run train.py --gpu L4 --keep

You must use either --on or --gpu, not both.

All Start Options

OptionShortDefaultDescription
--onRun on an existing instance (machine ID)
--gpu-gCreate a fresh instance with this GPU type
--vmCreate a VM instead of a container (fresh instances only)
--scriptEntrypoint script path inside a directory target
--template-tpytorchFramework template for containers (not used with --vm)
--storage-s40Storage in GB (fresh instances only)
--name-njl-runInstance name (fresh instances only)
--num-gpus1Number of GPUs (fresh instances only)
--regionRegion pin, e.g. IN1, EU1 (fresh instances only)
--http-portsComma-separated HTTP ports to expose (fresh instances only)
--requirementsOverride auto-detection: upload and install this file instead
--setupShell command to run before the main command
--follow / --no-follow--followStream logs after starting the run
--pausePause fresh instance after the run (default for fresh)
--destroyDestroy fresh instance after the run
--keepLeave fresh instance running after the run
--yes-ySkip confirmation prompts
--jsonOutput as JSON

Environment & Dependency Management

For file and directory targets, jl run automatically creates and manages a Python virtual environment on the remote instance using uv. The environment is designed to work seamlessly with JarvisLabs templates — you get both template packages (like PyTorch and CUDA) and your project's dependencies without extra configuration.

How it works

Every managed run creates a .venv inside the project's working directory on the remote machine. This venv:

  • Inherits template packages. If you chose the pytorch template, import torch works immediately — no need to install it yourself. The same applies to any package pre-installed by the template (CUDA libraries, numpy, etc.).
  • Has pip and uv available. Both pip install and uv pip install work inside the venv and install packages into the venv, not the system Python.
  • Persists across runs. On the same instance with the same target, the venv is reused. Previously installed packages are still there, so re-runs are fast.

Auto-detection of dependencies

For directory targets, the CLI checks your local directory before uploading and automatically installs dependencies:

  1. If pyproject.toml exists with a [project] table → installs from [project].dependencies
  2. Otherwise, if requirements.txt exists → installs from it
  3. If neither exists → no packages installed; template packages are enough

This means for most projects, you don't need to pass any flags — just make sure your requirements.txt or pyproject.toml is in the directory, and jl run handles the rest. Other dependency formats (uv.lock, poetry.lock, Pipfile) are not auto-detected — use --requirements with a requirements.txt for those projects.

For single file targets (e.g., jl run train.py), there is no directory to scan, so auto-detection does not apply. Use --requirements to specify a requirements file if needed.

For command-mode runs (no target, raw command after --), there is no venv or dependency installation. The command runs directly on the instance's system Python with template packages available. --setup still works as a pre-command hook, but --requirements is not available.

The --requirements flag

Use --requirements to override auto-detection. When provided, the specified file is uploaded to the remote and installed instead of any auto-detected file. This is useful when:

  • You want to use a different requirements file than the one in your project directory
  • You're running a single file target and need extra packages
  • Your pyproject.toml has a [project] table but you'd rather install from a separate requirements file
# Auto-detect (recommended for most projects)
jl run . --script train.py --gpu L4

# Override with a custom file
jl run . --script train.py --gpu L4 --requirements custom-reqs.txt

# Single file with requirements
jl run train.py --gpu L4 --requirements requirements.txt

The --setup flag

Use --setup to run a shell command after dependency installation but before your script. This is the escape hatch for anything that isn't a Python package — system libraries, compiled extensions, environment variables, or quick one-off installs.

# Install a system library (containers run as root; use sudo on VMs)
jl run . --script train.py --on 12345 --setup "apt-get update && apt-get install -y libsndfile1"

# Install a package that needs special flags
jl run . --script train.py --on 12345 --setup "pip install flash-attn --no-build-isolation"

# Set environment variables
jl run . --script train.py --on 12345 --setup "export CUDA_VISIBLE_DEVICES=0"

For recurring system-level setup (things you need on every instance boot), consider using startup scripts instead of --setup. Startup scripts run automatically when an instance is created or resumed, so you don't have to repeat the setup on every run.

The full setup chain

When you start a managed run with a file or directory target, the CLI executes these steps in order on the remote machine (chained with &&, so any failure stops the chain):

  1. uv installed if missing
  2. .venv created if it doesn't exist (with template package visibility and pip)
  3. .venv activated
  4. Dependencies installed — from auto-detected pyproject.toml or requirements.txt, or from --requirements if provided
  5. --setup command executed (if provided)
  6. Your script runs

The run logs show which dependency file was detected:

[jl] Installing from requirements.txt    # auto-detected
[jl] Installing from pyproject.toml # auto-detected
[jl] Installing from custom-reqs.txt # --requirements override
[jl] No dependency file detected, using template packages # nothing found
Template packages and torch

Template packages (like PyTorch) are available in the venv without installing them. However, if your requirements.txt or pyproject.toml lists torch as a dependency, uv will re-download and install it into the venv — this is because uv does not check system packages during dependency resolution. This is harmless (the correct version is installed) but wastes bandwidth on the first run. To avoid this, omit template packages from your dependency files and let the template provide them.

Recommended workflow
  • For most projects: Put your extra dependencies in requirements.txt or pyproject.toml. Don't include packages that the template already provides (torch, CUDA, numpy). Run jl run . --script train.py --gpu L4 and let auto-detection handle the rest.
  • For quick experiments: A single Python file with no dependencies works out of the box on a pytorch template — import torch just works.
  • For system-level setup: Use --setup for one-off commands, or startup scripts for recurring setup.
  • For AI coding agents: Agents should use --json --yes and monitor via jl run logs. The auto-detection, echo logging, and --requirements override all work identically in agent workflows.

Lifecycle Flags (Fresh Instances Only)

When creating a fresh instance with --gpu, these flags control what happens after the run completes:

FlagBehavior
--pausePause the instance after the run (default for fresh instances)
--destroyDestroy the instance — no leftover costs
--keepLeave the instance running (for debugging or follow-up work)

Only one lifecycle flag can be used at a time. These flags cannot be used with --on (existing instances are not touched after the run).

Detaching from fresh instances

--no-follow for fresh instances requires --keep. Since --pause and --destroy need the CLI to be connected when the run ends to perform the lifecycle action, they are incompatible with --no-follow. If you detach (Ctrl+C or --no-follow), the automatic lifecycle action will not happen — the instance stays running and billing continues. Manage it manually with jl pause or jl destroy.

Follow vs No-Follow

By default, jl run streams logs after starting (--follow). Press Ctrl+C to detach — the run keeps going in the background. Without --tail, --follow initially shows the last 20 lines before streaming new output.

# Default: stream logs, auto-pause when done
jl run train.py --gpu L4

# Detached: start and return immediately (requires --keep for fresh instances)
jl run train.py --gpu L4 --keep --no-follow

# Detached on existing instance (no lifecycle flag needed)
jl run train.py --on 12345 --no-follow

jl run logs <run_id>

View logs from a managed run.

OptionShortDescription
--follow-fStream logs in real time (press Ctrl+C to stop)
--tail-nShow only the last N lines (minimum: 1)
--jsonOutput as JSON with content and run_exit_code fields
# Full log output
jl run logs r_abc123

# Last 50 lines
jl run logs r_abc123 --tail 50

# Stream logs live
jl run logs r_abc123 --follow

# Stream with initial context
jl run logs r_abc123 --follow --tail 100

# JSON output with exit code (for scripting/agents)
jl run logs r_abc123 --tail 50 --json

JSON output fields:

FieldDescription
run_idThe run identifier
machine_idInstance the run is on
remote_logPath to the log file on the remote instance
contentThe log text (last N lines if --tail used, full log otherwise)
run_exit_codenull = still running, 0 = succeeded, non-zero = failed
info

--json is not supported with --follow. Without --tail, the entire log file is returned — this can be very large for long-running jobs.

Non-JSON output shows raw logs with a header and footer indicating run state:

--- run r_abc123 | machine 12345 | running ---

step=100 loss=2.31
step=200 loss=2.11

--- still running | log: /home/jl-runs/r_abc123/output.log ---

jl run status <run_id>

Show the current state of a run.

OptionShortDescription
--jsonOutput as JSON

Possible states: running, succeeded, failed, instance-paused, instance-pausing, instance-missing, instance-creating, instance-resuming, instance-destroying, instance-failed, unknown.

jl run status r_abc123
jl run status r_abc123 --json

jl run stop <run_id>

Stop a managed run by sending TERM to its process group. The instance itself is not affected.

OptionShortDescription
--jsonOutput as JSON
jl run stop r_abc123
jl run stop r_abc123 --json

If the process doesn't exit after TERM, it escalates to SIGKILL. If the run has already finished, it reports the final state without error.

jl run list

List all locally tracked managed runs (most recent first).

OptionShortDescription
--refreshCheck live status for each run by querying the instance (slower)
--machine-mFilter by instance ID
--limit-lShow only the N most recent runs
--status-sFilter by state (e.g. running, succeeded, failed). Implies --refresh.
--jsonOutput as JSON
# All runs (shows "saved" state without live check)
jl run list

# With live status refresh
jl run list --refresh

# Filter by instance
jl run list --machine 12345

# Most recent 5 runs
jl run list --limit 5

# Only running jobs
jl run list --status running

# For scripting
jl run list --refresh --json

Without --refresh, the state column shows saved (from the local record). Use --refresh or --status to query each instance for live state. Using --status automatically implies --refresh.

Implicit start Subcommand

jl run <target> is shorthand for jl run start <target>. The start subcommand is implied when the first argument isn't a known subcommand (list, status, logs, stop).

# These are equivalent:
jl run train.py --gpu L4
jl run start train.py --gpu L4

Run Tracking is Local

info

All run management commands (jl run logs, jl run status, jl run stop, jl run list) depend on local records stored under ~/.jl/runs/. You need to start and monitor runs from the same machine. If the local record is missing, the run_id alone is not enough to interact with the run.

Each run record is a JSON file at ~/.jl/runs/<run_id>.json containing the machine ID, remote log path, PID file path, exit code path, and launch command.


SSH Key Commands

SSH keys are required if you want to create VM instances (--vm) (bare-metal SSH access without a pre-configured container). You can manage your keys with jl ssh-key.

jl ssh-key list

List all SSH keys (ID, name, and truncated key).

jl ssh-key list
jl ssh-key list --json

jl ssh-key add <pubkey_file>

Add an SSH public key.

OptionShortDescription
--name-nName for this key (required)
--jsonOutput as JSON
jl ssh-key add ~/.ssh/id_ed25519.pub --name "my-laptop"

jl ssh-key remove <key_id>

Remove an SSH key.

OptionShortDescription
--yes-ySkip confirmation
--jsonOutput as JSON
jl ssh-key remove abc123

Startup Script Commands

Startup scripts are shell scripts that run automatically whenever an instance launches or resumes — useful for installing dependencies, pulling data, or setting up your environment. You can manage them with jl scripts.

jl scripts list

List startup scripts (ID and name).

jl scripts list
jl scripts list --json

jl scripts add <script_file>

Add a startup script.

OptionShortDescription
--name-nScript name (defaults to filename without extension)
--jsonOutput as JSON
jl scripts add ./setup.sh --name "install-deps"

jl scripts update <script_id> <script_file>

Replace the contents of an existing startup script.

OptionShortDescription
--jsonOutput as JSON
jl scripts update 42 ./setup-v2.sh

jl scripts remove <script_id>

Remove a startup script.

OptionShortDescription
--yes-ySkip confirmation
--jsonOutput as JSON
jl scripts remove 42

Filesystem Commands

Filesystems are persistent storage volumes that survive instance pause, resume, and even destroy cycles. They're ideal for datasets, model checkpoints, or any data you want to reuse across multiple instances. You can manage them with jl filesystem.

Filesystems are region-bound

Each filesystem is tied to the region where it was created. A filesystem created in IN2 is only accessible from IN2 instances. Data saved on an IN2 filesystem will not appear on an IN1 instance, even if you attach the same fs_id. Use jl filesystem list to see each filesystem's region.

jl filesystem list

List filesystems (ID, name, storage, region).

jl filesystem list
jl filesystem list --json

jl filesystem create

Create a new filesystem.

OptionShortDescription
--name-nFilesystem name (required, max 30 characters)
--storage-sStorage in GB (required, 50–2048)
--yes-ySkip confirmation
--jsonOutput as JSON
jl filesystem create --name "datasets" --storage 200

jl filesystem edit <fs_id>

Expand filesystem storage. Can only increase, never shrink.

OptionShortDescription
--storage-sNew storage size in GB (required)
--yes-ySkip confirmation
--jsonOutput as JSON
jl filesystem edit 10 --storage 500
info

edit may return a new filesystem ID. Always use the returned value for subsequent operations.

jl filesystem remove <fs_id>

Delete a filesystem.

OptionShortDescription
--yes-ySkip confirmation
--jsonOutput as JSON
jl filesystem remove 10

JSON Mode for Scripting

Most commands support --json for machine-readable output. JSON goes to stdout; human-readable status messages go to stderr.

# Instance list as JSON
jl list --json

# Create and capture the machine ID
RESULT=$(jl create --gpu L4 --yes --json)
MACHINE_ID=$(echo "$RESULT" | jq .machine_id)

# GPU availability pipeline
jl gpus --json | jq '.[] | select(.num_free_devices > 0) | .gpu_type'

# Run status in scripts
jl run status r_abc123 --json | jq .state

# Check if a run is still going
EXIT_CODE=$(jl run logs r_abc123 --tail 1 --json | jq .run_exit_code)

When --json is active:

  • Spinners and progress indicators are suppressed
  • Errors from jl itself (bad arguments, auth failures, etc.) are emitted as {"error": "..."} to stdout. Commands like jl exec --json return their own structured payload (with exit_code, stdout, stderr) even on non-zero exit
  • Exit codes are still set appropriately
  • For jl run start, --json returns immediately after the run is started (before log streaming), so lifecycle flags (--pause, --destroy) will not execute — use --keep when combining --json with fresh instances
tip

--json does not suppress confirmation prompts. Always use --yes alongside --json in scripts and agent workflows.


Shell Completion

Enable tab completion for your shell:

jl --install-completion

Supports bash, zsh, and fish.


Using with AI Coding Agents

One of the primary use cases for the jl CLI is letting AI coding agents manage GPU infrastructure on your behalf. Instead of manually creating instances, uploading code, and monitoring runs, you can let your agent handle the entire workflow — from provisioning a GPU to downloading results.

The CLI supports four major coding agents: Claude Code, Codex, Cursor, and OpenCode. During jl setup, you'll be asked which agents you use, and skill files are installed automatically to teach your agent how to use jl effectively.

Agent Setup

# Interactive: authenticates and asks which agents to install skills for
jl setup

# Non-interactive: installs skills for all supported agents
jl setup --token YOUR_TOKEN --agents all --yes

# Install skills for specific agents only
jl setup --agents claude-code,cursor
tip

Once skills are installed, your coding agent already knows how to use jl. Try asking it: "Spin up an A100, run my training script, and download the results when it's done."

Mental Model

ConceptCLIPurpose
Instancejl create/list/pause/...A machine — create, pause, resume, destroy, SSH into
Runjl runA managed job with log file + PID tracking
Execjl execQuick one-off commands for system checks and debugging

Core Rules for Agent Workflows

  1. Always use --yes on commands with confirmation prompts (create, pause, resume, destroy, run start) — agents can't answer interactive prompts
  2. Use --json for structured data — use it on commands where the agent needs to parse output (create, gpus, run start, instance list). For jl run logs, the default output is designed for agents — the header/footer shows run ID, machine ID, and state in a readable format
  3. Always use --json when starting runs — it returns immediately. Without --json, the CLI streams logs and blocks
  4. Always use --tail N when reading logs — full logs can be enormous
  5. Do an early failure check — wait 15s after starting a run and check logs once. This catches fast failures (import errors, missing files, pip issues) before committing to a long polling loop
  6. Then poll at steady intervals — 60-120s for short jobs, 180-600s for long training runs

The Agent Monitoring Loop

This is the primary pattern for running and monitoring GPU jobs:

# 1. Start a detached run
jl run train.py --on <machine_id> --yes --json
# returns {"run_id": "r_abc123", ...}

# 2. Early failure check - catches import errors, bad paths, pip failures fast
sleep 15 && jl run logs r_abc123 --tail 30

# 3. If still running, poll at steady intervals
sleep 120 && jl run logs r_abc123 --tail 50

# The log output shows a header and footer with run state:
# --- run r_abc123 | machine 12345 | running ---
# <log output>
# --- still running | log: /home/jl-runs/r_abc123/output.log ---
#
# When done:
# --- run r_abc123 | machine 12345 | succeeded (exit 0) ---
# <log output>
# --- succeeded | exit code: 0 | log: /home/jl-runs/r_abc123/output.log ---
#
# On failure:
# --- run r_abc123 | machine 12345 | failed (exit 1) ---
# <log output>
# --- failed | exit code: 1 | log: /home/jl-runs/r_abc123/output.log ---

The log output is the primary monitoring primitive — the header gives you the run ID and machine ID, and the footer tells you whether the run is still going or finished (with exit code).

Agent Workflow Example (End-to-End)

# 1. Check GPU availability
jl gpus --json

# 2. Create an instance
jl create --gpu L4 --storage 50 --yes --json
# returns {"machine_id": 12345, ...}

# 3. Start a detached run
jl run . --script train.py --on 12345 --requirements requirements.txt --yes --json
# returns {"run_id": "r_abc123", ...}

# 4. Early failure check - catches crashes fast
sleep 15 && jl run logs r_abc123 --tail 30

# 5. If still running, poll at steady intervals (repeat until footer shows exit code)
sleep 120 && jl run logs r_abc123 --tail 50

# 6. Download results
jl download 12345 /home/results ./results -r

# 7. Clean up
jl pause 12345 --yes --json

Starting Runs on Fresh Instances (Agent Mode)

When the agent needs to create a fresh instance inline:

jl run . --script train.py --gpu L4 --keep --json --yes

Key points:

  • --keep is required with --no-follow for fresh instances (the CLI will error without it)
  • The agent must manually pause or destroy the instance after the run completes
  • Additional fresh-instance flags: --template, --storage, --num-gpus, --region, --http-ports

Use separate jl create when you need to inspect GPU availability first, reuse machines across runs, or attach filesystems/scripts beforehand.

Quick System Checks with Exec

jl exec <id> --json -- nvidia-smi
jl exec <id> --json -- ps -ef
jl exec <id> --json -- df -h

For pipes or shell syntax, wrap in sh -lc:

jl exec <id> --json -- sh -lc 'grep "loss" /path/to/log | tail -5'
Skill files handle this for you

All of the patterns above — the monitoring loop, early failure checks, polling intervals, --tail, and more — are included in the skill files that jl setup installs for your agent. Once skills are installed, your agent already knows how to use jl correctly. You don't need to teach it these patterns yourself.

File Persistence Rules

The remote home directory (typically /home/ on containers, /home/<user>/ on VMs) persists across pause/resume cycles. Everything else is ephemeral.

Persists across pause/resume:

  • Files in the home directory (/home/ or /home/<user>/)
  • Uploaded directories: <home>/<directory_name>/
  • Uploaded files (via jl upload): <home>/<filename>
  • Run metadata: <home>/jl-runs/<run_id>/
  • .venv created inside the project directory
  • Attached filesystems

Lost on pause:

  • System-level installs (apt-get, global pip packages)
  • Files outside the home directory (/tmp, /root, etc.)

Use --setup or --requirements to reinstall dependencies on each run, or use startup scripts for recurring setup.

Anti-Patterns

Don'tWhy
Start runs without --jsonWithout --json, the CLI streams logs and blocks the agent
Use jl run logs --followBlocks forever; --json is also incompatible with --follow
Read full logs (omit --tail N)Can return megabytes of output, overwhelming context
Poll every few secondsWasteful and noisy; use 60–600s intervals
Use lifecycle flags with --on--keep, --pause, --destroy only apply to fresh instances
Forget to pause/destroy instancesThey cost money while running

Examples

Train on a fresh GPU, auto-pause when done

The simplest workflow — run a training script on a fresh GPU with dependencies. The instance is automatically paused when the script finishes, so you only pay for compute time.

jl run train.py --gpu L4 --requirements requirements.txt -- --epochs 100
# Instance created > code uploaded > deps installed > training runs > instance paused

Run a project directory with setup

When your project has multiple files, sync the entire directory and specify the entrypoint with --script. The CLI uses rsync under the hood, so only changed files are transferred on subsequent runs — making re-runs on the same instance fast even with large projects. You can also run custom setup commands before training starts.

jl run . --script train.py --gpu A100 \
--requirements requirements.txt \
--setup "pip install flash-attn" \
-- --batch-size 32 --lr 1e-4

Multi-GPU training

For large-scale training, you can request multiple GPUs on a single instance. Check Regions & GPUs for available GPU counts per region.

# 8x H100 in EU1 for distributed training
jl create --gpu H100 --num-gpus 8 --region EU1 --storage 500 --name "distributed-training"

# Upload your project and run with torchrun for multi-GPU
jl run . --script train.py --on <machine_id> \
--requirements requirements.txt \
--setup "pip install flash-attn" \
-- --num_gpus 8

Long-running job with manual control

For jobs where you want full control — create an instance, start a detached run, monitor at your own pace, and clean up when done.

# Create an instance
jl create --gpu A100 --storage 200 --name "research"

# Sync project and start a background run (--no-follow detaches from logs)
jl run ./my-project --script train.py --on <machine_id> --no-follow

# Monitor later
jl run status <run_id>
jl run logs <run_id> --tail 100
jl run logs <run_id> --follow

# Pause when done
jl pause <machine_id>

Detached run on existing instance

Start a run and come back to check on it later — the run continues in the background even if you close your terminal.

# Start without following
jl run train.py --on <machine_id> --no-follow

# Check on it later
jl run logs <run_id> --tail 50

# Stop it if needed
jl run stop <run_id>

Persistent data with filesystems

Filesystems let you keep datasets and model checkpoints across instances. Create a filesystem once, attach it to any instance in the same region, and your data is always available — even after destroying the instance. Note that filesystems are region-bound — an IN2 filesystem is only accessible from IN2 instances.

# Create a filesystem for datasets
jl filesystem create --name "datasets" --storage 500

# Create an instance with the filesystem attached
jl create --gpu A100 --fs-id <fs_id> --name "training"

# Run your training - the filesystem is attached and accessible on the instance
jl run train.py --on <machine_id>

# Done with training? Destroy the instance - data is safe in the filesystem
jl destroy <machine_id>

# Spin up a cheaper GPU for inference, same data
jl create --gpu L4 --fs-id <fs_id> --name "inference"

VM workflow (bare metal SSH access)

VM instances give you a clean Linux machine with SSH access instead of a pre-configured container. You'll need to register an SSH key first.

# Add your SSH key
jl ssh-key add ~/.ssh/id_ed25519.pub --name "my-key"

# Create a VM instance (available in IN2 and EU1 only)
jl create --gpu A100-80GB --vm --name "my-vm"

# SSH in
jl ssh <machine_id>

Scripting with JSON and jq

Most commands support --json output (except jl setup), making it easy to build automation pipelines with jq.

# Get IDs of all running instances
jl list --json | jq '[.[] | select(.status == "Running") | .machine_id]'

# Find cheapest available GPU
jl gpus --json | jq '[.[] | select(.num_free_devices > 0)] | sort_by(.price_per_hour) | .[0].gpu_type'

# Pause all running instances
for id in $(jl list --json | jq -r '.[] | select(.status == "Running") | .machine_id'); do
jl pause "$id" --yes --json
done

# Check if a run is still going
jl run logs <run_id> --tail 1 --json | jq .run_exit_code

Autonomous research with coding agents

One of the most powerful patterns is letting a coding agent drive the entire research loop autonomously. Andrej Karpathy's autoresearch is a great example of this — an AI agent autonomously edits training code, runs experiments, checks metrics, and iterates, accumulating only improvements. In Karpathy's own run, the agent evaluated ~700 experimental changes over 2 days, found ~20 additive improvements, and achieved an 11% reduction in Time-to-GPT-2.

The core loop works like this:

  1. Agent modifies train.py with an experimental idea and commits the change
  2. Agent runs the experiment on a GPU (via jl run)
  3. Agent reads the results from logs (via jl run logs) and extracts the target metric
  4. Agent logs the result — appends the commit hash, metric value, and a description to a results.tsv file so every experiment (successes and failures) is tracked
  5. If metrics improved — keep the commit, the branch advances
  6. If metrics got worse or it crashedgit reset to revert, try a different idea

The key insight is that the git branch only contains improvements (each commit is guaranteed better than the last), while results.tsv records the full history of all experiments including dead ends. This gives you a clean chain of improvements you can review, plus a complete log for analysis.

This pattern works for any ML problem — not just GPT training. You can apply it to hyperparameter sweeps, architecture search, data augmentation experiments, or any iterative research workflow.

Here's how to replicate this with jl:

# 1. Create a dedicated instance for experiments
jl create --gpu A100 --storage 200 --name "auto-research" --yes

# 2. Create a branch for this research session
git checkout -b autoresearch/session-1

# 3. Run baseline to establish initial metric
jl run . --script train.py --on <machine_id> \
--requirements requirements.txt --json --yes

# 4. Wait for it, then check results
sleep 15 && jl run logs <run_id> --tail 50

# The agent then loops autonomously:

# 5. Edit train.py with an idea, commit, and run
jl run . --script train.py --on <machine_id> \
--requirements requirements.txt --json --yes

# 6. Check results
sleep 15 && jl run logs <run_id> --tail 30
# ... then steady polling
sleep 120 && jl run logs <run_id> --tail 50

# 7. Extract metric from logs and append to results.tsv
# Format: commit | val_metric | memory_gb | status | description
# e.g.: a1b2c3d | 1.432 | 12.5 | keep | increased hidden dim to 512

# 8. If improved: keep the commit, loop back to step 5
# If worse: git reset to revert, loop back to step 5
# If crashed: log as crash, fix or try something else

# 9. When done, pause the instance
jl pause <machine_id>

With 5-minute experiments, the agent can run ~12 experiments per hour — roughly 100 experiments in an overnight session. Check results.tsv and git log the next morning to see what your agent discovered.

tip

To get started, install agent skills with jl setup --agents all, then ask your agent something like: "Run a hyperparameter sweep comparing learning rates 1e-3, 1e-4, and 1e-5 on an A100 using my training script." The agent will handle the rest.