Skip to main content

JarvisLabs CLI

New CLI — Part of the jarvislabs package

The jl CLI is part of the new jarvislabs package, replacing the deprecated jlclient. If you're still using jlclient, see the migration note.

The jl command-line tool lets you manage GPU instances, run training scripts, transfer files, and monitor experiments on JarvisLabs.ai — all from your terminal. It's built to work seamlessly with AI coding agents like Claude Code, Codex, Cursor, and OpenCode, so your agent can spin up GPUs, run experiments, and monitor results autonomously.

Package: jarvislabs | CLI command: jl | Version: 0.2.x (beta)

Beta Software

The jl CLI is in beta. Commands and options may change between releases. Pin your version in CI/automation scripts and check the changelog when upgrading.

Platform Support

Linux and macOS are fully supported. Windows is experimental and not fully tested — if you run into issues, please report them.

Want to see what the CLI can do?

Jump to the Examples section for end-to-end workflows covering training runs, agent automation, filesystem management, and more.

Installation

The package is currently in beta. Install with the --pre flag to get the latest prerelease.

uv tool install --pre jarvislabs

To upgrade:

uv tool upgrade --pre jarvislabs

With pip

pip install --pre jarvislabs

After installation, the jl command is available in your terminal.

What does jl setup do?

Run jl setup once after installing from your terminal. It walks you through:

  1. Authentication — prompts for your API token (get one from jarvislabs.ai/settings/api-keys) and saves it locally
  2. Account status — shows your current balance and active instances
  3. Agent skill installation — asks which AI coding agents you use (Claude Code, Codex, Cursor, OpenCode) and installs skill files for them with your approval, so your agent knows how to use jl out of the box
Exploring the CLI with --help

Every command and subcommand supports --help. It's the quickest way to see what's available, what flags a command takes, and what they do. You can pretty much learn the entire CLI from help alone.

jl --help                  # top-level commands
jl instance --help # all instance subcommands
jl run --help # run options, targets, lifecycle flags
jl instance create --help # every flag for creating an instance

Quick Start

There are two main ways to use the CLI, depending on how much control you need.

Path 1: Run a script directly on a fresh GPU

The fastest way to get started. This creates a GPU instance, uploads your code, installs dependencies, runs the script, and pauses the instance when done — all in one command.

# One-time setup
jl setup

# Check your balance and make sure you're good to go
jl status

# See which GPUs are currently available and their pricing
jl gpus

# Run a single training script on a fresh RTX5000
# Creates instance, uploads train.py, installs requirements, runs it, pauses when done
jl run train.py --gpu RTX5000 --requirements requirements.txt -- --epochs 50

# Or if you have a project directory, sync the whole thing
# This uploads your directory, creates a venv, installs deps, and runs the entrypoint
jl run . --script train.py --gpu A100 --requirements requirements.txt

# You can also run setup commands or a setup script before your main command
jl run . --script train.py --gpu A100 \
--requirements requirements.txt \
--setup "pip install flash-attn" \
--setup-file setup.sh

# The CLI streams logs by default. Once the run finishes, the instance is auto-paused.
# If you detached (Ctrl+C) or used --no-follow, you can check logs anytime:
jl run logs <run_id> --tail 50

# Check the final status of your run
jl run status <run_id>

Path 2: Manage instances yourself

If you want more control — SSH access, reusing machines across runs, attaching filesystems, or interactive debugging — create and manage instances directly.

# One-time setup
jl setup

# See available GPUs and pricing
jl gpus

# Create a GPU instance with 100 GB storage
jl instance create --gpu A100 --storage 100 --name "my-experiment"

# List your instances to get the machine ID
jl instance list

# SSH into your instance for interactive work
jl instance ssh <machine_id>

# Or upload and run a script on it
jl run train.py --on <machine_id>

# Check logs while the run is going
jl run logs <run_id> --tail 50

# Upload additional files to the instance
jl instance upload <machine_id> ./data /home/data

# Download results when you're done
jl instance download <machine_id> /home/results ./results -r

# Pause when you're done - stops compute billing, keeps your data
jl instance pause <machine_id>

# Later, resume with the same or a different GPU
jl instance resume <machine_id> --gpu RTX5000

# When you're completely done, destroy to stop all billing (including storage)
jl instance destroy <machine_id>

Authentication

Get your API token from jarvislabs.ai/settings/api-keys.

Interactive setup

jl setup

This authenticates, optionally installs agent skills, shows your account status, and displays a getting-started guide.

Non-interactive setup

jl setup --token YOUR_TOKEN --yes
tip

Without --yes, jl setup will still prompt for agent-skill installation even when --token is provided. Use --agents all or --yes to make setup fully non-interactive.

Environment variable

export JL_API_KEY="YOUR_TOKEN"

Token precedence

Both the CLI and SDK use the same resolution chain:

PriorityMethodUsed by
1Client(api_key="...") argumentSDK only
2JL_API_KEY environment variableCLI + SDK
3Config file (saved by jl setup)CLI + SDK

See Config file location below for config paths. See the SDK Authentication docs for more details.

Config file location

The config file is stored via platformdirs:

  • Linux: ~/.config/jl/config.toml
  • macOS: ~/Library/Application Support/jl/config.toml

Removing saved credentials

jl logout

Global Flags

These flags are available on most commands (exceptions noted below):

FlagDescription
--jsonOutput as machine-readable JSON (to stdout). Human-readable output goes to stderr.
--yes / -ySkip all confirmation prompts.
--versionPrint version and exit (root-level: jl --version).
info

--json and --yes are command-level options, not root-level — so jl instance list --json works correctly. Most commands support --json. --yes is only available on commands that have confirmation prompts (create, pause, resume, destroy, rename, run start, etc.). jl setup supports --yes but not --json. Read-only commands like jl gpus and jl run logs do not accept --yes.


Account Commands

jl setup

Set up the JarvisLabs CLI: authenticate and install agent skills.

OptionShortDescription
--token-tAPI token (skips interactive prompt)
--agentsComma-separated agent list: claude-code, codex, cursor, opencode, or all
--yes-ySkip confirmation prompts; auto-selects all agents
# Interactive setup
jl setup

# Non-interactive with token and all agent skills
jl setup --token YOUR_TOKEN --agents all --yes

# Install skills for specific agents only
jl setup --agents claude-code,cursor

If already authenticated, jl setup will show your current login and ask to re-authenticate. The --agents flag controls which coding agent skill files are installed:

AgentSkill file path
claude-code~/.claude/skills/jarvislabs/SKILL.md
codex~/.agents/skills/jarvislabs/SKILL.md
cursor~/.cursor/skills/jarvislabs/SKILL.md
opencode~/.config/opencode/skills/jarvislabs/SKILL.md

jl logout

Remove the saved API token from the config file. Supports --json for scripted usage.

jl logout

jl status

Show account info: name, user ID, balance, grants, and running/paused instance counts.

jl status
jl status --json
info

JSON output includes additional fields not shown in the human-readable table: running VMs, paused VMs, active deployments, filesystems, and billing currency.

jl gpus

Show GPU types with availability, region, VRAM, RAM, CPUs, and hourly pricing. Available GPUs are marked with a green dot, unavailable with a dim circle.

jl gpus
jl gpus --json

jl templates

List available framework templates that can be used with --template when creating instances (e.g. pytorch, tensorflow, jax, vm).

jl templates
jl templates --json

Regions & GPUs

JarvisLabs has three regions, each with different GPU types available. When creating an instance, the CLI auto-selects the best region based on your chosen GPU — or you can pin a specific region with --region.

RegionAvailable GPUs
IN1RTX5000, A5000Pro, A6000, RTX6000Ada, A100
IN2L4, A100, A100-80GB
EU1H100, H200

Run jl gpus to see real-time availability and pricing for each GPU type.

Storage & Template Constraints
  • EU1 region: supports 1 or 8 GPUs per instance only, 100 GB minimum storage (auto-bumped if you specify less)
  • VM template: 100 GB minimum storage (auto-bumped if you specify less)
  • VM template is only available in IN2 and EU1 regions, and requires at least one SSH key registered

Instance Commands

All instance commands live under jl instance. Here's how you can manage the full lifecycle of GPU instances — from creation to teardown.

jl instance list

List all your instances with their ID, name, status, GPU type, GPU count, storage, region, cost, and template.

jl instance list
jl instance list --json

jl instance get <machine_id>

Show full details of a specific instance including SSH command, notebook URL, HTTP ports, and endpoint URLs.

jl instance get 12345
jl instance get 12345 --json

jl instance create

Create a new GPU instance. The command blocks until the instance reaches Running status, so when it returns, your instance is ready to use.

OptionShortDefaultDescription
--gpu-g(required)GPU type (run jl gpus to see options)
--template-tpytorchFramework template (run jl templates to see options)
--storage-s40Storage in GB
--name-n"Name me"Instance name (max 40 characters)
--num-gpus1Number of GPUs
--regionRegion pin (e.g. IN1, IN2, EU1)
--http-portsComma-separated HTTP ports to expose (e.g. 7860,8080)
--script-idStartup script ID to run on launch
--script-argsArguments passed to the startup script
--fs-idFilesystem ID to attach
--yes-ySkip confirmation
--jsonOutput as JSON
# Basic instance
jl instance create --gpu RTX5000

# H100 with more storage and a name
jl instance create --gpu H100 --storage 200 --name "training-box"

# With a startup script and filesystem
jl instance create --gpu A100 --script-id 42 --fs-id 10

# Pin to a region
jl instance create --gpu A100 --region EU1

# Expose HTTP ports
jl instance create --gpu RTX5000 --http-ports "7860,8080"

# VM instance (requires SSH key - add one first with jl ssh-key add)
jl instance create --gpu H100 --template vm --name "my-vm"

# Non-interactive
jl instance create --gpu RTX5000 --yes --json

Prompts for confirmation unless --yes is passed. See Regions & GPUs for which GPUs are available in each region and storage constraints.

jl instance pause <machine_id>

Pause a running instance. Compute billing stops; a small storage cost continues.

OptionShortDescription
--yes-ySkip confirmation
--jsonOutput as JSON
jl instance pause 12345
jl instance pause 12345 --yes --json

jl instance resume <machine_id>

Resume a paused instance. You can also use this opportunity to change the GPU type, expand storage, rename the instance, or attach a different startup script or filesystem. The command blocks until the instance is running again.

OptionShortDescription
--gpu-gResume with a different GPU type
--num-gpusChange number of GPUs
--storage-sExpand storage in GB (can only increase, never shrink)
--name-nRename instance on resume
--script-idStartup script ID to run on resume
--script-argsArguments for the startup script
--fs-idFilesystem ID to attach
--yes-ySkip confirmation
--jsonOutput as JSON
# Resume with defaults
jl instance resume 12345

# Resume with a bigger GPU
jl instance resume 12345 --gpu H100

# Resume with more storage and a new name
jl instance resume 12345 --storage 200 --name "upgraded"
Region Lock & ID Changes

Resume is region-locked — an instance always resumes in its original region. If you request a GPU type not available in that region, the API returns an error.

Resume may also assign a new machine ID. The CLI warns you when this happens. Always use the returned ID for subsequent operations.

jl instance destroy <machine_id>

Permanently delete an instance and all its data.

warning

This action is irreversible. All data on the instance is lost. If you need to keep data across instances, use a filesystem.

OptionShortDescription
--yes-ySkip confirmation
--jsonOutput as JSON
jl instance destroy 12345
jl instance destroy 12345 --yes --json

jl instance rename <machine_id>

Rename an instance.

OptionShortDescription
--name-nNew instance name (required, max 40 characters)
--yes-ySkip confirmation
--jsonOutput as JSON
jl instance rename 12345 --name "experiment-v2"

SSH, Exec & File Transfer

These commands let you interact directly with running instances — open a shell, run commands remotely, or transfer files back and forth.

jl instance ssh <machine_id>

SSH into a running instance. This opens an interactive shell session.

OptionShortDescription
--print-command-pPrint the raw SSH command to stdout instead of connecting
--jsonOutput the SSH command as JSON
# Interactive session
jl instance ssh 12345

# Get the SSH command for use in scripts
jl instance ssh 12345 --print-command

The instance must be in Running status. If paused, you'll be told to resume it first.

tip

--print-command and --json output the stored SSH command regardless of instance status — useful for scripting and automation.

jl instance exec <machine_id> -- <command>

Run a command on a running instance and stream the output back to your terminal. The -- separator is required so jl can distinguish your remote command from its own flags.

OptionShortDescription
--jsonCapture output as JSON with stdout, stderr, and exit_code fields
# Check GPU
jl instance exec 12345 -- nvidia-smi

# Run Python
jl instance exec 12345 -- python -c "import torch; print(torch.cuda.device_count())"

# List files
jl instance exec 12345 -- ls -la /home

# Use shell features (pipes, redirection) - wrap in sh -lc
jl instance exec 12345 -- sh -lc 'grep "loss" /home/output.log | tail -5'

# Structured output for scripting
jl instance exec 12345 --json -- nvidia-smi

The exit code of the remote command is propagated as the exit code of jl instance exec.

tip

If your remote command uses pipes, redirection, or other shell features, wrap it in sh -lc '...' as shown above. Without the wrapper, each argument is treated as a separate command argument rather than shell syntax.

jl instance upload <machine_id> <source> [dest]

Upload a local file or directory to a running instance. If no remote destination is given, it uploads to the instance's home directory (/home/ for containers, /home/<user>/ for VMs).

Directories are uploaded recursively automatically.

OptionShortDescription
--jsonOutput upload result as JSON
# Upload a file (lands at /home/data.csv)
jl instance upload 12345 ./data.csv

# Upload a directory (lands at /home/my-project/)
jl instance upload 12345 ./my-project

# Upload to a specific remote path
jl instance upload 12345 ./config.yaml /home/config.yaml

jl instance download <machine_id> <source> [dest] [-r]

Download a file or directory from a running instance. If no local destination is given, it saves to ./<filename> in the current directory.

OptionShortDescription
--recursive-rDownload directories recursively
--jsonOutput download result as JSON
# Download a file (saves to ./results.csv)
jl instance download 12345 /home/results.csv

# Download to a specific local path
jl instance download 12345 /home/results.csv ./my-results.csv

# Download a directory
jl instance download 12345 /home/outputs ./local-outputs -r

Managed Runs

Managed runs are the fastest way to run scripts on GPU instances. A single jl run command handles uploading your code, setting up a Python virtual environment (via uv), installing requirements (when specified with --requirements), running your command in the background, and tracking logs.

Runs persist in the background even if you disconnect or close your terminal. Logs, status, and lifecycle are tracked locally in ~/.jl/runs/.

Run Targets

TargetWhat happens
train.pyUploads the single file, runs python3 train.py
run.shUploads the single file, runs bash run.sh
. or ./my-projectSyncs the directory via rsync, runs --script inside it (requires rsync installed locally)
(no target)Runs the command given after -- directly on the instance

Only .py and .sh file targets are supported directly. For other file types, use a directory target or jl instance upload + jl instance exec.

Starting a Run on an Existing Instance

jl run <target> --on <machine_id> [options] [-- extra args]
# Run a Python file
jl run train.py --on 12345

# Upload a directory and run a script inside it
jl run . --script train.py --on 12345

# Pass arguments to your script
jl run train.py --on 12345 -- --epochs 50 --lr 0.001

# Run an arbitrary remote command (no upload)
jl run --on 12345 -- python -c "print('hello from GPU')"

Starting a Run on a Fresh Instance

jl run <target> --gpu <gpu_type> [options] [-- extra args]

This creates a new instance, uploads your code, runs the command, and handles instance lifecycle when done.

# Run on a fresh RTX5000
jl run train.py --gpu RTX5000

# With requirements
jl run . --script train.py --gpu A100 --requirements requirements.txt

# Destroy instance after run (no leftover costs)
jl run train.py --gpu RTX5000 --destroy

# Keep instance running after run (for debugging)
jl run train.py --gpu RTX5000 --keep

You must use either --on or --gpu, not both.

All Start Options

OptionShortDefaultDescription
--onRun on an existing instance (machine ID)
--gpu-gCreate a fresh instance with this GPU type
--scriptEntrypoint script path inside a directory target
--template-tpytorchFramework template (fresh instances only)
--storage-s40Storage in GB (fresh instances only)
--name-njl-runInstance name (fresh instances only)
--num-gpus1Number of GPUs (fresh instances only)
--regionRegion pin, e.g. IN1, EU1 (fresh instances only)
--http-portsComma-separated HTTP ports to expose (fresh instances only)
--requirementsLocal requirements file to upload and install
--setupShell command to run before the main command
--setup-fileLocal bash file to upload and run before the main command
--follow / --no-follow--followStream logs after starting the run
--pausePause fresh instance after the run (default for fresh)
--destroyDestroy fresh instance after the run
--keepLeave fresh instance running after the run
--yes-ySkip confirmation prompts
--jsonOutput as JSON

Setup Chain

For file and directory targets, the following setup steps run before your main command (chained with &&):

  1. uv installed if missing (via curl)
  2. .venv created if missing (via uv venv)
  3. .venv activated (via . .venv/bin/activate)
  4. Requirements installed if --requirements is provided (via uv pip install -r <file>)
  5. Setup file run if --setup-file is provided (via bash <file>)
  6. Setup command run if --setup is provided (the raw shell command)
  7. Main script runs

All steps are chained with &&, so if any step fails, subsequent steps (including your main command) will not run.

For command-mode runs (no target), only the --setup command is prepended — --requirements and --setup-file are not available.

# Full setup chain example
jl run . --script train.py --on 12345 \
--requirements requirements.txt \
--setup-file setup.sh \
--setup "pip install flash-attn"

Lifecycle Flags (Fresh Instances Only)

When creating a fresh instance with --gpu, these flags control what happens after the run completes:

FlagBehavior
--pausePause the instance after the run (default for fresh instances)
--destroyDestroy the instance — no leftover costs
--keepLeave the instance running (for debugging or follow-up work)

Only one lifecycle flag can be used at a time. These flags cannot be used with --on (existing instances are not touched after the run).

Detaching from fresh instances

--no-follow for fresh instances requires --keep. Since --pause and --destroy need the CLI to be connected when the run ends to perform the lifecycle action, they are incompatible with --no-follow. If you detach (Ctrl+C or --no-follow), the automatic lifecycle action will not happen — the instance stays running and billing continues. Manage it manually with jl instance pause or jl instance destroy.

Follow vs No-Follow

By default, jl run streams logs after starting (--follow). Press Ctrl+C to detach — the run keeps going in the background. Without --tail, --follow initially shows the last 20 lines before streaming new output.

# Default: stream logs, auto-pause when done
jl run train.py --gpu RTX5000

# Detached: start and return immediately (requires --keep for fresh instances)
jl run train.py --gpu RTX5000 --keep --no-follow

# Detached on existing instance (no lifecycle flag needed)
jl run train.py --on 12345 --no-follow

jl run logs <run_id>

View logs from a managed run.

OptionShortDescription
--follow-fStream logs in real time (press Ctrl+C to stop)
--tail-nShow only the last N lines (minimum: 1)
--jsonOutput as JSON with content and run_exit_code fields
# Full log output
jl run logs r_abc123

# Last 50 lines
jl run logs r_abc123 --tail 50

# Stream logs live
jl run logs r_abc123 --follow

# Stream with initial context
jl run logs r_abc123 --follow --tail 100

# JSON output with exit code (for scripting/agents)
jl run logs r_abc123 --tail 50 --json

JSON output fields:

FieldDescription
run_idThe run identifier
machine_idInstance the run is on
remote_logPath to the log file on the remote instance
contentThe log text (last N lines if --tail used, full log otherwise)
run_exit_codenull = still running, 0 = succeeded, non-zero = failed
info

--json is not supported with --follow. Without --tail, the entire log file is returned — this can be very large for long-running jobs.

Non-JSON output shows raw logs with a header and footer indicating run state:

--- run r_abc123 | machine 12345 | running ---

step=100 loss=2.31
step=200 loss=2.11

--- still running | log: /home/jl-runs/r_abc123/output.log ---

jl run status <run_id>

Show the current state of a run.

OptionShortDescription
--jsonOutput as JSON

Possible states: running, succeeded, failed, instance-paused, instance-pausing, instance-missing, instance-creating, instance-resuming, instance-destroying, instance-failed, unknown.

jl run status r_abc123
jl run status r_abc123 --json

jl run stop <run_id>

Stop a managed run by sending TERM to its process group. The instance itself is not affected.

OptionShortDescription
--jsonOutput as JSON
jl run stop r_abc123
jl run stop r_abc123 --json

If the process doesn't exit after TERM, it escalates to SIGKILL. If the run has already finished, it reports the final state without error.

jl run list

List all locally tracked managed runs (most recent first).

OptionShortDescription
--refreshCheck live status for each run by querying the instance (slower)
--machine-mFilter by instance ID
--limit-lShow only the N most recent runs
--status-sFilter by state (e.g. running, succeeded, failed). Implies --refresh.
--jsonOutput as JSON
# All runs (shows "saved" state without live check)
jl run list

# With live status refresh
jl run list --refresh

# Filter by instance
jl run list --machine 12345

# Most recent 5 runs
jl run list --limit 5

# Only running jobs
jl run list --status running

# For scripting
jl run list --refresh --json

Without --refresh, the state column shows saved (from the local record). Use --refresh or --status to query each instance for live state. Using --status automatically implies --refresh.

Implicit start Subcommand

jl run <target> is shorthand for jl run start <target>. The start subcommand is implied when the first argument isn't a known subcommand (list, status, logs, stop).

# These are equivalent:
jl run train.py --gpu RTX5000
jl run start train.py --gpu RTX5000

Run Tracking is Local

info

All run management commands (jl run logs, jl run status, jl run stop, jl run list) depend on local records stored under ~/.jl/runs/. You need to start and monitor runs from the same machine. If the local record is missing, the run_id alone is not enough to interact with the run.

Each run record is a JSON file at ~/.jl/runs/<run_id>.json containing the machine ID, remote log path, PID file path, exit code path, and launch command.


SSH Key Commands

SSH keys are required if you want to use the VM template (bare-metal SSH access without a pre-configured container). You can manage your keys with jl ssh-key.

jl ssh-key list

List all SSH keys (ID, name, and truncated key).

jl ssh-key list
jl ssh-key list --json

jl ssh-key add <pubkey_file>

Add an SSH public key.

OptionShortDescription
--name-nName for this key (required)
--jsonOutput as JSON
jl ssh-key add ~/.ssh/id_ed25519.pub --name "my-laptop"

jl ssh-key remove <key_id>

Remove an SSH key.

OptionShortDescription
--yes-ySkip confirmation
--jsonOutput as JSON
jl ssh-key remove abc123

Startup Script Commands

Startup scripts are shell scripts that run automatically whenever an instance launches or resumes — useful for installing dependencies, pulling data, or setting up your environment. You can manage them with jl scripts.

jl scripts list

List startup scripts (ID and name).

jl scripts list
jl scripts list --json

jl scripts add <script_file>

Add a startup script.

OptionShortDescription
--name-nScript name (defaults to filename without extension)
--jsonOutput as JSON
jl scripts add ./setup.sh --name "install-deps"

jl scripts update <script_id> <script_file>

Replace the contents of an existing startup script.

OptionShortDescription
--jsonOutput as JSON
jl scripts update 42 ./setup-v2.sh

jl scripts remove <script_id>

Remove a startup script.

OptionShortDescription
--yes-ySkip confirmation
--jsonOutput as JSON
jl scripts remove 42

Filesystem Commands

Filesystems are persistent storage volumes that survive instance pause, resume, and even destroy cycles. They're ideal for datasets, model checkpoints, or any data you want to reuse across multiple instances. You can manage them with jl filesystem.

jl filesystem list

List filesystems (ID, name, storage).

jl filesystem list
jl filesystem list --json

jl filesystem create

Create a new filesystem.

OptionShortDescription
--name-nFilesystem name (required, max 30 characters)
--storage-sStorage in GB (required, 50–2048)
--yes-ySkip confirmation
--jsonOutput as JSON
jl filesystem create --name "datasets" --storage 200

jl filesystem edit <fs_id>

Expand filesystem storage. Can only increase, never shrink.

OptionShortDescription
--storage-sNew storage size in GB (required)
--yes-ySkip confirmation
--jsonOutput as JSON
jl filesystem edit 10 --storage 500
info

edit may return a new filesystem ID. Always use the returned value for subsequent operations.

jl filesystem remove <fs_id>

Delete a filesystem.

OptionShortDescription
--yes-ySkip confirmation
--jsonOutput as JSON
jl filesystem remove 10

JSON Mode for Scripting

Most commands support --json for machine-readable output. JSON goes to stdout; human-readable status messages go to stderr.

# Instance list as JSON
jl instance list --json

# Create and capture the machine ID
RESULT=$(jl instance create --gpu RTX5000 --yes --json)
MACHINE_ID=$(echo "$RESULT" | jq .machine_id)

# GPU availability pipeline
jl gpus --json | jq '.[] | select(.num_free_devices > 0) | .gpu_type'

# Run status in scripts
jl run status r_abc123 --json | jq .state

# Check if a run is still going
EXIT_CODE=$(jl run logs r_abc123 --tail 1 --json | jq .run_exit_code)

When --json is active:

  • Spinners and progress indicators are suppressed
  • Errors from jl itself (bad arguments, auth failures, etc.) are emitted as {"error": "..."} to stdout. Commands like jl instance exec --json return their own structured payload (with exit_code, stdout, stderr) even on non-zero exit
  • Exit codes are still set appropriately
  • For jl run start, --json returns immediately after the run is started (before log streaming), so lifecycle flags (--pause, --destroy) will not execute — use --keep when combining --json with fresh instances
tip

--json does not suppress confirmation prompts. Always use --yes alongside --json in scripts and agent workflows.


Shell Completion

Enable tab completion for your shell:

jl --install-completion

Supports bash, zsh, and fish.


Using with AI Coding Agents

One of the primary use cases for the jl CLI is letting AI coding agents manage GPU infrastructure on your behalf. Instead of manually creating instances, uploading code, and monitoring runs, you can let your agent handle the entire workflow — from provisioning a GPU to downloading results.

The CLI supports four major coding agents: Claude Code, Codex, Cursor, and OpenCode. During jl setup, you'll be asked which agents you use, and skill files are installed automatically to teach your agent how to use jl effectively.

Agent Setup

# Interactive: authenticates and asks which agents to install skills for
jl setup

# Non-interactive: installs skills for all supported agents
jl setup --token YOUR_TOKEN --agents all --yes

# Install skills for specific agents only
jl setup --agents claude-code,cursor
tip

Once skills are installed, your coding agent already knows how to use jl. Try asking it: "Spin up an A100, run my training script, and download the results when it's done."

Mental Model

ConceptCLIPurpose
Instancejl instanceA machine — create, pause, resume, destroy, SSH into
Runjl runA managed job with log file + PID tracking
Execjl instance execQuick one-off commands for system checks and debugging

Core Rules for Agent Workflows

  1. Always use --yes on commands with confirmation prompts (create, pause, resume, destroy, run start) — agents can't answer interactive prompts
  2. Use --json for structured data — use it on commands where the agent needs to parse output (create, gpus, run start, instance list). For jl run logs, the default output is designed for agents — the header/footer shows run ID, machine ID, and state in a readable format
  3. Always use --no-follow when starting runs--follow blocks the agent indefinitely
  4. Always use --tail N when reading logs — full logs can be enormous
  5. Do an early failure check — wait 15s after starting a run and check logs once. This catches fast failures (import errors, missing files, pip issues) before committing to a long polling loop
  6. Then poll at steady intervals — 60-120s for short jobs, 180-600s for long training runs

The Agent Monitoring Loop

This is the primary pattern for running and monitoring GPU jobs:

# 1. Start a detached run
jl run train.py --on <machine_id> --no-follow --yes --json
# returns {"run_id": "r_abc123", ...}

# 2. Early failure check - catches import errors, bad paths, pip failures fast
sleep 15 && jl run logs r_abc123 --tail 30

# 3. If still running, poll at steady intervals
sleep 120 && jl run logs r_abc123 --tail 50

# The log output shows a header and footer with run state:
# --- run r_abc123 | machine 12345 | running ---
# <log output>
# --- still running | log: /home/jl-runs/r_abc123/output.log ---
#
# When done:
# --- run r_abc123 | machine 12345 | succeeded (exit 0) ---
# <log output>
# --- succeeded | exit code: 0 | log: /home/jl-runs/r_abc123/output.log ---
#
# On failure:
# --- run r_abc123 | machine 12345 | failed (exit 1) ---
# <log output>
# --- failed | exit code: 1 | log: /home/jl-runs/r_abc123/output.log ---

The log output is the primary monitoring primitive — the header gives you the run ID and machine ID, and the footer tells you whether the run is still going or finished (with exit code).

Agent Workflow Example (End-to-End)

# 1. Check GPU availability
jl gpus --json

# 2. Create an instance
jl instance create --gpu RTX5000 --storage 50 --yes --json
# returns {"machine_id": 12345, ...}

# 3. Start a detached run
jl run . --script train.py --on 12345 --requirements requirements.txt --no-follow --yes --json
# returns {"run_id": "r_abc123", ...}

# 4. Early failure check - catches crashes fast
sleep 15 && jl run logs r_abc123 --tail 30

# 5. If still running, poll at steady intervals (repeat until footer shows exit code)
sleep 120 && jl run logs r_abc123 --tail 50

# 6. Download results
jl instance download 12345 /home/results ./results -r

# 7. Clean up
jl instance pause 12345 --yes --json

Starting Runs on Fresh Instances (Agent Mode)

When the agent needs to create a fresh instance inline:

jl run . --script train.py --gpu RTX5000 --no-follow --keep --json --yes

Key points:

  • --keep is required with --no-follow for fresh instances (the CLI will error without it)
  • The agent must manually pause or destroy the instance after the run completes
  • Additional fresh-instance flags: --template, --storage, --num-gpus, --region, --http-ports

Use separate jl instance create when you need to inspect GPU availability first, reuse machines across runs, or attach filesystems/scripts beforehand.

Quick System Checks with Exec

jl instance exec <id> --json -- nvidia-smi
jl instance exec <id> --json -- ps -ef
jl instance exec <id> --json -- df -h

For pipes or shell syntax, wrap in sh -lc:

jl instance exec <id> --json -- sh -lc 'grep "loss" /path/to/log | tail -5'
Skill files handle this for you

All of the patterns above — the monitoring loop, early failure checks, polling intervals, --no-follow, --tail, and more — are included in the skill files that jl setup installs for your agent. Once skills are installed, your agent already knows how to use jl correctly. You don't need to teach it these patterns yourself.

File Persistence Rules

The remote home directory (typically /home/ on containers, /home/<user>/ on VMs) persists across pause/resume cycles. Everything else is ephemeral.

Persists across pause/resume:

  • Files in the home directory (/home/ or /home/<user>/)
  • Uploaded directories: <home>/<directory_name>/
  • Uploaded files (via jl instance upload): <home>/<filename>
  • Run metadata: <home>/jl-runs/<run_id>/
  • .venv created inside the project directory
  • Attached filesystems

Lost on pause:

  • System-level installs (apt-get, global pip packages)
  • Files outside the home directory (/tmp, /root, etc.)

Use --setup, --requirements, or --setup-file to reinstall dependencies on each run.

Anti-Patterns

Don'tWhy
Use --follow when starting runsBlocks the agent indefinitely; will timeout
Omit --no-follow when starting runsDefault is --follow, which blocks
Use jl run logs --followBlocks forever; --json is also incompatible with --follow
Read full logs (omit --tail N)Can return megabytes of output, overwhelming context
Poll every few secondsWasteful and noisy; use 60–600s intervals
Use lifecycle flags with --on--keep, --pause, --destroy only apply to fresh instances
Forget to pause/destroy instancesThey cost money while running

Examples

Train on a fresh GPU, auto-pause when done

The simplest workflow — run a training script on a fresh GPU with dependencies. The instance is automatically paused when the script finishes, so you only pay for compute time.

jl run train.py --gpu RTX5000 --requirements requirements.txt -- --epochs 100
# Instance created > code uploaded > deps installed > training runs > instance paused

Run a project directory with setup

When your project has multiple files, sync the entire directory and specify the entrypoint with --script. The CLI uses rsync under the hood, so only changed files are transferred on subsequent runs — making re-runs on the same instance fast even with large projects. You can also run custom setup commands before training starts.

jl run . --script train.py --gpu A100 \
--requirements requirements.txt \
--setup "pip install flash-attn" \
-- --batch-size 32 --lr 1e-4

Multi-GPU training

For large-scale training, you can request multiple GPUs on a single instance. Check Regions & GPUs for available GPU counts per region.

# 8x H100 in EU1 for distributed training
jl instance create --gpu H100 --num-gpus 8 --region EU1 --storage 500 --name "distributed-training"

# Upload your project and run with torchrun for multi-GPU
jl run . --script train.py --on <machine_id> \
--requirements requirements.txt \
--setup "pip install flash-attn" \
-- --num_gpus 8

Long-running job with manual control

For jobs where you want full control — create an instance, start a detached run, monitor at your own pace, and clean up when done.

# Create an instance
jl instance create --gpu A100 --storage 200 --name "research"

# Sync project and start a background run (--no-follow detaches from logs)
jl run ./my-project --script train.py --on <machine_id> --no-follow

# Monitor later
jl run status <run_id>
jl run logs <run_id> --tail 100
jl run logs <run_id> --follow

# Pause when done
jl instance pause <machine_id>

Detached run on existing instance

Start a run and come back to check on it later — the run continues in the background even if you close your terminal.

# Start without following
jl run train.py --on <machine_id> --no-follow

# Check on it later
jl run logs <run_id> --tail 50

# Stop it if needed
jl run stop <run_id>

Persistent data with filesystems

Filesystems let you keep datasets and model checkpoints across instances. Create a filesystem once, attach it to any instance, and your data is always available — even after destroying the instance.

# Create a filesystem for datasets
jl filesystem create --name "datasets" --storage 500

# Create an instance with the filesystem attached
jl instance create --gpu A100 --fs-id <fs_id> --name "training"

# Run your training - the filesystem is attached and accessible on the instance
jl run train.py --on <machine_id>

# Done with training? Destroy the instance - data is safe in the filesystem
jl instance destroy <machine_id>

# Spin up a cheaper GPU for inference, same data
jl instance create --gpu RTX5000 --fs-id <fs_id> --name "inference"

VM workflow (bare metal SSH access)

VM instances give you a clean Linux machine with SSH access instead of a pre-configured container. You'll need to register an SSH key first.

# Add your SSH key
jl ssh-key add ~/.ssh/id_ed25519.pub --name "my-key"

# Create a VM instance (available in IN2 and EU1 only)
jl instance create --gpu H100 --template vm --name "my-vm"

# SSH in
jl instance ssh <machine_id>

Scripting with JSON and jq

Most commands support --json output (except jl setup), making it easy to build automation pipelines with jq.

# Get IDs of all running instances
jl instance list --json | jq '[.[] | select(.status == "Running") | .machine_id]'

# Find cheapest available GPU
jl gpus --json | jq '[.[] | select(.num_free_devices > 0)] | sort_by(.price_per_hour) | .[0].gpu_type'

# Pause all running instances
for id in $(jl instance list --json | jq -r '.[] | select(.status == "Running") | .machine_id'); do
jl instance pause "$id" --yes --json
done

# Check if a run is still going
jl run logs <run_id> --tail 1 --json | jq .run_exit_code

Autonomous research with coding agents

One of the most powerful patterns is letting a coding agent drive the entire research loop autonomously. Andrej Karpathy's autoresearch is a great example of this — an AI agent autonomously edits training code, runs experiments, checks metrics, and iterates, accumulating only improvements. In Karpathy's own run, the agent evaluated ~700 experimental changes over 2 days, found ~20 additive improvements, and achieved an 11% reduction in Time-to-GPT-2.

The core loop works like this:

  1. Agent modifies train.py with an experimental idea and commits the change
  2. Agent runs the experiment on a GPU (via jl run)
  3. Agent reads the results from logs (via jl run logs) and extracts the target metric
  4. Agent logs the result — appends the commit hash, metric value, and a description to a results.tsv file so every experiment (successes and failures) is tracked
  5. If metrics improved — keep the commit, the branch advances
  6. If metrics got worse or it crashedgit reset to revert, try a different idea

The key insight is that the git branch only contains improvements (each commit is guaranteed better than the last), while results.tsv records the full history of all experiments including dead ends. This gives you a clean chain of improvements you can review, plus a complete log for analysis.

This pattern works for any ML problem — not just GPT training. You can apply it to hyperparameter sweeps, architecture search, data augmentation experiments, or any iterative research workflow.

Here's how to replicate this with jl:

# 1. Create a dedicated instance for experiments
jl instance create --gpu A100 --storage 200 --name "auto-research" --yes

# 2. Create a branch for this research session
git checkout -b autoresearch/session-1

# 3. Run baseline to establish initial metric
jl run . --script train.py --on <machine_id> \
--requirements requirements.txt --no-follow --yes

# 4. Wait for it, then check results
sleep 15 && jl run logs <run_id> --tail 50

# The agent then loops autonomously:

# 5. Edit train.py with an idea, commit, and run
jl run . --script train.py --on <machine_id> \
--requirements requirements.txt --no-follow --yes

# 6. Check results
sleep 15 && jl run logs <run_id> --tail 30
# ... then steady polling
sleep 120 && jl run logs <run_id> --tail 50

# 7. Extract metric from logs and append to results.tsv
# Format: commit | val_metric | memory_gb | status | description
# e.g.: a1b2c3d | 1.432 | 12.5 | keep | increased hidden dim to 512

# 8. If improved: keep the commit, loop back to step 5
# If worse: git reset to revert, loop back to step 5
# If crashed: log as crash, fix or try something else

# 9. When done, pause the instance
jl instance pause <machine_id>

With 5-minute experiments, the agent can run ~12 experiments per hour — roughly 100 experiments in an overnight session. Check results.tsv and git log the next morning to see what your agent discovered.

tip

To get started, install agent skills with jl setup --agents all, then ask your agent something like: "Run a hyperparameter sweep comparing learning rates 1e-3, 1e-4, and 1e-5 on an A100 using my training script." The agent will handle the rest.