Running NemoClaw on JarvisLabs with Your Own Model (No NVIDIA API Key Required)
NemoClaw is NVIDIA's open-source framework for running sandboxed AI agents securely. By default, it routes inference through NVIDIA's cloud API — but you can run it entirely self-hosted on a JarvisLabs GPU instance with any open-source model, no NVIDIA API key needed.
This tutorial walks through setting up NemoClaw on a JarvisLabs VM with Qwen 2.5 7B served via vLLM.
What You'll Get
- A sandboxed AI agent running on your own GPU
- Local LLM inference via vLLM (no API costs, no data leaving your machine)
- Landlock + seccomp + network namespace isolation for security
- Full control over which model powers the agent
Prerequisites
- A JarvisLabs account with the
jlCLI installed and authenticated - An SSH key registered with JarvisLabs (
jl ssh-key listto verify)
Architecture Overview
┌─────────────────────────────────────────────────────┐
│ JarvisLabs VM (A100 80GB) │
│ │
│ ┌──────────────┐ ┌────────────────────────────┐ │
│ │ vLLM │ │ OpenShell Gateway (k3s) │ │
│ │ Qwen 2.5 │◄───│ │ │
│ │ Port 8000 │ │ ┌──────────────────────┐ │ │
│ └──────────────┘ │ │ NemoClaw Sandbox │ │ │
│ │ │ (Landlock + seccomp) │ │ │
│ │ │ │ │ │
│ │ │ AI Agent (OpenClaw) │ │ │
│ │ └──────────────────────┘ │ │
│ └────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
The AI agent runs inside a sandboxed container managed by OpenShell. When the agent needs to think, it calls out to vLLM running on the same machine — inference never leaves the VM.
Step 1: Launch a JarvisLabs VM
Check GPU availability and launch a VM with an A100 80GB:
# Check what's available
jl gpus --json
# Create a VM (not a container — NemoClaw needs Docker inside the instance)
jl create --gpu A100-80GB --vm --storage 100 --region IN2 --yes --json
Note the machine_id and ssh_command from the output. The VM template gives you full root access with Docker pre-installed.
Why a VM? NemoClaw runs Docker containers (OpenShell gateway, sandbox). JarvisLabs VMs support Docker-in-VM, while container instances don't support Docker-in-Docker well for this use case.
# Rename for easy identification
jl rename <machine_id> --name "nemoclaw-tutorial" --yes --json
Wait about 30 seconds for SSH to become available, then verify:
jl exec <machine_id> -- nvidia-smi
You should see your A100 80GB GPU.
Step 2: Check What's Pre-Installed
JarvisLabs VMs come with most of what we need:
jl exec <machine_id> -- sh -lc '
echo "=== Docker ===" && docker --version
echo "=== NVIDIA Container Toolkit ===" && dpkg -l | grep nvidia-container
echo "=== Python ===" && python3 --version
echo "=== OS ===" && cat /etc/os-release | head -2
'
What's already there:
- Docker v29+
- NVIDIA Container Toolkit v1.18+
- Python 3.10
- Ubuntu 22.04 LTS
What we need to add:
- Node.js 22 (NemoClaw requirement)
- Docker group access for the
ubuntuuser - A cgroup configuration fix
Step 3: Install Node.js and Fix Permissions
# Install Node.js 22
jl exec <machine_id> -- sh -lc '
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash - \
&& sudo apt-get install -y nodejs \
&& node --version \
&& npm --version
'
# Add ubuntu user to docker group (avoids needing sudo for docker)
jl exec <machine_id> -- sh -lc '
sudo usermod -aG docker ubuntu
'
Step 4: Fix cgroup Configuration
NemoClaw's OpenShell gateway runs k3s inside Docker, which requires cgroupns=host:
# Check current Docker daemon config
jl exec <machine_id> -- sh -lc 'sudo cat /etc/docker/daemon.json'
Add the cgroup setting and restart Docker:
jl exec <machine_id> -- sh -lc '
sudo python3 -c "
import json
with open(\"/etc/docker/daemon.json\") as f:
cfg = json.load(f)
cfg[\"default-cgroupns-mode\"] = \"host\"
with open(\"/etc/docker/daemon.json\", \"w\") as f:
json.dump(cfg, f, indent=4)
" && sudo systemctl restart docker
'
Your /etc/docker/daemon.json should now look like:
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
},
"default-cgroupns-mode": "host"
}
Step 5: Start vLLM with an Open-Source Model
Launch vLLM serving Qwen 2.5 7B Instruct. This model doesn't require a HuggingFace token, uses only ~15 GB of VRAM, and performs well for agent tasks:
jl exec <machine_id> -- sh -lc '
sg docker -c "docker run -d \
--gpus all \
--name vllm \
-p 8000:8000 \
--shm-size 16g \
vllm/vllm-openai:latest \
--model Qwen/Qwen2.5-7B-Instruct"
'
Wait about 60 seconds for the model to download and load, then verify:
# Check vLLM logs (look for "Application startup complete")
jl exec <machine_id> -- sh -lc '
sg docker -c "docker logs vllm 2>&1 | tail -5"
'
# Test inference
jl exec <machine_id> -- sh -lc '
curl -s http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d "{
\"model\": \"Qwen/Qwen2.5-7B-Instruct\",
\"messages\": [{\"role\": \"user\", \"content\": \"Hello, who are you?\"}],
\"max_tokens\": 50
}"
'
You should get a JSON response with the model's reply.
Alternative Models
You can swap Qwen/Qwen2.5-7B-Instruct for any HuggingFace model that fits your GPU. Options for A100 80GB:
| Model | VRAM | Notes |
|---|---|---|
Qwen/Qwen2.5-7B-Instruct | ~15 GB | Great balance of speed and quality |
Qwen/Qwen2.5-32B-Instruct | ~40 GB | Stronger reasoning |
meta-llama/Llama-3.1-8B-Instruct | ~16 GB | Requires HF token |
mistralai/Mistral-Small-24B-Instruct-2501 | ~30 GB | Strong for its size |
Step 6: Install NemoClaw
Clone from GitHub, build the CLI, and install globally:
jl exec <machine_id> -- sh -lc '
cd /home/ubuntu \
&& git clone https://github.com/NVIDIA/NemoClaw.git \
&& cd NemoClaw \
&& npm install typescript \
&& npx tsc -p tsconfig.src.json \
&& cd nemoclaw \
&& npm install --ignore-scripts \
&& ./node_modules/.bin/tsc \
&& cd .. \
&& sudo npm install -g .
'
Note: NemoClaw requires two TypeScript build steps — one at the repo root (
tsconfig.src.json) and one in thenemoclaw/subdirectory. Both must complete before the CLI will work.
Verify the installation:
jl exec <machine_id> -- sh -lc 'nemoclaw help'
Step 7: Run NemoClaw Onboarding
The key trick: setting NEMOCLAW_EXPERIMENTAL=1 makes NemoClaw auto-detect your running vLLM instance and skip the NVIDIA API key requirement.
Since the onboarding wizard is interactive, SSH into the VM directly:
jl ssh <machine_id>
Then run:
NEMOCLAW_EXPERIMENTAL=1 nemoclaw onboard
The wizard will:
- Preflight checks — verifies Docker, OpenShell, GPU, cgroups
- Start gateway — deploys OpenShell (k3s cluster inside Docker)
- Create sandbox — builds and launches the agent container (takes a few minutes on first run)
- Configure inference — auto-detects vLLM on port 8000, selects it as the provider
- Set up inference route — configures OpenShell to route LLM calls to local vLLM
- OpenClaw setup — launches the agent framework inside the sandbox
- Policy presets — apply security policies (pypi, npm suggested by default)
When prompted:
- Sandbox name: press Enter for default (
my-assistant) - Policy presets: press Enter to accept suggestions
At the end you'll see a dashboard confirming the setup:
──────────────────────────────────────────────────
Sandbox my-assistant (Landlock + seccomp + netns)
Model vllm-local (Local vLLM)
NIM not running
──────────────────────────────────────────────────
Run: nemoclaw my-assistant connect
Status: nemoclaw my-assistant status
Logs: nemoclaw my-assistant logs --follow
──────────────────────────────────────────────────
Step 8: Verify Everything Works
# Check sandbox status
nemoclaw my-assistant status
# Check running containers
docker ps
You should see two containers:
openshell-cluster-nemoclaw— the OpenShell gatewayvllm— your local LLM server
Step 9: Connect to the Agent
nemoclaw my-assistant connect
This drops you into the sandboxed agent environment where OpenClaw is running with your local model.
Useful Commands
# View sandbox status
nemoclaw my-assistant status
# View logs
nemoclaw my-assistant logs --follow
# List all sandboxes
nemoclaw list
# Add policy presets (e.g., allow PyPI, npm, GitHub access)
nemoclaw my-assistant policy-add
# List available policy presets
nemoclaw my-assistant policy-list
# Stop everything
nemoclaw stop
# Destroy sandbox
nemoclaw my-assistant destroy
Cost Management
The JarvisLabs VM costs $1.49/hr for A100 80GB. To save money:
# Pause when not using (keeps storage, stops billing for GPU)
jl pause <machine_id> --yes --json
# Resume when needed
jl resume <machine_id> --yes --json
# Destroy when done (deletes everything)
jl destroy <machine_id> --yes --json
When you resume, you'll need to:
- Restart vLLM:
docker start vllm - Restart NemoClaw:
nemoclaw startor re-run the onboard
Troubleshooting
"cgroup v2 detected but Docker is not configured"
Run the cgroup fix from Step 4. This is required for OpenShell's k3s to work.
vLLM container won't start after VM resume
Docker restart kills containers. Restart it:
docker start vllm
# Wait 60 seconds for model to load
docker logs vllm 2>&1 | tail -5
NemoClaw onboarding asks for NVIDIA API key
Make sure you set NEMOCLAW_EXPERIMENTAL=1 before running nemoclaw onboard, and that vLLM is running and healthy on port 8000:
curl -s http://localhost:8000/v1/models
Sandbox creation fails with permission errors
The installer script may try to npm install -g NemoClaw again. If you see EACCES errors, they're non-fatal — the tool is already installed from Step 6.
Want a different model?
Stop vLLM, remove the container, and start a new one with a different model:
docker stop vllm && docker rm vllm
docker run -d --gpus all --name vllm -p 8000:8000 --shm-size 16g \
vllm/vllm-openai:latest --model <new-model-name>
Then re-run NEMOCLAW_EXPERIMENTAL=1 nemoclaw onboard (it will detect the new model).
What's Next
- Automate this — Create a JarvisLabs startup script that runs Steps 3-7 automatically
- Try larger models — Swap in Qwen 2.5 32B or 72B for better agent reasoning
- Add integrations — Configure Telegram or Slack bridges for remote agent access
- Multi-GPU — Use 2x A100 80GB for models like Llama 3.3 70B with tensor parallelism
Summary
| What | Details |
|---|---|
| Platform | JarvisLabs VM |
| GPU | A100 80GB ($1.49/hr) |
| Model | Qwen 2.5 7B Instruct (any HF model works) |
| Inference | vLLM (OpenAI-compatible API) |
| Agent Runtime | NemoClaw + OpenShell + OpenClaw |
| NVIDIA API Key | Not required |
| HuggingFace Token | Not required (model-dependent) |
| Security | Landlock + seccomp + network namespace |
| Total Setup Time | ~10 minutes |