How to Deploy NVIDIA NemoClaw on JarvisLabs

NVIDIA NemoClaw is an open-source security stack that sandboxes autonomous AI agents at the kernel level. It wraps OpenClaw (an always-on AI assistant) inside the NVIDIA OpenShell runtime, enforcing filesystem isolation, network policies, and process restrictions so your agent can't escape its sandbox.
This guide shows the setup we used to run NemoClaw with local Nemotron 3 Nano 30B inference on a JarvisLabs A100 VM. No cloud API keys needed. Total setup time was under 15 minutes, inference runs at ~1 second per query, and the whole thing cost $1.13.
NemoClaw is in early preview (released March 16, 2026). APIs and runtime behavior may have breaking changes. Do not use this in production.
Tested Environment
| GPU | A100 40GB VRAM, 110GB system RAM |
| Instance type | VM (full disk persistence) |
| Model | Nemotron 3 Nano 30B via Ollama |
| OS | Ubuntu 22.04, CUDA 12.8, Docker 29.2 |
| NemoClaw | v0.1.0 (early preview) |
| Cold start to first query | ~12 minutes (includes model download) |
| Inference time | ~1.1 seconds (warm) |
| GPU memory usage | 25GB / 40GB |
| Total cost | $1.13 (52 minutes of A100 time) |
Run your ML workloads on Jarvislabs
A100s, H100s, and H200s with per-minute billing. Pre-configured environments, 90-second startup, and no long-term commitments.
Get StartedBefore You Start
- JarvisLabs account +
jlCLI installed (installation guide) - The Nemotron 3 Nano 30B model download is ~24GB
- NemoClaw pulls ~2.4GB of container images
- Total disk usage: ~35GB (well within the 100GB VM disk)
To install the CLI (requires uv):
uv tool install jarvislabs
jl setup # interactive — enter your API token from jarvislabs.ai/settings
Why a VM Instead of a Container
NemoClaw needs Docker (it runs OpenShell inside Docker, which manages a K3s cluster internally). It also requires Node.js, the openshell CLI, and several system-level packages. On a JarvisLabs container, anything installed via apt is lost when you pause and resume. A VM gives you full disk persistence, so you set up once and everything survives pause/resume.
The tradeoff: VMs have a public IP, which means you need to set up a firewall. We cover that below.
NemoClaw Installation
Step 1: Create a VM
jl create --gpu A100 --region IN2 --vm --name nemoclaw-agent
Note the instance ID and SSH command from the output. SSH into the VM:
ssh -o StrictHostKeyChecking=no ubuntu@<instance-ip>
All remaining commands run on the VM unless stated otherwise.
Step 2: Lock Down the Firewall
VMs have a public IP. Before installing anything, set up UFW to block all incoming traffic except SSH:
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow ssh
sudo ufw --force enable
Then allow Docker's internal networks to reach Ollama (this is critical and easy to miss):
sudo ufw allow from 172.17.0.0/16 to any port 11434 proto tcp
sudo ufw allow from 172.18.0.0/16 to any port 11434 proto tcp
sudo ufw allow from 10.42.0.0/16 to any port 11434 proto tcp
Without these rules, NemoClaw's sandboxed containers won't be able to reach the Ollama inference server running on the host. The 172.17.0.0/16 rule covers the default Docker bridge network, 172.18.0.0/16 covers the Docker network that the OpenShell gateway container uses, and 10.42.0.0/16 covers the K3s pod CIDR (K3s default). These ranges may differ on your system — check docker network ls and docker inspect if the defaults don't work.
Step 3: Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
Expected output: >>> NVIDIA GPU installed.
Step 4: Configure Ollama to Listen on All Interfaces
By default, Ollama only listens on 127.0.0.1. NemoClaw's sandbox runs inside a container, so it needs Ollama listening on 0.0.0.0:
sudo mkdir -p /etc/systemd/system/ollama.service.d
sudo tee /etc/systemd/system/ollama.service.d/override.conf > /dev/null <<EOF
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
EOF
sudo systemctl daemon-reload
sudo systemctl restart ollama
Verify it's listening correctly:
curl -s http://localhost:11434/api/tags
Expected output: {"models":[]} (empty because we haven't pulled a model yet).
Step 5: Pull the Nemotron Model
ollama pull nemotron-3-nano:30b
This downloads ~24GB. On the A100 instance it took about 1 minute 40 seconds.
Expected output: success after the download completes.
Step 6: Start Docker
The JarvisLabs VM has Docker pre-installed but it may not be running:
sudo systemctl start docker
sudo systemctl enable docker
sudo usermod -aG docker $USER
newgrp docker
The newgrp docker applies the group change to your current shell. Without it, the NemoClaw installer will fail with Docker permission errors.
Step 7: Install NemoClaw
curl -fsSL https://www.nvidia.com/nemoclaw.sh | \
NEMOCLAW_NON_INTERACTIVE=1 \
NEMOCLAW_PROVIDER=ollama \
NEMOCLAW_MODEL=nemotron-3-nano:30b \
bash
This does three things:
- Installs Node.js via nvm (if not present)
- Installs the NemoClaw CLI
- Runs the onboarding wizard in non-interactive mode
Expected output: The installer will progress through 7 steps:
[1/7] Preflight checks ✓
[2/7] Starting OpenShell gateway ✓
[3/7] Configuring inference ✓
[4/7] Setting up inference ✓
[5/7] Creating sandbox ✓ (takes a few minutes on first run)
[6/7] Setting up OpenClaw ✓
[7/7] Policy presets ✓
At the end you'll see:
Sandbox my-assistant (Landlock + seccomp + netns)
Model nemotron-3-nano:30b (Local Ollama)
If step 4 fails with "containers cannot reach host.openshell.internal:11434", the firewall rules from Step 2 are missing. Go back and add them, then rerun the installer.
Step 8: Verify Inference
Test that Ollama is serving the model correctly on the host:
curl -sf http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"nemotron-3-nano:30b","messages":[{"role":"user","content":"What is the capital of France? Reply in one sentence."}]}'
Expected output: A JSON response with "choices" containing the model's reply. The first query takes a few seconds as the model loads into GPU memory; subsequent queries are faster (~1 second).
This verifies Ollama on the host. The NemoClaw sandbox routes inference through inference.local internally — the onboarding wizard already validated that path in Step 7.
Step 9: Connect to the Sandbox
nemoclaw my-assistant connect
This drops you into a sandboxed shell. From here, launch the OpenClaw chat interface:
openclaw tui
You're now chatting with Nemotron 3 Nano 30B through a kernel-level sandboxed agent. Every network request, filesystem access, and process spawn is governed by NemoClaw's security policies.
Step 10: Clean Up
When you're done, exit the SSH session and pause from your local machine to stop billing. You can find your instance ID with jl list:
jl list
jl pause <instance-id>
Since this is a VM, everything persists. When you resume, Ollama and Docker restart automatically. Reconnect with:
ssh -o StrictHostKeyChecking=no ubuntu@<instance-ip>
nemoclaw my-assistant connect
If nemoclaw my-assistant connect fails after resume, restart the services first:
sudo systemctl start docker ollama
# Wait ~30 seconds for the OpenShell gateway to come back
nemoclaw my-assistant connect
Run your ML workloads on Jarvislabs
A100s, H100s, and H200s with per-minute billing. Pre-configured environments, 90-second startup, and no long-term commitments.
Get StartedHow NemoClaw Security Works
NemoClaw creates a multi-layered sandbox using Linux kernel security features:
Landlock restricts filesystem access. The agent can only read/write to /sandbox and /tmp. System directories like /usr and /etc are read-only. The agent cannot access your home directory, SSH keys, or any files outside its sandbox.
seccomp filters system calls. The agent can't call ptrace, mount, or other privileged syscalls that could let it escape the sandbox or escalate privileges.
Network namespaces isolate network access. All outbound traffic is blocked by default. The agent can only reach endpoints explicitly allowed in the policy (like inference.local for the LLM, github.com for code access, etc.). Every connection goes through the OpenShell gateway which enforces TLS termination and path-based rules.
Inference routing keeps API keys off the sandbox. When the agent calls the LLM, the request goes to inference.local (a virtual endpoint inside the sandbox). The OpenShell gateway intercepts it and routes it to the actual provider (Ollama in our case). Your API keys never enter the sandbox.
The default policy includes network rules for GitHub, npm registry, NVIDIA endpoints, Telegram, Discord, and the inference endpoint. Additional presets like PyPI and npm can be added during onboarding or later. Run nemoclaw my-assistant policy-list to see what's active, and nemoclaw my-assistant policy-add to add more.
NemoClaw Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| "containers cannot reach host.openshell.internal:11434" | UFW blocking Docker-to-host traffic | Add UFW rules for 172.17.0.0/16, 172.18.0.0/16, and 10.42.0.0/16 on port 11434 |
| "Docker is not running" | Docker service not started | sudo systemctl start docker |
| "Ollama listens on 127.0.0.1" | Default Ollama config | Create systemd override with OLLAMA_HOST=0.0.0.0:11434 |
| "nemoclaw: command not found" | nvm PATH not loaded | source ~/.bashrc or restart shell |
| "port 8080 in use" | Previous gateway still running | Reuse it (NemoClaw detects this automatically) |
| Sandbox image pull timeout | Slow network or large image | Retry; the openclaw image is ~2.2GB compressed |
Cost and GPU Requirements
| GPU | VRAM | RAM | $/hr | Result |
|---|---|---|---|---|
| A100 | 40GB | 110GB | $1.29 | Works. 25GB VRAM used, 1.1s inference |
Nemotron 3 Nano 30B needs ~25GB VRAM during inference on our test (short prompts, single turn). The A100's 40GB gives comfortable headroom. An L4 (24GB VRAM) fits the model at short context lengths according to Ollama's docs, but we didn't test it — longer conversations or larger contexts may push beyond 24GB.
Total compute cost for this tutorial: $1.13 (52 minutes including setup, model download, testing, and idle time between steps).
What is NemoClaw?
NemoClaw was announced at GTC on March 16, 2026. It's NVIDIA's answer to a real problem: as AI agents get more autonomous (browsing the web, writing code, managing files), how do you stop them from doing damage?
The stack has four layers:
- NemoClaw CLI — TypeScript tool that orchestrates everything
- NemoClaw Blueprint — Python orchestration for sandbox and policy management
- OpenShell — The runtime that creates and manages sandboxed containers with kernel-level isolation
- OpenClaw — The AI assistant framework that runs inside the sandbox
NemoClaw is Apache 2.0 licensed and currently in early preview. It supports NVIDIA Endpoints, OpenAI, Anthropic, Google Gemini, and local inference via Ollama.
- GitHub: NVIDIA/NemoClaw
- Docs: docs.nvidia.com/nemoclaw
- Ollama integration: docs.ollama.com/integrations/nemoclaw
What We Learned
-
The firewall gotcha is the #1 blocker. NemoClaw runs containers inside Docker inside K3s. These containers need to reach Ollama on the host, but UFW blocks this by default. You need explicit rules for three CIDR ranges (172.17.0.0/16, 172.18.0.0/16, 10.42.0.0/16).
-
Ollama's default bind address breaks containerized setups. Every guide that runs Ollama alongside Docker containers hits this. The systemd override to set
OLLAMA_HOST=0.0.0.0should be your first step after installing Ollama. -
VMs are the right call for NemoClaw. The stack has deep system dependencies (Docker, K3s, Node.js, multiple container images). On a JarvisLabs container, you'd need to reinstall most of this after every pause/resume. A VM persists everything.
-
Local inference on A100 is fast and cheap. Nemotron 3 Nano 30B runs at ~1 second per query with 25GB VRAM usage. At $1.29/hr, that's under $0.001 per query with no API rate limits or token costs.
Try it on JarvisLabs. Get started at jarvislabs.ai.
Run your ML workloads on Jarvislabs
A100s, H100s, and H200s with per-minute billing. Pre-configured environments, 90-second startup, and no long-term commitments.
Get Started