Skip to main content

Host an AI Model on JarvisLabs

In this guide, you'll create a GPU virtual machine on JarvisLabs, deploy Alibaba's Qwen3.5-27B model, and access it through Open WebUI — a ChatGPT-like interface you can use from your browser.

Qwen 3.5 is a family of open-source multimodal models with vision, thinking, and tool-use capabilities. The 27B variant hits the sweet spot for an A100 80GB — it fits comfortably with plenty of room for the KV cache, delivers strong performance across reasoning, coding, and multimodal tasks, and responds fast enough for interactive use.

By the end, you'll have a self-hosted LLM running on an A100 80GB GPU, accessible from anywhere, with automatic startup configured so it survives reboots.


Prerequisites

Before you begin, make sure you have:

  1. A JarvisLabs account — sign up at jarvislabs.ai if you don't have one
  2. The JarvisLabs CLI installed and configured:
# Install the CLI
uv tool install jarvislabs

# Authenticate and set up
jl setup
  1. An SSH key registered — VM instances require an SSH key:
# Generate a key if you don't have one
ssh-keygen -t ed25519

# Add it to your JarvisLabs account
jl ssh-key add ~/.ssh/id_ed25519.pub --name "my-key"
tip

Run jl status to check your account balance before creating an instance. Run jl gpus to verify A100-80GB availability.


Step 1: Create the VM

Create an A100 80GB virtual machine with 300 GB of storage. VMs give you full root access and the ability to run system services like systemd — which we'll use later to make Open WebUI start automatically.

jl create --gpu A100-80GB --vm --storage 300 --name "openwebui" --yes

The CLI blocks until the instance is running. Once it returns, your VM is ready.

info

VM instances are available in the IN2 region and require at least one SSH key registered. If you haven't added an SSH key yet, see the Prerequisites section above.

Note the machine ID from the output — you'll need it for the next steps. You can also find it anytime with:

jl list

Step 2: Connect to the VM

SSH into your new instance:

jl ssh <machine_id>

You should land in a shell on the VM. Verify the GPU is available:

nvidia-smi

You should see one A100 80GB GPU listed.


Step 3: Set Up Python with uv

We'll use uv to manage a clean Python environment for Open WebUI. It's fast, handles Python versions automatically, and is already the standard across JarvisLabs templates.

Install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

Create a virtual environment for Open WebUI:

uv venv ~/openwebui-env --python 3.12
source ~/openwebui-env/bin/activate

Step 4: Install Ollama and Open WebUI

Ollama handles model downloading and GPU inference. Open WebUI provides the browser-based chat interface.

Install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

Install Open WebUI:

uv pip install open-webui

Step 5: Start the Open WebUI Server

Launch the server:

open-webui serve

Open WebUI starts on port 8080 by default. Since your VM has a public IP, you can access it directly from your browser. Get the IP with:

jl get <machine_id>

Then open your browser and go to:

http://<your-vm-public-ip>:8080

You'll see the Open WebUI registration page. Create an account — this is a local account stored on your VM, not shared with any external service.


Step 6: Set Up Hugging Face Token (Optional)

Adding a Hugging Face token speeds up model downloads and gives access to gated models. Create a token at huggingface.co/settings/tokens, then set it on the VM:

echo "HUGGING_FACE_HUB_TOKEN=hf_your_token_here" | sudo tee -a /etc/environment

Restart Ollama to pick up the token:

sudo systemctl restart ollama

Step 7: Download the Qwen 3.5 Model

With Open WebUI running and accessible in your browser:

  1. Click "Select a model" at the top of the chat interface
  2. Type qwen3.5:27b in the search box
  3. Click the download/pull option — Ollama will download the model

The download takes a few minutes depending on network speed.

Once the download completes, select qwen3.5:27b from the model dropdown and start chatting. Since Qwen 3.5 is multimodal, you can also upload images directly in the chat and ask questions about them. It also supports a thinking mode for step-by-step reasoning on harder problems.

note

Subsequent starts load the model from disk in seconds — only the first download is slow.


Step 8: Enable Automatic Startup

To keep Open WebUI running across reboots, create a systemd service.

First, find your username on the VM:

whoami

Create the service file (replace <username> with your actual username):

sudo tee /etc/systemd/system/openwebui.service > /dev/null << 'EOF'
[Unit]
Description=Open WebUI Server
After=network.target

[Service]
User=<username>
WorkingDirectory=/home/<username>/
ExecStart=/home/<username>/openwebui-env/bin/open-webui serve
Restart=always

[Install]
WantedBy=multi-user.target
EOF

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable openwebui.service
sudo systemctl start openwebui.service

Verify it's running:

sudo systemctl status openwebui.service

You should see active (running) in the output. Open WebUI will now start automatically whenever the VM boots.


Managing Your Instance

Once you're done using the model, you can pause the instance to stop compute billing while keeping your data:

jl pause <machine_id>

When you want to use it again:

jl resume <machine_id>

Your Open WebUI installation, models, and chat history all persist across pause/resume cycles since they're stored in the home directory.

To permanently delete the instance and all data:

jl destroy <machine_id>
warning

jl destroy is irreversible. All data on the instance is lost. If you want to keep the instance for later use, use jl pause instead — it stops compute billing while preserving your data.


Next Steps

  • Try different models — Ollama supports hundreds of models. Try qwen3.5:35b for a larger option, gemma3:27b, or devstral for coding tasks — all from the Open WebUI interface.
  • Serve models via API — Open WebUI exposes an OpenAI-compatible API. See our Serving LLMs tutorial for more on using Ollama and vLLM APIs.
  • Learn more about the CLI — Check the full CLI reference for all available commands and workflows.