Host an AI Model on JarvisLabs
In this guide, you'll create a GPU virtual machine on JarvisLabs, deploy Alibaba's Qwen3.5-27B model, and access it through Open WebUI — a ChatGPT-like interface you can use from your browser.
Qwen 3.5 is a family of open-source multimodal models with vision, thinking, and tool-use capabilities. The 27B variant hits the sweet spot for an A100 80GB — it fits comfortably with plenty of room for the KV cache, delivers strong performance across reasoning, coding, and multimodal tasks, and responds fast enough for interactive use.
By the end, you'll have a self-hosted LLM running on an A100 80GB GPU, accessible from anywhere, with automatic startup configured so it survives reboots.
Prerequisites
Before you begin, make sure you have:
- A JarvisLabs account — sign up at jarvislabs.ai if you don't have one
- The JarvisLabs CLI installed and configured:
# Install the CLI
uv tool install jarvislabs
# Authenticate and set up
jl setup
- An SSH key registered — VM instances require an SSH key:
# Generate a key if you don't have one
ssh-keygen -t ed25519
# Add it to your JarvisLabs account
jl ssh-key add ~/.ssh/id_ed25519.pub --name "my-key"
Run jl status to check your account balance before creating an instance. Run jl gpus to verify A100-80GB availability.
Step 1: Create the VM
Create an A100 80GB virtual machine with 300 GB of storage. VMs give you full root access and the ability to run system services like systemd — which we'll use later to make Open WebUI start automatically.
jl create --gpu A100-80GB --vm --storage 300 --name "openwebui" --yes
The CLI blocks until the instance is running. Once it returns, your VM is ready.
VM instances are available in the IN2 region and require at least one SSH key registered. If you haven't added an SSH key yet, see the Prerequisites section above.
Note the machine ID from the output — you'll need it for the next steps. You can also find it anytime with:
jl list
Step 2: Connect to the VM
SSH into your new instance:
jl ssh <machine_id>
You should land in a shell on the VM. Verify the GPU is available:
nvidia-smi
You should see one A100 80GB GPU listed.
Step 3: Set Up Python with uv
We'll use uv to manage a clean Python environment for Open WebUI. It's fast, handles Python versions automatically, and is already the standard across JarvisLabs templates.
Install uv:
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
Create a virtual environment for Open WebUI:
uv venv ~/openwebui-env --python 3.12
source ~/openwebui-env/bin/activate
Step 4: Install Ollama and Open WebUI
Ollama handles model downloading and GPU inference. Open WebUI provides the browser-based chat interface.
Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh
Install Open WebUI:
uv pip install open-webui
Step 5: Start the Open WebUI Server
Launch the server:
open-webui serve
Open WebUI starts on port 8080 by default. Since your VM has a public IP, you can access it directly from your browser. Get the IP with:
jl get <machine_id>
Then open your browser and go to:
http://<your-vm-public-ip>:8080
You'll see the Open WebUI registration page. Create an account — this is a local account stored on your VM, not shared with any external service.
Step 6: Set Up Hugging Face Token (Optional)
Adding a Hugging Face token speeds up model downloads and gives access to gated models. Create a token at huggingface.co/settings/tokens, then set it on the VM:
echo "HUGGING_FACE_HUB_TOKEN=hf_your_token_here" | sudo tee -a /etc/environment
Restart Ollama to pick up the token:
sudo systemctl restart ollama
Step 7: Download the Qwen 3.5 Model
With Open WebUI running and accessible in your browser:
- Click "Select a model" at the top of the chat interface
- Type
qwen3.5:27bin the search box - Click the download/pull option — Ollama will download the model
The download takes a few minutes depending on network speed.
Once the download completes, select qwen3.5:27b from the model dropdown and start chatting. Since Qwen 3.5 is multimodal, you can also upload images directly in the chat and ask questions about them. It also supports a thinking mode for step-by-step reasoning on harder problems.
Subsequent starts load the model from disk in seconds — only the first download is slow.
Step 8: Enable Automatic Startup
To keep Open WebUI running across reboots, create a systemd service.
First, find your username on the VM:
whoami
Create the service file (replace <username> with your actual username):
sudo tee /etc/systemd/system/openwebui.service > /dev/null << 'EOF'
[Unit]
Description=Open WebUI Server
After=network.target
[Service]
User=<username>
WorkingDirectory=/home/<username>/
ExecStart=/home/<username>/openwebui-env/bin/open-webui serve
Restart=always
[Install]
WantedBy=multi-user.target
EOF
Enable and start the service:
sudo systemctl daemon-reload
sudo systemctl enable openwebui.service
sudo systemctl start openwebui.service
Verify it's running:
sudo systemctl status openwebui.service
You should see active (running) in the output. Open WebUI will now start automatically whenever the VM boots.
Managing Your Instance
Once you're done using the model, you can pause the instance to stop compute billing while keeping your data:
jl pause <machine_id>
When you want to use it again:
jl resume <machine_id>
Your Open WebUI installation, models, and chat history all persist across pause/resume cycles since they're stored in the home directory.
To permanently delete the instance and all data:
jl destroy <machine_id>
jl destroy is irreversible. All data on the instance is lost. If you want to keep the instance for later use, use jl pause instead — it stops compute billing while preserving your data.
Next Steps
- Try different models — Ollama supports hundreds of models. Try
qwen3.5:35bfor a larger option,gemma3:27b, ordevstralfor coding tasks — all from the Open WebUI interface. - Serve models via API — Open WebUI exposes an OpenAI-compatible API. See our Serving LLMs tutorial for more on using Ollama and vLLM APIs.
- Learn more about the CLI — Check the full CLI reference for all available commands and workflows.