1. Prepare Proxmox host (one‑time)
-
Enable IOMMU in BIOS (VT‑d / AMD‑Vi).
-
On PVE host shell:
-
Verify IOMMU is active after reboot:
dmesg | grep -e DMAR -e IOMMUYou should see
Intel(R) Virtualization Technology for Directed I/OandQueued invalidationlines like your current output. forum.proxmox
-
-
Install NVIDIA driver + CUDA on host (you already have this, but for rebuild):
-
Add NVIDIA repo or use Debian non‑free firmware.
-
After install and reboot:
nvidia-smi ls -l /dev/nvidia*Confirm both RTX 3060s (GPU 0 and 1) and devices
/dev/nvidia0,/dev/nvidia1,/dev/nvidiactl,/dev/nvidia-uvm*,/dev/nvidia-modeset,/dev/nvidia-caps. reddit
-
Host is now GPU‑ready.
2. Create the Ollama LXC (privileged Ubuntu)
-
In Proxmox UI →
Create CT:- CT ID:
122 - Hostname:
ollama - Template: Ubuntu 22.04/24.04.
- Uncheck “Unprivileged container” (must be privileged).
- Disk: 200 GB (as you have).
- CPU: 8 cores.
- RAM: 8 GB (or more if you like).
- Network:
vmbr0, static IP172.16.0.6/24, VLAN tag 90, GW172.16.0.1(your current settings).
- CT ID:
-
Before starting it, edit config on host:
nano /etc/pve/lxc/122.confMake it look like (adapting your net/rootfs values):
arch: amd64 cores: 8 hostname: ollama memory: 8192 net0: name=eth0,bridge=vmbr0,firewall=1,gw=172.16.0.1,hwaddr=BC:24:11:C9:1B:EE,ip=172.16.0.6/24,tag=90 onboot: 0 ostype: ubuntu rootfs: local-lvm:vm-122-disk-0,size=200G startup: order=2 swap: 8192 tags: unprivileged: 0 # GPU / NVIDIA lxc.cgroup2.devices.allow: c 195:* rwm lxc.cgroup2.devices.allow: c 238:* rwm lxc.cgroup2.devices.allow: c 235:* rwm lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file lxc.mount.entry: /dev/nvidia1 dev/nvidia1 none bind,optional,create=file lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file lxc.mount.entry: /dev/nvidia-caps dev/nvidia-caps none bind,optional,create=dir lxc.environment: NVIDIA_VISIBLE_DEVICES=all lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,utility lxc.environment: CUDA_DEVICE_ORDER=PCI_BUS_IDThis gives the CT full access to both GPUs and driver nodes following current Proxmox+NVIDIA LXC recommendations. forum.proxmox
-
Start the CT:
pct start 122
3. Verify GPUs inside the LXC
Inside CT 122 console:
ls -l /dev/nvidia*
nvidia-smi
You should see:
nvidia0,nvidia1,nvidiactl,nvidia-uvm,nvidia-modeset,nvidia-caps.nvidia-smishowing both RTX 3060s, identical to host output. youtube
If that’s true, passthrough is correct.
4. Install Ollama directly in the LXC
-
Inside CT (Ubuntu):
curl -fsSL https://ollama.com/install.sh | shOr follow Ollama Linux docs if manual. docs.ollama
-
Verify:
ollama --versionIt should install a systemd service
ollama.servicethat runsollama serve. docs.ollama
5. Configure systemd so Ollama uses both GPUs and listens on all IPs
-
Create/modify systemd override:
sudo systemctl edit ollama.servicePut:
[Service] Environment=NVIDIA_VISIBLE_DEVICES=0,1 Environment=OLLAMA_FLASH_ATTENTION=1 Environment=OLLAMA_GPU_OVERHEAD=1024 Environment=OLLAMA_HOST=0.0.0.0:11434 Environment=OLLAMA_ORIGINS=* Restart=always RestartSec=10This is the officially recommended pattern (Environment in override, main unit just runs
ollama serve). github -
Apply and restart:
sudo systemctl daemon-reload sudo systemctl restart ollama sudo systemctl status ollamaEnsure it’s
active (running). -
Confirm it’s listening:
ss -lntp | grep 11434 || netstat -lntp | grep 11434You should see
0.0.0.0:11434bound toollama. thinkinprompt
6. Confirm both GPUs are actually used
-
In CT 122, terminal A:
nvidia-smi -l 1 -
In terminal B:
ollama pull llama3.1:8b-q4 ollama run llama3.1:8b-q4 -
Watch
nvidia-smiduring generation: you should see processes on GPU 0 and GPU 1 when larger models or multiple requests are running. For big models (e.g.70b-q4), utilization on both cards will be clearer. perplexity
7. Connect Open WebUI (in its own CT/VM)
In your existing openweb-ui container/VM:
-
If Docker CUDA image (recommended):
docker run -d \ --name open-webui \ -p 3000:8080 \ --gpus all \ --security-opt apparmor=unconfined \ -e OLLAMA_BASE_URL=http://172.16.0.6:11434 \ -v open-webui:/app/backend/data \ --restart always \ ghcr.io/open-webui/open-webui:cudaThis uses GPU where Open WebUI runs and calls Ollama over the network at
172.16.0.6:11434. perplexity
Test by opening Open WebUI in the browser, selecting an Ollama model, and sending a message while watching nvidia-smi in CT 122.
8. Quick checklist for rebuild
- Host:
- IOMMU enabled (
dmesg | grep DMARshows VT‑d). forum.proxmox - NVIDIA driver + CUDA OK (
nvidia-smishows both 3060s). reddit
- IOMMU enabled (
- LXC 122:
- Privileged (
unprivileged: 0). /etc/pve/lxc/122.confhaslxc.cgroup2.devices.allowfor 195, 238, 235 and mount entries for all/dev/nvidia*and/dev/nvidia-caps. perplexity- Inside CT,
ls -l /dev/nvidia*+nvidia-smishow both GPUs. perplexity
- Privileged (
- Ollama:
- Installed directly in CT.
systemctl edit ollama.servicewith env vars for both GPUs + host/origins.systemctl status ollama→ running,ollama servebound to0.0.0.0:11434. ollama.readthedocs
- Open WebUI:
- Points
OLLAMA_BASE_URLtohttp://172.16.0.6:11434. perplexity
- Points
If you want, I can turn this into a bash script plus a couple of template files so next rebuild is mostly copy‑paste.