prepare-ollama-proxmox

ghassan/prepare-ollama-proxmox

Fork 0

Go to file

Ghassan Yusuf caba201b51 Add readme.md

2026-03-25 22:35:11 +03:00

readme.md

Add readme.md

2026-03-25 22:35:11 +03:00

readme.md

1. Prepare Proxmox host (one‑time)

Enable IOMMU in BIOS (VT‑d / AMD‑Vi).
On PVE host shell:
- Verify IOMMU is active after reboot:
```
dmesg | grep -e DMAR -e IOMMU
```
  You should see Intel(R) Virtualization Technology for Directed I/O and Queued invalidation lines like your current output. forum.proxmox
Install NVIDIA driver + CUDA on host (you already have this, but for rebuild):
- Add NVIDIA repo or use Debian non‑free firmware.
- After install and reboot:
```
nvidia-smi
ls -l /dev/nvidia*
```
  Confirm both RTX 3060s (GPU 0 and 1) and devices /dev/nvidia0, /dev/nvidia1, /dev/nvidiactl, /dev/nvidia-uvm*, /dev/nvidia-modeset, /dev/nvidia-caps. reddit

Host is now GPU‑ready.

2. Create the Ollama LXC (privileged Ubuntu)

In Proxmox UI → Create CT:
- CT ID: 122
- Hostname: ollama
- Template: Ubuntu 22.04/24.04.
- Uncheck “Unprivileged container” (must be privileged).
- Disk: 200 GB (as you have).
- CPU: 8 cores.
- RAM: 8 GB (or more if you like).
- Network: vmbr0, static IP 172.16.0.6/24, VLAN tag 90, GW 172.16.0.1 (your current settings).

Before starting it, edit config on host:

nano /etc/pve/lxc/122.conf

Make it look like (adapting your net/rootfs values):

arch: amd64
cores: 8
hostname: ollama
memory: 8192
net0: name=eth0,bridge=vmbr0,firewall=1,gw=172.16.0.1,hwaddr=BC:24:11:C9:1B:EE,ip=172.16.0.6/24,tag=90
onboot: 0
ostype: ubuntu
rootfs: local-lvm:vm-122-disk-0,size=200G
startup: order=2
swap: 8192
tags:
unprivileged: 0

# GPU / NVIDIA
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 238:* rwm
lxc.cgroup2.devices.allow: c 235:* rwm

lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidia1 dev/nvidia1 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-caps dev/nvidia-caps none bind,optional,create=dir

lxc.environment: NVIDIA_VISIBLE_DEVICES=all
lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,utility
lxc.environment: CUDA_DEVICE_ORDER=PCI_BUS_ID

This gives the CT full access to both GPUs and driver nodes following current Proxmox+NVIDIA LXC recommendations. forum.proxmox

Start the CT:
```
pct start 122
```

3. Verify GPUs inside the LXC

Inside CT 122 console:

ls -l /dev/nvidia*
nvidia-smi

You should see:

nvidia0, nvidia1, nvidiactl, nvidia-uvm, nvidia-modeset, nvidia-caps.
nvidia-smi showing both RTX 3060s, identical to host output. youtube

If that’s true, passthrough is correct.

4. Install Ollama directly in the LXC

Inside CT (Ubuntu):
```
curl -fsSL https://ollama.com/install.sh | sh
```
Or follow Ollama Linux docs if manual. docs.ollama
Verify:
```
ollama --version
```
It should install a systemd service ollama.service that runs ollama serve. docs.ollama

5. Configure systemd so Ollama uses both GPUs and listens on all IPs

Create/modify systemd override:

sudo systemctl edit ollama.service

Put:

[Service]
Environment=NVIDIA_VISIBLE_DEVICES=0,1
Environment=OLLAMA_FLASH_ATTENTION=1
Environment=OLLAMA_GPU_OVERHEAD=1024
Environment=OLLAMA_HOST=0.0.0.0:11434
Environment=OLLAMA_ORIGINS=*
Restart=always
RestartSec=10

This is the officially recommended pattern (Environment in override, main unit just runs ollama serve). github

Apply and restart:

sudo systemctl daemon-reload
sudo systemctl restart ollama
sudo systemctl status ollama

Ensure it’s active (running).

Confirm it’s listening:
```
ss -lntp | grep 11434 || netstat -lntp | grep 11434
```
You should see 0.0.0.0:11434 bound to ollama. thinkinprompt

6. Confirm both GPUs are actually used

In CT 122, terminal A:
```
nvidia-smi -l 1
```

In terminal B:

ollama pull llama3.1:8b-q4
ollama run llama3.1:8b-q4

Watch nvidia-smi during generation: you should see processes on GPU 0 and GPU 1 when larger models or multiple requests are running. For big models (e.g. 70b-q4), utilization on both cards will be clearer. perplexity

7. Connect Open WebUI (in its own CT/VM)

In your existing openweb-ui container/VM:

If Docker CUDA image (recommended):

docker run -d \
  --name open-webui \
  -p 3000:8080 \
  --gpus all \
  --security-opt apparmor=unconfined \
  -e OLLAMA_BASE_URL=http://172.16.0.6:11434 \
  -v open-webui:/app/backend/data \
  --restart always \
  ghcr.io/open-webui/open-webui:cuda

This uses GPU where Open WebUI runs and calls Ollama over the network at 172.16.0.6:11434. perplexity

Test by opening Open WebUI in the browser, selecting an Ollama model, and sending a message while watching nvidia-smi in CT 122.

8. Quick checklist for rebuild

Host:
- IOMMU enabled (dmesg | grep DMAR shows VT‑d). forum.proxmox
- NVIDIA driver + CUDA OK (nvidia-smi shows both 3060s). reddit
LXC 122:
- Privileged (unprivileged: 0).
- /etc/pve/lxc/122.conf has lxc.cgroup2.devices.allow for 195, 238, 235 and mount entries for all /dev/nvidia* and /dev/nvidia-caps. perplexity
- Inside CT, ls -l /dev/nvidia* + nvidia-smi show both GPUs. perplexity
Ollama:
- Installed directly in CT.
- systemctl edit ollama.service with env vars for both GPUs + host/origins.
- systemctl status ollama → running, ollama serve bound to 0.0.0.0:11434. ollama.readthedocs
Open WebUI:
- Points OLLAMA_BASE_URL to http://172.16.0.6:11434. perplexity

If you want, I can turn this into a bash script plus a couple of template files so next rebuild is mostly copy‑paste.

readme.md Unescape Escape

1. Prepare Proxmox host (one‑time)

2. Create the Ollama LXC (privileged Ubuntu)

3. Verify GPUs inside the LXC

4. Install Ollama directly in the LXC

5. Configure systemd so Ollama uses both GPUs and listens on all IPs

6. Confirm both GPUs are actually used

7. Connect Open WebUI (in its own CT/VM)

8. Quick checklist for rebuild

readme.md