2026-03-25 22:35:11 +03:00

230 lines
7.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## 1. Prepare Proxmox host (onetime)
1. Enable IOMMU in BIOS (VTd / AMDVi).
2. On PVE host shell:
- Verify IOMMU is active after reboot:
```bash
dmesg | grep -e DMAR -e IOMMU
```
You should see `Intel(R) Virtualization Technology for Directed I/O` and `Queued invalidation` lines like your current output. [forum.proxmox](https://forum.proxmox.com/threads/is-iommu-enabled-on-my-system.128191/)
3. Install NVIDIA driver + CUDA on host (you already have this, but for rebuild):
- Add NVIDIA repo or use Debian nonfree firmware.
- After install and reboot:
```bash
nvidia-smi
ls -l /dev/nvidia*
```
Confirm both RTX 3060s (GPU 0 and 1) and devices `/dev/nvidia0`, `/dev/nvidia1`, `/dev/nvidiactl`, `/dev/nvidia-uvm*`, `/dev/nvidia-modeset`, `/dev/nvidia-caps`. [reddit](https://www.reddit.com/r/Proxmox/comments/1geo379/nvidia_3060_on_proxmox_nvidiasmi_command_error/)
Host is now GPUready.
***
## 2. Create the Ollama LXC (privileged Ubuntu)
1. In Proxmox UI → `Create CT`:
- CT ID: `122`
- Hostname: `ollama`
- Template: Ubuntu 22.04/24.04.
- Uncheck “Unprivileged container” (must be **privileged**).
- Disk: 200 GB (as you have).
- CPU: 8 cores.
- RAM: 8 GB (or more if you like).
- Network: `vmbr0`, static IP `172.16.0.6/24`, VLAN tag 90, GW `172.16.0.1` (your current settings).
2. Before starting it, edit config on host:
```bash
nano /etc/pve/lxc/122.conf
```
Make it look like (adapting your net/rootfs values):
```ini
arch: amd64
cores: 8
hostname: ollama
memory: 8192
net0: name=eth0,bridge=vmbr0,firewall=1,gw=172.16.0.1,hwaddr=BC:24:11:C9:1B:EE,ip=172.16.0.6/24,tag=90
onboot: 0
ostype: ubuntu
rootfs: local-lvm:vm-122-disk-0,size=200G
startup: order=2
swap: 8192
tags:
unprivileged: 0
# GPU / NVIDIA
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 238:* rwm
lxc.cgroup2.devices.allow: c 235:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidia1 dev/nvidia1 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-caps dev/nvidia-caps none bind,optional,create=dir
lxc.environment: NVIDIA_VISIBLE_DEVICES=all
lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,utility
lxc.environment: CUDA_DEVICE_ORDER=PCI_BUS_ID
```
This gives the CT full access to both GPUs and driver nodes following current Proxmox+NVIDIA LXC recommendations. [forum.proxmox](https://forum.proxmox.com/threads/nvidia-gpu-drivers-installed-on-proxmox-host-but-not-working-in-lxc.151492/)
3. Start the CT:
```bash
pct start 122
```
***
## 3. Verify GPUs inside the LXC
Inside CT 122 console:
```bash
ls -l /dev/nvidia*
nvidia-smi
```
You should see:
- `nvidia0`, `nvidia1`, `nvidiactl`, `nvidia-uvm`, `nvidia-modeset`, `nvidia-caps`.
- `nvidia-smi` showing both RTX 3060s, identical to host output. [youtube](https://www.youtube.com/watch?v=lNGNRIJ708k)
If thats true, passthrough is correct.
***
## 4. Install Ollama directly in the LXC
1. Inside CT (Ubuntu):
```bash
curl -fsSL https://ollama.com/install.sh | sh
```
Or follow Ollama Linux docs if manual. [docs.ollama](https://docs.ollama.com/faq)
2. Verify:
```bash
ollama --version
```
It should install a systemd service `ollama.service` that runs `ollama serve`. [docs.ollama](https://docs.ollama.com/linux)
***
## 5. Configure systemd so Ollama uses both GPUs and listens on all IPs
1. Create/modify systemd override:
```bash
sudo systemctl edit ollama.service
```
Put:
```ini
[Service]
Environment=NVIDIA_VISIBLE_DEVICES=0,1
Environment=OLLAMA_FLASH_ATTENTION=1
Environment=OLLAMA_GPU_OVERHEAD=1024
Environment=OLLAMA_HOST=0.0.0.0:11434
Environment=OLLAMA_ORIGINS=*
Restart=always
RestartSec=10
```
This is the officially recommended pattern (Environment in override, main unit just runs `ollama serve`). [github](https://github.com/ollama/ollama/pull/5601/files)
2. Apply and restart:
```bash
sudo systemctl daemon-reload
sudo systemctl restart ollama
sudo systemctl status ollama
```
Ensure its `active (running)`.
3. Confirm its listening:
```bash
ss -lntp | grep 11434 || netstat -lntp | grep 11434
```
You should see `0.0.0.0:11434` bound to `ollama`. [thinkinprompt](https://thinkinprompt.com/post/Ollama-environment-variable-setting-on-Linux)
***
## 6. Confirm both GPUs are actually used
1. In CT 122, terminal A:
```bash
nvidia-smi -l 1
```
2. In terminal B:
```bash
ollama pull llama3.1:8b-q4
ollama run llama3.1:8b-q4
```
3. Watch `nvidia-smi` during generation: you should see processes on GPU 0 and GPU 1 when larger models or multiple requests are running. For big models (e.g. `70b-q4`), utilization on both cards will be clearer. [perplexity](https://www.perplexity.ai/search/d58f0044-521e-472f-8e3d-5cc576f18bdc)
***
## 7. Connect Open WebUI (in its own CT/VM)
In your existing `openweb-ui` container/VM:
- If Docker CUDA image (recommended):
```bash
docker run -d \
--name open-webui \
-p 3000:8080 \
--gpus all \
--security-opt apparmor=unconfined \
-e OLLAMA_BASE_URL=http://172.16.0.6:11434 \
-v open-webui:/app/backend/data \
--restart always \
ghcr.io/open-webui/open-webui:cuda
```
This uses GPU where Open WebUI runs and calls Ollama over the network at `172.16.0.6:11434`. [perplexity](https://www.perplexity.ai/search/1957f039-5940-4128-aed0-43effc9a766a)
Test by opening Open WebUI in the browser, selecting an Ollama model, and sending a message while watching `nvidia-smi` in CT 122.
***
## 8. Quick checklist for rebuild
- Host:
- IOMMU enabled (`dmesg | grep DMAR` shows VTd). [forum.proxmox](https://forum.proxmox.com/threads/is-iommu-enabled-on-my-system.128191/)
- NVIDIA driver + CUDA OK (`nvidia-smi` shows both 3060s). [reddit](https://www.reddit.com/r/Proxmox/comments/1geo379/nvidia_3060_on_proxmox_nvidiasmi_command_error/)
- LXC 122:
- Privileged (`unprivileged: 0`).
- `/etc/pve/lxc/122.conf` has `lxc.cgroup2.devices.allow` for 195, 238, 235 and mount entries for all `/dev/nvidia*` and `/dev/nvidia-caps`. [perplexity](https://www.perplexity.ai/search/45539c15-d92e-492c-a9a7-0c15d28dd331)
- Inside CT, `ls -l /dev/nvidia*` + `nvidia-smi` show both GPUs. [perplexity](https://www.perplexity.ai/search/d58f0044-521e-472f-8e3d-5cc576f18bdc)
- Ollama:
- Installed directly in CT.
- `systemctl edit ollama.service` with env vars for both GPUs + host/origins.
- `systemctl status ollama` → running, `ollama serve` bound to `0.0.0.0:11434`. [ollama.readthedocs](https://ollama.readthedocs.io/en/faq/)
- Open WebUI:
- Points `OLLAMA_BASE_URL` to `http://172.16.0.6:11434`. [perplexity](https://www.perplexity.ai/search/1957f039-5940-4128-aed0-43effc9a766a)
If you want, I can turn this into a bash script plus a couple of template files so next rebuild is mostly copypaste.