commit caba201b510bccec42171fd362f1b8bc757735df Author: Ghassan Yusuf Date: Wed Mar 25 22:35:11 2026 +0300 Add readme.md diff --git a/readme.md b/readme.md new file mode 100644 index 0000000..246c969 --- /dev/null +++ b/readme.md @@ -0,0 +1,230 @@ +## 1. Prepare Proxmox host (one‑time) + +1. Enable IOMMU in BIOS (VT‑d / AMD‑Vi). +2. On PVE host shell: + - Verify IOMMU is active after reboot: + + ```bash + dmesg | grep -e DMAR -e IOMMU + ``` + + You should see `Intel(R) Virtualization Technology for Directed I/O` and `Queued invalidation` lines like your current output. [forum.proxmox](https://forum.proxmox.com/threads/is-iommu-enabled-on-my-system.128191/) + +3. Install NVIDIA driver + CUDA on host (you already have this, but for rebuild): + - Add NVIDIA repo or use Debian non‑free firmware. + - After install and reboot: + + ```bash + nvidia-smi + ls -l /dev/nvidia* + ``` + + Confirm both RTX 3060s (GPU 0 and 1) and devices `/dev/nvidia0`, `/dev/nvidia1`, `/dev/nvidiactl`, `/dev/nvidia-uvm*`, `/dev/nvidia-modeset`, `/dev/nvidia-caps`. [reddit](https://www.reddit.com/r/Proxmox/comments/1geo379/nvidia_3060_on_proxmox_nvidiasmi_command_error/) + +Host is now GPU‑ready. + +*** + +## 2. Create the Ollama LXC (privileged Ubuntu) + +1. In Proxmox UI → `Create CT`: + - CT ID: `122` + - Hostname: `ollama` + - Template: Ubuntu 22.04/24.04. + - Uncheck “Unprivileged container” (must be **privileged**). + - Disk: 200 GB (as you have). + - CPU: 8 cores. + - RAM: 8 GB (or more if you like). + - Network: `vmbr0`, static IP `172.16.0.6/24`, VLAN tag 90, GW `172.16.0.1` (your current settings). + +2. Before starting it, edit config on host: + + ```bash + nano /etc/pve/lxc/122.conf + ``` + + Make it look like (adapting your net/rootfs values): + + ```ini + arch: amd64 + cores: 8 + hostname: ollama + memory: 8192 + net0: name=eth0,bridge=vmbr0,firewall=1,gw=172.16.0.1,hwaddr=BC:24:11:C9:1B:EE,ip=172.16.0.6/24,tag=90 + onboot: 0 + ostype: ubuntu + rootfs: local-lvm:vm-122-disk-0,size=200G + startup: order=2 + swap: 8192 + tags: + unprivileged: 0 + + # GPU / NVIDIA + lxc.cgroup2.devices.allow: c 195:* rwm + lxc.cgroup2.devices.allow: c 238:* rwm + lxc.cgroup2.devices.allow: c 235:* rwm + + lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file + lxc.mount.entry: /dev/nvidia1 dev/nvidia1 none bind,optional,create=file + lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file + lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file + lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file + lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file + lxc.mount.entry: /dev/nvidia-caps dev/nvidia-caps none bind,optional,create=dir + + lxc.environment: NVIDIA_VISIBLE_DEVICES=all + lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,utility + lxc.environment: CUDA_DEVICE_ORDER=PCI_BUS_ID + ``` + + This gives the CT full access to both GPUs and driver nodes following current Proxmox+NVIDIA LXC recommendations. [forum.proxmox](https://forum.proxmox.com/threads/nvidia-gpu-drivers-installed-on-proxmox-host-but-not-working-in-lxc.151492/) + +3. Start the CT: + + ```bash + pct start 122 + ``` + +*** + +## 3. Verify GPUs inside the LXC + +Inside CT 122 console: + +```bash +ls -l /dev/nvidia* +nvidia-smi +``` + +You should see: + +- `nvidia0`, `nvidia1`, `nvidiactl`, `nvidia-uvm`, `nvidia-modeset`, `nvidia-caps`. +- `nvidia-smi` showing both RTX 3060s, identical to host output. [youtube](https://www.youtube.com/watch?v=lNGNRIJ708k) + +If that’s true, passthrough is correct. + +*** + +## 4. Install Ollama directly in the LXC + +1. Inside CT (Ubuntu): + + ```bash + curl -fsSL https://ollama.com/install.sh | sh + ``` + + Or follow Ollama Linux docs if manual. [docs.ollama](https://docs.ollama.com/faq) + +2. Verify: + + ```bash + ollama --version + ``` + + It should install a systemd service `ollama.service` that runs `ollama serve`. [docs.ollama](https://docs.ollama.com/linux) + +*** + +## 5. Configure systemd so Ollama uses both GPUs and listens on all IPs + +1. Create/modify systemd override: + + ```bash + sudo systemctl edit ollama.service + ``` + + Put: + + ```ini + [Service] + Environment=NVIDIA_VISIBLE_DEVICES=0,1 + Environment=OLLAMA_FLASH_ATTENTION=1 + Environment=OLLAMA_GPU_OVERHEAD=1024 + Environment=OLLAMA_HOST=0.0.0.0:11434 + Environment=OLLAMA_ORIGINS=* + Restart=always + RestartSec=10 + ``` + + This is the officially recommended pattern (Environment in override, main unit just runs `ollama serve`). [github](https://github.com/ollama/ollama/pull/5601/files) + +2. Apply and restart: + + ```bash + sudo systemctl daemon-reload + sudo systemctl restart ollama + sudo systemctl status ollama + ``` + + Ensure it’s `active (running)`. + +3. Confirm it’s listening: + + ```bash + ss -lntp | grep 11434 || netstat -lntp | grep 11434 + ``` + + You should see `0.0.0.0:11434` bound to `ollama`. [thinkinprompt](https://thinkinprompt.com/post/Ollama-environment-variable-setting-on-Linux) + +*** + +## 6. Confirm both GPUs are actually used + +1. In CT 122, terminal A: + + ```bash + nvidia-smi -l 1 + ``` + +2. In terminal B: + + ```bash + ollama pull llama3.1:8b-q4 + ollama run llama3.1:8b-q4 + ``` + +3. Watch `nvidia-smi` during generation: you should see processes on GPU 0 and GPU 1 when larger models or multiple requests are running. For big models (e.g. `70b-q4`), utilization on both cards will be clearer. [perplexity](https://www.perplexity.ai/search/d58f0044-521e-472f-8e3d-5cc576f18bdc) + +*** + +## 7. Connect Open WebUI (in its own CT/VM) + +In your existing `openweb-ui` container/VM: + +- If Docker CUDA image (recommended): + + ```bash + docker run -d \ + --name open-webui \ + -p 3000:8080 \ + --gpus all \ + --security-opt apparmor=unconfined \ + -e OLLAMA_BASE_URL=http://172.16.0.6:11434 \ + -v open-webui:/app/backend/data \ + --restart always \ + ghcr.io/open-webui/open-webui:cuda + ``` + + This uses GPU where Open WebUI runs and calls Ollama over the network at `172.16.0.6:11434`. [perplexity](https://www.perplexity.ai/search/1957f039-5940-4128-aed0-43effc9a766a) + +Test by opening Open WebUI in the browser, selecting an Ollama model, and sending a message while watching `nvidia-smi` in CT 122. + +*** + +## 8. Quick checklist for rebuild + +- Host: + - IOMMU enabled (`dmesg | grep DMAR` shows VT‑d). [forum.proxmox](https://forum.proxmox.com/threads/is-iommu-enabled-on-my-system.128191/) + - NVIDIA driver + CUDA OK (`nvidia-smi` shows both 3060s). [reddit](https://www.reddit.com/r/Proxmox/comments/1geo379/nvidia_3060_on_proxmox_nvidiasmi_command_error/) +- LXC 122: + - Privileged (`unprivileged: 0`). + - `/etc/pve/lxc/122.conf` has `lxc.cgroup2.devices.allow` for 195, 238, 235 and mount entries for all `/dev/nvidia*` and `/dev/nvidia-caps`. [perplexity](https://www.perplexity.ai/search/45539c15-d92e-492c-a9a7-0c15d28dd331) + - Inside CT, `ls -l /dev/nvidia*` + `nvidia-smi` show both GPUs. [perplexity](https://www.perplexity.ai/search/d58f0044-521e-472f-8e3d-5cc576f18bdc) +- Ollama: + - Installed directly in CT. + - `systemctl edit ollama.service` with env vars for both GPUs + host/origins. + - `systemctl status ollama` → running, `ollama serve` bound to `0.0.0.0:11434`. [ollama.readthedocs](https://ollama.readthedocs.io/en/faq/) +- Open WebUI: + - Points `OLLAMA_BASE_URL` to `http://172.16.0.6:11434`. [perplexity](https://www.perplexity.ai/search/1957f039-5940-4128-aed0-43effc9a766a) + +If you want, I can turn this into a bash script plus a couple of template files so next rebuild is mostly copy‑paste. \ No newline at end of file