Add readme.md
This commit is contained in:
commit
caba201b51
230
readme.md
Normal file
230
readme.md
Normal file
@ -0,0 +1,230 @@
|
|||||||
|
## 1. Prepare Proxmox host (one‑time)
|
||||||
|
|
||||||
|
1. Enable IOMMU in BIOS (VT‑d / AMD‑Vi).
|
||||||
|
2. On PVE host shell:
|
||||||
|
- Verify IOMMU is active after reboot:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
dmesg | grep -e DMAR -e IOMMU
|
||||||
|
```
|
||||||
|
|
||||||
|
You should see `Intel(R) Virtualization Technology for Directed I/O` and `Queued invalidation` lines like your current output. [forum.proxmox](https://forum.proxmox.com/threads/is-iommu-enabled-on-my-system.128191/)
|
||||||
|
|
||||||
|
3. Install NVIDIA driver + CUDA on host (you already have this, but for rebuild):
|
||||||
|
- Add NVIDIA repo or use Debian non‑free firmware.
|
||||||
|
- After install and reboot:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
nvidia-smi
|
||||||
|
ls -l /dev/nvidia*
|
||||||
|
```
|
||||||
|
|
||||||
|
Confirm both RTX 3060s (GPU 0 and 1) and devices `/dev/nvidia0`, `/dev/nvidia1`, `/dev/nvidiactl`, `/dev/nvidia-uvm*`, `/dev/nvidia-modeset`, `/dev/nvidia-caps`. [reddit](https://www.reddit.com/r/Proxmox/comments/1geo379/nvidia_3060_on_proxmox_nvidiasmi_command_error/)
|
||||||
|
|
||||||
|
Host is now GPU‑ready.
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
## 2. Create the Ollama LXC (privileged Ubuntu)
|
||||||
|
|
||||||
|
1. In Proxmox UI → `Create CT`:
|
||||||
|
- CT ID: `122`
|
||||||
|
- Hostname: `ollama`
|
||||||
|
- Template: Ubuntu 22.04/24.04.
|
||||||
|
- Uncheck “Unprivileged container” (must be **privileged**).
|
||||||
|
- Disk: 200 GB (as you have).
|
||||||
|
- CPU: 8 cores.
|
||||||
|
- RAM: 8 GB (or more if you like).
|
||||||
|
- Network: `vmbr0`, static IP `172.16.0.6/24`, VLAN tag 90, GW `172.16.0.1` (your current settings).
|
||||||
|
|
||||||
|
2. Before starting it, edit config on host:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
nano /etc/pve/lxc/122.conf
|
||||||
|
```
|
||||||
|
|
||||||
|
Make it look like (adapting your net/rootfs values):
|
||||||
|
|
||||||
|
```ini
|
||||||
|
arch: amd64
|
||||||
|
cores: 8
|
||||||
|
hostname: ollama
|
||||||
|
memory: 8192
|
||||||
|
net0: name=eth0,bridge=vmbr0,firewall=1,gw=172.16.0.1,hwaddr=BC:24:11:C9:1B:EE,ip=172.16.0.6/24,tag=90
|
||||||
|
onboot: 0
|
||||||
|
ostype: ubuntu
|
||||||
|
rootfs: local-lvm:vm-122-disk-0,size=200G
|
||||||
|
startup: order=2
|
||||||
|
swap: 8192
|
||||||
|
tags:
|
||||||
|
unprivileged: 0
|
||||||
|
|
||||||
|
# GPU / NVIDIA
|
||||||
|
lxc.cgroup2.devices.allow: c 195:* rwm
|
||||||
|
lxc.cgroup2.devices.allow: c 238:* rwm
|
||||||
|
lxc.cgroup2.devices.allow: c 235:* rwm
|
||||||
|
|
||||||
|
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
|
||||||
|
lxc.mount.entry: /dev/nvidia1 dev/nvidia1 none bind,optional,create=file
|
||||||
|
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
|
||||||
|
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
|
||||||
|
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
|
||||||
|
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
|
||||||
|
lxc.mount.entry: /dev/nvidia-caps dev/nvidia-caps none bind,optional,create=dir
|
||||||
|
|
||||||
|
lxc.environment: NVIDIA_VISIBLE_DEVICES=all
|
||||||
|
lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,utility
|
||||||
|
lxc.environment: CUDA_DEVICE_ORDER=PCI_BUS_ID
|
||||||
|
```
|
||||||
|
|
||||||
|
This gives the CT full access to both GPUs and driver nodes following current Proxmox+NVIDIA LXC recommendations. [forum.proxmox](https://forum.proxmox.com/threads/nvidia-gpu-drivers-installed-on-proxmox-host-but-not-working-in-lxc.151492/)
|
||||||
|
|
||||||
|
3. Start the CT:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pct start 122
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
## 3. Verify GPUs inside the LXC
|
||||||
|
|
||||||
|
Inside CT 122 console:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ls -l /dev/nvidia*
|
||||||
|
nvidia-smi
|
||||||
|
```
|
||||||
|
|
||||||
|
You should see:
|
||||||
|
|
||||||
|
- `nvidia0`, `nvidia1`, `nvidiactl`, `nvidia-uvm`, `nvidia-modeset`, `nvidia-caps`.
|
||||||
|
- `nvidia-smi` showing both RTX 3060s, identical to host output. [youtube](https://www.youtube.com/watch?v=lNGNRIJ708k)
|
||||||
|
|
||||||
|
If that’s true, passthrough is correct.
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
## 4. Install Ollama directly in the LXC
|
||||||
|
|
||||||
|
1. Inside CT (Ubuntu):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -fsSL https://ollama.com/install.sh | sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Or follow Ollama Linux docs if manual. [docs.ollama](https://docs.ollama.com/faq)
|
||||||
|
|
||||||
|
2. Verify:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ollama --version
|
||||||
|
```
|
||||||
|
|
||||||
|
It should install a systemd service `ollama.service` that runs `ollama serve`. [docs.ollama](https://docs.ollama.com/linux)
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
## 5. Configure systemd so Ollama uses both GPUs and listens on all IPs
|
||||||
|
|
||||||
|
1. Create/modify systemd override:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo systemctl edit ollama.service
|
||||||
|
```
|
||||||
|
|
||||||
|
Put:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[Service]
|
||||||
|
Environment=NVIDIA_VISIBLE_DEVICES=0,1
|
||||||
|
Environment=OLLAMA_FLASH_ATTENTION=1
|
||||||
|
Environment=OLLAMA_GPU_OVERHEAD=1024
|
||||||
|
Environment=OLLAMA_HOST=0.0.0.0:11434
|
||||||
|
Environment=OLLAMA_ORIGINS=*
|
||||||
|
Restart=always
|
||||||
|
RestartSec=10
|
||||||
|
```
|
||||||
|
|
||||||
|
This is the officially recommended pattern (Environment in override, main unit just runs `ollama serve`). [github](https://github.com/ollama/ollama/pull/5601/files)
|
||||||
|
|
||||||
|
2. Apply and restart:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo systemctl daemon-reload
|
||||||
|
sudo systemctl restart ollama
|
||||||
|
sudo systemctl status ollama
|
||||||
|
```
|
||||||
|
|
||||||
|
Ensure it’s `active (running)`.
|
||||||
|
|
||||||
|
3. Confirm it’s listening:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ss -lntp | grep 11434 || netstat -lntp | grep 11434
|
||||||
|
```
|
||||||
|
|
||||||
|
You should see `0.0.0.0:11434` bound to `ollama`. [thinkinprompt](https://thinkinprompt.com/post/Ollama-environment-variable-setting-on-Linux)
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
## 6. Confirm both GPUs are actually used
|
||||||
|
|
||||||
|
1. In CT 122, terminal A:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
nvidia-smi -l 1
|
||||||
|
```
|
||||||
|
|
||||||
|
2. In terminal B:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ollama pull llama3.1:8b-q4
|
||||||
|
ollama run llama3.1:8b-q4
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Watch `nvidia-smi` during generation: you should see processes on GPU 0 and GPU 1 when larger models or multiple requests are running. For big models (e.g. `70b-q4`), utilization on both cards will be clearer. [perplexity](https://www.perplexity.ai/search/d58f0044-521e-472f-8e3d-5cc576f18bdc)
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
## 7. Connect Open WebUI (in its own CT/VM)
|
||||||
|
|
||||||
|
In your existing `openweb-ui` container/VM:
|
||||||
|
|
||||||
|
- If Docker CUDA image (recommended):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker run -d \
|
||||||
|
--name open-webui \
|
||||||
|
-p 3000:8080 \
|
||||||
|
--gpus all \
|
||||||
|
--security-opt apparmor=unconfined \
|
||||||
|
-e OLLAMA_BASE_URL=http://172.16.0.6:11434 \
|
||||||
|
-v open-webui:/app/backend/data \
|
||||||
|
--restart always \
|
||||||
|
ghcr.io/open-webui/open-webui:cuda
|
||||||
|
```
|
||||||
|
|
||||||
|
This uses GPU where Open WebUI runs and calls Ollama over the network at `172.16.0.6:11434`. [perplexity](https://www.perplexity.ai/search/1957f039-5940-4128-aed0-43effc9a766a)
|
||||||
|
|
||||||
|
Test by opening Open WebUI in the browser, selecting an Ollama model, and sending a message while watching `nvidia-smi` in CT 122.
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
## 8. Quick checklist for rebuild
|
||||||
|
|
||||||
|
- Host:
|
||||||
|
- IOMMU enabled (`dmesg | grep DMAR` shows VT‑d). [forum.proxmox](https://forum.proxmox.com/threads/is-iommu-enabled-on-my-system.128191/)
|
||||||
|
- NVIDIA driver + CUDA OK (`nvidia-smi` shows both 3060s). [reddit](https://www.reddit.com/r/Proxmox/comments/1geo379/nvidia_3060_on_proxmox_nvidiasmi_command_error/)
|
||||||
|
- LXC 122:
|
||||||
|
- Privileged (`unprivileged: 0`).
|
||||||
|
- `/etc/pve/lxc/122.conf` has `lxc.cgroup2.devices.allow` for 195, 238, 235 and mount entries for all `/dev/nvidia*` and `/dev/nvidia-caps`. [perplexity](https://www.perplexity.ai/search/45539c15-d92e-492c-a9a7-0c15d28dd331)
|
||||||
|
- Inside CT, `ls -l /dev/nvidia*` + `nvidia-smi` show both GPUs. [perplexity](https://www.perplexity.ai/search/d58f0044-521e-472f-8e3d-5cc576f18bdc)
|
||||||
|
- Ollama:
|
||||||
|
- Installed directly in CT.
|
||||||
|
- `systemctl edit ollama.service` with env vars for both GPUs + host/origins.
|
||||||
|
- `systemctl status ollama` → running, `ollama serve` bound to `0.0.0.0:11434`. [ollama.readthedocs](https://ollama.readthedocs.io/en/faq/)
|
||||||
|
- Open WebUI:
|
||||||
|
- Points `OLLAMA_BASE_URL` to `http://172.16.0.6:11434`. [perplexity](https://www.perplexity.ai/search/1957f039-5940-4128-aed0-43effc9a766a)
|
||||||
|
|
||||||
|
If you want, I can turn this into a bash script plus a couple of template files so next rebuild is mostly copy‑paste.
|
||||||
Loading…
x
Reference in New Issue
Block a user