2026-03-25 22:35:11 +03:00

7.3 KiB
Raw Blame History

1. Prepare Proxmox host (onetime)

  1. Enable IOMMU in BIOS (VTd / AMDVi).

  2. On PVE host shell:

    • Verify IOMMU is active after reboot:

      dmesg | grep -e DMAR -e IOMMU
      

      You should see Intel(R) Virtualization Technology for Directed I/O and Queued invalidation lines like your current output. forum.proxmox

  3. Install NVIDIA driver + CUDA on host (you already have this, but for rebuild):

    • Add NVIDIA repo or use Debian nonfree firmware.

    • After install and reboot:

      nvidia-smi
      ls -l /dev/nvidia*
      

      Confirm both RTX 3060s (GPU 0 and 1) and devices /dev/nvidia0, /dev/nvidia1, /dev/nvidiactl, /dev/nvidia-uvm*, /dev/nvidia-modeset, /dev/nvidia-caps. reddit

Host is now GPUready.


2. Create the Ollama LXC (privileged Ubuntu)

  1. In Proxmox UI → Create CT:

    • CT ID: 122
    • Hostname: ollama
    • Template: Ubuntu 22.04/24.04.
    • Uncheck “Unprivileged container” (must be privileged).
    • Disk: 200 GB (as you have).
    • CPU: 8 cores.
    • RAM: 8 GB (or more if you like).
    • Network: vmbr0, static IP 172.16.0.6/24, VLAN tag 90, GW 172.16.0.1 (your current settings).
  2. Before starting it, edit config on host:

    nano /etc/pve/lxc/122.conf
    

    Make it look like (adapting your net/rootfs values):

    arch: amd64
    cores: 8
    hostname: ollama
    memory: 8192
    net0: name=eth0,bridge=vmbr0,firewall=1,gw=172.16.0.1,hwaddr=BC:24:11:C9:1B:EE,ip=172.16.0.6/24,tag=90
    onboot: 0
    ostype: ubuntu
    rootfs: local-lvm:vm-122-disk-0,size=200G
    startup: order=2
    swap: 8192
    tags:
    unprivileged: 0
    
    # GPU / NVIDIA
    lxc.cgroup2.devices.allow: c 195:* rwm
    lxc.cgroup2.devices.allow: c 238:* rwm
    lxc.cgroup2.devices.allow: c 235:* rwm
    
    lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
    lxc.mount.entry: /dev/nvidia1 dev/nvidia1 none bind,optional,create=file
    lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
    lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
    lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
    lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
    lxc.mount.entry: /dev/nvidia-caps dev/nvidia-caps none bind,optional,create=dir
    
    lxc.environment: NVIDIA_VISIBLE_DEVICES=all
    lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,utility
    lxc.environment: CUDA_DEVICE_ORDER=PCI_BUS_ID
    

    This gives the CT full access to both GPUs and driver nodes following current Proxmox+NVIDIA LXC recommendations. forum.proxmox

  3. Start the CT:

    pct start 122
    

3. Verify GPUs inside the LXC

Inside CT 122 console:

ls -l /dev/nvidia*
nvidia-smi

You should see:

  • nvidia0, nvidia1, nvidiactl, nvidia-uvm, nvidia-modeset, nvidia-caps.
  • nvidia-smi showing both RTX 3060s, identical to host output. youtube

If thats true, passthrough is correct.


4. Install Ollama directly in the LXC

  1. Inside CT (Ubuntu):

    curl -fsSL https://ollama.com/install.sh | sh
    

    Or follow Ollama Linux docs if manual. docs.ollama

  2. Verify:

    ollama --version
    

    It should install a systemd service ollama.service that runs ollama serve. docs.ollama


5. Configure systemd so Ollama uses both GPUs and listens on all IPs

  1. Create/modify systemd override:

    sudo systemctl edit ollama.service
    

    Put:

    [Service]
    Environment=NVIDIA_VISIBLE_DEVICES=0,1
    Environment=OLLAMA_FLASH_ATTENTION=1
    Environment=OLLAMA_GPU_OVERHEAD=1024
    Environment=OLLAMA_HOST=0.0.0.0:11434
    Environment=OLLAMA_ORIGINS=*
    Restart=always
    RestartSec=10
    

    This is the officially recommended pattern (Environment in override, main unit just runs ollama serve). github

  2. Apply and restart:

    sudo systemctl daemon-reload
    sudo systemctl restart ollama
    sudo systemctl status ollama
    

    Ensure its active (running).

  3. Confirm its listening:

    ss -lntp | grep 11434 || netstat -lntp | grep 11434
    

    You should see 0.0.0.0:11434 bound to ollama. thinkinprompt


6. Confirm both GPUs are actually used

  1. In CT 122, terminal A:

    nvidia-smi -l 1
    
  2. In terminal B:

    ollama pull llama3.1:8b-q4
    ollama run llama3.1:8b-q4
    
  3. Watch nvidia-smi during generation: you should see processes on GPU 0 and GPU 1 when larger models or multiple requests are running. For big models (e.g. 70b-q4), utilization on both cards will be clearer. perplexity


7. Connect Open WebUI (in its own CT/VM)

In your existing openweb-ui container/VM:

  • If Docker CUDA image (recommended):

    docker run -d \
      --name open-webui \
      -p 3000:8080 \
      --gpus all \
      --security-opt apparmor=unconfined \
      -e OLLAMA_BASE_URL=http://172.16.0.6:11434 \
      -v open-webui:/app/backend/data \
      --restart always \
      ghcr.io/open-webui/open-webui:cuda
    

    This uses GPU where Open WebUI runs and calls Ollama over the network at 172.16.0.6:11434. perplexity

Test by opening Open WebUI in the browser, selecting an Ollama model, and sending a message while watching nvidia-smi in CT 122.


8. Quick checklist for rebuild

  • Host:
    • IOMMU enabled (dmesg | grep DMAR shows VTd). forum.proxmox
    • NVIDIA driver + CUDA OK (nvidia-smi shows both 3060s). reddit
  • LXC 122:
    • Privileged (unprivileged: 0).
    • /etc/pve/lxc/122.conf has lxc.cgroup2.devices.allow for 195, 238, 235 and mount entries for all /dev/nvidia* and /dev/nvidia-caps. perplexity
    • Inside CT, ls -l /dev/nvidia* + nvidia-smi show both GPUs. perplexity
  • Ollama:
    • Installed directly in CT.
    • systemctl edit ollama.service with env vars for both GPUs + host/origins.
    • systemctl status ollama → running, ollama serve bound to 0.0.0.0:11434. ollama.readthedocs
  • Open WebUI:
    • Points OLLAMA_BASE_URL to http://172.16.0.6:11434. perplexity

If you want, I can turn this into a bash script plus a couple of template files so next rebuild is mostly copypaste.