Files
homelab/active/podman_ollama/ollama.md
ducoterra ef9104c796
All checks were successful
Reese's Arch Toolbox / build-and-push-arch-toolbox (push) Successful in 14s
moving everything to active or retired vs incubating and graduated
2025-04-19 18:52:33 -04:00

9.6 KiB

Ollama

https://github.com/ollama/ollama

Install and run Ollama

https://ollama.com/download/linux

https://ollama.com/library

# Install script
curl -fsSL https://ollama.com/install.sh | sh
# Check service is running
systemctl status ollama

Remember to add Environment="OLLAMA_HOST=0.0.0.0" to /etc/systemd/system/ollama.service to make it accessible on the network.

Also add Environment="OLLAMA_MODELS=/models" to /etc/systemd/system/ollama.service to store models on an external disk.

For Radeon 6000 cards you'll need to add Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0" as well.

I'd recommend the following models to get started:

  • Chat: llava-llama3:latest
  • Code: qwen2.5-coder:7b
  • Math: qwen2-math:latest
  • Uncensored: mannix/llama3.1-8b-abliterated:latest
  • Embedding: nomic-embed-text:latest

Note your ollama instance will be available to podman containers via http://host.containers.internal:11434

Install and run Ollama with Podman

# AMD
# Use the below to generate a quadlet for /etc/containers/systemd/local-ai.container
# podman run --rm ghcr.io/containers/podlet --install --description "Local AI" \
podman run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama docker.io/ollama/ollama:rocm

# CPU
# Use the below to generate a quadlet for /etc/containers/systemd/local-ai.container
# podman run --rm ghcr.io/containers/podlet --install --description "Local AI" \
podman run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama docker.io/ollama/ollama

Unsticking models stuck in "Stopping"

ollama ps | grep -i stopping
pgrep ollama | xargs -I '%' sh -c 'kill %'

Run Anything LLM Interface

podman run \
    -d \
    -p 3001:3001 \
    --name anything-llm \
    --cap-add SYS_ADMIN \
    -v anything-llm:/app/server \
    -e STORAGE_DIR="/app/server/storage" \
    docker.io/mintplexlabs/anythingllm

This should now be accessible on port 3001. Note, you'll need to allow traffic between podman and the host:

Use podman network ls to see which networks podman is running on and podman network inspect to get the IP address range. Then allow traffic from that range to port 11434 (ollama):

ufw allow from 10.89.0.1/24 to any port 11434

Installing External Service with Nginx and Certbot

We're going to need a certificate for our service since we'll want to talk to it over https. This will be handled by certbot. I'm using AWS in this example, but certbot has tons of DNS plugins available with similar commands. The important part is getting that letsencrypt certificate generated and in the place nginx expects it.

Before we can use certbot we need aws credentials. Note this will be different if you use a different DNS provider.

See generating AWS credentials

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
./aws/install

# Configure default credentials
aws configure

With AWS credentials configured you can now install and generate a certificate.

# Fedora
dnf install -y certbot python3-certbot-dns-route53

# Ubuntu
apt install -y python3-certbot python3-certbot-dns-route53

# Both
certbot certonly --dns-route53 -d chatreesept.reeseapps.com

Now you have a cert!

Install and start nginx with the following commands:

# Fedora
dnf install -y nginx

# Ubuntu
apt install -y nginx

# Both
systemctl enable --now nginx

Now let's edit our nginx config. First, add this to our nginx.conf (or make sure it's already there).

/etc/nginx/nginx.conf

keepalive_timeout 1h;
send_timeout 1h;
client_body_timeout 1h;
client_header_timeout 1h;
proxy_connect_timeout 1h;
proxy_read_timeout 1h;
proxy_send_timeout 1h;

Now write your nginx http config files. You'll need two:

  1. ollama.reeseapps.com.conf
  2. chatreesept.reeseapps.com.conf

/etc/nginx/conf.d/ollama.reeseapps.com.conf

server {
  listen 80;
  listen [::]:80;
  server_name ollama.reeseapps.com;

  location / {
    return 301 https://$host$request_uri;
  }
}

server {
  listen 443 ssl;
  listen [::]:443 ssl;
  server_name ollama.reeseapps.com;

  ssl_certificate /etc/letsencrypt/live/ollama.reeseapps.com/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/ollama.reeseapps.com/privkey.pem;

  location / {
    if ($http_authorization != "Bearer <token>") {
        return 401;
    }

    proxy_pass http://127.0.0.1:11434;
    proxy_set_header Host $host;
    proxy_buffering off;
  }
}

/etc/nginx/conf.d/chatreesept.reeseapps.com.conf

server {
  listen 80;
  server_name chatreesept.reeseapps.com;

  location / {
    return 301 https://$host$request_uri;
  }
}

server {
  listen 443 ssl;
  server_name chatreesept.reeseapps.com;

  ssl_certificate /etc/letsencrypt/live/chatreesept.reeseapps.com/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/chatreesept.reeseapps.com/privkey.pem;

  location / {
    client_max_body_size 50m;

    proxy_pass http://localhost:3001;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_set_header Host $host;
    proxy_cache_bypass $http_upgrade;
  }
}

Run nginx -t to check for errors. If there are none, run systemctl reload nginx to pick up your changes. Your website should be available at chatreesept.reeseapps.com and localai.reeseapps.com.

Set up automatic certificate renewal by adding the following line to your crontab to renew the certificate daily:

sudo crontab -e

Add the following line to the end of the file:

0 0 * * * certbot renew --quiet

At this point you might need to create some UFW rules to allow inter-container talking.

# Try this first if you're having problems
ufw reload

# Debug with ufw logging
ufw logging on
tail -f /var/log/ufw.log

Also consider that podman will not restart your containers at boot. You'll need to create quadlets from the podman run commands. Check out the comments above the podman run commands for more info. Also search the web for "podman quadlets" or ask your AI about it!

Ollama Models

https://ollama.com/library

Custom Models

https://www.gpu-mart.com/blog/import-models-from-huggingface-to-ollama

https://www.hostinger.com/tutorials/ollama-cli-tutorial#Setting_up_Ollama_in_the_CLI

From Existing Model

ollama show --modelfile opencoder > Modelfile
PARAMETER num_ctx 8192
ollama create opencoder-fix -f Modelfile

From Scratch

Install git lfs and clone the model you're interested in

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install

git clone https://huggingface.co/bartowski/Starling-LM-7B-beta-GGUF

Create a modelfile

# Modelfile
FROM "./path/to/gguf"

TEMPLATE """{{ if .Prompt }}<|im_start|>
{{ .Prompt }}<|im_end|>
{{ end }}
"""

SYSTEM You are OpenCoder, created by OpenCoder Team.

PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>
PARAMETER stop <|fim_prefix|>
PARAMETER stop <|fim_middle|>
PARAMETER stop <|fim_suffix|>
PARAMETER stop <|fim_end|>

Build the model

ollama create "Starling-LM-7B-beta-Q6_K" -f Modelfile

Run the model

ollama run Starling-LM-7B-beta-Q6_K:latest

Discovering models

Check out Hugging Face's leaderboard: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard

  1. Select the model type you're after
  2. Drag the number of parameters slider to a range you can run
  3. Click the top few and read about them.

Custom models from safetensor files

https://www.theregister.com/2024/07/14/quantization_llm_feature/

Setup the repo:

# Setup
git clone https://github.com/ggerganov/llama.cpp.git
cd ~/llama.cpp
cmake -B build
cmake --build build --config Release -j $(nproc)
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
huggingface-cli login #necessary to download gated models
python convert_hf_to_gguf_update.py $(cat ~/.cache/huggingface/token)

Convert models to gguf:

# Copy the model title from hugging face
export MODEL_NAME=

# Create a folder to clone the model into
mkdir -p models/$MODEL_NAME

# Download the current head for the model
huggingface-cli download $MODEL_NAME --local-dir models/$MODEL_NAME

# Or get the f16 quantized gguf
wget -P models/$MODEL_NAME https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf/resolve/main/llava-llama-3-8b-v1_1-f16.gguf

# Convert model from hugging face to gguf, quant 8
python3 convert_hf_to_gguf.py models/$MODEL_NAME --outfile models/$MODEL_NAME.gguf

# Run ./llama-quantize to see available quants
./llama-quantize models/$MODEL_NAME.gguf models/$MODEL_NAME-Q4_K.gguf 15
./llama-quantize models/$MODEL_NAME.gguf models/$MODEL_NAME-Q5_K.gguf 17
./llama-quantize models/$MODEL_NAME.gguf models/$MODEL_NAME-Q6_K.gguf 18
./llama-quantize models/$MODEL_NAME.gguf models/$MODEL_NAME-Q8_0.gguf 7

# Copy to your localai models folder and restart
scp models/$MODEL_NAME-Q5_K.gguf localai:/models/

# View output
tree -phugL 2 models