homelab/podman/graduated/ollama/README.md

# Ollama

- [Ollama](#ollama)
  - [Run natively with GPU support](#run-natively-with-gpu-support)
  - [Unsticking models stuck in "Stopping"](#unsticking-models-stuck-in-stopping)
  - [Run Anything LLM Interface](#run-anything-llm-interface)
  - [Anything LLM Quadlet with Podlet](#anything-llm-quadlet-with-podlet)
  - [Now with Nginx and Certbot](#now-with-nginx-and-certbot)
  - [Custom Models](#custom-models)
    - [From Existing Model](#from-existing-model)
    - [From Scratch](#from-scratch)
  - [Converting to gguf](#converting-to-gguf)

<https://github.com/ollama/ollama>

## Run natively with GPU support

<https://ollama.com/download/linux>

<https://ollama.com/library>

```bash
# Install script
curl -fsSL https://ollama.com/install.sh | sh
# Check service is running
systemctl status ollama
```

Remember to add `Environment="OLLAMA_HOST=0.0.0.0"` to `/etc/systemd/system/ollama.service` to
make it accessible on the network.

For Radeon 6000 cards you'll need to add `Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"` as well.

```bash
# Pull models
# Try to use higher parameter models. Grab the q5_K_M variant at minimum.

# For a 24GB VRAM Card I'd recommend:

# Anything-LLM Coding
ollama pull qwen2.5-coder:14b-instruct-q5_K_M
# Anything-LLM Math
ollama pull qwen2-math:7b-instruct-fp16
# Anything-LLM Chat
ollama pull llama3.2-vision:11b-instruct-q8_0

# VSCode Continue Autocomplete
ollama pull starcoder2:15b-q5_K_M
# VSCode Continue Chat
ollama pull llama3.1:8b-instruct-fp16
# VSCode Continue Embedder
ollama pull nomic-embed-text:137m-v1.5-fp16
```

Note your ollama instance will be available to podman containers via `http://host.containers.internal:11434`

## Unsticking models stuck in "Stopping"

```bash
ollama ps | grep -i stopping
pgrep ollama | xargs -I '%' sh -c 'kill %'
```

## Run Anything LLM Interface

```bash
podman run \
    -d \
    -p 3001:3001 \
    --name anything-llm \
    --cap-add SYS_ADMIN \
    -v anything-llm:/app/server \
    -e STORAGE_DIR="/app/server/storage" \
    docker.io/mintplexlabs/anythingllm
```

This should now be accessible on port 3001. Note, you'll need to allow traffic between podman
and the host:

Use `podman network ls` to see which networks podman is running on and `podman network inspect`
to get the IP address range. Then allow traffic from that range to port 11434 (ollama):

```bash
ufw allow from 10.89.0.1/24 to any port 11434
```

## Anything LLM Quadlet with Podlet

```bash
podman run --rm ghcr.io/containers/podlet --install --description "Anything LLM" \
    podman run \
    -d \
    -p 3001:3001 \
    --name anything-llm \
    --cap-add SYS_ADMIN \
    --restart always \
    -v anything-llm:/app/server \
    -e STORAGE_DIR="/app/server/storage" \
    docker.io/mintplexlabs/anythingllm
```

To the service to have them autostart.

Put the generated files in `/usr/share/containers/systemd/`.

## Now with Nginx and Certbot

See [generating AWS credentials](cloud/graduated/aws_iam/README.md)

```bash
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
./aws/install

# Configure default credentials
aws configure
```

Open http/s in firewalld:

```bash
# Remember to firewall-cmd --set-default-zone=public
firewall-cmd --permanent --zone=public --add-service=http
firewall-cmd --permanent --zone=public --add-service=https
firewall-cmd --reload

# or
ufw allow 80/tcp
ufw allow 443/tcp
```

Here are the detailed instructions for installing and setting up Nginx on Fedora Linux with Certbot
using the Route53 DNS challenge to put in front of a service called "Anything LLM" running on port
3001 with WebSockets. The domain will be chatreesept.reeseapps.com.

1. Install Nginx:

    ```
    dnf install -y nginx
    ```

2. Start and enable Nginx service:

    ```
    systemctl enable --now nginx
    ```

3. Install Certbot and the Route53 DNS plugin:

    ```
    # Fedora
    dnf install -y certbot python3-certbot-dns-route53

    # Arch
    pacman -S certbot certbot-dns-route53
    ```

4. Request a certificate for your domain using the Route53 DNS challenge:

    ```
    certbot certonly --dns-route53 -d chatreesept.reeseapps.com
    ```

    Follow the prompts to provide your Route53 credentials and email address.

5. Configure Nginx for your domain: Create a new Nginx configuration file for your domain:

    Update your nginx conf with the following

    ```
    vim /etc/nginx/nginx.conf
    ```

    ```
    keepalive_timeout 1h;
    send_timeout 1h;
    client_body_timeout 1h;
    client_header_timeout 1h;
    proxy_connect_timeout 1h;
    proxy_read_timeout 1h;
    proxy_send_timeout 1h;
    ```

    ```
    vim /etc/nginx/conf.d/ollama.reeselink.com.conf
    ```

    ```
    server {
        listen 80;
        server_name ollama.reeselink.com;

        location / {
            return 301 https://$host$request_uri;
        }
    }

    server {
        listen 443 ssl;
        server_name ollama.reeselink.com;

        ssl_certificate /etc/letsencrypt/live/ollama.reeselink.com/fullchain.pem;
        ssl_certificate_key /etc/letsencrypt/live/ollama.reeselink.com/privkey.pem;

        location / {
            proxy_pass http://localhost:11434;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_set_header Host $host;
            proxy_cache_bypass $http_upgrade;
            proxy_buffering off;
        }
    }
    ```

    ```
    vim /etc/nginx/conf.d/chatreesept.reeseapps.com.conf
    ```

    Add the following configuration to the file:
    ```
    server {
        listen 80;
        server_name chatreesept.reeseapps.com;

        location / {
            return 301 https://$host$request_uri;
        }
    }

    server {
        listen 443 ssl;
        server_name chatreesept.reeseapps.com;

        ssl_certificate /etc/letsencrypt/live/chatreesept.reeseapps.com/fullchain.pem;
        ssl_certificate_key /etc/letsencrypt/live/chatreesept.reeseapps.com/privkey.pem;

        location / {
            client_max_body_size 50m;

            proxy_pass http://localhost:3001;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_set_header Host $host;
            proxy_cache_bypass $http_upgrade;
            proxy_buffering off;
        }
    }
    ``

6. Test your Nginx configuration for syntax errors:

    ```
    nginx -t
    ```

    If there are no errors, reload Nginx to apply the changes:

    ```
    systemctl reload nginx
    ```

7. Set up automatic certificate renewal: Add the following line to your crontab to renew the
    certificate daily:

    ```

    pacman -S cronie
    sudo crontab -e
    ```

    Add the following line to the end of the file:

    ```
    0 0 * * * certbot renew --quiet
    ```

Now, your "Anything LLM" service running on port 3001 with WebSockets is accessible through the
domain chatreesept.reeseapps.com with a valid SSL certificate from Let's Encrypt. The certificate
will be automatically renewed daily.

## Custom Models

<https://www.gpu-mart.com/blog/import-models-from-huggingface-to-ollama>

<https://www.hostinger.com/tutorials/ollama-cli-tutorial#Setting_up_Ollama_in_the_CLI>

### From Existing Model

```bash
ollama show --modelfile opencoder > Modelfile
PARAMETER num_ctx 8192
ollama create opencoder-fix -f Modelfile
```

### From Scratch

Install git lfs and clone the model you're interested in

```bash
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install

git clone https://huggingface.co/bartowski/Starling-LM-7B-beta-GGUF
```

Create a modelfile

```
# Modelfile
FROM "./path/to/gguf"

TEMPLATE """{{ if .Prompt }}<|im_start|>
{{ .Prompt }}<|im_end|>
{{ end }}
"""

SYSTEM You are OpenCoder, created by OpenCoder Team.

PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>
PARAMETER stop <|fim_prefix|>
PARAMETER stop <|fim_middle|>
PARAMETER stop <|fim_suffix|>
PARAMETER stop <|fim_end|>
PARAMETER stop """


"""

```

Build the model

```bash
ollama create "Starling-LM-7B-beta-Q6_K" -f Modelfile
```

Run the model

```bash
ollama run Starling-LM-7B-beta-Q6_K:latest
```

## Converting to gguf

<https://www.theregister.com/2024/07/14/quantization_llm_feature/>

1. Clone the llama.cpp repository and install its dependencies:

```bash
git clone https://github.com/ggerganov/llama.cpp.git
cd ~/llama.cpp
python3 -m venv venv && source venv/bin/activate
pip3 install -r requirements.txt

mkdir ~/llama.cpp/models/mistral
huggingface-cli login #necessary to download gated models
huggingface-cli download mistralai/Mistral-7B-Instruct-v0.3 --local-dir ~/llama.cpp/models/mistral/

python3 convert_hf_to_gguf.py ~/.cache/huggingface/hub/models--infly--OpenCoder-8B-Instruct/snapshots/01badbbf10c2dfd7e2a0b5f570065ef44548576c
```