services/homelab

Fork 0

Files

History

ducoterra 5f1d03349b add TOC to ollama

2024-11-25 10:54:00 -05:00

README.md

add TOC to ollama

2024-11-25 10:54:00 -05:00

README.md

Ollama

Ollama

https://github.com/ollama/ollama

Run natively with GPU support

https://ollama.com/download/linux

https://ollama.com/library

# Install script
curl -fsSL https://ollama.com/install.sh | sh
# Check service is running
systemctl status ollama

Remember to add Environment="OLLAMA_HOST=0.0.0.0" to /etc/systemd/system/ollama.service to make it accessible on the network.

For Radeon 6000 cards you'll need to add Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0" as well.

# Pull models
# Try to use higher parameter models. Grab the q5_K_M variant at minimum.

# For a 24GB VRAM Card I'd recommend:

# Anything-LLM Coding
ollama pull qwen2.5-coder:14b-instruct-q5_K_M
# Anything-LLM Math
ollama pull qwen2-math:7b-instruct-fp16
# Anything-LLM Chat
ollama pull llama3.2-vision:11b-instruct-q8_0

# VSCode Continue Autocomplete
ollama pull starcoder2:15b-q5_K_M
# VSCode Continue Chat
ollama pull llama3.1:8b-instruct-fp16
# VSCode Continue Embedder
ollama pull nomic-embed-text:137m-v1.5-fp16

Note your ollama instance will be available to podman containers via http://host.containers.internal:11434

Unsticking models stuck in "Stopping"

ollama ps | grep -i stopping
pgrep ollama | xargs -I '%' sh -c 'kill %'

Run Anything LLM Interface

podman run \
    -d \
    -p 3001:3001 \
    --name anything-llm \
    --cap-add SYS_ADMIN \
    -v anything-llm:/app/server \
    -e STORAGE_DIR="/app/server/storage" \
    docker.io/mintplexlabs/anythingllm

This should now be accessible on port 3001. Note, you'll need to allow traffic between podman and the host:

Use podman network ls to see which networks podman is running on and podman network inspect to get the IP address range. Then allow traffic from that range to port 11434 (ollama):

ufw allow from 10.89.0.1/24 to any port 11434

Anything LLM Quadlet with Podlet

podman run --rm ghcr.io/containers/podlet --install --description "Anything LLM" \
    podman run \
    -d \
    -p 3001:3001 \
    --name anything-llm \
    --cap-add SYS_ADMIN \
    --restart always \
    -v anything-llm:/app/server \
    -e STORAGE_DIR="/app/server/storage" \
    docker.io/mintplexlabs/anythingllm

To the service to have them autostart.

Put the generated files in /usr/share/containers/systemd/.

Now with Nginx and Certbot

See generating AWS credentials

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
./aws/install

# Configure default credentials
aws configure

Open http/s in firewalld:

# Remember to firewall-cmd --set-default-zone=public
firewall-cmd --permanent --zone=public --add-service=http
firewall-cmd --permanent --zone=public --add-service=https
firewall-cmd --reload

# or
ufw allow 80/tcp
ufw allow 443/tcp

Here are the detailed instructions for installing and setting up Nginx on Fedora Linux with Certbot using the Route53 DNS challenge to put in front of a service called "Anything LLM" running on port 3001 with WebSockets. The domain will be chatreesept.reeseapps.com.

Install Nginx:
```
dnf install -y nginx
```
Start and enable Nginx service:
```
systemctl enable --now nginx
```

Install Certbot and the Route53 DNS plugin:

# Fedora
dnf install -y certbot python3-certbot-dns-route53

# Arch
pacman -S certbot certbot-dns-route53

Request a certificate for your domain using the Route53 DNS challenge:
```
certbot certonly --dns-route53 -d chatreesept.reeseapps.com
```
Follow the prompts to provide your Route53 credentials and email address.

Configure Nginx for your domain: Create a new Nginx configuration file for your domain:

Update your nginx conf with the following

vim /etc/nginx/nginx.conf

keepalive_timeout 1h;
send_timeout 1h;
client_body_timeout 1h;
client_header_timeout 1h;
proxy_connect_timeout 1h;
proxy_read_timeout 1h;
proxy_send_timeout 1h;

vim /etc/nginx/conf.d/ollama.reeselink.com.conf

server {
    listen 80;
    server_name ollama.reeselink.com;

    location / {
        return 301 https://$host$request_uri;
    }
}

server {
    listen 443 ssl;
    server_name ollama.reeselink.com;

    ssl_certificate /etc/letsencrypt/live/ollama.reeselink.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/ollama.reeselink.com/privkey.pem;

    location / {
        proxy_pass http://localhost:11434;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
        proxy_buffering off;
    }
}

vim /etc/nginx/conf.d/chatreesept.reeseapps.com.conf

Add the following configuration to the file:

server {
    listen 80;
    server_name chatreesept.reeseapps.com;

    location / {
        return 301 https://$host$request_uri;
    }
}

server {
    listen 443 ssl;
    server_name chatreesept.reeseapps.com;

    ssl_certificate /etc/letsencrypt/live/chatreesept.reeseapps.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/chatreesept.reeseapps.com/privkey.pem;

    location / {
        client_max_body_size 50m;

        proxy_pass http://localhost:3001;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
        proxy_buffering off;
    }
}
``

Test your Nginx configuration for syntax errors:
```
nginx -t
```
If there are no errors, reload Nginx to apply the changes:
```
systemctl reload nginx
```
Set up automatic certificate renewal: Add the following line to your crontab to renew the certificate daily:
```
pacman -S cronie
sudo crontab -e
```
Add the following line to the end of the file:
```
0 0 * * * certbot renew --quiet
```

Now, your "Anything LLM" service running on port 3001 with WebSockets is accessible through the domain chatreesept.reeseapps.com with a valid SSL certificate from Let's Encrypt. The certificate will be automatically renewed daily.

Custom Models

https://www.gpu-mart.com/blog/import-models-from-huggingface-to-ollama

https://www.hostinger.com/tutorials/ollama-cli-tutorial#Setting_up_Ollama_in_the_CLI

From Existing Model

ollama show --modelfile opencoder > Modelfile
PARAMETER num_ctx 8192
ollama create opencoder-fix -f Modelfile

From Scratch

Install git lfs and clone the model you're interested in

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install

git clone https://huggingface.co/bartowski/Starling-LM-7B-beta-GGUF

Create a modelfile

# Modelfile
FROM "./path/to/gguf"

TEMPLATE """{{ if .Prompt }}<|im_start|>
{{ .Prompt }}<|im_end|>
{{ end }}
"""

SYSTEM You are OpenCoder, created by OpenCoder Team.

PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>
PARAMETER stop <|fim_prefix|>
PARAMETER stop <|fim_middle|>
PARAMETER stop <|fim_suffix|>
PARAMETER stop <|fim_end|>
PARAMETER stop """


"""

Build the model

ollama create "Starling-LM-7B-beta-Q6_K" -f Modelfile

Run the model

ollama run Starling-LM-7B-beta-Q6_K:latest

Converting to gguf

https://www.theregister.com/2024/07/14/quantization_llm_feature/

Clone the llama.cpp repository and install its dependencies:

git clone https://github.com/ggerganov/llama.cpp.git
cd ~/llama.cpp
python3 -m venv venv && source venv/bin/activate
pip3 install -r requirements.txt

mkdir ~/llama.cpp/models/mistral
huggingface-cli login #necessary to download gated models
huggingface-cli download mistralai/Mistral-7B-Instruct-v0.3 --local-dir ~/llama.cpp/models/mistral/

python3 convert_hf_to_gguf.py ~/.cache/huggingface/hub/models--infly--OpenCoder-8B-Instruct/snapshots/01badbbf10c2dfd7e2a0b5f570065ef44548576c