moving everything to active or retired vs incubating and graduated
All checks were successful
Reese's Arch Toolbox / build-and-push-arch-toolbox (push) Successful in 14s
All checks were successful
Reese's Arch Toolbox / build-and-push-arch-toolbox (push) Successful in 14s
This commit is contained in:
374
active/podman_ollama/ollama.md
Normal file
374
active/podman_ollama/ollama.md
Normal file
@@ -0,0 +1,374 @@
|
||||
# Ollama
|
||||
|
||||
- [Ollama](#ollama)
|
||||
- [Install and run Ollama](#install-and-run-ollama)
|
||||
- [Install and run Ollama with Podman](#install-and-run-ollama-with-podman)
|
||||
- [Unsticking models stuck in "Stopping"](#unsticking-models-stuck-in-stopping)
|
||||
- [Run Anything LLM Interface](#run-anything-llm-interface)
|
||||
- [Installing External Service with Nginx and Certbot](#installing-external-service-with-nginx-and-certbot)
|
||||
- [Ollama Models](#ollama-models)
|
||||
- [Custom Models](#custom-models)
|
||||
- [From Existing Model](#from-existing-model)
|
||||
- [From Scratch](#from-scratch)
|
||||
- [Discovering models](#discovering-models)
|
||||
- [Custom models from safetensor files](#custom-models-from-safetensor-files)
|
||||
|
||||
<https://github.com/ollama/ollama>
|
||||
|
||||
## Install and run Ollama
|
||||
|
||||
<https://ollama.com/download/linux>
|
||||
|
||||
<https://ollama.com/library>
|
||||
|
||||
```bash
|
||||
# Install script
|
||||
curl -fsSL https://ollama.com/install.sh | sh
|
||||
# Check service is running
|
||||
systemctl status ollama
|
||||
```
|
||||
|
||||
Remember to add `Environment="OLLAMA_HOST=0.0.0.0"` to `/etc/systemd/system/ollama.service` to
|
||||
make it accessible on the network.
|
||||
|
||||
Also add `Environment="OLLAMA_MODELS=/models"` to `/etc/systemd/system/ollama.service` to
|
||||
store models on an external disk.
|
||||
|
||||
For Radeon 6000 cards you'll need to add `Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"` as well.
|
||||
|
||||
I'd recommend the following models to get started:
|
||||
|
||||
- Chat: llava-llama3:latest
|
||||
- Code: qwen2.5-coder:7b
|
||||
- Math: qwen2-math:latest
|
||||
- Uncensored: mannix/llama3.1-8b-abliterated:latest
|
||||
- Embedding: nomic-embed-text:latest
|
||||
|
||||
Note your ollama instance will be available to podman containers via `http://host.containers.internal:11434`
|
||||
|
||||
## Install and run Ollama with Podman
|
||||
|
||||
```bash
|
||||
# AMD
|
||||
# Use the below to generate a quadlet for /etc/containers/systemd/local-ai.container
|
||||
# podman run --rm ghcr.io/containers/podlet --install --description "Local AI" \
|
||||
podman run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama docker.io/ollama/ollama:rocm
|
||||
|
||||
# CPU
|
||||
# Use the below to generate a quadlet for /etc/containers/systemd/local-ai.container
|
||||
# podman run --rm ghcr.io/containers/podlet --install --description "Local AI" \
|
||||
podman run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama docker.io/ollama/ollama
|
||||
```
|
||||
|
||||
## Unsticking models stuck in "Stopping"
|
||||
|
||||
```bash
|
||||
ollama ps | grep -i stopping
|
||||
pgrep ollama | xargs -I '%' sh -c 'kill %'
|
||||
```
|
||||
|
||||
## Run Anything LLM Interface
|
||||
|
||||
```bash
|
||||
podman run \
|
||||
-d \
|
||||
-p 3001:3001 \
|
||||
--name anything-llm \
|
||||
--cap-add SYS_ADMIN \
|
||||
-v anything-llm:/app/server \
|
||||
-e STORAGE_DIR="/app/server/storage" \
|
||||
docker.io/mintplexlabs/anythingllm
|
||||
```
|
||||
|
||||
This should now be accessible on port 3001. Note, you'll need to allow traffic between podman
|
||||
and the host:
|
||||
|
||||
Use `podman network ls` to see which networks podman is running on and `podman network inspect`
|
||||
to get the IP address range. Then allow traffic from that range to port 11434 (ollama):
|
||||
|
||||
```bash
|
||||
ufw allow from 10.89.0.1/24 to any port 11434
|
||||
```
|
||||
|
||||
## Installing External Service with Nginx and Certbot
|
||||
|
||||
We're going to need a certificate for our service since we'll want to talk to it over
|
||||
https. This will be handled by certbot. I'm using AWS in this example, but certbot has
|
||||
tons of DNS plugins available with similar commands. The important part is getting that
|
||||
letsencrypt certificate generated and in the place nginx expects it.
|
||||
|
||||
Before we can use certbot we need aws credentials. Note this will be different if you
|
||||
use a different DNS provider.
|
||||
|
||||
See [generating AWS credentials](active/cloud_aws_iam/README.md)
|
||||
|
||||
```bash
|
||||
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
|
||||
unzip awscliv2.zip
|
||||
./aws/install
|
||||
|
||||
# Configure default credentials
|
||||
aws configure
|
||||
```
|
||||
|
||||
With AWS credentials configured you can now install and generate a certificate.
|
||||
|
||||
```bash
|
||||
# Fedora
|
||||
dnf install -y certbot python3-certbot-dns-route53
|
||||
|
||||
# Ubuntu
|
||||
apt install -y python3-certbot python3-certbot-dns-route53
|
||||
|
||||
# Both
|
||||
certbot certonly --dns-route53 -d chatreesept.reeseapps.com
|
||||
```
|
||||
|
||||
Now you have a cert!
|
||||
|
||||
Install and start nginx with the following commands:
|
||||
|
||||
```bash
|
||||
# Fedora
|
||||
dnf install -y nginx
|
||||
|
||||
# Ubuntu
|
||||
apt install -y nginx
|
||||
|
||||
# Both
|
||||
systemctl enable --now nginx
|
||||
```
|
||||
|
||||
Now let's edit our nginx config. First, add this to our nginx.conf (or make sure it's already there).
|
||||
|
||||
/etc/nginx/nginx.conf
|
||||
|
||||
```conf
|
||||
keepalive_timeout 1h;
|
||||
send_timeout 1h;
|
||||
client_body_timeout 1h;
|
||||
client_header_timeout 1h;
|
||||
proxy_connect_timeout 1h;
|
||||
proxy_read_timeout 1h;
|
||||
proxy_send_timeout 1h;
|
||||
```
|
||||
|
||||
Now write your nginx http config files. You'll need two:
|
||||
|
||||
1. ollama.reeseapps.com.conf
|
||||
2. chatreesept.reeseapps.com.conf
|
||||
|
||||
/etc/nginx/conf.d/ollama.reeseapps.com.conf
|
||||
|
||||
```conf
|
||||
server {
|
||||
listen 80;
|
||||
listen [::]:80;
|
||||
server_name ollama.reeseapps.com;
|
||||
|
||||
location / {
|
||||
return 301 https://$host$request_uri;
|
||||
}
|
||||
}
|
||||
|
||||
server {
|
||||
listen 443 ssl;
|
||||
listen [::]:443 ssl;
|
||||
server_name ollama.reeseapps.com;
|
||||
|
||||
ssl_certificate /etc/letsencrypt/live/ollama.reeseapps.com/fullchain.pem;
|
||||
ssl_certificate_key /etc/letsencrypt/live/ollama.reeseapps.com/privkey.pem;
|
||||
|
||||
location / {
|
||||
if ($http_authorization != "Bearer <token>") {
|
||||
return 401;
|
||||
}
|
||||
|
||||
proxy_pass http://127.0.0.1:11434;
|
||||
proxy_set_header Host $host;
|
||||
proxy_buffering off;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
/etc/nginx/conf.d/chatreesept.reeseapps.com.conf
|
||||
|
||||
```conf
|
||||
server {
|
||||
listen 80;
|
||||
server_name chatreesept.reeseapps.com;
|
||||
|
||||
location / {
|
||||
return 301 https://$host$request_uri;
|
||||
}
|
||||
}
|
||||
|
||||
server {
|
||||
listen 443 ssl;
|
||||
server_name chatreesept.reeseapps.com;
|
||||
|
||||
ssl_certificate /etc/letsencrypt/live/chatreesept.reeseapps.com/fullchain.pem;
|
||||
ssl_certificate_key /etc/letsencrypt/live/chatreesept.reeseapps.com/privkey.pem;
|
||||
|
||||
location / {
|
||||
client_max_body_size 50m;
|
||||
|
||||
proxy_pass http://localhost:3001;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Upgrade $http_upgrade;
|
||||
proxy_set_header Connection "upgrade";
|
||||
proxy_set_header Host $host;
|
||||
proxy_cache_bypass $http_upgrade;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Run `nginx -t` to check for errors. If there are none, run `systemctl reload nginx` to pick up
|
||||
your changes. Your website should be available at chatreesept.reeseapps.com and localai.reeseapps.com.
|
||||
|
||||
Set up automatic certificate renewal by adding the following line to your crontab to renew the
|
||||
certificate daily:
|
||||
|
||||
```bash
|
||||
sudo crontab -e
|
||||
```
|
||||
|
||||
Add the following line to the end of the file:
|
||||
|
||||
```bash
|
||||
0 0 * * * certbot renew --quiet
|
||||
```
|
||||
|
||||
At this point you might need to create some UFW rules to allow inter-container talking.
|
||||
|
||||
```bash
|
||||
# Try this first if you're having problems
|
||||
ufw reload
|
||||
|
||||
# Debug with ufw logging
|
||||
ufw logging on
|
||||
tail -f /var/log/ufw.log
|
||||
```
|
||||
|
||||
Also consider that podman will not restart your containers at boot. You'll need to create quadlets
|
||||
from the podman run commands. Check out the comments above the podman run commands for more info.
|
||||
Also search the web for "podman quadlets" or ask your AI about it!
|
||||
|
||||
## Ollama Models
|
||||
|
||||
<https://ollama.com/library>
|
||||
|
||||
## Custom Models
|
||||
|
||||
<https://www.gpu-mart.com/blog/import-models-from-huggingface-to-ollama>
|
||||
|
||||
<https://www.hostinger.com/tutorials/ollama-cli-tutorial#Setting_up_Ollama_in_the_CLI>
|
||||
|
||||
### From Existing Model
|
||||
|
||||
```bash
|
||||
ollama show --modelfile opencoder > Modelfile
|
||||
PARAMETER num_ctx 8192
|
||||
ollama create opencoder-fix -f Modelfile
|
||||
```
|
||||
|
||||
### From Scratch
|
||||
|
||||
Install git lfs and clone the model you're interested in
|
||||
|
||||
```bash
|
||||
# Make sure you have git-lfs installed (https://git-lfs.com)
|
||||
git lfs install
|
||||
|
||||
git clone https://huggingface.co/bartowski/Starling-LM-7B-beta-GGUF
|
||||
```
|
||||
|
||||
Create a modelfile
|
||||
|
||||
```Dockerfile
|
||||
# Modelfile
|
||||
FROM "./path/to/gguf"
|
||||
|
||||
TEMPLATE """{{ if .Prompt }}<|im_start|>
|
||||
{{ .Prompt }}<|im_end|>
|
||||
{{ end }}
|
||||
"""
|
||||
|
||||
SYSTEM You are OpenCoder, created by OpenCoder Team.
|
||||
|
||||
PARAMETER stop <|im_start|>
|
||||
PARAMETER stop <|im_end|>
|
||||
PARAMETER stop <|fim_prefix|>
|
||||
PARAMETER stop <|fim_middle|>
|
||||
PARAMETER stop <|fim_suffix|>
|
||||
PARAMETER stop <|fim_end|>
|
||||
```
|
||||
|
||||
Build the model
|
||||
|
||||
```bash
|
||||
ollama create "Starling-LM-7B-beta-Q6_K" -f Modelfile
|
||||
```
|
||||
|
||||
Run the model
|
||||
|
||||
```bash
|
||||
ollama run Starling-LM-7B-beta-Q6_K:latest
|
||||
```
|
||||
|
||||
### Discovering models
|
||||
|
||||
Check out Hugging Face's leaderboard: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
|
||||
|
||||
1. Select the model type you're after
|
||||
2. Drag the number of parameters slider to a range you can run
|
||||
3. Click the top few and read about them.
|
||||
|
||||
### Custom models from safetensor files
|
||||
|
||||
<https://www.theregister.com/2024/07/14/quantization_llm_feature/>
|
||||
|
||||
Setup the repo:
|
||||
|
||||
```bash
|
||||
# Setup
|
||||
git clone https://github.com/ggerganov/llama.cpp.git
|
||||
cd ~/llama.cpp
|
||||
cmake -B build
|
||||
cmake --build build --config Release -j $(nproc)
|
||||
python3 -m venv venv && source venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
huggingface-cli login #necessary to download gated models
|
||||
python convert_hf_to_gguf_update.py $(cat ~/.cache/huggingface/token)
|
||||
```
|
||||
|
||||
Convert models to gguf:
|
||||
|
||||
```bash
|
||||
# Copy the model title from hugging face
|
||||
export MODEL_NAME=
|
||||
|
||||
# Create a folder to clone the model into
|
||||
mkdir -p models/$MODEL_NAME
|
||||
|
||||
# Download the current head for the model
|
||||
huggingface-cli download $MODEL_NAME --local-dir models/$MODEL_NAME
|
||||
|
||||
# Or get the f16 quantized gguf
|
||||
wget -P models/$MODEL_NAME https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf/resolve/main/llava-llama-3-8b-v1_1-f16.gguf
|
||||
|
||||
# Convert model from hugging face to gguf, quant 8
|
||||
python3 convert_hf_to_gguf.py models/$MODEL_NAME --outfile models/$MODEL_NAME.gguf
|
||||
|
||||
# Run ./llama-quantize to see available quants
|
||||
./llama-quantize models/$MODEL_NAME.gguf models/$MODEL_NAME-Q4_K.gguf 15
|
||||
./llama-quantize models/$MODEL_NAME.gguf models/$MODEL_NAME-Q5_K.gguf 17
|
||||
./llama-quantize models/$MODEL_NAME.gguf models/$MODEL_NAME-Q6_K.gguf 18
|
||||
./llama-quantize models/$MODEL_NAME.gguf models/$MODEL_NAME-Q8_0.gguf 7
|
||||
|
||||
# Copy to your localai models folder and restart
|
||||
scp models/$MODEL_NAME-Q5_K.gguf localai:/models/
|
||||
|
||||
# View output
|
||||
tree -phugL 2 models
|
||||
```
|
||||
Reference in New Issue
Block a user