refine ollama docs

This commit is contained in:
2024-12-06 03:47:20 -05:00
parent 1b899d9062
commit 509b87c15c
2 changed files with 361 additions and 364 deletions

View File

@@ -1,364 +0,0 @@
# Ollama
- [Ollama](#ollama)
- [Run natively with GPU support](#run-natively-with-gpu-support)
- [Unsticking models stuck in "Stopping"](#unsticking-models-stuck-in-stopping)
- [Run Anything LLM Interface](#run-anything-llm-interface)
- [Anything LLM Quadlet with Podlet](#anything-llm-quadlet-with-podlet)
- [Now with Nginx and Certbot](#now-with-nginx-and-certbot)
- [Custom Models](#custom-models)
- [From Existing Model](#from-existing-model)
- [From Scratch](#from-scratch)
- [Converting to gguf](#converting-to-gguf)
<https://github.com/ollama/ollama>
## Run natively with GPU support
<https://ollama.com/download/linux>
<https://ollama.com/library>
```bash
# Install script
curl -fsSL https://ollama.com/install.sh | sh
# Check service is running
systemctl status ollama
```
Remember to add `Environment="OLLAMA_HOST=0.0.0.0"` to `/etc/systemd/system/ollama.service` to
make it accessible on the network.
For Radeon 6000 cards you'll need to add `Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"` as well.
```bash
# Pull models
# Try to use higher parameter models. Grab the q5_K_M variant at minimum.
# For a 24GB VRAM Card I'd recommend:
# Anything-LLM Coding
ollama pull qwen2.5-coder:14b-instruct-q5_K_M
# Anything-LLM Math
ollama pull qwen2-math:7b-instruct-fp16
# Anything-LLM Chat
ollama pull llama3.2-vision:11b-instruct-q8_0
# VSCode Continue Autocomplete
ollama pull starcoder2:15b-q5_K_M
# VSCode Continue Chat
ollama pull llama3.1:8b-instruct-fp16
# VSCode Continue Embedder
ollama pull nomic-embed-text:137m-v1.5-fp16
```
Note your ollama instance will be available to podman containers via `http://host.containers.internal:11434`
## Unsticking models stuck in "Stopping"
```bash
ollama ps | grep -i stopping
pgrep ollama | xargs -I '%' sh -c 'kill %'
```
## Run Anything LLM Interface
```bash
podman run \
-d \
-p 3001:3001 \
--name anything-llm \
--cap-add SYS_ADMIN \
-v anything-llm:/app/server \
-e STORAGE_DIR="/app/server/storage" \
docker.io/mintplexlabs/anythingllm
```
This should now be accessible on port 3001. Note, you'll need to allow traffic between podman
and the host:
Use `podman network ls` to see which networks podman is running on and `podman network inspect`
to get the IP address range. Then allow traffic from that range to port 11434 (ollama):
```bash
ufw allow from 10.89.0.1/24 to any port 11434
```
## Anything LLM Quadlet with Podlet
```bash
podman run --rm ghcr.io/containers/podlet --install --description "Anything LLM" \
podman run \
-d \
-p 3001:3001 \
--name anything-llm \
--cap-add SYS_ADMIN \
--restart always \
-v anything-llm:/app/server \
-e STORAGE_DIR="/app/server/storage" \
docker.io/mintplexlabs/anythingllm
```
To the service to have them autostart.
Put the generated files in `/usr/share/containers/systemd/`.
## Now with Nginx and Certbot
See [generating AWS credentials](cloud/graduated/aws_iam/README.md)
```bash
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
./aws/install
# Configure default credentials
aws configure
```
Open http/s in firewalld:
```bash
# Remember to firewall-cmd --set-default-zone=public
firewall-cmd --permanent --zone=public --add-service=http
firewall-cmd --permanent --zone=public --add-service=https
firewall-cmd --reload
# or
ufw allow 80/tcp
ufw allow 443/tcp
```
Here are the detailed instructions for installing and setting up Nginx on Fedora Linux with Certbot
using the Route53 DNS challenge to put in front of a service called "Anything LLM" running on port
3001 with WebSockets. The domain will be chatreesept.reeseapps.com.
1. Install Nginx:
```
dnf install -y nginx
```
2. Start and enable Nginx service:
```
systemctl enable --now nginx
```
3. Install Certbot and the Route53 DNS plugin:
```
# Fedora
dnf install -y certbot python3-certbot-dns-route53
# Arch
pacman -S certbot certbot-dns-route53
```
4. Request a certificate for your domain using the Route53 DNS challenge:
```
certbot certonly --dns-route53 -d chatreesept.reeseapps.com
```
Follow the prompts to provide your Route53 credentials and email address.
5. Configure Nginx for your domain: Create a new Nginx configuration file for your domain:
Update your nginx conf with the following
```
vim /etc/nginx/nginx.conf
```
```
keepalive_timeout 1h;
send_timeout 1h;
client_body_timeout 1h;
client_header_timeout 1h;
proxy_connect_timeout 1h;
proxy_read_timeout 1h;
proxy_send_timeout 1h;
```
```
vim /etc/nginx/conf.d/ollama.reeselink.com.conf
```
```
server {
listen 80;
server_name ollama.reeselink.com;
location / {
return 301 https://$host$request_uri;
}
}
server {
listen 443 ssl;
server_name ollama.reeselink.com;
ssl_certificate /etc/letsencrypt/live/ollama.reeselink.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/ollama.reeselink.com/privkey.pem;
location / {
proxy_pass http://localhost:11434;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
proxy_buffering off;
}
}
```
```
vim /etc/nginx/conf.d/chatreesept.reeseapps.com.conf
```
Add the following configuration to the file:
```
server {
listen 80;
server_name chatreesept.reeseapps.com;
location / {
return 301 https://$host$request_uri;
}
}
server {
listen 443 ssl;
server_name chatreesept.reeseapps.com;
ssl_certificate /etc/letsencrypt/live/chatreesept.reeseapps.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/chatreesept.reeseapps.com/privkey.pem;
location / {
client_max_body_size 50m;
proxy_pass http://localhost:3001;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
proxy_buffering off;
}
}
``
6. Test your Nginx configuration for syntax errors:
```
nginx -t
```
If there are no errors, reload Nginx to apply the changes:
```
systemctl reload nginx
```
7. Set up automatic certificate renewal: Add the following line to your crontab to renew the
certificate daily:
```
pacman -S cronie
sudo crontab -e
```
Add the following line to the end of the file:
```
0 0 * * * certbot renew --quiet
```
Now, your "Anything LLM" service running on port 3001 with WebSockets is accessible through the
domain chatreesept.reeseapps.com with a valid SSL certificate from Let's Encrypt. The certificate
will be automatically renewed daily.
## Custom Models
<https://www.gpu-mart.com/blog/import-models-from-huggingface-to-ollama>
<https://www.hostinger.com/tutorials/ollama-cli-tutorial#Setting_up_Ollama_in_the_CLI>
### From Existing Model
```bash
ollama show --modelfile opencoder > Modelfile
PARAMETER num_ctx 8192
ollama create opencoder-fix -f Modelfile
```
### From Scratch
Install git lfs and clone the model you're interested in
```bash
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/bartowski/Starling-LM-7B-beta-GGUF
```
Create a modelfile
```
# Modelfile
FROM "./path/to/gguf"
TEMPLATE """{{ if .Prompt }}<|im_start|>
{{ .Prompt }}<|im_end|>
{{ end }}
"""
SYSTEM You are OpenCoder, created by OpenCoder Team.
PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>
PARAMETER stop <|fim_prefix|>
PARAMETER stop <|fim_middle|>
PARAMETER stop <|fim_suffix|>
PARAMETER stop <|fim_end|>
PARAMETER stop """
"""
```
Build the model
```bash
ollama create "Starling-LM-7B-beta-Q6_K" -f Modelfile
```
Run the model
```bash
ollama run Starling-LM-7B-beta-Q6_K:latest
```
## Converting to gguf
<https://www.theregister.com/2024/07/14/quantization_llm_feature/>
1. Clone the llama.cpp repository and install its dependencies:
```bash
git clone https://github.com/ggerganov/llama.cpp.git
cd ~/llama.cpp
python3 -m venv venv && source venv/bin/activate
pip3 install -r requirements.txt
mkdir ~/llama.cpp/models/mistral
huggingface-cli login #necessary to download gated models
huggingface-cli download mistralai/Mistral-7B-Instruct-v0.3 --local-dir ~/llama.cpp/models/mistral/
python3 convert_hf_to_gguf.py ~/.cache/huggingface/hub/models--infly--OpenCoder-8B-Instruct/snapshots/01badbbf10c2dfd7e2a0b5f570065ef44548576c
```

View File

@@ -0,0 +1,361 @@
# Ollama
- [Ollama](#ollama)
- [Install and run Ollama](#install-and-run-ollama)
- [Install and run Ollama with Podman](#install-and-run-ollama-with-podman)
- [Unsticking models stuck in "Stopping"](#unsticking-models-stuck-in-stopping)
- [Run Anything LLM Interface](#run-anything-llm-interface)
- [Installing External Service with Nginx and Certbot](#installing-external-service-with-nginx-and-certbot)
- [Custom Models](#custom-models)
- [From Existing Model](#from-existing-model)
- [From Scratch](#from-scratch)
- [Discovering models](#discovering-models)
- [Custom models from safetensor files](#custom-models-from-safetensor-files)
<https://github.com/ollama/ollama>
## Install and run Ollama
<https://ollama.com/download/linux>
<https://ollama.com/library>
```bash
# Install script
curl -fsSL https://ollama.com/install.sh | sh
# Check service is running
systemctl status ollama
```
Remember to add `Environment="OLLAMA_HOST=0.0.0.0"` to `/etc/systemd/system/ollama.service` to
make it accessible on the network.
Also add `Environment="OLLAMA_MODELS=/models"` to `/etc/systemd/system/ollama.service` to
store models on an external disk.
For Radeon 6000 cards you'll need to add `Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"` as well.
I'd recommend the following models to get started:
- Chat: llava-llama3:latest
- Code: qwen2.5-coder:7b
- Math: qwen2-math:latest
- Uncensored: mannix/llama3.1-8b-abliterated:latest
- Embedding: nomic-embed-text:latest
Note your ollama instance will be available to podman containers via `http://host.containers.internal:11434`
## Install and run Ollama with Podman
```bash
podman run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama docker.io/ollama/ollama:rocm
```
## Unsticking models stuck in "Stopping"
```bash
ollama ps | grep -i stopping
pgrep ollama | xargs -I '%' sh -c 'kill %'
```
## Run Anything LLM Interface
```bash
podman run \
-d \
-p 3001:3001 \
--name anything-llm \
--cap-add SYS_ADMIN \
-v anything-llm:/app/server \
-e STORAGE_DIR="/app/server/storage" \
docker.io/mintplexlabs/anythingllm
```
This should now be accessible on port 3001. Note, you'll need to allow traffic between podman
and the host:
Use `podman network ls` to see which networks podman is running on and `podman network inspect`
to get the IP address range. Then allow traffic from that range to port 11434 (ollama):
```bash
ufw allow from 10.89.0.1/24 to any port 11434
```
## Installing External Service with Nginx and Certbot
We're going to need a certificate for our service since we'll want to talk to it over
https. This will be handled by certbot. I'm using AWS in this example, but certbot has
tons of DNS plugins available with similar commands. The important part is getting that
letsencrypt certificate generated and in the place nginx expects it.
Before we can use certbot we need aws credentials. Note this will be different if you
use a different DNS provider.
See [generating AWS credentials](cloud/graduated/aws_iam/README.md)
```bash
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
./aws/install
# Configure default credentials
aws configure
```
With AWS credentials configured you can now install and generate a certificate.
```bash
# Fedora
dnf install -y certbot python3-certbot-dns-route53
# Ubuntu
apt install -y python3-certbot python3-certbot-dns-route53
# Both
certbot certonly --dns-route53 -d chatreesept.reeseapps.com
```
Now you have a cert!
Install and start nginx with the following commands:
```bash
# Fedora
dnf install -y nginx
# Ubuntu
apt install -y nginx
# Both
systemctl enable --now nginx
```
Now let's edit our nginx config. First, add this to our nginx.conf (or make sure it's already there).
/etc/nginx/nginx.conf
```conf
keepalive_timeout 1h;
send_timeout 1h;
client_body_timeout 1h;
client_header_timeout 1h;
proxy_connect_timeout 1h;
proxy_read_timeout 1h;
proxy_send_timeout 1h;s
```
Now write your nginx http config files. You'll need two:
1. ollama.reeseapps.com.conf
2. chatreesept.reeseapps.com.conf
/etc/nginx/conf.d/ollama.reeseapps.com.conf
```conf
server {
listen 80;
listen [::]:80;
server_name ollama.reeseapps.com;
location / {
return 301 https://$host$request_uri;
}
}
server {
listen 443 ssl;
listen [::]:443 ssl;
server_name ollama.reeseapps.com;
ssl_certificate /etc/letsencrypt/live/ollama.reeseapps.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/ollama.reeseapps.com/privkey.pem;
location / {
if ($http_authorization != "Bearer <token>") {
return 401;
}
proxy_pass http://127.0.0.1:11434;
proxy_set_header Host $host;
proxy_buffering off;
}
}
```
/etc/nginx/conf.d/chatreesept.reeseapps.com.conf
```conf
server {
listen 80;
server_name chatreesept.reeseapps.com;
location / {
return 301 https://$host$request_uri;
}
}
server {
listen 443 ssl;
server_name chatreesept.reeseapps.com;
ssl_certificate /etc/letsencrypt/live/chatreesept.reeseapps.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/chatreesept.reeseapps.com/privkey.pem;
location / {
client_max_body_size 50m;
proxy_pass http://localhost:3001;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
}
```
Run `nginx -t` to check for errors. If there are none, run `systemctl reload nginx` to pick up
your changes. Your website should be available at chatreesept.reeseapps.com and localai.reeseapps.com.
Set up automatic certificate renewal by adding the following line to your crontab to renew the
certificate daily:
```bash
sudo crontab -e
```
Add the following line to the end of the file:
```bash
0 0 * * * certbot renew --quiet
```
At this point you might need to create some UFW rules to allow inter-container talking.
```bash
# Try this first if you're having problems
ufw reload
# Debug with ufw logging
ufw logging on
tail -f /var/log/ufw.log
```
Also consider that podman will not restart your containers at boot. You'll need to create quadlets
from the podman run commands. Check out the comments above the podman run commands for more info.
Also search the web for "podman quadlets" or ask your AI about it!
## Custom Models
<https://www.gpu-mart.com/blog/import-models-from-huggingface-to-ollama>
<https://www.hostinger.com/tutorials/ollama-cli-tutorial#Setting_up_Ollama_in_the_CLI>
### From Existing Model
```bash
ollama show --modelfile opencoder > Modelfile
PARAMETER num_ctx 8192
ollama create opencoder-fix -f Modelfile
```
### From Scratch
Install git lfs and clone the model you're interested in
```bash
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/bartowski/Starling-LM-7B-beta-GGUF
```
Create a modelfile
```Dockerfile
# Modelfile
FROM "./path/to/gguf"
TEMPLATE """{{ if .Prompt }}<|im_start|>
{{ .Prompt }}<|im_end|>
{{ end }}
"""
SYSTEM You are OpenCoder, created by OpenCoder Team.
PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>
PARAMETER stop <|fim_prefix|>
PARAMETER stop <|fim_middle|>
PARAMETER stop <|fim_suffix|>
PARAMETER stop <|fim_end|>
```
Build the model
```bash
ollama create "Starling-LM-7B-beta-Q6_K" -f Modelfile
```
Run the model
```bash
ollama run Starling-LM-7B-beta-Q6_K:latest
```
### Discovering models
Check out Hugging Face's leaderboard: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
1. Select the model type you're after
2. Drag the number of parameters slider to a range you can run
3. Click the top few and read about them.
### Custom models from safetensor files
<https://www.theregister.com/2024/07/14/quantization_llm_feature/>
Setup the repo:
```bash
# Setup
git clone https://github.com/ggerganov/llama.cpp.git
cd ~/llama.cpp
cmake -B build
cmake --build build --config Release -j $(nproc)
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
huggingface-cli login #necessary to download gated models
python convert_hf_to_gguf_update.py $(cat ~/.cache/huggingface/token)
```
Convert models to gguf:
```bash
# Copy the model title from hugging face
export MODEL_NAME=
# Create a folder to clone the model into
mkdir -p models/$MODEL_NAME
# Download the current head for the model
huggingface-cli download $MODEL_NAME --local-dir models/$MODEL_NAME
# Or get the f16 quantized gguf
wget -P models/$MODEL_NAME https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf/resolve/main/llava-llama-3-8b-v1_1-f16.gguf
# Convert model from hugging face to gguf, quant 8
python3 convert_hf_to_gguf.py models/$MODEL_NAME --outfile models/$MODEL_NAME.gguf
# Run ./llama-quantize to see available quants
./llama-quantize models/$MODEL_NAME.gguf models/$MODEL_NAME-Q4_K.gguf 15
./llama-quantize models/$MODEL_NAME.gguf models/$MODEL_NAME-Q5_K.gguf 17
./llama-quantize models/$MODEL_NAME.gguf models/$MODEL_NAME-Q6_K.gguf 18
./llama-quantize models/$MODEL_NAME.gguf models/$MODEL_NAME-Q8_0.gguf 7
# Copy to your localai models folder and restart
scp models/$MODEL_NAME-Q5_K.gguf localai:/models/
# View output
tree -phugL 2 models
```