productionzie ollama and overhaul docs

This commit is contained in:
2024-11-23 09:38:27 -05:00
parent 224d86f5af
commit df06b206af

View File

@@ -2,88 +2,90 @@
<https://github.com/ollama/ollama> <https://github.com/ollama/ollama>
## Running with Podman ## Run natively with GPU support
<https://ollama.com/download/linux>
<https://ollama.com/library> <https://ollama.com/library>
```bash ```bash
podman network create localai # Install script
curl -fsSL https://ollama.com/install.sh | sh
# Check service is running
systemctl status ollama
```
Remember to add `Environment="OLLAMA_HOST=0.0.0.0"` to `/etc/systemd/system/ollama.service` to
make it accessible on the network.
For Radeon 6000 cards you'll need to add `Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"` as well.
```bash
# Pull models
# Try to use higher parameter models. Grab the q5_K_M variant at minimum.
# For a 24GB VRAM Card I'd recommend:
# Anything-LLM Coding
ollama pull qwen2.5-coder:14b-instruct-q5_K_M
# Anything-LLM Math
ollama pull qwen2-math:7b-instruct-fp16
# Anything-LLM Chat
ollama pull llama3.2-vision:11b-instruct-q8_0
# VSCode Continue Autocomplete
ollama pull starcoder2:15b-q5_K_M
# VSCode Continue Chat
ollama pull llama3.1:8b-instruct-fp16
# VSCode Continue Embedder
ollama pull nomic-embed-text:137m-v1.5-fp16
```
Note your ollama instance will be available to podman containers via `http://host.containers.internal:11434`
## Run Anything LLM Interface
```bash
podman run \ podman run \
-d \ -d \
-v ollama:/root/.ollama \ -p 3001:3001 \
-p 127.0.0.1:11434:po \
--network localai \
--name ollama \
docker.io/ollama/ollama
# Pull new models
podman container exec ollama ollama pull llama3.2:3b
podman container exec ollama ollama pull llama3.2:1b
podman container exec ollama ollama pull llama3.2-vision:11b
podman container exec ollama ollama pull llava-llama3:8b
podman container exec ollama ollama pull deepseek-coder-v2:16b
podman container exec ollama ollama pull opencoder:8b
podman container exec ollama ollama pull codestral:22b
# Talk to an existing model via cli
podman container exec -it ollama ollama run llama3.2:3b
podman run \
-d \
-p 127.0.0.1:3001:3001 \
--name anything-llm \ --name anything-llm \
--network localai \
--cap-add SYS_ADMIN \ --cap-add SYS_ADMIN \
-v anything-llm:/app/server \ -v anything-llm:/app/server \
-e STORAGE_DIR="/app/server/storage" \ -e STORAGE_DIR="/app/server/storage" \
mintplexlabs/anythingllm docker.io/mintplexlabs/anythingllm
``` ```
### Quadlets with Podlet This should now be accessible on port 3001. Note, you'll need to allow traffic between podman
and the host:
Use `podman network ls` to see which networks podman is running on and `podman network inspect`
to get the IP address range. Then allow traffic from that range to port 11434 (ollama):
```bash ```bash
# Create volume for ollama ufw allow from 10.89.0.1/24 to any port 11434
mkdir /ollama ```
podman run --rm ghcr.io/containers/podlet --install --description "Local AI Network" \ ## Anything LLM Quadlet with Podlet
podman network create localai
podman run --rm ghcr.io/containers/podlet --install --description "Ollama" \ ```bash
podman run \
-d \
-v /ollama:/root/.ollama \
-p 127.0.0.1:11434:11434 \
--network localai \
--name ollama \
docker.io/ollama/ollama
export STORAGE_LOCATION=/anything-llm && \
podman run --rm ghcr.io/containers/podlet --install --description "Anything LLM" \ podman run --rm ghcr.io/containers/podlet --install --description "Anything LLM" \
podman run \ podman run \
-d \ -d \
-p 127.0.0.1:3001:3001 \ -p 3001:3001 \
--name anything-llm \ --name anything-llm \
--network localai \
--cap-add SYS_ADMIN \ --cap-add SYS_ADMIN \
-v ${STORAGE_LOCATION}:/app/server/storage \ --restart always \
-v ${STORAGE_LOCATION}/.env:/app/server/.env \ -v anything-llm:/app/server \
-e STORAGE_DIR="/app/server/storage" \ -e STORAGE_DIR="/app/server/storage" \
mintplexlabs/anythingllm docker.io/mintplexlabs/anythingllm
```
Make sure to add
```conf
[Service]
Restart=always
``` ```
To the service to have them autostart. To the service to have them autostart.
Put the generated files in `/usr/share/containers/systemd/`. Put the generated files in `/usr/share/containers/systemd/`.
## Podman systemd service ## Now with Nginx and Certbot
See [generating AWS credentials](cloud/graduated/aws_iam/README.md) See [generating AWS credentials](cloud/graduated/aws_iam/README.md)
@@ -99,9 +101,14 @@ aws configure
Open http/s in firewalld: Open http/s in firewalld:
```bash ```bash
# Remember to firewall-cmd --set-default-zone=public
firewall-cmd --permanent --zone=public --add-service=http firewall-cmd --permanent --zone=public --add-service=http
firewall-cmd --permanent --zone=public --add-service=https firewall-cmd --permanent --zone=public --add-service=https
firewall-cmd --reload firewall-cmd --reload
# or
ufw allow 80/tcp
ufw allow 443/tcp
``` ```
Here are the detailed instructions for installing and setting up Nginx on Fedora Linux with Certbot Here are the detailed instructions for installing and setting up Nginx on Fedora Linux with Certbot
@@ -109,32 +116,43 @@ using the Route53 DNS challenge to put in front of a service called "Anything LL
3001 with WebSockets. The domain will be chatreesept.reeseapps.com. 3001 with WebSockets. The domain will be chatreesept.reeseapps.com.
1. Install Nginx: 1. Install Nginx:
``` ```
dnf install -y nginx dnf install -y nginx
``` ```
2. Start and enable Nginx service: 2. Start and enable Nginx service:
``` ```
systemctl enable --now nginx systemctl enable --now nginx
``` ```
3. Install Certbot and the Route53 DNS plugin: 3. Install Certbot and the Route53 DNS plugin:
``` ```
# Fedora
dnf install -y certbot python3-certbot-dns-route53 dnf install -y certbot python3-certbot-dns-route53
# Arch
pacman -S certbot certbot-dns-route53
``` ```
4. Request a certificate for your domain using the Route53 DNS challenge: 4. Request a certificate for your domain using the Route53 DNS challenge:
``` ```
certbot certonly --dns-route53 -d chatreesept.reeseapps.com certbot certonly --dns-route53 -d chatreesept.reeseapps.com
``` ```
Follow the prompts to provide your Route53 credentials and email address. Follow the prompts to provide your Route53 credentials and email address.
5. Configure Nginx for your domain: Create a new Nginx configuration file for your domain: 5. Configure Nginx for your domain: Create a new Nginx configuration file for your domain:
Update your nginx conf with the following
``` ```
vim /etc/nginx/conf.d/chatreesept.reeseapps.com.conf vim /etc/nginx/nginx.conf
``` ```
Add the following configuration to the file:
``` ```
keepalive_timeout 1h; keepalive_timeout 1h;
send_timeout 1h; send_timeout 1h;
@@ -143,7 +161,47 @@ using the Route53 DNS challenge to put in front of a service called "Anything LL
proxy_connect_timeout 1h; proxy_connect_timeout 1h;
proxy_read_timeout 1h; proxy_read_timeout 1h;
proxy_send_timeout 1h; proxy_send_timeout 1h;
```
```
vim /etc/nginx/conf.d/ollama.reeselink.com.conf
```
```
server {
listen 80;
server_name ollama.reeselink.com;
location / {
return 301 https://$host$request_uri;
}
}
server {
listen 443 ssl;
server_name ollama.reeselink.com;
ssl_certificate /etc/letsencrypt/live/ollama.reeselink.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/ollama.reeselink.com/privkey.pem;
location / {
proxy_pass http://localhost:11434;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
proxy_buffering off;
}
}
```
```
vim /etc/nginx/conf.d/chatreesept.reeseapps.com.conf
```
Add the following configuration to the file:
```
server { server {
listen 80; listen 80;
server_name chatreesept.reeseapps.com; server_name chatreesept.reeseapps.com;
@@ -169,67 +227,118 @@ using the Route53 DNS challenge to put in front of a service called "Anything LL
proxy_set_header Connection "upgrade"; proxy_set_header Connection "upgrade";
proxy_set_header Host $host; proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade; proxy_cache_bypass $http_upgrade;
proxy_buffering off;
} }
} }
``` ``
6. Test your Nginx configuration for syntax errors: 6. Test your Nginx configuration for syntax errors:
``` ```
nginx -t nginx -t
``` ```
If there are no errors, reload Nginx to apply the changes: If there are no errors, reload Nginx to apply the changes:
``` ```
systemctl reload nginx systemctl reload nginx
``` ```
7. Set up automatic certificate renewal: Add the following line to your crontab to renew the 7. Set up automatic certificate renewal: Add the following line to your crontab to renew the
certificate daily: certificate daily:
``` ```
pacman -S cronie
sudo crontab -e sudo crontab -e
``` ```
Add the following line to the end of the file: Add the following line to the end of the file:
``` ```
0 0 * * * certbot renew --quiet --no-self-upgrade --pre-hook "systemctl stop nginx" --post-hook "systemctl start nginx" 0 0 * * * certbot renew --quiet
``` ```
Now, your "Anything LLM" service running on port 3001 with WebSockets is accessible through the Now, your "Anything LLM" service running on port 3001 with WebSockets is accessible through the
domain chatreesept.reeseapps.com with a valid SSL certificate from Let's Encrypt. The certificate domain chatreesept.reeseapps.com with a valid SSL certificate from Let's Encrypt. The certificate
will be automatically renewed daily. will be automatically renewed daily.
## Nginx ## Custom Models
<https://www.gpu-mart.com/blog/import-models-from-huggingface-to-ollama>
### From Existing Model
```bash ```bash
certbot-3 certonly --dns-route53 -d chatreesept.reeseapps.com ollama show --modelfile opencoder > Modelfile
PARAMETER num_ctx 8192
ollama create opencoder-fix -f Modelfile
``` ```
Make sure to add the following timeout configurations to your http block: ### From Scratch
```conf Install git lfs and clone the model you're interested in
server {
# Enable websocket connections for agent protocol.
location ~* ^/api/agent-invocation/(.*) {
proxy_pass http://0.0.0.0:3001;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
}
listen 80; ```bash
server_name [insert FQDN here]; # Make sure you have git-lfs installed (https://git-lfs.com)
location / { git lfs install
# Prevent timeouts on long-running requests.
proxy_connect_timeout 605;
proxy_send_timeout 605;
proxy_read_timeout 605;
send_timeout 605;
keepalive_timeout 605;
# Enable readable HTTP Streaming for LLM streamed responses git clone https://huggingface.co/bartowski/Starling-LM-7B-beta-GGUF
proxy_buffering off; ```
proxy_cache off;
Create a modelfile
# Proxy your locally running service
proxy_pass http://0.0.0.0:3001; ```
} # Modelfile
} FROM "./path/to/gguf"
TEMPLATE """{{ if .Prompt }}<|im_start|>
{{ .Prompt }}<|im_end|>
{{ end }}
"""
SYSTEM You are OpenCoder, created by OpenCoder Team.
PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>
PARAMETER stop <|fim_prefix|>
PARAMETER stop <|fim_middle|>
PARAMETER stop <|fim_suffix|>
PARAMETER stop <|fim_end|>
PARAMETER stop """
"""
```
Build the model
```bash
ollama create "Starling-LM-7B-beta-Q6_K" -f Modelfile
```
Run the model
```bash
ollama run Starling-LM-7B-beta-Q6_K:latest
```
## Converting to gguf
<https://www.theregister.com/2024/07/14/quantization_llm_feature/>
1. Clone the llama.cpp repository and install its dependencies:
```bash
git clone https://github.com/ggerganov/llama.cpp.git
cd ~/llama.cpp
python3 -m venv venv && source venv/bin/activate
pip3 install -r requirements.txt
mkdir ~/llama.cpp/models/mistral
huggingface-cli login #necessary to download gated models
huggingface-cli download mistralai/Mistral-7B-Instruct-v0.3 --local-dir ~/llama.cpp/models/mistral/
python3 convert_hf_to_gguf.py ~/.cache/huggingface/hub/models--infly--OpenCoder-8B-Instruct/snapshots/01badbbf10c2dfd7e2a0b5f570065ef44548576c
``` ```