refine ollama docs

2024-12-06 03:47:20 -05:00
parent 1b899d9062
commit 509b87c15c
2 changed files with 361 additions and 364 deletions
@@ -1,364 +0,0 @@
-# Ollama
-
- [Ollama](#ollama)
-  - [Run natively with GPU support](#run-natively-with-gpu-support)
-  - [Unsticking models stuck in "Stopping"](#unsticking-models-stuck-in-stopping)
-  - [Run Anything LLM Interface](#run-anything-llm-interface)
-  - [Anything LLM Quadlet with Podlet](#anything-llm-quadlet-with-podlet)
-  - [Now with Nginx and Certbot](#now-with-nginx-and-certbot)
-  - [Custom Models](#custom-models)
-    - [From Existing Model](#from-existing-model)
-    - [From Scratch](#from-scratch)
-  - [Converting to gguf](#converting-to-gguf)
-
-<https://github.com/ollama/ollama>
-
-## Run natively with GPU support
-
-<https://ollama.com/download/linux>
-
-<https://ollama.com/library>
-
-```bash
-# Install script
-curl -fsSL https://ollama.com/install.sh | sh
-# Check service is running
-systemctl status ollama
-```
-
-Remember to add `Environment="OLLAMA_HOST=0.0.0.0"` to `/etc/systemd/system/ollama.service` to 
-make it accessible on the network.
-
-For Radeon 6000 cards you'll need to add `Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"` as well.
-
-```bash
-# Pull models
-# Try to use higher parameter models. Grab the q5_K_M variant at minimum.
-
-# For a 24GB VRAM Card I'd recommend:
-
-# Anything-LLM Coding
-ollama pull qwen2.5-coder:14b-instruct-q5_K_M
-# Anything-LLM Math
-ollama pull qwen2-math:7b-instruct-fp16
-# Anything-LLM Chat
-ollama pull llama3.2-vision:11b-instruct-q8_0
-
-# VSCode Continue Autocomplete
-ollama pull starcoder2:15b-q5_K_M
-# VSCode Continue Chat
-ollama pull llama3.1:8b-instruct-fp16
-# VSCode Continue Embedder
-ollama pull nomic-embed-text:137m-v1.5-fp16
-```
-
-Note your ollama instance will be available to podman containers via `http://host.containers.internal:11434`
-
-## Unsticking models stuck in "Stopping"
-
-```bash
-ollama ps | grep -i stopping
-pgrep ollama | xargs -I '%' sh -c 'kill %'
-```
-
-## Run Anything LLM Interface
-
-```bash
-podman run \
-    -d \
-    -p 3001:3001 \
-    --name anything-llm \
-    --cap-add SYS_ADMIN \
-    -v anything-llm:/app/server \
-    -e STORAGE_DIR="/app/server/storage" \
-    docker.io/mintplexlabs/anythingllm
-```
-
-This should now be accessible on port 3001. Note, you'll need to allow traffic between podman
-and the host:
-
-Use `podman network ls` to see which networks podman is running on and `podman network inspect`
-to get the IP address range. Then allow traffic from that range to port 11434 (ollama):
-
-```bash
-ufw allow from 10.89.0.1/24 to any port 11434
-```
-
-## Anything LLM Quadlet with Podlet
-
-```bash
-podman run --rm ghcr.io/containers/podlet --install --description "Anything LLM" \
-    podman run \
-    -d \
-    -p 3001:3001 \
-    --name anything-llm \
-    --cap-add SYS_ADMIN \
-    --restart always \
-    -v anything-llm:/app/server \
-    -e STORAGE_DIR="/app/server/storage" \
-    docker.io/mintplexlabs/anythingllm
-```
-
-To the service to have them autostart.
-
-Put the generated files in `/usr/share/containers/systemd/`.
-
-## Now with Nginx and Certbot
-
-See [generating AWS credentials](cloud/graduated/aws_iam/README.md)
-
-```bash
-curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
-unzip awscliv2.zip
-./aws/install
-
-# Configure default credentials
-aws configure
-```
-
-Open http/s in firewalld:
-
-```bash
-# Remember to firewall-cmd --set-default-zone=public
-firewall-cmd --permanent --zone=public --add-service=http
-firewall-cmd --permanent --zone=public --add-service=https
-firewall-cmd --reload
-
-# or
-ufw allow 80/tcp
-ufw allow 443/tcp
-```
-
-Here are the detailed instructions for installing and setting up Nginx on Fedora Linux with Certbot
-using the Route53 DNS challenge to put in front of a service called "Anything LLM" running on port
-3001 with WebSockets. The domain will be chatreesept.reeseapps.com.
-
-1. Install Nginx:
-    
-    ```
-    dnf install -y nginx
-    ```
-
-2. Start and enable Nginx service:
-    
-    ```
-    systemctl enable --now nginx
-    ```
-
-3. Install Certbot and the Route53 DNS plugin:
-    
-    ```
-    # Fedora
-    dnf install -y certbot python3-certbot-dns-route53
-
-    # Arch
-    pacman -S certbot certbot-dns-route53
-    ```
-
-4. Request a certificate for your domain using the Route53 DNS challenge:
-
-    ```
-    certbot certonly --dns-route53 -d chatreesept.reeseapps.com
-    ```
-    
-    Follow the prompts to provide your Route53 credentials and email address.
-
-5. Configure Nginx for your domain: Create a new Nginx configuration file for your domain:
-
-    Update your nginx conf with the following
-
-    ```
-    vim /etc/nginx/nginx.conf
-    ```
-
-    ```
-    keepalive_timeout 1h;
-    send_timeout 1h;
-    client_body_timeout 1h;
-    client_header_timeout 1h;
-    proxy_connect_timeout 1h;
-    proxy_read_timeout 1h;
-    proxy_send_timeout 1h;
-    ```
-
-    ```
-    vim /etc/nginx/conf.d/ollama.reeselink.com.conf
-    ```
-
-    ```
-    server {
-        listen 80;
-        server_name ollama.reeselink.com;
-
-        location / {
-            return 301 https://$host$request_uri;
-        }
-    }
-
-    server {
-        listen 443 ssl;
-        server_name ollama.reeselink.com;
-
-        ssl_certificate /etc/letsencrypt/live/ollama.reeselink.com/fullchain.pem;
-        ssl_certificate_key /etc/letsencrypt/live/ollama.reeselink.com/privkey.pem;
-
-        location / {
-            proxy_pass http://localhost:11434;
-            proxy_http_version 1.1;
-            proxy_set_header Upgrade $http_upgrade;
-            proxy_set_header Connection "upgrade";
-            proxy_set_header Host $host;
-            proxy_cache_bypass $http_upgrade;
-            proxy_buffering off;
-        }
-    }
-    ```
-
-    ```
-    vim /etc/nginx/conf.d/chatreesept.reeseapps.com.conf
-    ```
-
-    Add the following configuration to the file:
-    ```
-    server {
-        listen 80;
-        server_name chatreesept.reeseapps.com;
-
-        location / {
-            return 301 https://$host$request_uri;
-        }
-    }
-
-    server {
-        listen 443 ssl;
-        server_name chatreesept.reeseapps.com;
-
-        ssl_certificate /etc/letsencrypt/live/chatreesept.reeseapps.com/fullchain.pem;
-        ssl_certificate_key /etc/letsencrypt/live/chatreesept.reeseapps.com/privkey.pem;
-
-        location / {
-            client_max_body_size 50m;
-
-            proxy_pass http://localhost:3001;
-            proxy_http_version 1.1;
-            proxy_set_header Upgrade $http_upgrade;
-            proxy_set_header Connection "upgrade";
-            proxy_set_header Host $host;
-            proxy_cache_bypass $http_upgrade;
-            proxy_buffering off;
-        }
-    }
-    ``
-
-6. Test your Nginx configuration for syntax errors:
-
-    ```
-    nginx -t
-    ```
-    
-    If there are no errors, reload Nginx to apply the changes:
-    
-    ```
-    systemctl reload nginx
-    ```
-
-7. Set up automatic certificate renewal: Add the following line to your crontab to renew the
-    certificate daily:
-    
-    ```
-    
-    pacman -S cronie
-    sudo crontab -e
-    ```
-
-    Add the following line to the end of the file:
-    
-    ```
-    0 0 * * * certbot renew --quiet
-    ```
-
-Now, your "Anything LLM" service running on port 3001 with WebSockets is accessible through the
-domain chatreesept.reeseapps.com with a valid SSL certificate from Let's Encrypt. The certificate
-will be automatically renewed daily.
-
-## Custom Models
-
-<https://www.gpu-mart.com/blog/import-models-from-huggingface-to-ollama>
-
-<https://www.hostinger.com/tutorials/ollama-cli-tutorial#Setting_up_Ollama_in_the_CLI>
-
-### From Existing Model
-
-```bash
-ollama show --modelfile opencoder > Modelfile
-PARAMETER num_ctx 8192
-ollama create opencoder-fix -f Modelfile
-```
-
-### From Scratch
-
-Install git lfs and clone the model you're interested in
-
-```bash
-# Make sure you have git-lfs installed (https://git-lfs.com)
-git lfs install
-
-git clone https://huggingface.co/bartowski/Starling-LM-7B-beta-GGUF
-```
-
-Create a modelfile
-
-```
-# Modelfile
-FROM "./path/to/gguf"
-
-TEMPLATE """{{ if .Prompt }}<|im_start|>
-{{ .Prompt }}<|im_end|>
-{{ end }}
-"""
-
-SYSTEM You are OpenCoder, created by OpenCoder Team.
-
-PARAMETER stop <|im_start|>
-PARAMETER stop <|im_end|>
-PARAMETER stop <|fim_prefix|>
-PARAMETER stop <|fim_middle|>
-PARAMETER stop <|fim_suffix|>
-PARAMETER stop <|fim_end|>
-PARAMETER stop """
-
-
-"""
-
-```
-
-Build the model
-
-```bash
-ollama create "Starling-LM-7B-beta-Q6_K" -f Modelfile
-```
-
-Run the model
-
-```bash
-ollama run Starling-LM-7B-beta-Q6_K:latest
-```
-
-## Converting to gguf
-
-<https://www.theregister.com/2024/07/14/quantization_llm_feature/>
-
-1. Clone the llama.cpp repository and install its dependencies:
-
-```bash
-git clone https://github.com/ggerganov/llama.cpp.git
-cd ~/llama.cpp
-python3 -m venv venv && source venv/bin/activate
-pip3 install -r requirements.txt
-
-mkdir ~/llama.cpp/models/mistral
-huggingface-cli login #necessary to download gated models
-huggingface-cli download mistralai/Mistral-7B-Instruct-v0.3 --local-dir ~/llama.cpp/models/mistral/
-
-python3 convert_hf_to_gguf.py ~/.cache/huggingface/hub/models--infly--OpenCoder-8B-Instruct/snapshots/01badbbf10c2dfd7e2a0b5f570065ef44548576c
-```
@@ -0,0 +1,361 @@
+# Ollama
+
+- [Ollama](#ollama)
+  - [Install and run Ollama](#install-and-run-ollama)
+  - [Install and run Ollama with Podman](#install-and-run-ollama-with-podman)
+  - [Unsticking models stuck in "Stopping"](#unsticking-models-stuck-in-stopping)
+  - [Run Anything LLM Interface](#run-anything-llm-interface)
+  - [Installing External Service with Nginx and Certbot](#installing-external-service-with-nginx-and-certbot)
+  - [Custom Models](#custom-models)
+    - [From Existing Model](#from-existing-model)
+    - [From Scratch](#from-scratch)
+    - [Discovering models](#discovering-models)
+    - [Custom models from safetensor files](#custom-models-from-safetensor-files)
+
+<https://github.com/ollama/ollama>
+
+## Install and run Ollama
+
+<https://ollama.com/download/linux>
+
+<https://ollama.com/library>
+
+```bash
+# Install script
+curl -fsSL https://ollama.com/install.sh | sh
+# Check service is running
+systemctl status ollama
+```
+
+Remember to add `Environment="OLLAMA_HOST=0.0.0.0"` to `/etc/systemd/system/ollama.service` to
+make it accessible on the network.
+
+Also add `Environment="OLLAMA_MODELS=/models"` to `/etc/systemd/system/ollama.service` to
+store models on an external disk.
+
+For Radeon 6000 cards you'll need to add `Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"` as well.
+
+I'd recommend the following models to get started:
+
+- Chat: llava-llama3:latest
+- Code: qwen2.5-coder:7b
+- Math: qwen2-math:latest
+- Uncensored: mannix/llama3.1-8b-abliterated:latest
+- Embedding: nomic-embed-text:latest
+
+Note your ollama instance will be available to podman containers via `http://host.containers.internal:11434`
+
+## Install and run Ollama with Podman
+
+```bash
+podman run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama docker.io/ollama/ollama:rocm
+```
+
+## Unsticking models stuck in "Stopping"
+
+```bash
+ollama ps | grep -i stopping
+pgrep ollama | xargs -I '%' sh -c 'kill %'
+```
+
+## Run Anything LLM Interface
+
+```bash
+podman run \
+    -d \
+    -p 3001:3001 \
+    --name anything-llm \
+    --cap-add SYS_ADMIN \
+    -v anything-llm:/app/server \
+    -e STORAGE_DIR="/app/server/storage" \
+    docker.io/mintplexlabs/anythingllm
+```
+
+This should now be accessible on port 3001. Note, you'll need to allow traffic between podman
+and the host:
+
+Use `podman network ls` to see which networks podman is running on and `podman network inspect`
+to get the IP address range. Then allow traffic from that range to port 11434 (ollama):
+
+```bash
+ufw allow from 10.89.0.1/24 to any port 11434
+```
+
+## Installing External Service with Nginx and Certbot
+
+We're going to need a certificate for our service since we'll want to talk to it over
+https. This will be handled by certbot. I'm using AWS in this example, but certbot has
+tons of DNS plugins available with similar commands. The important part is getting that
+letsencrypt certificate generated and in the place nginx expects it.
+
+Before we can use certbot we need aws credentials. Note this will be different if you
+use a different DNS provider.
+
+See [generating AWS credentials](cloud/graduated/aws_iam/README.md)
+
+```bash
+curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
+unzip awscliv2.zip
+./aws/install
+
+# Configure default credentials
+aws configure
+```
+
+With AWS credentials configured you can now install and generate a certificate.
+
+```bash
+# Fedora
+dnf install -y certbot python3-certbot-dns-route53
+
+# Ubuntu
+apt install -y python3-certbot python3-certbot-dns-route53
+
+# Both
+certbot certonly --dns-route53 -d chatreesept.reeseapps.com
+```
+
+Now you have a cert!
+
+Install and start nginx with the following commands:
+
+```bash
+# Fedora
+dnf install -y nginx
+
+# Ubuntu
+apt install -y nginx
+
+# Both
+systemctl enable --now nginx
+```
+
+Now let's edit our nginx config. First, add this to our nginx.conf (or make sure it's already there).
+
+/etc/nginx/nginx.conf
+
+```conf
+keepalive_timeout 1h;
+send_timeout 1h;
+client_body_timeout 1h;
+client_header_timeout 1h;
+proxy_connect_timeout 1h;
+proxy_read_timeout 1h;
+proxy_send_timeout 1h;s
+```
+
+Now write your nginx http config files. You'll need two:
+
+1. ollama.reeseapps.com.conf
+2. chatreesept.reeseapps.com.conf
+
+/etc/nginx/conf.d/ollama.reeseapps.com.conf
+
+```conf
+server {
+  listen 80;
+  listen [::]:80;
+  server_name ollama.reeseapps.com;
+
+  location / {
+    return 301 https://$host$request_uri;
+  }
+}
+
+server {
+  listen 443 ssl;
+  listen [::]:443 ssl;
+  server_name ollama.reeseapps.com;
+
+  ssl_certificate /etc/letsencrypt/live/ollama.reeseapps.com/fullchain.pem;
+  ssl_certificate_key /etc/letsencrypt/live/ollama.reeseapps.com/privkey.pem;
+
+  location / {
+    if ($http_authorization != "Bearer <token>") {
+        return 401;
+    }
+
+    proxy_pass http://127.0.0.1:11434;
+    proxy_set_header Host $host;
+    proxy_buffering off;
+  }
+}
+```
+
+/etc/nginx/conf.d/chatreesept.reeseapps.com.conf
+
+```conf
+server {
+  listen 80;
+  server_name chatreesept.reeseapps.com;
+
+  location / {
+    return 301 https://$host$request_uri;
+  }
+}
+
+server {
+  listen 443 ssl;
+  server_name chatreesept.reeseapps.com;
+
+  ssl_certificate /etc/letsencrypt/live/chatreesept.reeseapps.com/fullchain.pem;
+  ssl_certificate_key /etc/letsencrypt/live/chatreesept.reeseapps.com/privkey.pem;
+
+  location / {
+    client_max_body_size 50m;
+
+    proxy_pass http://localhost:3001;
+    proxy_http_version 1.1;
+    proxy_set_header Upgrade $http_upgrade;
+    proxy_set_header Connection "upgrade";
+    proxy_set_header Host $host;
+    proxy_cache_bypass $http_upgrade;
+  }
+}
+```
+
+Run `nginx -t` to check for errors. If there are none, run `systemctl reload nginx` to pick up
+your changes. Your website should be available at chatreesept.reeseapps.com and localai.reeseapps.com.
+
+Set up automatic certificate renewal by adding the following line to your crontab to renew the
+certificate daily:
+
+```bash
+sudo crontab -e
+```
+
+Add the following line to the end of the file:
+
+```bash
+0 0 * * * certbot renew --quiet
+```
+
+At this point you might need to create some UFW rules to allow inter-container talking.
+
+```bash
+# Try this first if you're having problems
+ufw reload
+
+# Debug with ufw logging
+ufw logging on
+tail -f /var/log/ufw.log
+```
+
+Also consider that podman will not restart your containers at boot. You'll need to create quadlets
+from the podman run commands. Check out the comments above the podman run commands for more info.
+Also search the web for "podman quadlets" or ask your AI about it!
+
+## Custom Models
+
+<https://www.gpu-mart.com/blog/import-models-from-huggingface-to-ollama>
+
+<https://www.hostinger.com/tutorials/ollama-cli-tutorial#Setting_up_Ollama_in_the_CLI>
+
+### From Existing Model
+
+```bash
+ollama show --modelfile opencoder > Modelfile
+PARAMETER num_ctx 8192
+ollama create opencoder-fix -f Modelfile
+```
+
+### From Scratch
+
+Install git lfs and clone the model you're interested in
+
+```bash
+# Make sure you have git-lfs installed (https://git-lfs.com)
+git lfs install
+
+git clone https://huggingface.co/bartowski/Starling-LM-7B-beta-GGUF
+```
+
+Create a modelfile
+
+```Dockerfile
+# Modelfile
+FROM "./path/to/gguf"
+
+TEMPLATE """{{ if .Prompt }}<|im_start|>
+{{ .Prompt }}<|im_end|>
+{{ end }}
+"""
+
+SYSTEM You are OpenCoder, created by OpenCoder Team.
+
+PARAMETER stop <|im_start|>
+PARAMETER stop <|im_end|>
+PARAMETER stop <|fim_prefix|>
+PARAMETER stop <|fim_middle|>
+PARAMETER stop <|fim_suffix|>
+PARAMETER stop <|fim_end|>
+```
+
+Build the model
+
+```bash
+ollama create "Starling-LM-7B-beta-Q6_K" -f Modelfile
+```
+
+Run the model
+
+```bash
+ollama run Starling-LM-7B-beta-Q6_K:latest
+```
+
+### Discovering models
+
+Check out Hugging Face's leaderboard: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
+
+1. Select the model type you're after
+2. Drag the number of parameters slider to a range you can run
+3. Click the top few and read about them.
+
+### Custom models from safetensor files
+
+<https://www.theregister.com/2024/07/14/quantization_llm_feature/>
+
+Setup the repo:
+
+```bash
+# Setup
+git clone https://github.com/ggerganov/llama.cpp.git
+cd ~/llama.cpp
+cmake -B build
+cmake --build build --config Release -j $(nproc)
+python3 -m venv venv && source venv/bin/activate
+pip install -r requirements.txt
+huggingface-cli login #necessary to download gated models
+python convert_hf_to_gguf_update.py $(cat ~/.cache/huggingface/token)
+```
+
+Convert models to gguf:
+
+```bash
+# Copy the model title from hugging face
+export MODEL_NAME=
+
+# Create a folder to clone the model into
+mkdir -p models/$MODEL_NAME
+
+# Download the current head for the model
+huggingface-cli download $MODEL_NAME --local-dir models/$MODEL_NAME
+
+# Or get the f16 quantized gguf
+wget -P models/$MODEL_NAME https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf/resolve/main/llava-llama-3-8b-v1_1-f16.gguf
+
+# Convert model from hugging face to gguf, quant 8
+python3 convert_hf_to_gguf.py models/$MODEL_NAME --outfile models/$MODEL_NAME.gguf
+
+# Run ./llama-quantize to see available quants
+./llama-quantize models/$MODEL_NAME.gguf models/$MODEL_NAME-Q4_K.gguf 15
+./llama-quantize models/$MODEL_NAME.gguf models/$MODEL_NAME-Q5_K.gguf 17
+./llama-quantize models/$MODEL_NAME.gguf models/$MODEL_NAME-Q6_K.gguf 18
+./llama-quantize models/$MODEL_NAME.gguf models/$MODEL_NAME-Q8_0.gguf 7
+
+# Copy to your localai models folder and restart
+scp models/$MODEL_NAME-Q5_K.gguf localai:/models/
+
+# View output
+tree -phugL 2 models
+```