productionzie ollama and overhaul docs

2024-11-23 09:38:27 -05:00
parent 224d86f5af
commit df06b206af
1 changed files with 224 additions and 115 deletions
--- a/podman/incubating/ollama/README.md
+++ b/podman/incubating/ollama/README.md
@@ -2,88 +2,90 @@

 <https://github.com/ollama/ollama>

-## Running with Podman
+## Run natively with GPU support
+
+<https://ollama.com/download/linux>

 <https://ollama.com/library>

 ```bash
-podman network create localai
+# Install script
+curl -fsSL https://ollama.com/install.sh | sh
+# Check service is running
+systemctl status ollama
+```

+Remember to add `Environment="OLLAMA_HOST=0.0.0.0"` to `/etc/systemd/system/ollama.service` to 
+make it accessible on the network.
+
+For Radeon 6000 cards you'll need to add `Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"` as well.
+
+```bash
+# Pull models
+# Try to use higher parameter models. Grab the q5_K_M variant at minimum.
+
+# For a 24GB VRAM Card I'd recommend:
+
+# Anything-LLM Coding
+ollama pull qwen2.5-coder:14b-instruct-q5_K_M
+# Anything-LLM Math
+ollama pull qwen2-math:7b-instruct-fp16
+# Anything-LLM Chat
+ollama pull llama3.2-vision:11b-instruct-q8_0
+
+# VSCode Continue Autocomplete
+ollama pull starcoder2:15b-q5_K_M
+# VSCode Continue Chat
+ollama pull llama3.1:8b-instruct-fp16
+# VSCode Continue Embedder
+ollama pull nomic-embed-text:137m-v1.5-fp16
+```
+
+Note your ollama instance will be available to podman containers via `http://host.containers.internal:11434`
+
+## Run Anything LLM Interface
+
+```bash
 podman run \
    -d \
-    -v ollama:/root/.ollama \
-    -p 127.0.0.1:11434:po \
-    --network localai \
-    --name ollama \
-    docker.io/ollama/ollama
-
-# Pull new models
-podman container exec ollama ollama pull llama3.2:3b
-podman container exec ollama ollama pull llama3.2:1b
-podman container exec ollama ollama pull llama3.2-vision:11b
-podman container exec ollama ollama pull llava-llama3:8b
-podman container exec ollama ollama pull deepseek-coder-v2:16b
-podman container exec ollama ollama pull opencoder:8b
-podman container exec ollama ollama pull codestral:22b
-
-# Talk to an existing model via cli
-podman container exec -it ollama ollama run llama3.2:3b
-
-podman run \
-    -d \
-    -p 127.0.0.1:3001:3001 \
+    -p 3001:3001 \
    --name anything-llm \
-    --network localai \
    --cap-add SYS_ADMIN \
    -v anything-llm:/app/server \
    -e STORAGE_DIR="/app/server/storage" \
-    mintplexlabs/anythingllm
+    docker.io/mintplexlabs/anythingllm
 ```

-### Quadlets with Podlet
+This should now be accessible on port 3001. Note, you'll need to allow traffic between podman
+and the host:
+
+Use `podman network ls` to see which networks podman is running on and `podman network inspect`
+to get the IP address range. Then allow traffic from that range to port 11434 (ollama):

 ```bash
-# Create volume for ollama
-mkdir /ollama
+ufw allow from 10.89.0.1/24 to any port 11434
+```

-podman run --rm ghcr.io/containers/podlet --install --description "Local AI Network" \
-    podman network create localai
+## Anything LLM Quadlet with Podlet

-podman run --rm ghcr.io/containers/podlet --install --description "Ollama" \
-    podman run \
-    -d \
-    -v /ollama:/root/.ollama \
-    -p 127.0.0.1:11434:11434 \
-    --network localai \
-    --name ollama \
-    docker.io/ollama/ollama
-
-export STORAGE_LOCATION=/anything-llm && \
+```bash
 podman run --rm ghcr.io/containers/podlet --install --description "Anything LLM" \
    podman run \
    -d \
-    -p 127.0.0.1:3001:3001 \
+    -p 3001:3001 \
    --name anything-llm \
-    --network localai \
    --cap-add SYS_ADMIN \
-    -v ${STORAGE_LOCATION}:/app/server/storage \
-    -v ${STORAGE_LOCATION}/.env:/app/server/.env \
+    --restart always \
+    -v anything-llm:/app/server \
    -e STORAGE_DIR="/app/server/storage" \
-    mintplexlabs/anythingllm
-```
-
-Make sure to add
-
-```conf
-[Service]
-Restart=always
+    docker.io/mintplexlabs/anythingllm
 ```

 To the service to have them autostart.

 Put the generated files in `/usr/share/containers/systemd/`.

-## Podman systemd service
+## Now with Nginx and Certbot

 See [generating AWS credentials](cloud/graduated/aws_iam/README.md)

@@ -99,9 +101,14 @@ aws configure
 Open http/s in firewalld:

 ```bash
+# Remember to firewall-cmd --set-default-zone=public
 firewall-cmd --permanent --zone=public --add-service=http
 firewall-cmd --permanent --zone=public --add-service=https
 firewall-cmd --reload
+
+# or
+ufw allow 80/tcp
+ufw allow 443/tcp
 ```

 Here are the detailed instructions for installing and setting up Nginx on Fedora Linux with Certbot
@@ -109,33 +116,44 @@ using the Route53 DNS challenge to put in front of a service called "Anything LL
 3001 with WebSockets. The domain will be chatreesept.reeseapps.com.

 1. Install Nginx:
-   ```
-   dnf install -y nginx
-   ```
+    
+    ```
+    dnf install -y nginx
+    ```

 2. Start and enable Nginx service:
-   ```
-   systemctl enable --now nginx
-   ```
+    
+    ```
+    systemctl enable --now nginx
+    ```

 3. Install Certbot and the Route53 DNS plugin:
-   ```
-   dnf install -y certbot python3-certbot-dns-route53
-   ```
+    
+    ```
+    # Fedora
+    dnf install -y certbot python3-certbot-dns-route53
+
+    # Arch
+    pacman -S certbot certbot-dns-route53
+    ```

 4. Request a certificate for your domain using the Route53 DNS challenge:
-   ```
-   certbot certonly --dns-route53 -d chatreesept.reeseapps.com
-   ```
-   Follow the prompts to provide your Route53 credentials and email address.
+
+    ```
+    certbot certonly --dns-route53 -d chatreesept.reeseapps.com
+    ```
+    
+    Follow the prompts to provide your Route53 credentials and email address.

 5. Configure Nginx for your domain: Create a new Nginx configuration file for your domain:
-   ```
-   vim /etc/nginx/conf.d/chatreesept.reeseapps.com.conf
-   ```

-   Add the following configuration to the file:
-   ```
+    Update your nginx conf with the following
+
+    ```
+    vim /etc/nginx/nginx.conf
+    ```
+
+    ```
    keepalive_timeout 1h;
    send_timeout 1h;
    client_body_timeout 1h;
@@ -143,7 +161,47 @@ using the Route53 DNS challenge to put in front of a service called "Anything LL
    proxy_connect_timeout 1h;
    proxy_read_timeout 1h;
    proxy_send_timeout 1h;
+    ```

+    ```
+    vim /etc/nginx/conf.d/ollama.reeselink.com.conf
+    ```
+
+    ```
+    server {
+        listen 80;
+        server_name ollama.reeselink.com;
+
+        location / {
+            return 301 https://$host$request_uri;
+        }
+    }
+
+    server {
+        listen 443 ssl;
+        server_name ollama.reeselink.com;
+
+        ssl_certificate /etc/letsencrypt/live/ollama.reeselink.com/fullchain.pem;
+        ssl_certificate_key /etc/letsencrypt/live/ollama.reeselink.com/privkey.pem;
+
+        location / {
+            proxy_pass http://localhost:11434;
+            proxy_http_version 1.1;
+            proxy_set_header Upgrade $http_upgrade;
+            proxy_set_header Connection "upgrade";
+            proxy_set_header Host $host;
+            proxy_cache_bypass $http_upgrade;
+            proxy_buffering off;
+        }
+    }
+    ```
+
+    ```
+    vim /etc/nginx/conf.d/chatreesept.reeseapps.com.conf
+    ```
+
+    Add the following configuration to the file:
+    ```
    server {
        listen 80;
        server_name chatreesept.reeseapps.com;
@@ -169,67 +227,118 @@ using the Route53 DNS challenge to put in front of a service called "Anything LL
            proxy_set_header Connection "upgrade";
            proxy_set_header Host $host;
            proxy_cache_bypass $http_upgrade;
+            proxy_buffering off;
        }
    }
-   ```
+    ``

 6. Test your Nginx configuration for syntax errors:
-   ```
-   nginx -t
-   ```
-   If there are no errors, reload Nginx to apply the changes:
-   ```
-   systemctl reload nginx
-   ```
+
+    ```
+    nginx -t
+    ```
+    
+    If there are no errors, reload Nginx to apply the changes:
+    
+    ```
+    systemctl reload nginx
+    ```

 7. Set up automatic certificate renewal: Add the following line to your crontab to renew the
-   certificate daily:
-   ```
-   sudo crontab -e
-   ```
-   Add the following line to the end of the file:
-   ```
-   0 0 * * * certbot renew --quiet --no-self-upgrade --pre-hook "systemctl stop nginx" --post-hook "systemctl start nginx"
-   ```
+    certificate daily:
+    
+    ```
+    
+    pacman -S cronie
+    sudo crontab -e
+    ```
+
+    Add the following line to the end of the file:
+    
+    ```
+    0 0 * * * certbot renew --quiet
+    ```

 Now, your "Anything LLM" service running on port 3001 with WebSockets is accessible through the
 domain chatreesept.reeseapps.com with a valid SSL certificate from Let's Encrypt. The certificate
 will be automatically renewed daily.

-## Nginx
+## Custom Models
+
+<https://www.gpu-mart.com/blog/import-models-from-huggingface-to-ollama>
+
+### From Existing Model

 ```bash
-certbot-3 certonly --dns-route53 -d chatreesept.reeseapps.com
+ollama show --modelfile opencoder > Modelfile
+PARAMETER num_ctx 8192
+ollama create opencoder-fix -f Modelfile
 ```

-Make sure to add the following timeout configurations to your http block:
+### From Scratch

-```conf
-server {
-   # Enable websocket connections for agent protocol.
-   location ~* ^/api/agent-invocation/(.*) {
-      proxy_pass http://0.0.0.0:3001;
-      proxy_http_version 1.1;
-      proxy_set_header Upgrade $http_upgrade;
-      proxy_set_header Connection "Upgrade";
-   }
+Install git lfs and clone the model you're interested in

-   listen 80;
-   server_name [insert FQDN here];
-   location / {
-      # Prevent timeouts on long-running requests.
-      proxy_connect_timeout       605;
-      proxy_send_timeout          605;
-      proxy_read_timeout          605;
-      send_timeout                605;
-      keepalive_timeout           605;
+```bash
+# Make sure you have git-lfs installed (https://git-lfs.com)
+git lfs install

-      # Enable readable HTTP Streaming for LLM streamed responses
-      proxy_buffering off; 
-      proxy_cache off;
+git clone https://huggingface.co/bartowski/Starling-LM-7B-beta-GGUF
+```

-      # Proxy your locally running service
-      proxy_pass  http://0.0.0.0:3001;
-    }
-}
+Create a modelfile
+
+```
+# Modelfile
+FROM "./path/to/gguf"
+
+TEMPLATE """{{ if .Prompt }}<|im_start|>
+{{ .Prompt }}<|im_end|>
+{{ end }}
+"""
+
+SYSTEM You are OpenCoder, created by OpenCoder Team.
+
+PARAMETER stop <|im_start|>
+PARAMETER stop <|im_end|>
+PARAMETER stop <|fim_prefix|>
+PARAMETER stop <|fim_middle|>
+PARAMETER stop <|fim_suffix|>
+PARAMETER stop <|fim_end|>
+PARAMETER stop """
+
+
+"""
+
+```
+
+Build the model
+
+```bash
+ollama create "Starling-LM-7B-beta-Q6_K" -f Modelfile
+```
+
+Run the model
+
+```bash
+ollama run Starling-LM-7B-beta-Q6_K:latest
+```
+
+## Converting to gguf
+
+<https://www.theregister.com/2024/07/14/quantization_llm_feature/>
+
+1. Clone the llama.cpp repository and install its dependencies:
+
+```bash
+git clone https://github.com/ggerganov/llama.cpp.git
+cd ~/llama.cpp
+python3 -m venv venv && source venv/bin/activate
+pip3 install -r requirements.txt
+
+mkdir ~/llama.cpp/models/mistral
+huggingface-cli login #necessary to download gated models
+huggingface-cli download mistralai/Mistral-7B-Instruct-v0.3 --local-dir ~/llama.cpp/models/mistral/
+
+python3 convert_hf_to_gguf.py ~/.cache/huggingface/hub/models--infly--OpenCoder-8B-Instruct/snapshots/01badbbf10c2dfd7e2a0b5f570065ef44548576c
 ```