productionzie ollama and overhaul docs

2024-11-23 09:38:27 -05:00
parent 224d86f5af
commit df06b206af
1 changed files with 224 additions and 115 deletions
--- a/podman/incubating/ollama/README.md
+++ b/podman/incubating/ollama/README.md
@@ -2,88 +2,90 @@
 <https://github.com/ollama/ollama>
-## Running with Podman
+## Run natively with GPU support
 <https://ollama.com/download/linux>
 <https://ollama.com/library>
 ```bash
-podman network create localai
+# Install script
 curl -fsSL https://ollama.com/install.sh | sh
 # Check service is running
 systemctl status ollama
 ```
 Remember to add `Environment="OLLAMA_HOST=0.0.0.0"` to `/etc/systemd/system/ollama.service` to 
 make it accessible on the network.
 For Radeon 6000 cards you'll need to add `Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"` as well.
 ```bash
 # Pull models
 # Try to use higher parameter models. Grab the q5_K_M variant at minimum.
 # For a 24GB VRAM Card I'd recommend:
 # Anything-LLM Coding
 ollama pull qwen2.5-coder:14b-instruct-q5_K_M
 # Anything-LLM Math
 ollama pull qwen2-math:7b-instruct-fp16
 # Anything-LLM Chat
 ollama pull llama3.2-vision:11b-instruct-q8_0
 # VSCode Continue Autocomplete
 ollama pull starcoder2:15b-q5_K_M
 # VSCode Continue Chat
 ollama pull llama3.1:8b-instruct-fp16
 # VSCode Continue Embedder
 ollama pull nomic-embed-text:137m-v1.5-fp16
 ```
 Note your ollama instance will be available to podman containers via `http://host.containers.internal:11434`
 ## Run Anything LLM Interface
 ```bash
 podman run \
    -d \
-    -v ollama:/root/.ollama \
+    -p 3001:3001 \
    -p 127.0.0.1:11434:po \
    --network localai \
    --name ollama \
    docker.io/ollama/ollama
 # Pull new models
 podman container exec ollama ollama pull llama3.2:3b
 podman container exec ollama ollama pull llama3.2:1b
 podman container exec ollama ollama pull llama3.2-vision:11b
 podman container exec ollama ollama pull llava-llama3:8b
 podman container exec ollama ollama pull deepseek-coder-v2:16b
 podman container exec ollama ollama pull opencoder:8b
 podman container exec ollama ollama pull codestral:22b
 # Talk to an existing model via cli
 podman container exec -it ollama ollama run llama3.2:3b
 podman run \
    -d \
    -p 127.0.0.1:3001:3001 \
    --name anything-llm \
    --network localai \
    --cap-add SYS_ADMIN \
    -v anything-llm:/app/server \
    -e STORAGE_DIR="/app/server/storage" \
-    mintplexlabs/anythingllm
+    docker.io/mintplexlabs/anythingllm
 ```
-### Quadlets with Podlet
+This should now be accessible on port 3001. Note, you'll need to allow traffic between podman
 and the host:
 Use `podman network ls` to see which networks podman is running on and `podman network inspect`
 to get the IP address range. Then allow traffic from that range to port 11434 (ollama):
 ```bash
-# Create volume for ollama
+ufw allow from 10.89.0.1/24 to any port 11434
-mkdir /ollama
+```
-podman run --rm ghcr.io/containers/podlet --install --description "Local AI Network" \
+## Anything LLM Quadlet with Podlet
    podman network create localai
-podman run --rm ghcr.io/containers/podlet --install --description "Ollama" \
+```bash
    podman run \
    -d \
    -v /ollama:/root/.ollama \
    -p 127.0.0.1:11434:11434 \
    --network localai \
    --name ollama \
    docker.io/ollama/ollama
 export STORAGE_LOCATION=/anything-llm && \
 podman run --rm ghcr.io/containers/podlet --install --description "Anything LLM" \
    podman run \
    -d \
-    -p 127.0.0.1:3001:3001 \
+    -p 3001:3001 \
    --name anything-llm \
    --network localai \
    --cap-add SYS_ADMIN \
-    -v ${STORAGE_LOCATION}:/app/server/storage \
+    --restart always \
-    -v ${STORAGE_LOCATION}/.env:/app/server/.env \
+    -v anything-llm:/app/server \
    -e STORAGE_DIR="/app/server/storage" \
-    mintplexlabs/anythingllm
+    docker.io/mintplexlabs/anythingllm
 ```
 Make sure to add
 ```conf
 [Service]
 Restart=always
 ```
 To the service to have them autostart.
 Put the generated files in `/usr/share/containers/systemd/`.
-## Podman systemd service
+## Now with Nginx and Certbot
 See [generating AWS credentials](cloud/graduated/aws_iam/README.md)
@@ -99,9 +101,14 @@ aws configure
 Open http/s in firewalld:
 ```bash
 # Remember to firewall-cmd --set-default-zone=public
 firewall-cmd --permanent --zone=public --add-service=http
 firewall-cmd --permanent --zone=public --add-service=https
 firewall-cmd --reload
 # or
 ufw allow 80/tcp
 ufw allow 443/tcp
 ```
 Here are the detailed instructions for installing and setting up Nginx on Fedora Linux with Certbot
@@ -109,32 +116,43 @@ using the Route53 DNS challenge to put in front of a service called "Anything LL
 3001 with WebSockets. The domain will be chatreesept.reeseapps.com.
 1. Install Nginx:
    ```
    dnf install -y nginx
    ```
 2. Start and enable Nginx service:
    ```
    systemctl enable --now nginx
    ```
 3. Install Certbot and the Route53 DNS plugin:
    ```
    # Fedora
    dnf install -y certbot python3-certbot-dns-route53
    # Arch
    pacman -S certbot certbot-dns-route53
    ```
 4. Request a certificate for your domain using the Route53 DNS challenge:
    ```
    certbot certonly --dns-route53 -d chatreesept.reeseapps.com
    ```
    Follow the prompts to provide your Route53 credentials and email address.
 5. Configure Nginx for your domain: Create a new Nginx configuration file for your domain:
    Update your nginx conf with the following
    ```
-   vim /etc/nginx/conf.d/chatreesept.reeseapps.com.conf
+    vim /etc/nginx/nginx.conf
    ```
   Add the following configuration to the file:
    ```
    keepalive_timeout 1h;
    send_timeout 1h;
@@ -143,7 +161,47 @@ using the Route53 DNS challenge to put in front of a service called "Anything LL
    proxy_connect_timeout 1h;
    proxy_read_timeout 1h;
    proxy_send_timeout 1h;
    ```
    ```
    vim /etc/nginx/conf.d/ollama.reeselink.com.conf
    ```
    ```
    server {
        listen 80;
        server_name ollama.reeselink.com;
        location / {
            return 301 https://$host$request_uri;
        }
    }
    server {
        listen 443 ssl;
        server_name ollama.reeselink.com;
        ssl_certificate /etc/letsencrypt/live/ollama.reeselink.com/fullchain.pem;
        ssl_certificate_key /etc/letsencrypt/live/ollama.reeselink.com/privkey.pem;
        location / {
            proxy_pass http://localhost:11434;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_set_header Host $host;
            proxy_cache_bypass $http_upgrade;
            proxy_buffering off;
        }
    }
    ```
    ```
    vim /etc/nginx/conf.d/chatreesept.reeseapps.com.conf
    ```
    Add the following configuration to the file:
    ```
    server {
        listen 80;
        server_name chatreesept.reeseapps.com;
@@ -169,67 +227,118 @@ using the Route53 DNS challenge to put in front of a service called "Anything LL
            proxy_set_header Connection "upgrade";
            proxy_set_header Host $host;
            proxy_cache_bypass $http_upgrade;
            proxy_buffering off;
        }
    }
-   ```
+    ``
 6. Test your Nginx configuration for syntax errors:
    ```
    nginx -t
    ```
    If there are no errors, reload Nginx to apply the changes:
    ```
    systemctl reload nginx
    ```
 7. Set up automatic certificate renewal: Add the following line to your crontab to renew the
    certificate daily:
    ```
    pacman -S cronie
    sudo crontab -e
    ```
    Add the following line to the end of the file:
    ```
-   0 0 * * * certbot renew --quiet --no-self-upgrade --pre-hook "systemctl stop nginx" --post-hook "systemctl start nginx"
+    0 0 * * * certbot renew --quiet
    ```
 Now, your "Anything LLM" service running on port 3001 with WebSockets is accessible through the
 domain chatreesept.reeseapps.com with a valid SSL certificate from Let's Encrypt. The certificate
 will be automatically renewed daily.
-## Nginx
+## Custom Models
 <https://www.gpu-mart.com/blog/import-models-from-huggingface-to-ollama>
 ### From Existing Model
 ```bash
-certbot-3 certonly --dns-route53 -d chatreesept.reeseapps.com
+ollama show --modelfile opencoder > Modelfile
 PARAMETER num_ctx 8192
 ollama create opencoder-fix -f Modelfile
 ```
-Make sure to add the following timeout configurations to your http block:
+### From Scratch
-```conf
+Install git lfs and clone the model you're interested in
 server {
   # Enable websocket connections for agent protocol.
   location ~* ^/api/agent-invocation/(.*) {
      proxy_pass http://0.0.0.0:3001;
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection "Upgrade";
   }
-   listen 80;
+```bash
-   server_name [insert FQDN here];
+# Make sure you have git-lfs installed (https://git-lfs.com)
-   location / {
+git lfs install
      # Prevent timeouts on long-running requests.
      proxy_connect_timeout       605;
      proxy_send_timeout          605;
      proxy_read_timeout          605;
      send_timeout                605;
      keepalive_timeout           605;
-      # Enable readable HTTP Streaming for LLM streamed responses
+git clone https://huggingface.co/bartowski/Starling-LM-7B-beta-GGUF
-      proxy_buffering off; 
+```
-      proxy_cache off;
+
-
+Create a modelfile
-      # Proxy your locally running service
+
-      proxy_pass  http://0.0.0.0:3001;
+```
-    }
+# Modelfile
-}
+FROM "./path/to/gguf"
 TEMPLATE """{{ if .Prompt }}<|im_start|>
 {{ .Prompt }}<|im_end|>
 {{ end }}
 """
 SYSTEM You are OpenCoder, created by OpenCoder Team.
 PARAMETER stop <|im_start|>
 PARAMETER stop <|im_end|>
 PARAMETER stop <|fim_prefix|>
 PARAMETER stop <|fim_middle|>
 PARAMETER stop <|fim_suffix|>
 PARAMETER stop <|fim_end|>
 PARAMETER stop """
 """
 ```
 Build the model
 ```bash
 ollama create "Starling-LM-7B-beta-Q6_K" -f Modelfile
 ```
 Run the model
 ```bash
 ollama run Starling-LM-7B-beta-Q6_K:latest
 ```
 ## Converting to gguf
 <https://www.theregister.com/2024/07/14/quantization_llm_feature/>
 1. Clone the llama.cpp repository and install its dependencies:
 ```bash
 git clone https://github.com/ggerganov/llama.cpp.git
 cd ~/llama.cpp
 python3 -m venv venv && source venv/bin/activate
 pip3 install -r requirements.txt
 mkdir ~/llama.cpp/models/mistral
 huggingface-cli login #necessary to download gated models
 huggingface-cli download mistralai/Mistral-7B-Instruct-v0.3 --local-dir ~/llama.cpp/models/mistral/
 python3 convert_hf_to_gguf.py ~/.cache/huggingface/hub/models--infly--OpenCoder-8B-Instruct/snapshots/01badbbf10c2dfd7e2a0b5f570065ef44548576c
 ```