Ollama
https://github.com/ollama/ollama
Run natively with GPU support
https://ollama.com/download/linux
# Install script
curl -fsSL https://ollama.com/install.sh | sh
# Check service is running
systemctl status ollama
Remember to add Environment="OLLAMA_HOST=0.0.0.0" to /etc/systemd/system/ollama.service to
make it accessible on the network.
For Radeon 6000 cards you'll need to add Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0" as well.
# Pull models
# Try to use higher parameter models. Grab the q5_K_M variant at minimum.
# For a 24GB VRAM Card I'd recommend:
# Anything-LLM Coding
ollama pull qwen2.5-coder:14b-instruct-q5_K_M
# Anything-LLM Math
ollama pull qwen2-math:7b-instruct-fp16
# Anything-LLM Chat
ollama pull llama3.2-vision:11b-instruct-q8_0
# VSCode Continue Autocomplete
ollama pull starcoder2:15b-q5_K_M
# VSCode Continue Chat
ollama pull llama3.1:8b-instruct-fp16
# VSCode Continue Embedder
ollama pull nomic-embed-text:137m-v1.5-fp16
Note your ollama instance will be available to podman containers via http://host.containers.internal:11434
Unsticking models stuck in "Stopping"
ollama ps | grep -i stopping
pgrep ollama | xargs -I '%' sh -c 'kill %'
Run Anything LLM Interface
podman run \
-d \
-p 3001:3001 \
--name anything-llm \
--cap-add SYS_ADMIN \
-v anything-llm:/app/server \
-e STORAGE_DIR="/app/server/storage" \
docker.io/mintplexlabs/anythingllm
This should now be accessible on port 3001. Note, you'll need to allow traffic between podman and the host:
Use podman network ls to see which networks podman is running on and podman network inspect
to get the IP address range. Then allow traffic from that range to port 11434 (ollama):
ufw allow from 10.89.0.1/24 to any port 11434
Anything LLM Quadlet with Podlet
podman run --rm ghcr.io/containers/podlet --install --description "Anything LLM" \
podman run \
-d \
-p 3001:3001 \
--name anything-llm \
--cap-add SYS_ADMIN \
--restart always \
-v anything-llm:/app/server \
-e STORAGE_DIR="/app/server/storage" \
docker.io/mintplexlabs/anythingllm
To the service to have them autostart.
Put the generated files in /usr/share/containers/systemd/.
Now with Nginx and Certbot
See generating AWS credentials
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
./aws/install
# Configure default credentials
aws configure
Open http/s in firewalld:
# Remember to firewall-cmd --set-default-zone=public
firewall-cmd --permanent --zone=public --add-service=http
firewall-cmd --permanent --zone=public --add-service=https
firewall-cmd --reload
# or
ufw allow 80/tcp
ufw allow 443/tcp
Here are the detailed instructions for installing and setting up Nginx on Fedora Linux with Certbot using the Route53 DNS challenge to put in front of a service called "Anything LLM" running on port 3001 with WebSockets. The domain will be chatreesept.reeseapps.com.
-
Install Nginx:
dnf install -y nginx -
Start and enable Nginx service:
systemctl enable --now nginx -
Install Certbot and the Route53 DNS plugin:
# Fedora dnf install -y certbot python3-certbot-dns-route53 # Arch pacman -S certbot certbot-dns-route53 -
Request a certificate for your domain using the Route53 DNS challenge:
certbot certonly --dns-route53 -d chatreesept.reeseapps.comFollow the prompts to provide your Route53 credentials and email address.
-
Configure Nginx for your domain: Create a new Nginx configuration file for your domain:
Update your nginx conf with the following
vim /etc/nginx/nginx.confkeepalive_timeout 1h; send_timeout 1h; client_body_timeout 1h; client_header_timeout 1h; proxy_connect_timeout 1h; proxy_read_timeout 1h; proxy_send_timeout 1h;vim /etc/nginx/conf.d/ollama.reeselink.com.confserver { listen 80; server_name ollama.reeselink.com; location / { return 301 https://$host$request_uri; } } server { listen 443 ssl; server_name ollama.reeselink.com; ssl_certificate /etc/letsencrypt/live/ollama.reeselink.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/ollama.reeselink.com/privkey.pem; location / { proxy_pass http://localhost:11434; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_set_header Host $host; proxy_cache_bypass $http_upgrade; proxy_buffering off; } }vim /etc/nginx/conf.d/chatreesept.reeseapps.com.confAdd the following configuration to the file:
server { listen 80; server_name chatreesept.reeseapps.com; location / { return 301 https://$host$request_uri; } } server { listen 443 ssl; server_name chatreesept.reeseapps.com; ssl_certificate /etc/letsencrypt/live/chatreesept.reeseapps.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/chatreesept.reeseapps.com/privkey.pem; location / { client_max_body_size 50m; proxy_pass http://localhost:3001; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_set_header Host $host; proxy_cache_bypass $http_upgrade; proxy_buffering off; } } `` -
Test your Nginx configuration for syntax errors:
nginx -tIf there are no errors, reload Nginx to apply the changes:
systemctl reload nginx -
Set up automatic certificate renewal: Add the following line to your crontab to renew the certificate daily:
pacman -S cronie sudo crontab -eAdd the following line to the end of the file:
0 0 * * * certbot renew --quiet
Now, your "Anything LLM" service running on port 3001 with WebSockets is accessible through the domain chatreesept.reeseapps.com with a valid SSL certificate from Let's Encrypt. The certificate will be automatically renewed daily.
Custom Models
https://www.gpu-mart.com/blog/import-models-from-huggingface-to-ollama
https://www.hostinger.com/tutorials/ollama-cli-tutorial#Setting_up_Ollama_in_the_CLI
From Existing Model
ollama show --modelfile opencoder > Modelfile
PARAMETER num_ctx 8192
ollama create opencoder-fix -f Modelfile
From Scratch
Install git lfs and clone the model you're interested in
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/bartowski/Starling-LM-7B-beta-GGUF
Create a modelfile
# Modelfile
FROM "./path/to/gguf"
TEMPLATE """{{ if .Prompt }}<|im_start|>
{{ .Prompt }}<|im_end|>
{{ end }}
"""
SYSTEM You are OpenCoder, created by OpenCoder Team.
PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>
PARAMETER stop <|fim_prefix|>
PARAMETER stop <|fim_middle|>
PARAMETER stop <|fim_suffix|>
PARAMETER stop <|fim_end|>
PARAMETER stop """
"""
Build the model
ollama create "Starling-LM-7B-beta-Q6_K" -f Modelfile
Run the model
ollama run Starling-LM-7B-beta-Q6_K:latest
Converting to gguf
https://www.theregister.com/2024/07/14/quantization_llm_feature/
- Clone the llama.cpp repository and install its dependencies:
git clone https://github.com/ggerganov/llama.cpp.git
cd ~/llama.cpp
python3 -m venv venv && source venv/bin/activate
pip3 install -r requirements.txt
mkdir ~/llama.cpp/models/mistral
huggingface-cli login #necessary to download gated models
huggingface-cli download mistralai/Mistral-7B-Instruct-v0.3 --local-dir ~/llama.cpp/models/mistral/
python3 convert_hf_to_gguf.py ~/.cache/huggingface/hub/models--infly--OpenCoder-8B-Instruct/snapshots/01badbbf10c2dfd7e2a0b5f570065ef44548576c