From 31739320aa1ea8c1c30413c418ab873cb73f8aab Mon Sep 17 00:00:00 2001 From: ducoterra Date: Wed, 25 Feb 2026 16:01:08 -0500 Subject: [PATCH] add apple m4 max benchmark --- active/software_ai_stack/ai_stack.md | 50 ++++++++++++++++++++++------ 1 file changed, 39 insertions(+), 11 deletions(-) diff --git a/active/software_ai_stack/ai_stack.md b/active/software_ai_stack/ai_stack.md index 4a0d822..a1f66bc 100644 --- a/active/software_ai_stack/ai_stack.md +++ b/active/software_ai_stack/ai_stack.md @@ -21,13 +21,15 @@ - [Z-Image](#z-image) - [Flux](#flux) - [Embedding Models](#embedding-models) - - [Nomic](#nomic) + - [Qwen Embedding](#qwen-embedding) + - [Nomic Embedding](#nomic-embedding) - [llama.cpp](#llamacpp) - [stable-diffusion.cpp](#stable-diffusioncpp) - [open-webui](#open-webui) - [Install Services with Quadlets](#install-services-with-quadlets) - [Internal and External Pods](#internal-and-external-pods) - [Llama CPP Server](#llama-cpp-server) + - [Llama CPP Embedding Server](#llama-cpp-embedding-server) - [Stable Diffusion CPP](#stable-diffusion-cpp) - [Open Webui](#open-webui-1) - [Install the update script](#install-the-update-script) @@ -239,7 +241,14 @@ hf download --local-dir . unsloth/Qwen3-8B-GGUF Qwen3-8B-Q8_0.gguf #### Embedding Models -##### Nomic +##### Qwen Embedding + +```bash +mkdir /home/ai/models/embedding/qwen3-vl-embed && cd /home/ai/models/embedding/qwen3-vl-embed +hf download --local-dir . dam2452/Qwen3-VL-Embedding-8B-GGUF Qwen3-VL-Embedding-8B-Q8_0.gguf +``` + +##### Nomic Embedding ```bash # nomic-embed-text-v2 @@ -352,7 +361,7 @@ localhost/stable-diffusion-cpp:latest \ ```bash mkdir /home/ai/.env # Create a file called open-webui-env with `WEBUI_SECRET_KEY="some-random-key" -scp active/device_framework_desktop/secrets/open-webui-env deskwork-ai:.env/ +scp active/software_ai_stack/secrets/open-webui-env deskwork-ai:.env/ # Will be available on port 8080 podman run \ @@ -368,7 +377,8 @@ Use the following connections: | Service | Endpoint | | ------------------------- | ----------------------------------------- | -| llama.cpp | | +| llama.cpp server | | +| llama.cpp embed | | | stable-diffusion.cpp | | | stable-diffusion.cpp edit | | @@ -381,7 +391,7 @@ stable-diffusion.cpp services while allowing the frontend services to communicate with those containers. ```bash -scp -r active/device_framework_desktop/quadlets_pods/* deskwork-ai:.config/containers/systemd/ +scp -r active/software_ai_stack/quadlets_pods/* deskwork-ai:.config/containers/systemd/ ssh deskwork-ai systemctl --user daemon-reload systemctl --user start ai-internal-pod.service ai-external-pod.service @@ -392,7 +402,18 @@ systemctl --user start ai-internal-pod.service ai-external-pod.service Installs the llama.cpp server to run our text models. ```bash -scp -r active/device_framework_desktop/quadlets_llama_server/* deskwork-ai:.config/containers/systemd/ +scp -r active/software_ai_stack/quadlets_llama_server/* deskwork-ai:.config/containers/systemd/ +ssh deskwork-ai +systemctl --user daemon-reload +systemctl --user restart ai-internal-pod.service +``` + +### Llama CPP Embedding Server + +Installs the llama.cpp server to run our embedding models + +```bash +scp -r active/software_ai_stack/quadlets_llama_embed/* deskwork-ai:.config/containers/systemd/ ssh deskwork-ai systemctl --user daemon-reload systemctl --user restart ai-internal-pod.service @@ -403,7 +424,7 @@ systemctl --user restart ai-internal-pod.service Installs the stable-diffusion.cpp server to run our image models. ```bash -scp -r active/device_framework_desktop/quadlets_stable_diffusion/* deskwork-ai:.config/containers/systemd/ +scp -r active/software_ai_stack/quadlets_stable_diffusion/* deskwork-ai:.config/containers/systemd/ ssh deskwork-ai systemctl --user daemon-reload systemctl --user restart ai-internal-pod.service @@ -414,7 +435,7 @@ systemctl --user restart ai-internal-pod.service Installs the open webui frontend. ```bash -scp -r active/device_framework_desktop/quadlets_openwebui/* deskwork-ai:.config/containers/systemd/ +scp -r active/software_ai_stack/quadlets_openwebui/* deskwork-ai:.config/containers/systemd/ ssh deskwork-ai systemctl --user daemon-reload systemctl --user restart ai-external-pod.service @@ -429,7 +450,7 @@ will be up at `http://host.containers.internal:8000`. # 1. Builds the latest llama.cpp and stable-diffusion.cpp # 2. Pulls the latest open-webui # 3. Restarts all services -scp active/device_framework_desktop/update-script.sh deskwork-ai: +scp active/software_ai_stack/update-script.sh deskwork-ai: ssh deskwork-ai chmod +x update-script.sh ./update-script.sh @@ -440,7 +461,7 @@ chmod +x update-script.sh Optionally install a guest openwebui service. ```bash -scp -r active/device_framework_desktop/systemd/. deskwork-ai:.config/systemd/user/ +scp -r active/software_ai_stack/systemd/. deskwork-ai:.config/systemd/user/ ssh deskwork-ai systemctl --user daemon-reload systemctl --user enable open-webui-guest-start.timer @@ -496,4 +517,11 @@ NVIDIA GeForce RTX 3090 | model | size | params | backend | ngl | test | t/s | | ---------------- | --------: | ------: | ------- | ---: | ----: | --------------: | | gpt-oss 20B Q8_0 | 11.27 GiB | 20.91 B | CUDA | 99 | pp512 | 4297.72 ± 35.60 | -| gpt-oss 20B Q8_0 | 11.27 GiB | 20.91 B | CUDA | 99 | tg128 | 197.73 ± 0.62 | \ No newline at end of file +| gpt-oss 20B Q8_0 | 11.27 GiB | 20.91 B | CUDA | 99 | tg128 | 197.73 ± 0.62 | + +Apple M4 max + +| model | test | t/s | +| :---------------------------- | -----: | -------------: | +| unsloth/gpt-oss-20b-Q8_0-GGUF | pp2048 | 1579.12 ± 7.12 | +| unsloth/gpt-oss-20b-Q8_0-GGUF | tg32 | 113.00 ± 2.81 |