add apple m4 max benchmark
All checks were successful
Podman DDNS Image / build-and-push-ddns (push) Successful in 1m4s
All checks were successful
Podman DDNS Image / build-and-push-ddns (push) Successful in 1m4s
This commit is contained in:
@@ -21,13 +21,15 @@
|
|||||||
- [Z-Image](#z-image)
|
- [Z-Image](#z-image)
|
||||||
- [Flux](#flux)
|
- [Flux](#flux)
|
||||||
- [Embedding Models](#embedding-models)
|
- [Embedding Models](#embedding-models)
|
||||||
- [Nomic](#nomic)
|
- [Qwen Embedding](#qwen-embedding)
|
||||||
|
- [Nomic Embedding](#nomic-embedding)
|
||||||
- [llama.cpp](#llamacpp)
|
- [llama.cpp](#llamacpp)
|
||||||
- [stable-diffusion.cpp](#stable-diffusioncpp)
|
- [stable-diffusion.cpp](#stable-diffusioncpp)
|
||||||
- [open-webui](#open-webui)
|
- [open-webui](#open-webui)
|
||||||
- [Install Services with Quadlets](#install-services-with-quadlets)
|
- [Install Services with Quadlets](#install-services-with-quadlets)
|
||||||
- [Internal and External Pods](#internal-and-external-pods)
|
- [Internal and External Pods](#internal-and-external-pods)
|
||||||
- [Llama CPP Server](#llama-cpp-server)
|
- [Llama CPP Server](#llama-cpp-server)
|
||||||
|
- [Llama CPP Embedding Server](#llama-cpp-embedding-server)
|
||||||
- [Stable Diffusion CPP](#stable-diffusion-cpp)
|
- [Stable Diffusion CPP](#stable-diffusion-cpp)
|
||||||
- [Open Webui](#open-webui-1)
|
- [Open Webui](#open-webui-1)
|
||||||
- [Install the update script](#install-the-update-script)
|
- [Install the update script](#install-the-update-script)
|
||||||
@@ -239,7 +241,14 @@ hf download --local-dir . unsloth/Qwen3-8B-GGUF Qwen3-8B-Q8_0.gguf
|
|||||||
|
|
||||||
#### Embedding Models
|
#### Embedding Models
|
||||||
|
|
||||||
##### Nomic
|
##### Qwen Embedding
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkdir /home/ai/models/embedding/qwen3-vl-embed && cd /home/ai/models/embedding/qwen3-vl-embed
|
||||||
|
hf download --local-dir . dam2452/Qwen3-VL-Embedding-8B-GGUF Qwen3-VL-Embedding-8B-Q8_0.gguf
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Nomic Embedding
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# nomic-embed-text-v2
|
# nomic-embed-text-v2
|
||||||
@@ -352,7 +361,7 @@ localhost/stable-diffusion-cpp:latest \
|
|||||||
```bash
|
```bash
|
||||||
mkdir /home/ai/.env
|
mkdir /home/ai/.env
|
||||||
# Create a file called open-webui-env with `WEBUI_SECRET_KEY="some-random-key"
|
# Create a file called open-webui-env with `WEBUI_SECRET_KEY="some-random-key"
|
||||||
scp active/device_framework_desktop/secrets/open-webui-env deskwork-ai:.env/
|
scp active/software_ai_stack/secrets/open-webui-env deskwork-ai:.env/
|
||||||
|
|
||||||
# Will be available on port 8080
|
# Will be available on port 8080
|
||||||
podman run \
|
podman run \
|
||||||
@@ -368,7 +377,8 @@ Use the following connections:
|
|||||||
|
|
||||||
| Service | Endpoint |
|
| Service | Endpoint |
|
||||||
| ------------------------- | ----------------------------------------- |
|
| ------------------------- | ----------------------------------------- |
|
||||||
| llama.cpp | <http://host.containers.internal:8000> |
|
| llama.cpp server | <http://host.containers.internal:8000> |
|
||||||
|
| llama.cpp embed | <http://host.containers.internal:8001> |
|
||||||
| stable-diffusion.cpp | <http://host.containers.internal:1234/v1> |
|
| stable-diffusion.cpp | <http://host.containers.internal:1234/v1> |
|
||||||
| stable-diffusion.cpp edit | <http://host.containers.internal:1235/v1> |
|
| stable-diffusion.cpp edit | <http://host.containers.internal:1235/v1> |
|
||||||
|
|
||||||
@@ -381,7 +391,7 @@ stable-diffusion.cpp services while allowing the frontend services to
|
|||||||
communicate with those containers.
|
communicate with those containers.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
scp -r active/device_framework_desktop/quadlets_pods/* deskwork-ai:.config/containers/systemd/
|
scp -r active/software_ai_stack/quadlets_pods/* deskwork-ai:.config/containers/systemd/
|
||||||
ssh deskwork-ai
|
ssh deskwork-ai
|
||||||
systemctl --user daemon-reload
|
systemctl --user daemon-reload
|
||||||
systemctl --user start ai-internal-pod.service ai-external-pod.service
|
systemctl --user start ai-internal-pod.service ai-external-pod.service
|
||||||
@@ -392,7 +402,18 @@ systemctl --user start ai-internal-pod.service ai-external-pod.service
|
|||||||
Installs the llama.cpp server to run our text models.
|
Installs the llama.cpp server to run our text models.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
scp -r active/device_framework_desktop/quadlets_llama_server/* deskwork-ai:.config/containers/systemd/
|
scp -r active/software_ai_stack/quadlets_llama_server/* deskwork-ai:.config/containers/systemd/
|
||||||
|
ssh deskwork-ai
|
||||||
|
systemctl --user daemon-reload
|
||||||
|
systemctl --user restart ai-internal-pod.service
|
||||||
|
```
|
||||||
|
|
||||||
|
### Llama CPP Embedding Server
|
||||||
|
|
||||||
|
Installs the llama.cpp server to run our embedding models
|
||||||
|
|
||||||
|
```bash
|
||||||
|
scp -r active/software_ai_stack/quadlets_llama_embed/* deskwork-ai:.config/containers/systemd/
|
||||||
ssh deskwork-ai
|
ssh deskwork-ai
|
||||||
systemctl --user daemon-reload
|
systemctl --user daemon-reload
|
||||||
systemctl --user restart ai-internal-pod.service
|
systemctl --user restart ai-internal-pod.service
|
||||||
@@ -403,7 +424,7 @@ systemctl --user restart ai-internal-pod.service
|
|||||||
Installs the stable-diffusion.cpp server to run our image models.
|
Installs the stable-diffusion.cpp server to run our image models.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
scp -r active/device_framework_desktop/quadlets_stable_diffusion/* deskwork-ai:.config/containers/systemd/
|
scp -r active/software_ai_stack/quadlets_stable_diffusion/* deskwork-ai:.config/containers/systemd/
|
||||||
ssh deskwork-ai
|
ssh deskwork-ai
|
||||||
systemctl --user daemon-reload
|
systemctl --user daemon-reload
|
||||||
systemctl --user restart ai-internal-pod.service
|
systemctl --user restart ai-internal-pod.service
|
||||||
@@ -414,7 +435,7 @@ systemctl --user restart ai-internal-pod.service
|
|||||||
Installs the open webui frontend.
|
Installs the open webui frontend.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
scp -r active/device_framework_desktop/quadlets_openwebui/* deskwork-ai:.config/containers/systemd/
|
scp -r active/software_ai_stack/quadlets_openwebui/* deskwork-ai:.config/containers/systemd/
|
||||||
ssh deskwork-ai
|
ssh deskwork-ai
|
||||||
systemctl --user daemon-reload
|
systemctl --user daemon-reload
|
||||||
systemctl --user restart ai-external-pod.service
|
systemctl --user restart ai-external-pod.service
|
||||||
@@ -429,7 +450,7 @@ will be up at `http://host.containers.internal:8000`.
|
|||||||
# 1. Builds the latest llama.cpp and stable-diffusion.cpp
|
# 1. Builds the latest llama.cpp and stable-diffusion.cpp
|
||||||
# 2. Pulls the latest open-webui
|
# 2. Pulls the latest open-webui
|
||||||
# 3. Restarts all services
|
# 3. Restarts all services
|
||||||
scp active/device_framework_desktop/update-script.sh deskwork-ai:
|
scp active/software_ai_stack/update-script.sh deskwork-ai:
|
||||||
ssh deskwork-ai
|
ssh deskwork-ai
|
||||||
chmod +x update-script.sh
|
chmod +x update-script.sh
|
||||||
./update-script.sh
|
./update-script.sh
|
||||||
@@ -440,7 +461,7 @@ chmod +x update-script.sh
|
|||||||
Optionally install a guest openwebui service.
|
Optionally install a guest openwebui service.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
scp -r active/device_framework_desktop/systemd/. deskwork-ai:.config/systemd/user/
|
scp -r active/software_ai_stack/systemd/. deskwork-ai:.config/systemd/user/
|
||||||
ssh deskwork-ai
|
ssh deskwork-ai
|
||||||
systemctl --user daemon-reload
|
systemctl --user daemon-reload
|
||||||
systemctl --user enable open-webui-guest-start.timer
|
systemctl --user enable open-webui-guest-start.timer
|
||||||
@@ -497,3 +518,10 @@ NVIDIA GeForce RTX 3090
|
|||||||
| ---------------- | --------: | ------: | ------- | ---: | ----: | --------------: |
|
| ---------------- | --------: | ------: | ------- | ---: | ----: | --------------: |
|
||||||
| gpt-oss 20B Q8_0 | 11.27 GiB | 20.91 B | CUDA | 99 | pp512 | 4297.72 ± 35.60 |
|
| gpt-oss 20B Q8_0 | 11.27 GiB | 20.91 B | CUDA | 99 | pp512 | 4297.72 ± 35.60 |
|
||||||
| gpt-oss 20B Q8_0 | 11.27 GiB | 20.91 B | CUDA | 99 | tg128 | 197.73 ± 0.62 |
|
| gpt-oss 20B Q8_0 | 11.27 GiB | 20.91 B | CUDA | 99 | tg128 | 197.73 ± 0.62 |
|
||||||
|
|
||||||
|
Apple M4 max
|
||||||
|
|
||||||
|
| model | test | t/s |
|
||||||
|
| :---------------------------- | -----: | -------------: |
|
||||||
|
| unsloth/gpt-oss-20b-Q8_0-GGUF | pp2048 | 1579.12 ± 7.12 |
|
||||||
|
| unsloth/gpt-oss-20b-Q8_0-GGUF | tg32 | 113.00 ± 2.81 |
|
||||||
|
|||||||
Reference in New Issue
Block a user