checkpoint commit

2026-05-05 06:26:40 -04:00
parent e43c534ceb
commit f2015e2c71
76 changed files with 4265 additions and 235 deletions
--- a/stories/README.md
+++ b/stories/README.md
@@ -0,0 +1,22 @@
+# Stories
+
+Stories I want to tell. Unlike the `active` project, which stores notes in no
+particular order, stories are meant to be read and enjoyed from top to bottom.
+Hopefully they teach you something.
+
+## This is a mkdocs project
+
+`docs/` contains all the stories.
+
+`mkdocs.yml` holds the project config.
+
+```bash
+# Run the mkdocs site with mkdocs serve
+uv run mkdocs serve
+```
+
+## Errata
+
+mkdocs has a [bug that breaks
+autoreload](https://github.com/mkdocs/mkdocs/issues/4032). This project has
+pinned click to `8.2.1` to fix it.
--- a/stories/docs/10-fedora_server_admin.md
+++ b/stories/docs/10-fedora_server_admin.md
@@ -0,0 +1,92 @@
+# I want to build the perfect homelab server
+
+Date Written: 02/11/26
+
+Fedora Version: 43
+
+## Intro
+
+And it will run Fedora. Backstory: I ran Truenas for a long time. I started with
+Freenas, got confused by jails, switched to this new thing called Truenas,
+learned about ZFS, watched "Apps" come into existence, watched Kubernetes
+support rise and fall, watched the disaster that was Incus container/VM support
+and the subsequent "rollback" to traditional VMs, and got really tired of lack
+of control.
+
+App backup and restore was never, and still isn't, well supported. Taking
+snapshots of your apps pool and then sending those snapshots to a backup drive
+un-hid the ix-systems directory (which would frequently have thousands of
+snapshots due to Truenas's liberal use of subvolumes and would slow down the UI
+immensely). App data was intentionally hidden from the user for some reason.
+Migrating between Docker, then kubernetes, then Incus was never fully planned.
+The Truenas Charts app market was awesome, but building a Truenas Chart was
+complex and required duplicating all `values.yaml` configurable parameters into
+a new yaml file that the UI could use for form-fill. They got rid of Truenas
+Charts regardless so you just have to hope that your favorite app supporter is
+comfortable rewriting their app in the new format and supporting some kind of
+migration strategy. Every six months I would expect some kind of downtime
+because Truenas would change something critical and it would inevitably impact
+my workflow.
+
+So, if I'm going to be subject to the whims of a changing platform anyway (given
+Truenas is supposed to be based on Debian, aka the *stable* choice), and if I'm
+going to suffer breaking changes every 6 months no matter what I choose, then I
+may as well have the latest and greatest via a rolling kernel distro.
+
+So why not Arch? Simply: SELinux. SELinux is currently not officially supported
+in Arch linux. Plus Fedora Server comes with a lot built in that I like.
+Cockpit, Firewalld, Podman, SELinux, OSBuild, and RPM support all work out of
+the box. These are, imo, the "bare bones" requirements for a server exposed to
+the internet that will run homelab services.
+
+So let's get started configuring an awesome Fedora server to keep your data safe
+and run your Homelab services with minimal downtime.
+
+## Installation
+
+When installing Fedora from the ISO, take some time at the installation menu to
+configure some basics.
+
+Don't worry about RAID for now, we can convert a single disk into a RAID 1 array
+later.
+
+If you don't have an SSH key already, generate one for yourself so you can log into the server. On your local machine:
+
+```bash
+# Generate the key
+# Save it to the default location (~/.ssh/id_ed25519)
+# Please please please encrypt it with a password. Something memorable. Write it down. Friends don't let friends have naked SSH keys.
+ssh-keygen -t ed25519
+```
+
+1. Configure the network
+   1. Set a hostname
+   2. Disable ipv6 privacy extensions
+2. Configure software selection
+   1. Choose anything you'd like preinstalled
+3. Create a non-root user
+   1. Set a simple password for easy login, we'll change it later
+4. Configure your disk partitioning
+   1. Select manual (blivet) partitioning
+   2. Create a 1GB EFI system partition and mount it at `/boot/efi`
+   3. Create a 1GB btrfs partition and mount it at `/boot`
+   4. Create an encrypted btrfs volume with the remaining data and name it something unique, do not mount it
+   5. Create a btrfs subvolume called "root" and mount it at `/`
+   6. Create a btrfs subvolume called "home" and mount it at `/home`
+   7. Create any other btrfs subvolumes you might need (`/var`, for example)
+5. Take note of the ipv4 and ipv6 address. Update any DNS records at this time.
+6. Install and reboot
+
+## Configuration
+
+Once your server boots up we'll follow a basic playbook:
+
+1. Change your password
+2. Configure automatic decryption for your encrypted drives at boot with TPM2
+3. Configure the package manager and apply updates
+4. Secure SSH with Fail2Ban
+5. Install Snapper for automatic snapshots to prevent accidental file deletion
+6. Install BorgBackup for automatic backups
+7. Install VM support
+8. Build some images
+9. Run some VMs
--- a/stories/docs/20-local_llms.md
+++ b/stories/docs/20-local_llms.md
@@ -0,0 +1,224 @@
+# I refuse to pay for LLMs
+
+But I want them anyway. And I don't just want LLMs, I want:
+
+1. Image Generation
+2. Image Editing
+3. Speech to Text
+4. Text to Speech
+5. Web Searching
+6. RAG Retrieval
+7. Guest accounts with time-based access
+8. Probably other things
+
+On rootless podman with snapshots and backups and no compromises.
+
+- [I refuse to pay for LLMs](#i-refuse-to-pay-for-llms)
+  - [Create your environment](#create-your-environment)
+  - [Local LLM First](#local-llm-first)
+    - [Ollama](#ollama)
+    - [LM Studio](#lm-studio)
+    - [llama.cpp](#llamacpp)
+  - [Ok, so you have a backend](#ok-so-you-have-a-backend)
+    - [What about llama-server?](#what-about-llama-server)
+    - [Anything LLM](#anything-llm)
+    - [Open Webui](#open-webui)
+  - [But we don't have image editing working](#but-we-dont-have-image-editing-working)
+    - [Stable Diffusion CPP](#stable-diffusion-cpp)
+  - [Making it Run with Quadlets](#making-it-run-with-quadlets)
+
+## Create your environment
+
+I created a user named `ai` to run all my AI services. Do that now:
+
+```bash
+useradd -m ai
+loginctl enable-linger ai
+su -l ai
+mkdir -p /home/ai/.config/containers/systemd/
+mkdir -p /home/ai/.ssh
+```
+
+## Local LLM First
+
+On the Framework Desktop (or any AMD system) your options are ROCM or Vulkan drivers. Both are fine, with Vulkan pulling slightly ahead as of February 2026. Almost every backend you pick will support both, so pick a backend first.
+
+### Ollama
+
+is the natural place to start. Their "marketplace" is the best I've found for browsing models. They include short descriptions about what the models are good for and (almost) all of them work out of the box!
+
+Bonus points: Ollama's API is well supported by interfaces like Anything LLM, Open Webui, a litany of F-Droid apps, and many other services.
+
+Honestly, Ollama is still where I'd recommend anyone start. The installer is easy, performance is decent, the API is great, they (the Ollama team) curate models that work well on their platform, what's not to like?
+
+Performance, mostly. llama.cpp just performs 20-30% better in my testing on models like gpt-oss-120b. Your mileage may vary, this is a great project.
+
+### LM Studio
+
+Everyone says to start with this. Ok, first of all, it's a GUI app. Yeah there's a toggle to run an API server but ain't no way I'm installing wayland on my pure, uncompromising, headless Fedora server.
+
+I do have to admit it's the fastest way to get started with LLMs on desktop. But we're not here for desktops, we're here for servers. It runs llama.cpp in the backend anyway so skip past this and go for the good stuff.
+
+### llama.cpp
+
+We've landed on the best choice. You'll browse Hugging Face for models, be confused, and like it. You'll struggle to read the logs and feel right at home. You'll wonder why there isn't an intuitive CLI like Ollama. And you'll be rewarded with the fastest, most flexible way to run LLMs.
+
+You'll need the Hugging Face CLI (`hf`). Install that.
+
+First, download qwen3-vl-8b. This is a good jack of all trades model that supports vision, which is nice.
+
+```bash
+# Create a directory to hold your text models
+# I put mine at /home/ai/models/text
+mkdir -p /home/ai/models/text/qwen3-vl-8b-instruct
+
+# Download the model from hugging face
+hf download --local-dir /home/ai/models/text/qwen3-vl-8b-instruct Qwen/Qwen3-VL-8B-Instruct-GGUF Qwen3VL-8B-Instruct-Q4_K_M.gguf
+# Also download the "mmproj" file for this model
+# "mmproj" files allow a model to see images
+hf download --local-dir /home/ai/models/text/qwen3-vl-8b-instruct Qwen/Qwen3-VL-8B-Instruct-GGUF mmproj-Qwen3VL-8B-Instruct-Q8_0.gguf
+```
+
+With our model locked and loaded, we can run the llama.cpp server. We do have to build the llama.cpp server container first though because making this any easier would be a crime.
+
+```bash
+# Build the llama.cpp container image
+git clone https://github.com/ggml-org/llama.cpp.git
+cd llama.cpp
+export BUILD_TAG=$(date +"%Y-%m-%d-%H-%M-%S")
+
+# Vulkan
+podman build -f .devops/vulkan.Dockerfile -t llama-cpp-vulkan:${BUILD_TAG} -t llama-cpp-vulkan:latest .
+
+# Run llama server (Available on port 8000)
+# Add `--n-cpu-moe 32` to gpt-oss-120b to keep minimal number of expert in GPU
+podman run \
+--rm \
+--name llama-server-demo \
+--device=/dev/kfd \
+--device=/dev/dri \
+--pod systemd-ai-internal \
+-v /home/ai/models/text:/models:z \
+localhost/llama-cpp-vulkan:latest \
+--port 8000 \
+-c 16384 \
+--perf \
+--n-gpu-layers all \
+--jinja \
+--models-max 1 \
+--models-dir /models
+```
+
+You should be able to access the llama.cpp server at http://{your-ip}:8000. From there you can select the only model you have downloaded (qwen3-vl-8b) and have a conversation.
+
+## Ok, so you have a backend
+
+Now we need a frontend. In my experience there are only 2 choices, but this is changing extremely fast.
+
+### What about llama-server?
+
+Good enough for testing. Honestly, if this meets your needs, more power to you.
+
+### Anything LLM
+
+I started here about a year ago. This is a fantastic frontend with RAG, speech to text, text to speech, web search, RAG, plugins, and decent user management. It supports Ollama, OpenAI, and a bunch of other backends.
+
+Unfortunately, as of when I used it, there was no integrated image generation or image editing.
+
+### Open Webui
+
+This is, in my opinion, the best frontend experience you can get. The killer feature is side-by-side HTML rendering with your LLM response. If your LLM writes HTML/Javascript/CSS, it'll render in real time next to your chat. That's ridiculously cool.
+
+It also supports image generation as a tool that your LLM can call. Prompts like "Generate an image of a dragon" will trigger a call to the image generation tool. Generated images show up in the chat and can be edited with another message.
+
+```bash
+mkdir /home/ai/.env
+vim /home/ai/.env/open-webui-env
+
+# Add this to the file, then save an exit
+WEBUI_SECRET_KEY="some-random-key"
+
+# Will be available on port 8080
+podman run \
+-d \
+-p 8080 \
+-v open-webui:/app/backend/data \
+--env-file /home/ai/.env/open-webui-env \
+--name open-webui \
+--restart always \
+ghcr.io/open-webui/open-webui:main
+```
+
+Use the following connections when configuring models/image editing:
+
+| Service              | Endpoint                                  |
+| -------------------- | ----------------------------------------- |
+| llama.cpp            | <http://host.containers.internal:8000>    |
+| stable-diffusion.cpp | <http://host.containers.internal:1234/v1> |
+
+## But we don't have image editing working
+
+In the past I used stable-diffusion-webui-forge. This project relied on a very
+specific set of ROCM torch versions installed via pip from the nightly ROCM pip
+repository. I had Stable Diffusion XL and Flux1.dev working on an AMD GPU, but I
+couldn't get this working at all on the Framework Desktop.
+
+I found out later this might be due to a ROCM driver bug, but we have bigger and better projects to work with.
+
+### Stable Diffusion CPP
+
+This project is llama.cpp equivalent for image generation. Open AI compatible API, tons of model support, excellent documentation, it's the best.
+
+```bash
+# Clone and build the stable diffusion cpp container
+git clone https://github.com/leejet/stable-diffusion.cpp.git
+cd stable-diffusion.cpp
+git submodule update --init --recursive
+export BUILD_TAG=$(date +"%Y-%m-%d-%H-%M-%S")
+podman build -f Dockerfile.vulkan -t stable-diffusion-cpp:${BUILD_TAG} -t stable-diffusion-cpp:latest .
+```
+
+Stable diffusion CPP supports a CLI and a web server. Let's download a model and test out the CLI.
+
+```bash
+# z-turbo image model
+# Fastest image generation in 8 steps. Great a text and prompt following.
+# Lacks variety.
+mkdir -p /home/ai/models/image/z-turbo
+hf download --local-dir /home/ai/models/image/z-turbo QuantStack/FLUX.1-Kontext-dev-GGUF flux1-kontext-dev-Q4_K_M.gguf
+hf download --local-dir /home/ai/models/image/z-turbo black-forest-labs/FLUX.1-schnell ae.safetensors
+hf download --local-dir /home/ai/models/image/z-turbo unsloth/Qwen3-4B-Instruct-2507-GGUF Qwen3-4B-Instruct-2507-Q4_K_M.gguf
+
+# Create our output directory
+mkdir /home/ai/output
+
+# Generate an image of a photorealistic dragon.
+podman run --rm \
+-v /home/ai/models:/models:z \
+-v /home/ai/output:/output:z \
+--device /dev/kfd \
+--device /dev/dri \
+localhost/stable-diffusion-cpp:latest \
+--diffusion-model /models/image/z-turbo/z_image_turbo-Q4_K.gguf \
+--vae /models/image/z-turbo/ae.safetensors  \
+--llm /models/image/z-turbo/Qwen3-4B-Instruct-2507-Q4_K_M.gguf \
+--cfg-scale 1.0 \
+-v \
+--seed -1 \
+--steps 8 \
+--vae-conv-direct \
+-H 1024 \
+-W 1024 \
+-o /output/output.png \
+-p "A photorealistic dragon"
+```
+
+With any luck you should have a picture of a dragon in your output folder.
+
+Since we know it works, we can tie everything together.
+
+## Making it Run with Quadlets
+
+Now that we have know our setup works we can glue it all together with systemd.
+
+Take a look at [the framework desktop docs](https://gitea.reeseapps.com/services/homelab/src/branch/main/active/device_framework_desktop/framework_desktop.md#install-the-whole-thing-with-quadlets-tm) for the relevant commands.
--- a/stories/docs/30-gpg_signing.md
+++ b/stories/docs/30-gpg_signing.md
@@ -0,0 +1 @@
+# Everyone uses this GPG thing, so should I
--- a/stories/docs/40-podman_rootless_hosting.md
+++ b/stories/docs/40-podman_rootless_hosting.md
@@ -0,0 +1 @@
+# I want to use Podman, not Docker
--- a/stories/docs/index.md
+++ b/stories/docs/index.md
@@ -0,0 +1,11 @@
+# Come, have a seat
+
+Join me on a journey through homelab adventures. Follow along at home! These
+stores will walk you through the trials of my self hosting wins and losses.
+
+The stories will be written in a way that allows you to skip past the text and
+just copy/paste the code blocks (similar to a Medium article). Each story will
+lay out its goal and the prerequisites.
+
+Stories are ordered by time written, oldest to newest. They don't necessarily
+read in order, but may reference each other. No need to read each one.
--- a/stories/mkdocs.yml
+++ b/stories/mkdocs.yml
@@ -0,0 +1,3 @@
+site_name: Reese's Homelab Stories
+theme:
+  name: readthedocs
				`@@ -0,0 +1 @@`
				`# Everyone uses this GPG thing, so should I`