kube transfer to single-node host

This commit is contained in:
2024-07-08 10:39:56 -04:00
parent d1afa569cc
commit 887df21477
69 changed files with 1675 additions and 2009 deletions

738
README.md
View File

@@ -6,16 +6,6 @@ A project to store homelab stuff.
- [Homelab](#homelab)
- [Table of Contents](#table-of-contents)
- [Platforms](#platforms)
- [Reverse Proxy](#reverse-proxy)
- [Service Mesh](#service-mesh)
- [Data Storage](#data-storage)
- [Adding a new host](#adding-a-new-host)
- [Components](#components)
- [CoreDNS](#coredns)
- [Metal LB](#metal-lb)
- [Nginx Ingress](#nginx-ingress)
- [Storage](#storage)
- [Apps](#apps)
- [Dashboard](#dashboard)
- [Nextcloud](#nextcloud)
@@ -32,450 +22,6 @@ A project to store homelab stuff.
- [Iperf3](#iperf3)
- [Wordpress](#wordpress)
- [Grafana](#grafana)
- [Upgrading](#upgrading)
- [Nodes](#nodes)
- [K3S](#k3s)
- [Automated Upgrades](#automated-upgrades)
- [Manual Upgrades](#manual-upgrades)
- [Create a Userspace](#create-a-userspace)
- [Quickstart](#quickstart)
- [Userspace](#userspace)
- [Namespace](#namespace)
- [Roles](#roles)
- [Rolebinding](#rolebinding)
- [Manual Steps](#manual-steps)
- [Create a kubernetes certsigner pod](#create-a-kubernetes-certsigner-pod)
- [Create the certsigner secret](#create-the-certsigner-secret)
- [Set up the certsigner pod](#set-up-the-certsigner-pod)
- [Generate a cert](#generate-a-cert)
- [Create a new Userspace](#create-a-new-userspace)
- [Sign the cert](#sign-the-cert)
- [Add to the config](#add-to-the-config)
- [Delete](#delete)
- [Signing a user cert - detailed notes](#signing-a-user-cert---detailed-notes)
- [Help](#help)
- [Troubleshooting](#troubleshooting)
## Platforms
### Reverse Proxy
We will use a reverse proxy / load balancer as our single point of entry for all services.
This helps control inbound and outbound traffic and TLS certificate termination. This will
be installed on bare metal machine(s) via ansible to ensure max performance and ipv6 compatibility.
Each machine that acts as a reverse proxy will add its public ipv4 and ipv6 address(es) to
the public domains used for external and internal access (*.reeseapps.com).
### Service Mesh
All devices will be connected via wireguard and will talk over the wireguard connection. See
the wireguard folder for more details. It's advisable to create DNS records internally pointing
to the wireguard-assigned IP addresses.
### Data Storage
All servers will use ISCSI.
## Adding a new host
1. Set static IP in Unifi
2. Add to .ssh/config
3. Add to ansible inventory (`ansible/`)
4. Establish DNS records (`dns/`)
1. Both `-wg` records and `reeselink` records
5. Create reverse proxy(s) (`nginx/`)
1. (If removing) Delete any unused certs with `certbot delete`
2. Run the ansible certbot and nginx role
6. Create service mesh (`mesh/`)
1. Make sure to edit both `peers` and `ip` in `vars.yaml`
2. If you need to delete unused peers, add them to the `peers.yaml` delete job
7. Install services
8. Set up port forwarding in Unifi if applicable
## Components
### CoreDNS
We'll use our own coredns server so we can add custom hosts. This prevents the server from collapsing
if the internet drops out (something that apparently happens quite frequently)
One key entry in the coredns config is `driveripper.reeselink.com` pointing to the internal
IP `172.20.0.1`. This ensures democratic-csi can access the truenas server without internet
or DNS.
```bash
helm repo add coredns https://coredns.github.io/helm
helm repo update
helm upgrade --install \
--namespace=coredns \
--create-namespace \
--values coredns/coredns-values.yaml \
coredns \
coredns/coredns
```
You can test your dns config with
```bash
kubectl run -it --rm --restart=Never --image=infoblox/dnstools:latest dnstools
```
### Metal LB
We'll be swapping K3S's default load balancer with Metal LB for more flexibility. ServiceLB was
struggling to allocate IP addresses for load balanced services. MetallLB does make things a little
more complicated- you'll need special annotations (see below) but it's otherwise a well-tested,
stable load balancing service with features to grow into.
Metallb is pretty cool. It works via l2 advertisement or BGP. We won't be using BGP, so let's
focus on l2.
When we connect our nodes to a network we give them an IP address range: ex. `192.168.122.20/24`.
This range represents all the available addresses the node could be assigned. Usually we assign
a single "static" IP address for our node and direct traffic to it by port forwarding from our
router. This is fine for single nodes - but what if we have a cluster of nodes and we don't want
our service to disappear just because one node is down for maintenance?
This is where l2 advertising comes in. Metallb will assign a static IP address from a given
pool to any arbitrary node - then advertise that node's mac address as the location for the
IP. When that node goes down metallb simply advertises a new mac address for the same IP
address, effectively moving the IP to another node. This isn't really "load balancing" but
"failover". Fortunately, that's exactly what we're looking for.
```bash
helm repo add metallb https://metallb.github.io/metallb
helm repo update
helm upgrade --install metallb \
--namespace metallb \
--create-namespace \
metallb/metallb
```
MetalLB doesn't know what IP addresses are available for it to allocate so we'll have
to provide it with a list. The `metallb-addresspool.yaml` has one IP address (we'll get to
IP address sharing in a second) which is an unassigned IP address not allocated to any of our
nodes. Note if you have many public IPs which all point to the same router or virtual network
you can list them. We're only going to use one because we want to port forward from our router.
```bash
# create the metallb allocation pool
kubectl apply -f metallb-addresspool.yaml
```
Now we need to create the l2 advertisement. This is handled with a custom resource definition
which specifies that all nodes listed are eligible to be assigned, and advertise, our
"production" IP addresses.
```bash
kubectl apply -f metallb-l2advertisement.yaml
```
We now have a problem. We only have a signle production IP address and Metallb
really doesn't want to share it. In order to allow services to allocate the
same IP address (on different ports) we'll need to annotate them as such.
MetalLB will allow services to allocate the same IP if:
- They both have the same sharing key.
- They request the use of different ports (e.g. tcp/80 for one and tcp/443 for the other).
- They both use the Cluster external traffic policy, or they both point to the exact same set of pods (i.e. the pod selectors are identical).
See <https://metallb.org/usage/#ip-address-sharing> for more info.
You'll need to annotate your service as follows if you want an external IP:
```yaml
apiVersion: v1
kind: Service
metadata:
name: {{ .Release.Name }}
annotations:
metallb.universe.tf/address-pool: "production"
metallb.universe.tf/allow-shared-ip: "production"
spec:
externalTrafficPolicy: Cluster
selector:
app: {{ .Release.Name }}
ports:
- port: {{ .Values.ports.containerPort }}
targetPort: {{ .Values.ports.targetPort }}
name: {{ .Release.Name }}
type: LoadBalancer
```
### Nginx Ingress
Now we need an ingress solution (preferably with certs for https). We'll be using nginx since
it's a little bit more configurable than traefik (though don't sell traefik short, it's really
good. Just finnicky when you have use cases they haven't explicitly coded for).
1. Install nginx
```bash
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm upgrade --install \
ingress-nginx \
ingress-nginx/ingress-nginx \
--values ingress-nginx-values.yaml \
--namespace ingress-nginx \
--create-namespace
```
2. Install cert-manager
```bash
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm upgrade --install \
cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.12.4 \
--set installCRDs=true
```
3. Create the let's encrypt issuer
```bash
kubectl apply -f letsencrypt-issuer.yaml
```
You can test if your ingress is working with `kubectl apply -f ingress-nginx-test.yaml`
Navigate to ingress-nginx-test.reeseapps.com
### Storage
<https://github.com/democratic-csi/democratic-csi/blob/master/examples/freenas-nfs.yaml>
Use nfsv4. It works without rpcbind which makes it lovely.
We'll be installing democratic csi for our volume manager. Specifically, we'll be installing the
freenas-api-nfs driver. All configuration is stored in truenas-nfs.yaml.
The nfs driver will provision an nfs store owned by user 3000 (kube). You may have to make
that user on Truenas. The nfs share created will be world-read/write, so any user can write to
it. Users that write to the share will have their uid/gid mapped to Truenas, so if user 33 writes
a file to the nfs share it will show up as owned by user 33 on Truenas.
The iscsi driver will require a portal ID. This is NOT what is reflected in the UI. The most
reliable way (seriously) to get the real ID is to open the network monitor in the browser, reload
truenas and find the websocket connection, click on it, create the portal and click on the
server reseponse. It'll look something like:
```json
{"msg": "added", "collection": "iscsi.portal.query", "id": 7, "fields": {"id": 7, "tag": 1, "comment": "democratic-csi", "listen": [{"ip": "172.20.0.1", "port": 3260}], "discovery_authmethod": "NONE", "discovery_authgroup": null}}
```
The initiator group IDs seem to line up.
It's good practice to have separate hostnames for your share export and your truenas server. This
way you can have a direct link without worrying about changing the user-facing hostname.
For example: your truenas server might be driveripper.reeselink.com and your kube server might be
containers.reeselink.com. You should also have a democratic-csi-server.reeselink.com and a
democratic-csi-client-1.reeselink.com which might be on 172.20.0.1 and 172.20.0.2.
<https://github.com/democratic-csi/democratic-csi>
ISCSI requires a bit of server config before proceeding. Run the following on the kubernetes node.
```bash
# Install the following system packages
sudo dnf install -y lsscsi iscsi-initiator-utils sg3_utils device-mapper-multipath
# Enable multipathing
sudo mpathconf --enable --with_multipathd y
# Ensure that iscsid and multipathd are running
sudo systemctl enable iscsid multipathd
sudo systemctl start iscsid multipathd
# Start and enable iscsi
sudo systemctl enable iscsi
sudo systemctl start iscsi
```
Now you can install the drivers. Note we won't be using the API drivers for Truenas
scale. These have stability issues that happen intermittently (especially when deleting
volumes... as in it won't delete volumes). As of 6/13/23 I don't recommend it.
Note: you can switch between driver types after install so there's no risk in using the
stable driver first and then experimenting with the API driver.
Before we begin you'll need to create a new "democratic" user on Truenas. First you should
create an SSH key for the user:
```bash
ssh-keygen -t rsa -N '' -f secrets/democratic_rsa.prod
chmod 600 secrets/democratic_rsa.prod
```
Now in the web console, use the following options:
| Field | Value |
|----------------------------------------|------------------------------------------------|
| Full Name | democratic |
| Username | democratic |
| Email | blank |
| Disable Password | True |
| Create New Primary Group | True |
| Auxiliary Groups | None |
| Create Home Directory | True |
| Authorized Keys | paste the generated ".pub" key here |
| Shell | bash |
| Allowed sudo commands | /usr/sbin/zfs /usr/sbin/zpool /usr/sbin/chroot |
| Allowed sudo commands with no password | /usr/sbin/zfs /usr/sbin/zpool /usr/sbin/chroot |
| Samba Authentication | False |
Save the user and verify SSH works with
```bash
ssh -i secrets/democratic_rsa.prod democratic@driveripper.reeselink.com
# test forbidden sudo command, should require a password
sudo ls
# test allowed sudo command
sudo zfs list
```
Next you'll need an API key. Save it to a file called `secrets/truenas-api-key`:
```bash
echo 'api-key-here' > secrets/truenas-api-key
```
Now we can proceed with the install
```bash
helm repo add democratic-csi https://democratic-csi.github.io/charts/
helm repo update
# enc0 storage (iscsi)
helm upgrade \
--install \
--values democratic-csi/truenas-iscsi-enc0.yaml \
--namespace democratic-csi \
--create-namespace \
--set driver.config.httpConnection.apiKey=$(cat secrets/truenas-api-key) \
zfs-iscsi-enc0 democratic-csi/democratic-csi
# enc1 storage (iscsi)
helm upgrade \
--install \
--values democratic-csi/truenas-iscsi-enc1.yaml \
--namespace democratic-csi \
--create-namespace \
--set driver.config.httpConnection.apiKey=$(cat secrets/truenas-api-key) \
zfs-iscsi-enc1 democratic-csi/democratic-csi
# enc1 storage (nfs)
helm upgrade \
--install \
--values democratic-csi/truenas-nfs-enc1.yaml \
--namespace democratic-csi \
--create-namespace \
--set driver.config.httpConnection.apiKey=$(cat secrets/truenas-api-key) \
zfs-nfs-enc1 democratic-csi/democratic-csi
```
You can test that things worked with:
```bash
kubectl apply -f tests/democratic-csi-pvc-test.yaml
kubectl delete -f tests/democratic-csi-pvc-test.yaml
```
And run some performance tests. You can use network and disk monitoring tools
to see performance during the tests.
```bash
# Big writes
count=0
start_time=$EPOCHREALTIME
while true; do
dd if=/dev/zero of=test.dat bs=1M count=100 1> /dev/null 2> /dev/null
current=$(echo "$EPOCHREALTIME - $start_time" | bc)
current_gt_one=$(echo "$current > 10" | bc)
if [ $current_gt_one -eq 0 ]; then
count=$((count + 1))
echo -e '\e[1A\e[K'$count
else
break
fi
done
# Lots of little writes
count=0
start_time=$EPOCHREALTIME
while true; do
dd if=/dev/zero of=test.dat bs=1K count=1 1> /dev/null 2> /dev/null
current=$(echo "$EPOCHREALTIME - $start_time" | bc)
current_gt_one=$(echo "$current > 1" | bc)
if [ $current_gt_one -eq 0 ]; then
count=$((count + 1))
echo -e '\e[1A\e[K'$count
else
break
fi
done
```
Because iscsi will mount block devices, troubleshooting mounting issues, data corruption,
and exploring pvc contents must happen on the client device. Here are a few cheat-sheet
commands to make things easier:
Note with iscsi login: set the node.session.auth.username NOT node.session.auth.username_in
```bash
# discover all targets on the server
iscsiadm --mode discovery \
--type sendtargets \
--portal democratic-csi-server.reeselink.com:3260
export ISCSI_TARGET=
# delete the discovered targets
iscsiadm --mode discovery \
--portal democratic-csi-server.reeselink.com:3260 \
--op delete
# view discovered targets
iscsiadm --mode node
# view current session
iscsiadm --mode session
# prevent automatic login
iscsiadm --mode node \
--portal democratic-csi-server.reeselink.com:3260 \
--op update \
--name node.startup \
--value manual
# connect a target
iscsiadm --mode node \
--login \
--portal democratic-csi-server.reeselink.com:3260 \
--targetname $ISCSI_TARGET
# disconnect a target
# you might have to do this if pods can't mount their volumes.
# manually connecting a target tends to make it unavailable for the pods since there
# will be two targets with the same name.
iscsiadm --mode node \
--logout \
--portal democratic-csi-server.reeselink.com:3260 \
--targetname $ISCSI_TARGET
# view all connected disks
ls /dev/zvol/
# mount a disk
mount -t xfs /dev/zvol/... /mnt/iscsi
# emergency - by-path isn't available
# (look for "Attached scsi disk")
iscsiadm --mode session -P 3 | grep Target -A 2 -B 2
```
## Apps
@@ -588,9 +134,11 @@ especially since Gitea tends to change how `values.yaml` is structured.
First we need to create the gitea admin secret
```bash
kubectl create namespace gitea
kubectl create secret generic gitea-admin-secret \
--from-literal=username='' \
--from-literal=password='' \
-n gitea \
--from-literal=username='gitea-admin' \
--from-literal=password="$(pwgen -c -s 64 | head -n 1)" \
--from-literal=email=''
```
@@ -735,281 +283,3 @@ Grafana has a kubernetes yaml they prefer you use. See `kubectl/grafana.yaml`.
```bash
kubectl apply -f kubectl/grafana.yaml
```
## Upgrading
### Nodes
```bash
kubectl drain node1 --ignore-daemonsets --delete-emptydir-data
watch -n 3 kubectl get pod --all-namespaces -w
```
### K3S
#### Automated Upgrades
<https://docs.k3s.io/upgrades/automated>
```bash
kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/system-upgrade-controller.yaml
kubectl apply -f upgrade-plan.yaml
kubectl get pod -w -n system-upgrade
```
#### Manual Upgrades
<https://docs.k3s.io/upgrades/manual#manually-upgrade-k3s-using-the-binary>
```bash
sudo su -
wget https://github.com/k3s-io/k3s/releases/download/v1.28.3%2Bk3s1/k3s
systemctl stop k3s
chmod +x k3s
mv k3s /usr/local/bin/k3s
systemctl start k3s
```
## Create a Userspace
This creates a user, namespace, and permissions with a simple script.
### Quickstart
```bash
# Create certsigner pod for all other operations
./setup.sh <server_fqdn>
# Create a user, use "admin" to create an admin user
./upsertuser.sh <ssh_address> <server_fqdn (for kubectl)> <user>
# Remove a user, their namespace, and their access
./removeuserspace <server_fqdn> <user>
```
### Userspace
#### Namespace
```yaml
apiVersion: v1
kind: Namespace
metadata:
name: {{ .Release.Name }}
```
#### Roles
```yaml
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: namespace-manager
namespace: {{ .Release.Name }}
rules:
- apiGroups:
- ""
- extensions
- apps
- batch
- autoscaling
- networking.k8s.io
- traefik.containo.us
- rbac.authorization.k8s.io
- metrics.k8s.io
resources:
- deployments
- replicasets
- pods
- pods/exec
- pods/log
- pods/attach
- daemonsets
- statefulsets
- replicationcontrollers
- horizontalpodautoscalers
- services
- ingresses
- persistentvolumeclaims
- jobs
- cronjobs
- secrets
- configmaps
- serviceaccounts
- rolebindings
- ingressroutes
- middlewares
- endpoints
verbs:
- "*"
- apiGroups:
- ""
- metrics.k8s.io
- rbac.authorization.k8s.io
resources:
- resourcequotas
- roles
verbs:
- list
```
#### Rolebinding
```yaml
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
namespace: {{ .Release.Name }}
name: namespace-manager
subjects:
- kind: User
name: {{ .Release.Name }}
apiGroup: ""
roleRef:
kind: ClusterRole
name: namespace-manager
apiGroup: ""
```
### Manual Steps
#### Create a kubernetes certsigner pod
This keeps the client-ca crt and key secret and allows the cert to be signed and stored on the pod
#### Create the certsigner secret
```bash
kubectl -n kube-system create secret generic certsigner --from-file /var/lib/rancher/k3s/server/tls/client-ca.crt --from-file /var/lib/rancher/k3s/server/tls/client-ca.key
```
#### Set up the certsigner pod
```bash
scp certsigner.yaml <server>:~/certsigner.yaml
kubectl apply -f certsigner.yaml
```
#### Generate a cert
```bash
export USER=<user>
docker run -it -v $(pwd)/users/$USER:/$USER python:latest openssl genrsa -out /$USER/$USER.key 2048
docker run -it -v $(pwd)/users/$USER:/$USER python:latest openssl req -new -key /$USER/$USER.key -out /$USER/$USER.csr -subj "/CN=$USER/O=user"
```
#### Create a new Userspace
```bash
helm template $USER ./namespace | kubectl --context admin apply -f -
```
#### Sign the cert
```bash
export USER=<user>
kubectl --context admin cp $(pwd)/users/$USER/$USER.csr certsigner:/certs/$USER.csr
kubectl --context admin exec -it --context admin certsigner -- openssl x509 -in /certs/$USER.csr -req -CA /keys/client-ca.crt -CAkey /keys/client-ca.key -CAcreateserial -out /certs/$USER.crt -days 5000
kubectl --context admin cp certsigner:/certs/$USER.crt $(pwd)/users/$USER/$USER.crt
```
#### Add to the config
```bash
kubectl config set-credentials $USER --client-certificate=$USER.crt --client-key=$USER.key
kubectl config set-context $USER --cluster=mainframe --namespace=$USER --user=$USER
```
#### Delete
```bash
kubectl config delete-context $USER
helm template $USER ./namespace | kubectl --context admin delete -f -
```
### Signing a user cert - detailed notes
NOTE: ca.crt and ca.key are in /var/lib/rancher/k3s/server/tls/client-ca.*
```bash
# First we create the credentials
# /CN=<username> - the user
# /O=<group> - the group
# Navigate to the user directory
export USER=<username>
cd $USER
# Generate a private key
openssl genrsa -out $USER.key 2048
# Check the key
# openssl pkey -in ca.key -noout -text
# Generate and send me the CSR
# The "user" group is my default group
openssl req -new -key $USER.key -out $USER.csr -subj "/CN=$USER/O=user"
# Check the CSR
# openssl req -in $USER.csr -noout -text
# If satisfactory, sign the CSR
# Copy from /var/lib/rancher/k3s/server/tls/client-ca.crt and client-ca.key
openssl x509 -req -in $USER.csr -CA ../client-ca.crt -CAkey ../client-ca.key -CAcreateserial -out $USER.crt -days 5000
# Review the certificate
# openssl x509 -in $USER.crt -text -noout
# Send back the crt
# cp $USER.crt $USER.key ../server-ca.crt ~/.kube/
kubectl config set-credentials $USER --client-certificate=$USER.crt --client-key=$USER.key
kubectl config set-context $USER --cluster=mainframe --namespace=$USER --user=$USER
# Now we create the namespace, rolebindings, and resource quotas
# kubectl apply -f k8s/
# Add the cluster
# CA file can be found at https://3.14.3.100:6443/cacerts
- cluster:
certificate-authority: server-ca.crt
server: https://3.14.3.100:6443
name: mainframe
# Test if everything worked
kubectl --context=$USER-context get pods
```
## Help
### Troubleshooting
Deleting a stuck namespace
```bash
NAMESPACE=nginx
kubectl proxy &
kubectl get namespace $NAMESPACE -o json |jq '.spec = {"finalizers":[]}' >temp.json
curl -k -H "Content-Type: application/json" -X PUT --data-binary @temp.json 127.0.0.1:8001/api/v1/namespaces/$NAMESPACE/finalize
```
Fixing a bad volume
```bash
xfs_repair -L /dev/sdg
```
Mounting an ix-application volume from truenas
```bash
# set the mountpoint
zfs set mountpoint=/ix_pvc enc1/ix-applications/releases/gitea/volumes/pvc-40e27277-71e3-4469-88a3-a39f53435a8b
#"unset" the mountpoint (back to legacy)
zfs set mountpoint=legacy enc1/ix-applications/releases/gitea/volumes/pvc-40e27277-71e3-4469-88a3-a39f53435a8b
```
Mounting a volume
```bash
# mount
mount -t xfs /dev/zvol/enc0/dcsi/apps/pvc-d5090258-cf20-4f2e-a5cf-330ac00d0049 /mnt/dcsi_pvc
# unmount
umount /mnt/dcsi_pvc
```