2023-10-31 23:34:51 -04:00
2023-10-31 23:57:37 -04:00
2023-11-01 09:50:12 -04:00
2023-04-06 10:58:13 -04:00
2023-10-31 10:02:43 -04:00
2023-10-31 10:02:43 -04:00
2023-09-12 00:15:01 -04:00
2023-10-20 00:03:15 -04:00
2023-03-24 00:44:25 -04:00
2023-11-01 00:42:08 -04:00

Containers

A project to store container-based hosting stuff.

Table of Contents

Platform

Before you being be sure to take a look at the Fedora Server Config readme which explains how to set up a basic fedora server hosting platform with certbot.

Components

CoreDNS

We'll use our own coredns server so we can add custom hosts. This prevents the server from collapsing if the internet drops out (something that apparently happens quite frequently)

helm repo add coredns https://coredns.github.io/helm
helm repo update
helm upgrade --install \
    --namespace=kube-system \
    --values coredns-values.yaml \
    coredns \
    coredns/coredns

You can test your dns config with

kubectl run -it --rm --restart=Never --image=infoblox/dnstools:latest dnstools

Metal LB

We'll be swapping K3S's default load balancer with Metal LB for more flexibility. ServiceLB was struggling to allocate IP addresses for load balanced services. MetallLB does make things a little more complicated- you'll need special annotations (see below) but it's otherwise a well-tested, stable load balancing service with features to grow into.

Metallb is pretty cool. It works via l2 advertisement or BGP. We won't be using BGP, so let's focus on l2.

When we connect our nodes to a network we give them an IP address range: ex. 192.168.122.20/24. This range represents all the available addresses the node could be assigned. Usually we assign a single "static" IP address for our node and direct traffic to it by port forwarding from our router. This is fine for single nodes - but what if we have a cluster of nodes and we don't want our service to disappear just because one node is down for maintenance?

This is where l2 advertising comes in. Metallb will assign a static IP address from a given pool to any arbitrary node - then advertise that node's mac address as the location for the IP. When that node goes down metallb simply advertises a new mac address for the same IP address, effectively moving the IP to another node. This isn't really "load balancing" but "failover". Fortunately, that's exactly what we're looking for.

helm repo add metallb https://metallb.github.io/metallb
helm repo update
helm upgrade --install metallb \
    --namespace metallb \
    --create-namespace \
    metallb/metallb

MetalLB doesn't know what IP addresses are available for it to allocate so we'll have to provide it with a list. The metallb-addresspool.yaml has one IP address (we'll get to IP address sharing in a second) which is an unassigned IP address not allocated to any of our nodes. Note if you have many public IPs which all point to the same router or virtual network you can list them. We're only going to use one because we want to port forward from our router.

# create the metallb allocation pool
kubectl apply -f metallb-addresspool.yaml

Now we need to create the l2 advertisement. This is handled with a custom resource definition which specifies that all nodes listed are eligible to be assigned, and advertise, our "production" IP addresses.

kubectl apply -f metallb-l2advertisement.yaml

We now have a problem. We only have a signle production IP address and Metallb really doesn't want to share it. In order to allow services to allocate the same IP address (on different ports) we'll need to annotate them as such. MetalLB will allow services to allocate the same IP if:

  • They both have the same sharing key.
  • They request the use of different ports (e.g. tcp/80 for one and tcp/443 for the other).
  • They both use the Cluster external traffic policy, or they both point to the exact same set of pods (i.e. the pod selectors are identical).

See https://metallb.org/usage/#ip-address-sharing for more info.

You'll need to annotate your service as follows if you want an external IP:

apiVersion: v1
kind: Service
metadata:
  name: {{ .Release.Name }}
  annotations:
    metallb.universe.tf/address-pool: "production"
    metallb.universe.tf/allow-shared-ip: "production"
spec:
  externalTrafficPolicy: Cluster
  selector:
    app: {{ .Release.Name }}
  ports:
  - port: {{ .Values.ports.containerPort }}
    targetPort: {{ .Values.ports.targetPort }}
    name: {{ .Release.Name }}
  type: LoadBalancer

Nginx Ingress

Now we need an ingress solution (preferably with certs for https). We'll be using nginx since it's a little bit more configurable than traefik (though don't sell traefik short, it's really good. Just finnicky when you have use cases they haven't explicitly coded for).

  1. Install nginx

    helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
    helm repo update
    helm upgrade --install \
        ingress-nginx \
        ingress-nginx/ingress-nginx \
        --values ingress-nginx-values.yaml \
        --namespace ingress-nginx \
        --create-namespace
    
  2. Install cert-manager

    helm repo add jetstack https://charts.jetstack.io
    helm repo update
    helm upgrade --install \
        cert-manager jetstack/cert-manager \
        --namespace cert-manager \
        --create-namespace \
        --version v1.12.4 \
        --set installCRDs=true
    
  3. Create the let's encrypt issuer

    kubectl apply -f letsencrypt-issuer.yaml
    

You can test if your ingress is working with kubectl apply -f ingress-nginx-test.yaml

Navigate to ingress-nginx-test.reeseapps.com

Storage

https://github.com/democratic-csi/democratic-csi/blob/master/examples/freenas-nfs.yaml

Use nfsv4. It works without rpcbind which makes it lovely.

We'll be installing democratic csi for our volume manager. Specifically, we'll be installing the freenas-api-nfs driver. All configuration is stored in truenas-nfs.yaml.

The nfs driver will provision an nfs store owned by user 3000 (kube). You may have to make that user on Truenas. The nfs share created will be world-read/write, so any user can write to it. Users that write to the share will have their uid/gid mapped to Truenas, so if user 33 writes a file to the nfs share it will show up as owned by user 33 on Truenas.

The iscsi driver will require a portal ID. This is NOT what is reflected in the UI. The most reliable way (seriously) to get the real ID is to open the network monitor in the browser, reload truenas and find the websocket connection, click on it, create the portal and click on the server reseponse. It'll look something like:

{"msg": "added", "collection": "iscsi.portal.query", "id": 7, "fields": {"id": 7, "tag": 1, "comment": "democratic-csi", "listen": [{"ip": "172.20.0.1", "port": 3260}], "discovery_authmethod": "NONE", "discovery_authgroup": null}}

The initiator group IDs seem to line up.

It's good practice to have separate hostnames for your share export and your truenas server. This way you can have a direct link without worrying about changing the user-facing hostname. For example: your truenas server might be driveripper.reeselink.com and your kube server might be containers.reeselink.com. You should also have a democratic-csi-server.reeselink.com and a democratic-csi-client-1.reeselink.com which might be on 172.20.0.1 and 172.20.0.2.

https://github.com/democratic-csi/democratic-csi

ISCSI requires a bit of server config before proceeding. Run the following on the kubernetes node.

# Install the following system packages
sudo dnf install -y lsscsi iscsi-initiator-utils sg3_utils device-mapper-multipath

# Enable multipathing
sudo mpathconf --enable --with_multipathd y

# Ensure that iscsid and multipathd are running
sudo systemctl enable iscsid multipathd
sudo systemctl start iscsid multipathd

# Start and enable iscsi
sudo systemctl enable iscsi
sudo systemctl start iscsi

Now you can install the drivers. Note we won't be using the API drivers for Truenas scale. These have stability issues that happen intermittently (especially when deleting volumes... as in it won't delete volumes). As of 6/13/23 I don't recommend it.

Note: you can switch between driver types after install so there's no risk in using the stable driver first and then experimenting with the API driver.

Before we begin you'll need to create a new "democratic" user on Truenas. First you should create an SSH key for the user:

ssh-keygen -t rsa -N '' -f secrets/democratic_rsa.prod
chmod 600 secrets/democratic_rsa.prod

Now in the web console, use the following options:

Field Value
Full Name democratic
Username democratic
Email blank
Disable Password True
Create New Primary Group True
Auxiliary Groups None
Create Home Directory True
Authorized Keys paste the generated ".pub" key here
Shell bash
Allowed sudo commands /usr/sbin/zfs /usr/sbin/zpool /usr/sbin/chroot
Allowed sudo commands with no password /usr/sbin/zfs /usr/sbin/zpool /usr/sbin/chroot
Samba Authentication False

Save the user and verify SSH works with

ssh -i secrets/democratic_rsa.prod democratic@driveripper.reeselink.com
# test forbidden sudo command, should require a password
sudo ls
# test allowed sudo command
sudo zfs list

Next you'll need an API key. Save it to a file called secrets/truenas-api-key:

echo 'api-key-here' > secrets/truenas-api-key

Now we can proceed with the install

helm repo add democratic-csi https://democratic-csi.github.io/charts/
helm repo update

# enc0 storage (iscsi)
helm upgrade \
--install \
--values truenas-iscsi-enc0.yaml \
--namespace democratic-csi \
--create-namespace \
--set-file driver.config.sshConnection.privateKey=secrets/democratic_rsa \
--set driver.config.httpConnection.apiKey=$(cat secrets/truenas-api-key) \
zfs-iscsi-enc0 democratic-csi/democratic-csi

# enc1 storage (iscsi)
helm upgrade \
--install \
--values truenas-iscsi-enc1.yaml \
--namespace democratic-csi \
--create-namespace \
--set-file driver.config.sshConnection.privateKey=secrets/democratic_rsa \
--set driver.config.httpConnection.apiKey=$(cat secrets/truenas-api-key) \
zfs-iscsi-enc1 democratic-csi/democratic-csi

# enc1 storage (nfs)
helm upgrade \
--install \
--values truenas-nfs-enc1.yaml \
--namespace democratic-csi \
--create-namespace \
--set-file driver.config.sshConnection.privateKey=secrets/democratic_rsa \
--set driver.config.httpConnection.apiKey=$(cat secrets/truenas-api-key) \
zfs-nfs-enc1 democratic-csi/democratic-csi

You can test that things worked with:

kubectl apply -f tests/democratic-csi-pvc-test.yaml
kubectl delete -f tests/democratic-csi-pvc-test.yaml

And run some performance tests. You can use network and disk monitoring tools to see performance during the tests.

# Big writes
count=0
start_time=$EPOCHREALTIME
while true; do
    dd if=/dev/zero of=test.dat bs=1M count=100 1> /dev/null 2> /dev/null
    current=$(echo "$EPOCHREALTIME - $start_time" | bc)
    current_gt_one=$(echo "$current > 10" | bc)
    if [ $current_gt_one -eq 0 ]; then
        count=$((count + 1))
        echo -e '\e[1A\e[K'$count
    else
        break
    fi
done

# Lots of little writes
count=0
start_time=$EPOCHREALTIME
while true; do
    dd if=/dev/zero of=test.dat bs=1K count=1 1> /dev/null 2> /dev/null
    current=$(echo "$EPOCHREALTIME - $start_time" | bc)
    current_gt_one=$(echo "$current > 1" | bc)
    if [ $current_gt_one -eq 0 ]; then
        count=$((count + 1))
        echo -e '\e[1A\e[K'$count
    else
        break
    fi
done

Because iscsi will mount block devices, troubleshooting mounting issues, data corruption, and exploring pvc contents must happen on the client device. Here are a few cheat-sheet commands to make things easier:

Note with iscsi login: set the node.session.auth.username NOT node.session.auth.username_in

# discover all targets on the server
iscsiadm --mode discovery \
    --type sendtargets \
    --portal democratic-csi-server.reeselink.com:3260

export ISCSI_TARGET=

# delete the discovered targets
iscsiadm --mode discovery \
    --portal democratic-csi-server.reeselink.com:3260 \
    --op delete

# view discovered targets
iscsiadm --mode node

# view current session
iscsiadm --mode session

# prevent automatic login
iscsiadm --mode node \
    --portal democratic-csi-server.reeselink.com:3260 \
    --op update \
    --name node.startup \
    --value manual

# connect a target
iscsiadm --mode node \
    --login \
    --portal democratic-csi-server.reeselink.com:3260 \
    --targetname $ISCSI_TARGET

# disconnect a target
# you might have to do this if pods can't mount their volumes.
# manually connecting a target tends to make it unavailable for the pods since there
# will be two targets with the same name.
iscsiadm --mode node \
    --logout \
    --portal democratic-csi-server.reeselink.com:3260 \
    --targetname $ISCSI_TARGET

# view all connected disks
ls /dev/zvol/

# mount a disk
mount -t xfs /dev/zvol/... /mnt/iscsi

# emergency - by-path isn't available
# (look for "Attached scsi disk")
iscsiadm --mode session -P 3 | grep Target -A 2 -B 2

Apps

Dashboard

The kubernetes dashboard isn't all that useful but it can sometimes give you a good visual breakdown when things are going wrong. It's sometimes faster than running kubectl get commands over and over.

Create the dashboard and an admin user with:

helm upgrade \
--install \
--namespace kubernetes-dashboard \
--create-namespace \
dashboard-user ./helm/dashboard-user

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml

Then login with the following:

kubectl -n kubernetes-dashboard create token admin-user
kubectl proxy

Nextcloud

The first chart we'll deploy is nextcloud. This is a custom chart because Nextcloud doesn't support helm installation natively (yet). There is a native Docker image and really detailed installation instructions so we can pretty easily piece together what's required.

This image runs the nextcloud cron job automatically and creates random secrets for all infrastructure - very helpful for a secure deployment, not very helpful for migrating clusters. You'll want to export the secrets and save them in a secure location.

helm upgrade --install \
    nextcloud \
    ./helm/nextcloud \
    --namespace nextcloud \
    --create-namespace

Need to add lots of files? Copy them to the user data dir and then run

./occ files:scan --all

Set up SES with the following links:

https://docs.aws.amazon.com/general/latest/gr/ses.html

To upgrade you'll need to:

  1. Apply the new image in values.yaml

  2. Exec into the container and run the following:

    su -s /bin/bash www-data
    ./occ upgrade
    ./occ maintenance:mode --off
    

See https://docs.nextcloud.com/server/latest/admin_manual/maintenance/upgrade.html#maintenance-mode for more information.

Test Deploy

You can create a test deployment with the following:

helm upgrade --install nextcloud ./helm/nextcloud \
    --namespace nextcloud-test \
    --create-namespace \
    --set nextcloud.domain=nextcloud-test.reeseapps.com \
    --set nextcloud.html.storageClassName=zfs-nfs-enc1 \
    --set nextcloud.html.storage=8Gi \
    --set nextcloud.data.storageClassName=zfs-nfs-enc1 \
    --set nextcloud.data.storage=8Gi \
    --set postgres.storageClassName=zfs-nfs-enc1 \
    --set postgres.storage=8Gi \
    --set redis.storageClassName=zfs-nfs-enc1 \
    --set redis.storage=8Gi \
    --set show_passwords=true \
    --dry-run

Gitea

Gitea provides a helm chart here. We're not going to modify much, but we are going to solidify some of the default values in case they decide to change things. This is the first chart (besides ingress-nginx) where we need to pay attention to the MetalLB annotation. This has been set in the values.yaml file.

First we need to create the gitea admin secret

kubectl create secret generic gitea-admin-secret \
    --from-literal=username='' \
    --from-literal=password='' \
    --from-literal=email=''
helm repo add gitea-charts https://dl.gitea.io/charts/
helm repo update
helm upgrade --install \
    gitea \
    gitea-charts/gitea \
    --values gitea-values.yaml \
    --namespace gitea \
    --create-namespace

If you need to backup your database you can run:

# Backup
kubectl exec -it -n gitea gitea-postgresql-0 -- \
    pg_dump \
    --no-owner \
    --dbname=postgresql://gitea:gitea@localhost:5432 > gitea_backup.db

# Take gitea down to zero pods
kubectl scale statefulset gitea --replicas 0

# Drop the existing database
kubectl exec -it -n gitea gitea-postgresql-0 -- psql -U gitea

\c postgres;
drop database gitea;
CREATE DATABASE gitea WITH OWNER gitea TEMPLATE template0 ENCODING UTF8 LC_COLLATE 'en_US.UTF-8' LC_CTYPE 'en_US.UTF-8';
exit

# restore from backup
kubectl exec -it -n gitea gitea-postgresql-0 -- \
    psql \
    postgresql://gitea:gitea@localhost:5432 gitea < gitea_backup.db

# Restore gitea to 1 pod
kubectl scale statefulset gitea --replicas 1

Minecraft

Minecraft is available through the custom helm chart (including a server downloader). The example below installs nimcraft. For each installation you'll want to create your own values.yaml with a new port. The server-downloader is called "minecraft_get_server" and is available on Github.

Nimcraft

helm upgrade --install \
    nimcraft \
    ./helm/minecraft \
    --namespace nimcraft \
    --create-namespace

Testing

helm upgrade --install \
    testcraft \
    ./helm/minecraft \
    --namespace testcraft \
    --create-namespace \
    --set port=25566

Snapdrop

Snapdrop is a file sharing app that allows airdrop-like functionality over the web

helm upgrade --install \
    snapdrop \
    ./helm/snapdrop \
    --namespace snapdrop \
    --create-namespace

Jellyfin

This assumes you have a media NFS share.

helm upgrade --install \
    jellyfin \
    ./helm/jellyfin \
    --namespace jellyfin \
    --create-namespace

Iperf3

This creates a basic iperf3 server.

helm upgrade --install \
    iperf3 \
    ./helm/iperf3 \
    --namespace iperf3 \
    --create-namespace

Upgrading

Nodes

kubectl drain node1 --ignore-daemonsets --delete-emptydir-data
watch -n 3 kubectl get pod --all-namespaces -w

K3S

Automated Upgrades

https://docs.k3s.io/upgrades/automated

kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/system-upgrade-controller.yaml
kubectl apply -f upgrade-plan.yaml
kubectl get pod -w -n system-upgrade

Manual Upgrades

https://docs.k3s.io/upgrades/manual#manually-upgrade-k3s-using-the-binary

sudo su -
wget https://github.com/k3s-io/k3s/releases/download/v1.28.3%2Bk3s1/k3s
systemctl stop k3s
chmod +x k3s
mv k3s /usr/local/bin/k3s
systemctl start k3s

Create a Userspace

This creates a user, namespace, and permissions with a simple script.

Quickstart

# Create certsigner pod for all other operations
./setup.sh <server_fqdn>

# Create a user, use "admin" to create an admin user
./upsertuser.sh <ssh_address> <server_fqdn (for kubectl)> <user>

# Remove a user, their namespace, and their access
./removeuserspace <server_fqdn> <user>

Userspace

Namespace

apiVersion: v1
kind: Namespace
metadata:
  name: {{ .Release.Name }}

Roles

kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: namespace-manager
  namespace: {{ .Release.Name }}
rules:
- apiGroups:
    - ""
    - extensions
    - apps
    - batch
    - autoscaling
    - networking.k8s.io
    - traefik.containo.us
    - rbac.authorization.k8s.io
    - metrics.k8s.io
  resources: 
    - deployments
    - replicasets
    - pods
    - pods/exec
    - pods/log
    - pods/attach
    - daemonsets
    - statefulsets
    - replicationcontrollers
    - horizontalpodautoscalers
    - services
    - ingresses
    - persistentvolumeclaims
    - jobs
    - cronjobs
    - secrets
    - configmaps
    - serviceaccounts
    - rolebindings
    - ingressroutes
    - middlewares
    - endpoints
  verbs: 
    - "*"
- apiGroups:
    - ""
    - metrics.k8s.io
    - rbac.authorization.k8s.io
  resources:
    - resourcequotas
    - roles
  verbs:
    - list

Rolebinding

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  namespace: {{ .Release.Name }}
  name: namespace-manager
subjects:
- kind: User
  name: {{ .Release.Name }}
  apiGroup: ""
roleRef:
  kind: ClusterRole
  name: namespace-manager
  apiGroup: ""

Manual Steps

Create a kubernetes certsigner pod

This keeps the client-ca crt and key secret and allows the cert to be signed and stored on the pod

Create the certsigner secret

kubectl -n kube-system create secret generic certsigner --from-file /var/lib/rancher/k3s/server/tls/client-ca.crt --from-file /var/lib/rancher/k3s/server/tls/client-ca.key

Set up the certsigner pod

scp certsigner.yaml <server>:~/certsigner.yaml
kubectl apply -f certsigner.yaml

Generate a cert

export USER=<user>
docker run -it -v $(pwd)/users/$USER:/$USER python:latest openssl genrsa -out /$USER/$USER.key 2048
docker run -it -v $(pwd)/users/$USER:/$USER python:latest openssl req -new -key /$USER/$USER.key -out /$USER/$USER.csr -subj "/CN=$USER/O=user"

Create a new Userspace

helm template $USER ./namespace | kubectl --context admin apply -f -

Sign the cert

export USER=<user>
kubectl --context admin cp $(pwd)/users/$USER/$USER.csr certsigner:/certs/$USER.csr
kubectl --context admin exec -it --context admin certsigner -- openssl x509 -in /certs/$USER.csr -req -CA /keys/client-ca.crt -CAkey /keys/client-ca.key -CAcreateserial -out /certs/$USER.crt -days 5000
kubectl --context admin cp certsigner:/certs/$USER.crt $(pwd)/users/$USER/$USER.crt

Add to the config

kubectl config set-credentials $USER --client-certificate=$USER.crt  --client-key=$USER.key
kubectl config set-context $USER --cluster=mainframe --namespace=$USER --user=$USER

Delete

kubectl config delete-context $USER
helm template $USER ./namespace | kubectl --context admin delete -f -

Signing a user cert - detailed notes

NOTE: ca.crt and ca.key are in /var/lib/rancher/k3s/server/tls/client-ca.*

# First we create the credentials
# /CN=<username> - the user
# /O=<group> - the group

# Navigate to the user directory
export USER=<username>
cd $USER

# Generate a private key
openssl genrsa -out $USER.key 2048
# Check the key
# openssl pkey -in ca.key -noout -text
# Generate and send me the CSR
# The "user" group is my default group
openssl req -new -key $USER.key -out $USER.csr -subj "/CN=$USER/O=user"

# Check the CSR
# openssl req -in $USER.csr -noout -text
# If satisfactory, sign the CSR
# Copy from /var/lib/rancher/k3s/server/tls/client-ca.crt and client-ca.key
openssl x509 -req -in $USER.csr -CA ../client-ca.crt -CAkey ../client-ca.key -CAcreateserial -out $USER.crt -days 5000
# Review the certificate
# openssl x509 -in $USER.crt -text -noout

# Send back the crt
# cp $USER.crt $USER.key ../server-ca.crt ~/.kube/
kubectl config set-credentials $USER --client-certificate=$USER.crt  --client-key=$USER.key
kubectl config set-context $USER --cluster=mainframe --namespace=$USER --user=$USER

# Now we create the namespace, rolebindings, and resource quotas
# kubectl apply -f k8s/

# Add the cluster
# CA file can be found at https://3.14.3.100:6443/cacerts
- cluster:
    certificate-authority: server-ca.crt
    server: https://3.14.3.100:6443
  name: mainframe

# Test if everything worked
kubectl --context=$USER-context get pods

Help

Troubleshooting

Deleting a stuck namespace

NAMESPACE=nginx
kubectl proxy &
kubectl get namespace $NAMESPACE -o json |jq '.spec = {"finalizers":[]}' >temp.json
curl -k -H "Content-Type: application/json" -X PUT --data-binary @temp.json 127.0.0.1:8001/api/v1/namespaces/$NAMESPACE/finalize

Fixing a bad volume

xfs_repair -L /dev/sdg

Mounting an ix-application volume from truenas

# set the mountpoint
zfs set mountpoint=/ix_pvc enc1/ix-applications/releases/gitea/volumes/pvc-40e27277-71e3-4469-88a3-a39f53435a8b

#"unset" the mountpoint (back to legacy)
zfs set mountpoint=legacy enc1/ix-applications/releases/gitea/volumes/pvc-40e27277-71e3-4469-88a3-a39f53435a8b

Mounting a volume

# mount
mount -t xfs /dev/zvol/enc0/dcsi/apps/pvc-d5090258-cf20-4f2e-a5cf-330ac00d0049 /mnt/dcsi_pvc

# unmount
umount /mnt/dcsi_pvc
Description
No description provided
Readme 31 MiB
Languages
Python 61.6%
Shell 18.8%
Dockerfile 13.1%
Jinja 5.1%
DIGITAL Command Language 1.4%