Files
homelab/infrastructure/graduated/truenas/truenas.md
2025-03-06 17:38:56 -05:00

13 KiB

Truenas

Bios settings

You can check the bios version with dmidecode -t bios -q

  1. Turn off all C-State or power saving features. These definitely cause instability like random freezes.
  2. Turn off boosting
  3. Enable XMP

Archiving

  1. Create a recursive snapshot called "archive_pool_year_month_day"

  2. Create a replication task called "archive_pool_year_month_day"

    • select all datasets you want to backup
    • fill in enc0/archives/archive-year-month-day_hour-minute
    • full filesystem replication
    • select "Matching naming schema"
    • Use archive-%Y-%m-%d_%H-%M
    • Deselect run automatically
    • Save and run

Deleting snapshots

Sometimes you need to delete many snapshots from a certain dataset. The UI is terrible for this, so we need to use zfs destroy. xargs is the best way to do this since it allows parallel processing.

# zfs list snapshots with:
# -o name: only print the name
# -S creation: sort by creation time
# -H: don't display headers
# -r: recurse through every child dataset
zfs list -t snapshot enc0/archives -o name -S creation -H -r

# pipe it through xargs with:
# -n 1: take only 1 argument from the pipe per command
# -P 8: eight parallel processes
# Also pass to zfs destroy:
# -v: verbose
# -n: dryrun
zfs list -t snapshot enc0/archives -o name -S creation -H -r | xargs -n 1 -P 8 zfs destroy -v -n

# if that looks good you can remove the "-n"
zfs list -t snapshot enc0/archives -o name -S creation -H -r | xargs -n 1 -P 8 zfs destroy -v

But First, ZFS on RPi

A really good backup server is an RPi running openzfs. See the openzfs docs for more info.

Pi Setup

Add the vault ssh CA key to your pi.

curl -o /etc/ssh/trusted-user-ca-keys.pem https://vault.ducoterra.net/v1/ssh-client-signer/public_key

echo "TrustedUserCAKeys /etc/ssh/trusted-user-ca-keys.pem" >> /etc/ssh/sshd_config

service ssh restart

Create a pi user.

adduser pi
usermod -a -G sudo pi

SSH to the pi as the "pi" user. Delete the ubuntu user.

killall -u ubuntu
userdel -r ubuntu

Disable SSH password authentication

sed -i 's/PasswordAuthentication yes/PasswordAuthentication no/g' /etc/ssh/sshd_config
service ssh restart

Change the hostname.

echo pi-nas > /etc/hostname

Upgrade and restart the pi.

apt update && apt upgrade -y && apt autoremove -y
reboot

Install ZFS.

apt install -y pv zfs-initramfs

Find the disks you want to use to create your pool

fdisk -l

Create a pool.

mkdir -p /mnt/backup
zpool create \
    -o ashift=12 \
    -O acltype=posixacl -O canmount=off -O compression=lz4 \
    -O dnodesize=auto -O normalization=formD -O relatime=on \
    -O xattr=sa -O mountpoint=/mnt/backup \
    backup ${DISK}

Datasets, Snapshots, and Encryption

Migrating encrypted pools

Since you can't use -R to send encrypted datasets recursively you'll need to use more creative tactics. Here's my recommendation:

  1. Save the datasets from a pool to a text file:

    export SNAPSHOT='@enc1-hourly-2025-03-05_09-00'
    export SEND_POOL=enc1
    export RECV_POOL=enc0
    export DATASETS_FILE=pool_datasets.txt
    
    zfs list -r -H -o name <pool> > pool_datasets.txt
    
  2. Remove the source pool from the front of all the listed datasets. In vim, for example:

    :%s/enc0\//g
    
  3. Now you can run the following

    # Dry run
    for DATASET in $(cat $DATASETS_FILE); do echo "zfs send -v $POOL/$DATASET$SNAPSHOT | zfs recv $RECV_POOL/$DATASET"; done
    
    # Real thing
    for DATASET in $(cat $DATASETS_FILE); do zfs send -v $POOL/$DATASET$SNAPSHOT | zfs recv $RECV_POOL/$DATASET; done
    

Migrating Properties

If you need to migrate your dataset comments you can use the following bash to automate the task.

for i in $(zfs list -H -d 1 -o name backup/nvme/k3os-private); do read -r name desc < <(zfs list -H -o name,org.freenas:description $i) && pvc=$(echo "$name" | awk -F "/" '{print $NF}') && zfs set org.freenas:description=$desc enc1/k3os-private/$pvc; done

Backup Task Settings

Key Value
Destination Dataset Read-only Policy SET
Recursive true
Snapshot Retention Policy Same as Source
Include Dataset Properties true
Periodic Snapshot Tasks

Create and Destroy zfs Datasets

# Create a pool
zpool create rpool /dev/disk/by-id/disk-id

# Add a cache disk
zpool add backup cache /dev/sda

# Enable encryption
zpool set feature@encryption=enabled rpool

# Create a dataset
zfs create rpool/d1

# Create an encrypted dataset
zfs create -o encryption=on -o keylocation=prompt -o keyformat=passphrase rpool/d1

# Delete a dataset
zfs destroy rpool/d1

Create and send snapshots

export SEND_DATASET=enc0/vms/gitea-docker-runner-data
export RECV_DATASET=enc0/vms/gitea-docker-runner-data-sparse

# snapshot pool and all children
zfs snapshot -r $SEND_DATASET@now

# send all child snapshots
zfs send -R $SEND_DATASET@now | pv | zfs recv $RECV_DATASET

# use the -w raw flag to send encrypted snapshots
zfs send -R -w $SEND_DATASET@snapshot | pv | zfs recv $RECV_DATASET

Cleaning up old snapshots

If you want to delete every snapshot:

# Just in case, use tmux. This can take a while
tmux

# This pool you want to clean up
export POOL=enc0
# This can be anything, set it to something memorable
export SNAPSHOTS_FILE=enc0_mar2025_snapshots.txt

# Check the number of snapshots in the dataset
zfs list -t snap -r $POOL | wc -l

# Save the list of snapshots to the snapshots file
zfs list -t snap -r -H -o name $POOL > $SNAPSHOTS_FILE

# Check the file 
cat $SNAPSHOTS_FILE  | less

# Dry run
for SNAPSHOT in $(cat $SNAPSHOTS_FILE); do echo "zfs destroy -v $SNAPSHOT"; done | less

# Real thing
for SNAPSHOT in $(cat $SNAPSHOTS_FILE); do zfs destroy -v $SNAPSHOT; done

Creating and restoring snapshots

# Take a snapshot
zfs list -d 1 enc1/vms
export ZFS_VOL='enc1/vms/Gambox1-z4e0t'
zfs snapshot $ZFS_VOL@manual-$(date --iso-8601)

# Restore a snapshot
zfs list -t snapshot $ZFS_VOL
export ZFS_SNAPSHOT='enc1/vms/Gambox1-z4e0t@init-no-drivers-2025-03-03_05-35'
zfs rollback $ZFS_SNAPSHOT

Filesystem ACLs

If you see something like "nfs4xdr_winacl: Failed to set default ACL on...":

Dataset -> Dataset details (edit) -> Advanced Options -> ACL Type (inherit)

# Remove all ACLs
setfacl -b -R /mnt/enc0/smb/media

VMs

  1. Force UEFI installation
  2. cp /boot/efi/EFI/debian/grubx64.efi /boot/efi/EFI/BOOT/bootx64.efi

Converting zvol to qcow2

dd if=/dev/zvol/enc1/vms/unifi-e373f of=unifi.raw
qemu-img convert -f raw -O qcow2 unifi.raw unifi.qcow2

Converting qcow2 to zvol

qemu-img convert -O raw -p /mnt/enc0/images/haos_ova-14.1.qcow2 /dev/zvol/enc1/vms/hass-Iph4DeeJ

Tunables

Core

sysctl kern.ipc.somaxconn=2048
sysctl kern.ipc.maxsockbuf=16777216
sysctl net.inet.tcp.recvspace=4194304
sysctl net.inet.tcp.sendspace=2097152
sysctl net.inet.tcp.sendbuf_max=16777216
sysctl net.inet.tcp.recvbuf_max=16777216
sysctl net.inet.tcp.sendbuf_auto=1
sysctl net.inet.tcp.recvbuf_auto=1
sysctl net.inet.tcp.sendbuf_inc=16384
sysctl net.inet.tcp.recvbuf_inc=524288
sysctl vfs.zfs.arc_max=34359738368 # set arc size to 32 GiB to prevent eating VMs
loader vm.kmem_size=34359738368 # set kmem_size to 32 GiB to force arc_max to apply
loader vm.kmem_size_max=34359738368  # set kmem_size_max to 32 GiB to sync with kmem_size

Nic options: "mtu 9000 rxcsum txcsum tso4 lro"

Scale

ARC Limit

Create an Init/Shutdown Script of type Command with the following:

# Limit to 8 GiB
echo 8589934592 >> /sys/module/zfs/parameters/zfs_arc_max

Set When to Post Init.

Certs

https://raymondc.net/2018/02/28/using-freenas-as-your-ca.html

  1. Create a new Root certificate (CAs -> ADD -> Internal CA)
    • Name: Something_Root
    • Key Length: 4096
    • Digest: SHA512
    • Lifetime: 825 (Apple's new requirement)
    • Extend Key Usage: Server Auth
    • Common Name: Something Root CA
    • Subject Alternate Names:
  2. Create a new intermediate certificate (CAs -> Add -> Intermediate CA)
    • Name: Something_Intermediate_CA
    • Key Length: 4096
    • Digest: SHA512
    • Lifetime: 825 (Apple's new requirement)
    • Extend Key Usage: Server Auth
  3. Create a new Certificate (Certificates -> Add -> Internal Certificate)
    • Name: Something_Certificate
    • Key Length: 4096
    • Digest: SHA512
    • Lifetime: 825 (Apple's new requirement)
    • Extend Key Usage: Server Auth

Testing

iperf

iperf3 -c mainframe -P 4
iperf3 -c mainframe -P 4 -R

iperf3 -c pc -P 4
iperf3 -c pc -P 4 -R

disk

# write 16GB to disk
dd if=/dev/zero of=/tmp/test bs=1024k count=16000
# divide result by 1000^3 to get GB/s

# read 16GB from disk
dd if=/tmp/test of=/dev/null bs=1024k
# divide result by 1000^3 to get GB/s

disk health

https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/internal-drives/wd-black-ssd/product-brief-wd-black-sn750-nvme-ssd.pdf

# HDD
smartctl -a /dev/ada1 | grep "SMART Attributes" -A 18

# NVME
smartctl -a /dev/nvme1 | grep "SMART/Health Information" -A 17

Dead Disks

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Black
Device Model:     WDC WD2003FZEX-00Z4SA0
Serial Number:    WD-WMC5C0D6PZYZ
LU WWN Device Id: 5 0014ee 65a5a19fc
Firmware Version: 01.01A01
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Feb 13 18:31:57 2021 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Corrupted data

One or more devices has experienced an error resulting in data corruption. Applications may be affected.

To get a list of affected files run:

zpool status -v

Stuck VMs

"[EFAULT] 'freeipa' VM is suspended and can only be resumed/powered off"

"virsh cannot acquire state change lock monitor=remoteDispatchDomainSuspend"

virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" list
export VM_NAME=

# Try this first
virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" resume $VM_NAME

# Or just destroy and start it again
virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" destroy $VM_NAME
virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" start $VM_NAME

Mounting ZVOLS

Sometimes you need to mount zvols onto the truenas host. You can do this with the block device in /dev.

For simple operations:

export ZVOL_PATH=enc0/vms/gitea-docker-runner-data-sparse
mount --mkdir /dev/zvol/$ZVOL_PATH /tmp/$ZVOL_PATH

# If you need to create a filesystem
fdisk /dev/zvol/$ZVOL_PATH
mkfs.btrfs /dev/zvol/$ZVOL_PATH

For bulk operations:

for path in $(ls /dev/zvol/enc0/dcsi/apps/); do mount --mkdir /dev/zvol/enc0/dcsi/apps/$path /tmp/pvcs/$path; done
for path in $(ls /dev/zvol/enc1/dcsi/apps/); do mount --mkdir /dev/zvol/enc1/dcsi/apps/$path /tmp/pvcs/$path; done

# From driveripper
rsync --progress -av -e ssh \
    driveripper:/mnt/enc1/dcsi/nfs/pvc-ccaace81-bd69-4441-8de1-3b2b24baa7af/ \
    /tmp/transfer/ \
    --dry-run

# To Kube
rsync --progress -av --delete -e ssh \
    /tmp/transfer/ \
    kube:/opt/local-path-provisioner/ssd/pvc-4fca5cad-7640-45ea-946d-7a604a3ac875_minecraft_nimcraft/ \
    --dry-run