Files
homelab/active/os_truenas/truenas.md

705 lines
19 KiB
Markdown

# Truenas
- [Truenas](#truenas)
- [Bios settings](#bios-settings)
- [Datasets, Snapshots, and Encryption](#datasets-snapshots-and-encryption)
- [Periodic Snapshot Recommendations](#periodic-snapshot-recommendations)
- [Hourly Snapshots](#hourly-snapshots)
- [Daily Snapshots](#daily-snapshots)
- [Replication Tasks](#replication-tasks)
- [Source](#source)
- [Destination](#destination)
- [Manually Create Named Snapshots](#manually-create-named-snapshots)
- [Migrating encrypted pools](#migrating-encrypted-pools)
- [Migrating Properties](#migrating-properties)
- [Create and Destroy zfs Datasets](#create-and-destroy-zfs-datasets)
- [Create and send snapshots](#create-and-send-snapshots)
- [Cleaning up old snapshots](#cleaning-up-old-snapshots)
- [Creating and restoring snapshots](#creating-and-restoring-snapshots)
- [Filesystem ACLs](#filesystem-acls)
- [Decrypting Pools](#decrypting-pools)
- [ZPool Scrubbing](#zpool-scrubbing)
- [ISCSI](#iscsi)
- [Create ZVOL](#create-zvol)
- [Create ISCSI Target](#create-iscsi-target)
- [VMs](#vms)
- [Converting zvol to qcow2](#converting-zvol-to-qcow2)
- [Converting qcow2 to zvol](#converting-qcow2-to-zvol)
- [Tunables](#tunables)
- [Core](#core)
- [Scale](#scale)
- [ARC Limit](#arc-limit)
- [Certs](#certs)
- [Let's Encrypt](#lets-encrypt)
- [Self-signed CA](#self-signed-ca)
- [Testing](#testing)
- [iperf](#iperf)
- [disk](#disk)
- [disk health](#disk-health)
- [Dead Disks](#dead-disks)
- [Corrupted data](#corrupted-data)
- [Stuck VMs](#stuck-vms)
- [Mounting ZVOLS](#mounting-zvols)
- [UPS Monitoring](#ups-monitoring)
- [ZFS Size Data](#zfs-size-data)
- [ZFS Rename](#zfs-rename)
- [ISCSI](#iscsi-1)
- [ISCSI Base Name](#iscsi-base-name)
- [Archiving](#archiving)
- [Deleting snapshots](#deleting-snapshots)
- [But First, ZFS on RPi](#but-first-zfs-on-rpi)
- [Pi Setup](#pi-setup)
## Bios settings
You can check the bios version with `dmidecode -t bios -q`
1. Turn off all C-State or power saving features. These definitely cause instability
like random freezes.
2. Turn off boosting
3. Enable XMP
## Datasets, Snapshots, and Encryption
### Periodic Snapshot Recommendations
#### Hourly Snapshots
- Lifetime: `1 day`
- Naming Schema: `hourly-%Y-%m-%d_%H-%M`
- Schedule: `Hourly`
- Begin: `00:00:00`
- End: `23:59:00`
- Disallow taking empty snapshots
- Enabled
- Recursive
Assuming 100 datasets: 100 datasets x 24 hours = 2400 snapshots
Disallowing empty snapshots will help keep that number down.
#### Daily Snapshots
- Lifetime: `1 week`
- Naming Schema: `daily-%Y-%m-%d_%H-%M`
- Schedule: `Daily`
- Allow taking empty snapshots
- Enabled
- Recursive
Assuming 100 datasets: 100 datasets x 7 days = 700 snapshots
### Replication Tasks
Before configuring, create a dataset that you'll be replicating to.
Use advanced settings.
- Transport `LOCAL`
#### Source
- Recursive
- Include Dataset Properties
- Periodic Snapshot Tasks: Select your `daily` task
- Run automatically
#### Destination
- Read-only Policy: `SET`
- Snapshot Retention Policy: `Custom`
- Lifetime: `1 month`
- Naming Schema: `daily-%Y-%m-%d_%H-%M`
Assuming 100 datasets: 100 datasets x 30 days = 3000 snapshots
#### Manually Create Named Snapshots
1. Datasets -> Select dataset -> Create Snapshot -> Naming Schema (daily)
2. Start replication from Data Protection
### Migrating encrypted pools
Since you can't use `-R` to send encrypted datasets recursively you'll need to use more creative tactics. Here's my recommendation:
1. Save the datasets from a pool to a text file:
```bash
export SNAPSHOT='@enc1-hourly-2025-03-05_09-00'
export SEND_POOL=enc1
export RECV_POOL=enc0
export DATASETS_FILE=pool_datasets.txt
zfs list -r -H -o name <pool> > pool_datasets.txt
```
2. Remove the source pool from the front of all the listed datasets. In vim, for example:
```bash
:%s/enc0\//g
```
3. Now you can run the following
```bash
# Dry run
for DATASET in $(cat $DATASETS_FILE); do echo "zfs send -v $POOL/$DATASET$SNAPSHOT | zfs recv $RECV_POOL/$DATASET"; done
# Real thing
for DATASET in $(cat $DATASETS_FILE); do zfs send -v $POOL/$DATASET$SNAPSHOT | zfs recv $RECV_POOL/$DATASET; done
```
### Migrating Properties
If you need to migrate your dataset comments you can use the following bash to automate the task.
```bash
for i in $(zfs list -H -d 1 -o name backup/nvme/k3os-private); do read -r name desc < <(zfs list -H -o name,org.freenas:description $i) && pvc=$(echo "$name" | awk -F "/" '{print $NF}') && zfs set org.freenas:description=$desc enc1/k3os-private/$pvc; done
```
### Create and Destroy zfs Datasets
```bash
# Create a pool
zpool create rpool /dev/disk/by-id/disk-id
# Add a cache disk
zpool add backup cache /dev/sda
# Enable encryption
zpool set feature@encryption=enabled rpool
# Create a dataset
zfs create rpool/d1
# Create an encrypted dataset
zfs create -o encryption=on -o keylocation=prompt -o keyformat=passphrase rpool/d1
# Delete a dataset
zfs destroy rpool/d1
```
### Create and send snapshots
```bash
export SEND_DATASET=enc0/vms/gitea-docker-runner-data
export RECV_DATASET=enc0/vms/gitea-docker-runner-data-sparse
# snapshot pool and all children
zfs snapshot -r $SEND_DATASET@now
# send all child snapshots
zfs send -R $SEND_DATASET@now | pv | zfs recv $RECV_DATASET
# use the -w raw flag to send encrypted snapshots
zfs send -R -w $SEND_DATASET@snapshot | pv | zfs recv $RECV_DATASET
```
### Cleaning up old snapshots
If you want to delete every snapshot:
```bash
# Just in case, use tmux. This can take a while
tmux
# This pool you want to clean up
export POOL=backup0
# This can be anything, set it to something memorable
export SNAPSHOTS_FILE=enc0_mar2025_snapshots.txt
# Check the number of snapshots in the dataset
zfs list -t snap -r $POOL | wc -l
# Save the list of snapshots to the snapshots file
zfs list -t snap -r -H -o name $POOL > $SNAPSHOTS_FILE
# Check the file
cat $SNAPSHOTS_FILE | less
# Dry run
for SNAPSHOT in $(cat $SNAPSHOTS_FILE); do echo "zfs destroy -v $SNAPSHOT"; done | less
# Real thing
for SNAPSHOT in $(cat $SNAPSHOTS_FILE); do zfs destroy -v $SNAPSHOT; done
```
### Creating and restoring snapshots
```bash
# Take a snapshot
zfs list -d 1 enc1/vms
export ZFS_VOL='enc1/vms/Gambox1-z4e0t'
zfs snapshot $ZFS_VOL@manual-$(date --iso-8601)
# Restore a snapshot
zfs list -t snapshot $ZFS_VOL
export ZFS_SNAPSHOT='enc1/vms/Gambox1-z4e0t@init-no-drivers-2025-03-03_05-35'
zfs rollback $ZFS_SNAPSHOT
```
### Filesystem ACLs
If you see something like "nfs4xdr_winacl: Failed to set default ACL on...":
Dataset -> Dataset details (edit) -> Advanced Options -> ACL Type (inherit)
```bash
# Remove all ACLs
setfacl -b -R /mnt/enc0/smb/media
```
### Decrypting Pools
Unlocking through the UI.
We'll need to recreate the key manifest json. This is a little tedious, but
your keys will be correct after this process.
```bash
# List all datasets and format them for json keys
export LIST_DATASET=pool0/dcsi
echo "{" && \
for DATASET_PATH in $(sudo zfs list -r $LIST_DATASET -H -o name); do echo " \"$DATASET_PATH\": \"key_here\","; done && \
echo "}"
# If the dataset's children have all the encryption keys
# Note this generates the cat EOF commands to create the json files needed to unlock.
export TL_DATASET=pool0
for TL_DATASET_PATH in $(zfs list -r $TL_DATASET -H -o name -d 1); do \
echo "cat <<EOF > dataset_${TL_DATASET_PATH}_key.json" && \
echo "{" && \
for DATASET_PATH in $(zfs list -r $TL_DATASET_PATH -H -o name); do echo " \"$DATASET_PATH\": \"key_here\","; done && \
echo "}" && \
echo "EOF";
done
```
### ZPool Scrubbing
```bash
# Start a scrub
zpool scrub pool0
# Check status
zpool status pool0
```
## ISCSI
### Create ZVOL
1. Create a new dataset called "iscsi" and then a dataset under that called "backups"
1. Set sync to always
2. Disable compression
3. Enable Sparse
2. Create a new dataset under backups with the same name as your server hostname
3. Set the size to something reasonable (Note you may need to "force size")
### Create ISCSI Target
In Shared -> Block (iSCSI) Shares Targets
1. Global Target Configuration -> Base Name
1. set the Base Name following [these rules](#iscsi-base-name)
2. Authorized Access -> Add
1. Group ID arbitrary - just pick a number you haven't used
2. Discovery Authentication: Chap
3. User: The connecting machine's ISCSI Base Name
4. Secret: A 16 character password with no special characters
3. Extents -> Add
1. Name: `some-name`
2. Type: `Device`
3. Device: The ZVOL you just created
4. Sharing Platform: `Modern OS`
5. Protocol Options Portal: Either create new (0.0.0.0 and ::) or select your existing portal
6. Protocol Options Initiators: The base name of the connecting machine following [these rules](#iscsi-base-name)
4. Targets -> Select the backup-<hostname> target -> Edit
1. Authentication Method: `CHAP`
2. Authentication Group Number: The group number you created above
## VMs
1. Force UEFI installation
2. `cp /boot/efi/EFI/debian/grubx64.efi /boot/efi/EFI/BOOT/bootx64.efi`
### Converting zvol to qcow2
```bash
# Convert zvol to raw
dd status=progress bs=1M if=/dev/zvol/enc0/vms/nextcloud-fi7tkq of=/mnt/enc0/vms/qcow2/nextcloud-fi7tkq.raw
# Convert raw to qcow
qemu-img convert -f raw -O qcow2 unifi.raw unifi.qcow2
# Convert in batch
# Convert zvol to raw
for FILE in $(ls /dev/zvol/enc0/vms); do dd status=progress bs=1M if=/dev/zvol/enc0/vms/$FILE of=/mnt/enc0/vms/qcow2/$FILE.raw; done
# Convert raw to qcow
for FILE in $(ls /dev/zvol/enc0/vms); do echo "qemu-img convert -f raw -O qcow2 /mnt/enc0/vms/qcow2/$FILE.raw /mnt/enc0/vms/qcow2/$FILE.qcow2"; done
```
### Converting qcow2 to zvol
```bash
qemu-img convert -O raw -p /mnt/enc0/images/haos_ova-14.1.qcow2 /dev/zvol/enc1/vms/hass-Iph4DeeJ
```
## Tunables
### Core
```bash
sysctl kern.ipc.somaxconn=2048
sysctl kern.ipc.maxsockbuf=16777216
sysctl net.inet.tcp.recvspace=4194304
sysctl net.inet.tcp.sendspace=2097152
sysctl net.inet.tcp.sendbuf_max=16777216
sysctl net.inet.tcp.recvbuf_max=16777216
sysctl net.inet.tcp.sendbuf_auto=1
sysctl net.inet.tcp.recvbuf_auto=1
sysctl net.inet.tcp.sendbuf_inc=16384
sysctl net.inet.tcp.recvbuf_inc=524288
sysctl vfs.zfs.arc_max=34359738368 # set arc size to 32 GiB to prevent eating VMs
loader vm.kmem_size=34359738368 # set kmem_size to 32 GiB to force arc_max to apply
loader vm.kmem_size_max=34359738368 # set kmem_size_max to 32 GiB to sync with kmem_size
```
Nic options: "mtu 9000 rxcsum txcsum tso4 lro"
### Scale
#### ARC Limit
Create an Init/Shutdown Script of type `Command` with the following:
```bash
# Limit to 8 GiB
echo 8589934592 >> /sys/module/zfs/parameters/zfs_arc_max
```
Set `When` to `Post Init`.
## Certs
### Let's Encrypt
<https://www.truenas.com/docs/scale/22.12/scaletutorials/credentials/certificates/settingupletsencryptcertificates/>
Note, for all "Name" fields use your domain with all "." replaced with "-"
Examaple: `driveripper.reeselink.com` becomes `driveripper-reeselink-com`
1. Go to Credentials > Certificates and click ADD in the ACME DNS-Authenticators widget
2. Generate credentials for your domain via [AWS IAM](/active/aws_iam/aws_iam.md)
3. Click ADD in the Certificate Signing Requests widget
1. Remember, only the SAN is required
4. Click the wrench icon next to the new CSR
1. In "Identifier" use `domain-something-com-le-stage` for staging and `domain-something-com-le-prod` for prod
5. System -> General Settings -> GUI Settings -> GUI SSL Certificate -> `domain-something-com-le-prod`
### Self-signed CA
<https://raymondc.net/2018/02/28/using-freenas-as-your-ca.html>
1. Create a new Root certificate (CAs -> ADD -> Internal CA)
- Name: Something_Root
- Key Length: 4096
- Digest: SHA512
- Lifetime: 825 (Apple's new requirement)
- Extend Key Usage: Server Auth
- Common Name: Something Root CA
- Subject Alternate Names:
2. Create a new intermediate certificate (CAs -> Add -> Intermediate CA)
- Name: Something_Intermediate_CA
- Key Length: 4096
- Digest: SHA512
- Lifetime: 825 (Apple's new requirement)
- Extend Key Usage: Server Auth
3. Create a new Certificate (Certificates -> Add -> Internal Certificate)
- Name: Something_Certificate
- Key Length: 4096
- Digest: SHA512
- Lifetime: 825 (Apple's new requirement)
- Extend Key Usage: Server Auth
## Testing
### iperf
```bash
iperf3 -c mainframe -P 4
iperf3 -c mainframe -P 4 -R
iperf3 -c pc -P 4
iperf3 -c pc -P 4 -R
```
### disk
```bash
# write 16GB to disk
dd if=/dev/zero of=/tmp/test bs=1024k count=16000
# divide result by 1000^3 to get GB/s
# read 16GB from disk
dd if=/tmp/test of=/dev/null bs=1024k
# divide result by 1000^3 to get GB/s
```
## disk health
<https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/internal-drives/wd-black-ssd/product-brief-wd-black-sn750-nvme-ssd.pdf>
```bash
# HDD
smartctl -a /dev/ada1 | grep "SMART Attributes" -A 18
# NVME
smartctl -a /dev/nvme1 | grep "SMART/Health Information" -A 17
```
## Dead Disks
```bash
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Black
Device Model: WDC WD2003FZEX-00Z4SA0
Serial Number: WD-WMC5C0D6PZYZ
LU WWN Device Id: 5 0014ee 65a5a19fc
Firmware Version: 01.01A01
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sat Feb 13 18:31:57 2021 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
```
## Corrupted data
One or more devices has experienced an error resulting in data corruption. Applications may be affected.
To get a list of affected files run:
```bash
zpool status -v
```
## Stuck VMs
"[EFAULT] 'freeipa' VM is suspended and can only be resumed/powered off"
"virsh cannot acquire state change lock monitor=remoteDispatchDomainSuspend"
```bash
virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" list
export VM_NAME=
# Try this first
virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" resume $VM_NAME
# Or just destroy and start it again
virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" destroy $VM_NAME
virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" start $VM_NAME
```
## Mounting ZVOLS
Sometimes you need to mount zvols onto the truenas host. You can do this with the block device in /dev.
For simple operations:
```bash
export ZVOL_PATH=enc0/vms/gitea-docker-runner-data-sparse
mount --mkdir /dev/zvol/$ZVOL_PATH /tmp/$ZVOL_PATH
# If you need to create a filesystem
fdisk /dev/zvol/$ZVOL_PATH
mkfs.btrfs /dev/zvol/$ZVOL_PATH
```
For bulk operations:
```bash
for path in $(ls /dev/zvol/enc0/dcsi/apps/); do mount --mkdir /dev/zvol/enc0/dcsi/apps/$path /tmp/pvcs/$path; done
for path in $(ls /dev/zvol/enc1/dcsi/apps/); do mount --mkdir /dev/zvol/enc1/dcsi/apps/$path /tmp/pvcs/$path; done
# From driveripper
rsync --progress -av -e ssh \
driveripper:/mnt/enc1/dcsi/nfs/pvc-ccaace81-bd69-4441-8de1-3b2b24baa7af/ \
/tmp/transfer/ \
--dry-run
# To Kube
rsync --progress -av --delete -e ssh \
/tmp/transfer/ \
kube:/opt/local-path-provisioner/ssd/pvc-4fca5cad-7640-45ea-946d-7a604a3ac875_minecraft_nimcraft/ \
--dry-run
```
## UPS Monitoring
First, you'll need to create a user with access to the UPS in System -> Services -> UPS.
Under the Extra Users section, add a user like so:
```conf
[admin]
password = mypass
actions = set
actions = fsd
instcmds = all
```
Then you can run commands with upscmd
```bash
export UPS_USER=admin
export UPS_PASS=mypass
# Quick battery test
upscmd -u $UPS_USER$ -p $UPS_PASS ups test.battery.start.quick
```
## ZFS Size Data
```bash
# jq -r is required otherwise the data will be invalid
zfs list -j enc0/vms -p -o available,used | \
jq -r --arg TIMESTAMP `date +%s` '"driveripper.vms.data.used " + .datasets[].properties.used.value + " " + $TIMESTAMP' | \
nc -N -4 yellow.reeselink.com 2003
```
## ZFS Rename
Make sure you unshare any connected shares, otherwise you'll get
"cannot unmount '/mnt/enc0/smb/reese_and_alica': pool or dataset is busy"
```bash
zfs rename enc0/something enc0/something_else
```
## ISCSI
### ISCSI Base Name
<https://datatracker.ietf.org/doc/html/rfc3721.html#section-1.1>
| iqn | . | year-month of domain registration | . | reversed domain | : | unique string
iqn.2022-01.com.reeselink:driveripper
## Archiving
1. Create a recursive snapshot called "archive_pool_year_month_day"
2. Create a replication task called "archive_pool_year_month_day"
- select all datasets you want to backup
- fill in enc0/archives/archive-year-month-day_hour-minute
- full filesystem replication
- select "Matching naming schema"
- Use `archive-%Y-%m-%d_%H-%M`
- Deselect run automatically
- Save and run
## Deleting snapshots
Sometimes you need to delete many snapshots from a certain dataset. The UI is terrible for this, so
we need to use `zfs destroy`. xargs is the best way to do this since it allows parallel processing.
```bash
# zfs list snapshots with:
# -o name: only print the name
# -S creation: sort by creation time
# -H: don't display headers
# -r: recurse through every child dataset
zfs list -t snapshot enc0/archives -o name -S creation -H -r
# pipe it through xargs with:
# -n 1: take only 1 argument from the pipe per command
# -P 8: eight parallel processes
# Also pass to zfs destroy:
# -v: verbose
# -n: dryrun
zfs list -t snapshot enc0/archives -o name -S creation -H -r | xargs -n 1 -P 8 zfs destroy -v -n
# if that looks good you can remove the "-n"
zfs list -t snapshot enc0/archives -o name -S creation -H -r | xargs -n 1 -P 8 zfs destroy -v
```
## But First, ZFS on RPi
A really good backup server is an RPi running openzfs. See [the openzfs docs](https://openzfs.github.io/openzfs-docs/Getting%20Started/Ubuntu/Ubuntu%2020.04%20Root%20on%20ZFS%20for%20Raspberry%20Pi.html#step-2-setup-zfs) for more info.
### Pi Setup
Add the vault ssh CA key to your pi.
```bash
curl -o /etc/ssh/trusted-user-ca-keys.pem https://vault.ducoterra.net/v1/ssh-client-signer/public_key
echo "TrustedUserCAKeys /etc/ssh/trusted-user-ca-keys.pem" >> /etc/ssh/sshd_config
service ssh restart
```
Create a pi user.
```bash
adduser pi
usermod -a -G sudo pi
```
SSH to the pi as the "pi" user. Delete the ubuntu user.
```bash
killall -u ubuntu
userdel -r ubuntu
```
Disable SSH password authentication
```bash
sed -i 's/PasswordAuthentication yes/PasswordAuthentication no/g' /etc/ssh/sshd_config
service ssh restart
```
Change the hostname.
```bash
echo pi-nas > /etc/hostname
```
Upgrade and restart the pi.
```bash
apt update && apt upgrade -y && apt autoremove -y
reboot
```
Install ZFS.
```bash
apt install -y pv zfs-initramfs
```
Find the disks you want to use to create your pool
```bash
fdisk -l
```
Create a pool.
```bash
mkdir -p /mnt/backup
zpool create \
-o ashift=12 \
-O acltype=posixacl -O canmount=off -O compression=lz4 \
-O dnodesize=auto -O normalization=formD -O relatime=on \
-O xattr=sa -O mountpoint=/mnt/backup \
backup ${DISK}
```