various updates across several ai/vm related systems
This commit is contained in:
404
infrastructure/graduated/truenas/truenas.md
Normal file
404
infrastructure/graduated/truenas/truenas.md
Normal file
@@ -0,0 +1,404 @@
|
||||
# Truenas
|
||||
|
||||
- [Truenas](#truenas)
|
||||
- [Bios settings](#bios-settings)
|
||||
- [Archiving](#archiving)
|
||||
- [Deleting snapshots](#deleting-snapshots)
|
||||
- [But First, ZFS on RPi](#but-first-zfs-on-rpi)
|
||||
- [Pi Setup](#pi-setup)
|
||||
- [Datasets, Snapshots, and Encryption](#datasets-snapshots-and-encryption)
|
||||
- [Migrating encrypted pools](#migrating-encrypted-pools)
|
||||
- [Migrating Properties](#migrating-properties)
|
||||
- [Backup Task Settings](#backup-task-settings)
|
||||
- [Create and Destroy zfs Datasets](#create-and-destroy-zfs-datasets)
|
||||
- [Create and send snapshots](#create-and-send-snapshots)
|
||||
- [Cleaning up old snapshots](#cleaning-up-old-snapshots)
|
||||
- [VMs](#vms)
|
||||
- [Converting zvol to qcow2](#converting-zvol-to-qcow2)
|
||||
- [Tunables](#tunables)
|
||||
- [Core](#core)
|
||||
- [Scale](#scale)
|
||||
- [ARC Limit](#arc-limit)
|
||||
- [Certs](#certs)
|
||||
- [Testing](#testing)
|
||||
- [iperf](#iperf)
|
||||
- [disk](#disk)
|
||||
- [disk health](#disk-health)
|
||||
- [Dead Disks](#dead-disks)
|
||||
- [Corrupted data](#corrupted-data)
|
||||
- [Stuck VMs](#stuck-vms)
|
||||
- [Mounting ZVOLS](#mounting-zvols)
|
||||
|
||||
## Bios settings
|
||||
|
||||
These are my recommended settings that seem stable and allow GPU passthrough
|
||||
|
||||
1. Memory 3200mhz, fabric 1600mhz
|
||||
2. AC Power - On
|
||||
3. SVM - On
|
||||
4. IOMMU - On (Do not touch rebar or other pci encoding stuff)
|
||||
5. Fans 100%
|
||||
6. Initial video output: pci 3
|
||||
7. PCIE slot 1 bifurcation: 4x4x4x4
|
||||
8. Disable CSM
|
||||
9. Fast Boot Enabled
|
||||
|
||||
## Archiving
|
||||
|
||||
1. Create a recursive snapshot called "archive_pool_year_month_day"
|
||||
2. Create a replication task called "archive_pool_year_month_day"
|
||||
|
||||
- select all datasets you want to backup
|
||||
- fill in enc0/archives/archive-year-month-day_hour-minute
|
||||
- full filesystem replication
|
||||
- select "Matching naming schema"
|
||||
- Use `archive-%Y-%m-%d_%H-%M`
|
||||
- Deselect run automatically
|
||||
- Save and run
|
||||
|
||||
## Deleting snapshots
|
||||
|
||||
Sometimes you need to delete many snapshots from a certain dataset. The UI is terrible for this, so
|
||||
we need to use `zfs destroy`. xargs is the best way to do this since it allows parallel processing.
|
||||
|
||||
```bash
|
||||
# zfs list snapshots with:
|
||||
# -o name: only print the name
|
||||
# -S creation: sort by creation time
|
||||
# -H: don't display headers
|
||||
# -r: recurse through every child dataset
|
||||
zfs list -t snapshot enc0/archives -o name -S creation -H -r
|
||||
|
||||
# pipe it through xargs with:
|
||||
# -n 1: take only 1 argument from the pipe per command
|
||||
# -P 8: eight parallel processes
|
||||
# Also pass to zfs destroy:
|
||||
# -v: verbose
|
||||
# -n: dryrun
|
||||
zfs list -t snapshot enc0/archives -o name -S creation -H -r | xargs -n 1 -P 8 zfs destroy -v -n
|
||||
|
||||
# if that looks good you can remove the "-n"
|
||||
zfs list -t snapshot enc0/archives -o name -S creation -H -r | xargs -n 1 -P 8 zfs destroy -v
|
||||
```
|
||||
|
||||
## But First, ZFS on RPi
|
||||
|
||||
A really good backup server is an RPi running openzfs. See [the openzfs docs](https://openzfs.github.io/openzfs-docs/Getting%20Started/Ubuntu/Ubuntu%2020.04%20Root%20on%20ZFS%20for%20Raspberry%20Pi.html#step-2-setup-zfs) for more info.
|
||||
|
||||
### Pi Setup
|
||||
|
||||
Add the vault ssh CA key to your pi.
|
||||
|
||||
```bash
|
||||
curl -o /etc/ssh/trusted-user-ca-keys.pem https://vault.ducoterra.net/v1/ssh-client-signer/public_key
|
||||
|
||||
echo "TrustedUserCAKeys /etc/ssh/trusted-user-ca-keys.pem" >> /etc/ssh/sshd_config
|
||||
|
||||
service ssh restart
|
||||
```
|
||||
|
||||
Create a pi user.
|
||||
|
||||
```bash
|
||||
adduser pi
|
||||
usermod -a -G sudo pi
|
||||
```
|
||||
|
||||
SSH to the pi as the "pi" user. Delete the ubuntu user.
|
||||
|
||||
```bash
|
||||
killall -u ubuntu
|
||||
userdel -r ubuntu
|
||||
```
|
||||
|
||||
Disable SSH password authentication
|
||||
|
||||
```bash
|
||||
sed -i 's/PasswordAuthentication yes/PasswordAuthentication no/g' /etc/ssh/sshd_config
|
||||
service ssh restart
|
||||
```
|
||||
|
||||
Change the hostname.
|
||||
|
||||
```bash
|
||||
echo pi-nas > /etc/hostname
|
||||
```
|
||||
|
||||
Upgrade and restart the pi.
|
||||
|
||||
```bash
|
||||
apt update && apt upgrade -y && apt autoremove -y
|
||||
reboot
|
||||
```
|
||||
|
||||
Install ZFS.
|
||||
|
||||
```bash
|
||||
apt install -y pv zfs-initramfs
|
||||
```
|
||||
|
||||
Find the disks you want to use to create your pool
|
||||
|
||||
```bash
|
||||
fdisk -l
|
||||
```
|
||||
|
||||
Create a pool.
|
||||
|
||||
```bash
|
||||
mkdir -p /mnt/backup
|
||||
zpool create \
|
||||
-o ashift=12 \
|
||||
-O acltype=posixacl -O canmount=off -O compression=lz4 \
|
||||
-O dnodesize=auto -O normalization=formD -O relatime=on \
|
||||
-O xattr=sa -O mountpoint=/mnt/backup \
|
||||
backup ${DISK}
|
||||
```
|
||||
|
||||
## Datasets, Snapshots, and Encryption
|
||||
|
||||
### Migrating encrypted pools
|
||||
|
||||
Since you can't use `-R` to send encrypted datasets recursively you'll need to use more creative tactics. Here's my recommendation:
|
||||
|
||||
1. Save the datasets from a pool to a text file:
|
||||
|
||||
```bash
|
||||
zfs list -r -o name <pool> > pool_datasets.txt
|
||||
```
|
||||
|
||||
2. Next, remove the prefix of the source pool from the list of datasets. Also remove the source pool itself as well as any duplicate pools in the receiving dataset.
|
||||
3. Now, run a command like the following:
|
||||
|
||||
```bash
|
||||
for i in $(cat nvme_pools.txt); do zfs send -v nvme/$i@manual-2021-10-03_22-34 | zfs recv -x encryption enc0/$i; done
|
||||
```
|
||||
|
||||
### Migrating Properties
|
||||
|
||||
If you need to migrate your dataset comments you can use the following bash to automate the task.
|
||||
|
||||
```bash
|
||||
for i in $(zfs list -H -d 1 -o name backup/nvme/k3os-private); do read -r name desc < <(zfs list -H -o name,org.freenas:description $i) && pvc=$(echo "$name" | awk -F "/" '{print $NF}') && zfs set org.freenas:description=$desc enc1/k3os-private/$pvc; done
|
||||
```
|
||||
|
||||
### Backup Task Settings
|
||||
|
||||
| Key | Value |
|
||||
| ------------------------------------ | --------------------- |
|
||||
| Destination Dataset Read-only Policy | SET |
|
||||
| Recursive | true |
|
||||
| Snapshot Retention Policy | Same as Source |
|
||||
| Include Dataset Properties | true |
|
||||
| Periodic Snapshot Tasks | <daily-snapshot-task> |
|
||||
|
||||
### Create and Destroy zfs Datasets
|
||||
|
||||
```bash
|
||||
# Create a pool
|
||||
zpool create rpool /dev/disk/by-id/disk-id
|
||||
|
||||
# Add a cache disk
|
||||
zpool add backup cache /dev/sda
|
||||
|
||||
# Enable encryption
|
||||
zpool set feature@encryption=enabled rpool
|
||||
|
||||
# Create a dataset
|
||||
zfs create rpool/d1
|
||||
|
||||
# Create an encrypted dataset
|
||||
zfs create -o encryption=on -o keylocation=prompt -o keyformat=passphrase rpool/d1
|
||||
|
||||
# Delete a dataset
|
||||
zfs destroy rpool/d1
|
||||
```
|
||||
|
||||
### Create and send snapshots
|
||||
|
||||
```bash
|
||||
# snapshot pool and all children
|
||||
zfs snapshot -r dataset@now
|
||||
|
||||
# send all child snapshots
|
||||
zfs send -R dataset@snapshot | zfs recv dataset
|
||||
|
||||
# use the -w raw flag to send encrypted snapshots
|
||||
zfs send -R -w dataset@snapshot | zfs recv dataset
|
||||
```
|
||||
|
||||
### Cleaning up old snapshots
|
||||
|
||||
```bash
|
||||
wget https://raw.githubusercontent.com/bahamas10/zfs-prune-snapshots/master/zfs-prune-snapshots
|
||||
```
|
||||
|
||||
## VMs
|
||||
|
||||
1. Force UEFI installation
|
||||
2. `cp /boot/efi/EFI/debian/grubx64.efi /boot/efi/EFI/BOOT/bootx64.efi`
|
||||
|
||||
### Converting zvol to qcow2
|
||||
|
||||
```bash
|
||||
dd if=/dev/zvol/enc1/vms/unifi-e373f of=unifi.raw
|
||||
qemu-img convert -f raw -O qcow2 unifi.raw unifi.qcow2
|
||||
```
|
||||
|
||||
## Tunables
|
||||
|
||||
### Core
|
||||
|
||||
```bash
|
||||
sysctl kern.ipc.somaxconn=2048
|
||||
sysctl kern.ipc.maxsockbuf=16777216
|
||||
sysctl net.inet.tcp.recvspace=4194304
|
||||
sysctl net.inet.tcp.sendspace=2097152
|
||||
sysctl net.inet.tcp.sendbuf_max=16777216
|
||||
sysctl net.inet.tcp.recvbuf_max=16777216
|
||||
sysctl net.inet.tcp.sendbuf_auto=1
|
||||
sysctl net.inet.tcp.recvbuf_auto=1
|
||||
sysctl net.inet.tcp.sendbuf_inc=16384
|
||||
sysctl net.inet.tcp.recvbuf_inc=524288
|
||||
sysctl vfs.zfs.arc_max=34359738368 # set arc size to 32 GiB to prevent eating VMs
|
||||
loader vm.kmem_size=34359738368 # set kmem_size to 32 GiB to force arc_max to apply
|
||||
loader vm.kmem_size_max=34359738368 # set kmem_size_max to 32 GiB to sync with kmem_size
|
||||
```
|
||||
|
||||
Nic options: "mtu 9000 rxcsum txcsum tso4 lro"
|
||||
|
||||
### Scale
|
||||
|
||||
#### ARC Limit
|
||||
|
||||
Create an Init/Shutdown Script of type `Command` with the following:
|
||||
|
||||
```bash
|
||||
echo 34359738368 >> /sys/module/zfs/parameters/zfs_arc_max
|
||||
```
|
||||
|
||||
Set `When` to `Pre Init`.
|
||||
|
||||
## Certs
|
||||
|
||||
<https://raymondc.net/2018/02/28/using-freenas-as-your-ca.html>
|
||||
|
||||
1. Create a new Root certificate (CAs -> ADD -> Internal CA)
|
||||
- Name: Something_Root
|
||||
- Key Length: 4096
|
||||
- Digest: SHA512
|
||||
- Lifetime: 825 (Apple's new requirement)
|
||||
- Extend Key Usage: Server Auth
|
||||
- Common Name: Something Root CA
|
||||
- Subject Alternate Names:
|
||||
2. Create a new intermediate certificate (CAs -> Add -> Intermediate CA)
|
||||
- Name: Something_Intermediate_CA
|
||||
- Key Length: 4096
|
||||
- Digest: SHA512
|
||||
- Lifetime: 825 (Apple's new requirement)
|
||||
- Extend Key Usage: Server Auth
|
||||
3. Create a new Certificate (Certificates -> Add -> Internal Certificate)
|
||||
- Name: Something_Certificate
|
||||
- Key Length: 4096
|
||||
- Digest: SHA512
|
||||
- Lifetime: 825 (Apple's new requirement)
|
||||
- Extend Key Usage: Server Auth
|
||||
|
||||
## Testing
|
||||
|
||||
### iperf
|
||||
|
||||
```bash
|
||||
iperf3 -c mainframe -P 4
|
||||
iperf3 -c mainframe -P 4 -R
|
||||
|
||||
iperf3 -c pc -P 4
|
||||
iperf3 -c pc -P 4 -R
|
||||
```
|
||||
|
||||
### disk
|
||||
|
||||
```bash
|
||||
# write 16GB to disk
|
||||
dd if=/dev/zero of=/tmp/test bs=1024k count=16000
|
||||
# divide result by 1000^3 to get GB/s
|
||||
|
||||
# read 16GB from disk
|
||||
dd if=/tmp/test of=/dev/null bs=1024k
|
||||
# divide result by 1000^3 to get GB/s
|
||||
```
|
||||
|
||||
## disk health
|
||||
|
||||
<https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/internal-drives/wd-black-ssd/product-brief-wd-black-sn750-nvme-ssd.pdf>
|
||||
|
||||
```bash
|
||||
# HDD
|
||||
smartctl -a /dev/ada1 | grep "SMART Attributes" -A 18
|
||||
|
||||
# NVME
|
||||
smartctl -a /dev/nvme1 | grep "SMART/Health Information" -A 17
|
||||
```
|
||||
|
||||
## Dead Disks
|
||||
|
||||
```bash
|
||||
=== START OF INFORMATION SECTION ===
|
||||
Model Family: Western Digital Black
|
||||
Device Model: WDC WD2003FZEX-00Z4SA0
|
||||
Serial Number: WD-WMC5C0D6PZYZ
|
||||
LU WWN Device Id: 5 0014ee 65a5a19fc
|
||||
Firmware Version: 01.01A01
|
||||
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
|
||||
Sector Sizes: 512 bytes logical, 4096 bytes physical
|
||||
Rotation Rate: 7200 rpm
|
||||
Device is: In smartctl database [for details use: -P show]
|
||||
ATA Version is: ACS-2 (minor revision not indicated)
|
||||
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
|
||||
Local Time is: Sat Feb 13 18:31:57 2021 EST
|
||||
SMART support is: Available - device has SMART capability.
|
||||
SMART support is: Enabled
|
||||
```
|
||||
|
||||
## Corrupted data
|
||||
|
||||
One or more devices has experienced an error resulting in data corruption. Applications may be affected.
|
||||
|
||||
To get a list of affected files run:
|
||||
|
||||
```bash
|
||||
zpool status -v
|
||||
```
|
||||
|
||||
## Stuck VMs
|
||||
|
||||
"[EFAULT] 'freeipa' VM is suspended and can only be resumed/powered off"
|
||||
|
||||
"virsh cannot acquire state change lock monitor=remoteDispatchDomainSuspend"
|
||||
|
||||
```bash
|
||||
virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" list
|
||||
virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" resume <vm_name>
|
||||
virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" start <vm_i>
|
||||
```
|
||||
|
||||
## Mounting ZVOLS
|
||||
|
||||
Sometimes you need to mount zvols onto the truenas host. You can do this with the block device in /dev.
|
||||
|
||||
```bash
|
||||
for path in $(ls /dev/zvol/enc0/dcsi/apps/); do mount --mkdir /dev/zvol/enc0/dcsi/apps/$path /tmp/pvcs/$path; done
|
||||
for path in $(ls /dev/zvol/enc1/dcsi/apps/); do mount --mkdir /dev/zvol/enc1/dcsi/apps/$path /tmp/pvcs/$path; done
|
||||
|
||||
# From driveripper
|
||||
rsync --progress -av -e ssh \
|
||||
driveripper:/mnt/enc1/dcsi/nfs/pvc-ccaace81-bd69-4441-8de1-3b2b24baa7af/ \
|
||||
/tmp/transfer/ \
|
||||
--dry-run
|
||||
|
||||
# To Kube
|
||||
rsync --progress -av --delete -e ssh \
|
||||
/tmp/transfer/ \
|
||||
kube:/opt/local-path-provisioner/ssd/pvc-4fca5cad-7640-45ea-946d-7a604a3ac875_minecraft_nimcraft/ \
|
||||
--dry-run
|
||||
```
|
||||
Reference in New Issue
Block a user