AIO Ep17. Reheating Leftovers - ZFS Pool Backup Strategies

lyc8503

2024-08-01

This article is currently an experimental machine translation and may contain errors. If anything is unclear, please refer to the original Chinese version. I am continuously working to improve the translation.

The Nth Time Talking About Data Backup

I’ve already written two dedicated blog posts about data backup: Full-Device Backup Solution and Disaster Recovery Test & Personal Backup Update. But since then, my HomeLab setup has changed—I’ve switched to PVE with ZFS storage pools.

BTW, after using it for a while, I’m extremely satisfied with the new PVE system. Compared to unRAID, it feels like unRAID is more of a toy or something hobbyists play with (yet somehow it’s not cheap). Many features that require a pile of scripts to hack together in unRAID are natively supported in PVE. Storage, permission control, VM/container configurations, firewall—everything is much more polished and has way fewer bugs.

Subscribed to PVE to support great open-source software (though of course, you can use all features just fine without a subscription~)

There haven’t been major updates to my phone & laptop backup strategy, so this post will focus on HomeLab data backup.

ZFS & zrepl (Local Replication)

Previously, I used unRAID’s parity mechanism, which technically provides local data redundancy. But on SSDs, it causes severe performance issues, so I’ve abandoned it. That led me to explore local backup options under ZFS.

As a next-generation filesystem, ZFS offers a range of advanced features: Copy-on-Write, automatic detection and repair of data corruption, RAID-Z, snapshots, and more. Here, I’m mainly leveraging the snapshot functionality for backups.

You can create, view, and destroy snapshots like this:

# zfs snapshot pool/data@now
# zfs list -t snapshot
NAME            USED  AVAIL     REFER  MOUNTPOINT
pool/data@now   540M      -      540M  -
(You can now access the snapshot contents directly via pool/data/.zfs/snapshot)
# zfs destroy pool/data@now

You can fully send a snapshot from one pool to another. In this example, a new dataset pool2/backup is created, and all data from pool/data@now is copied into it:

1	# zfs send pool/data@now \| zfs recv pool2/backup

You can also send incremental snapshots. In this case, changes between snap1 and snap2 in pool/data are synced to pool2/backup. This requires that pool2/backup was already synchronized with pool/data@snap1 before:

1	# zfs send -i pool/data@snap1 pool/data@snap2 \| zfs recv pool2/backup

For my backup needs, I originally wrote a shell script that creates a weekly snapshot on my SSD and sends it to my HDD storage pool. The first transfer is full, and subsequent ones are incremental.

The script worked fine at first, but eventually ran into issues: inflexible retention policies, and losing sync if a send fails for any reason. Later, I discovered zrepl, which calls itself a “One-stop ZFS backup & replication solution.” After trying it out, I found it quite solid and ditched my old script in favor of zrepl.

zrepl essentially automates the snapshot and replication process I described. It comes with a nice TUI, built-in old snapshot cleanup, and ensures reliable incremental sends by holding the last successfully sent snapshot (see its documentation).

Output from zrepl status

The documentation is also quite clear, so I won’t go into further usage details—check the official docs for that.

Restic (Cloud Backup)

Previously, I used Duplicati and Duplicacy, both of which performed poorly. For datasets with many small files, they consumed excessive CPU during scanning, wasted power, and were unstable, often failing. (Duplicati’s local database also grew huge and was prone to corruption)

In contrast, zfs send operates at the ZFS filesystem level (Merkle tree), requiring minimal computation and achieving full disk sequential read performance.

Notice that in the earlier examples, zfs send and zfs recv are connected directly via a pipe. So in theory, we could do:

1	zfs send pool/data \| ssh user@x.x.x.x zfs recv remote/backup

In fact, this is an officially supported method, and zrepl even supports remote replication to various endpoints. But there’s a catch: The remote system must also support ZFS. To meet the 3-2-1 backup rule, this means I’d need to maintain another machine offsite—no support for object storage or cloud drives. Cost goes up fast. ~~The most convenient workaround I could think of was teaming up with another NAS owner to exchange backup space, acting as mutual offsite backups. But… I eventually gave up due to laziness.~~

What if we force the zfs send stream into a file and restore it later?

1
2
3

zfs send pool/data > backup.zfs
# save backup.zfs somewhere else (Backblaze, S3, OneDrive...)
zfs recv remote/recovery < backup.zfs

Theoretically, no problem. I even wrote a script called zclone to handle compression, chunking, encryption, and upload of zfs send streams. But after some research, I found some serious issues—see this thread.

You are somewhat out of luck on this one. The zfs send stream gets validated on receive - until you receive, there is no way to make sure it is sound; and if it isn’t the entire receive will fail and you will lose the entire stream rather than just the affected files.
One of the things I have been pondering writing is a system based on zfs diff to identify files that changed between snapshots, and using that list to incrementally upload changed files between snapshots.
The reason I haven’t done it yet is because zfs diff was broken when I last had a need for such a thing. It has since been fixed, but I haven’t had a chance to go back to that project yet.
Obviously, this is nowhere nearly as efficient as incremental zfs send - if a single byte changes in a 1TB file, you would have to transfer the whole file.
You could potentially make this more storage and transfer efficient by generating a patch between files using bsdiff.
This is an interesting discussion and I have done some research on this topic a while back.
I came to the conclusion that while snapshots, combined with the send/receive, functionality in ZFS allows to efficiently backup datasets (i.e. filesystems) to local or remote storage, it requires ZFS on both ends in order to take full advantage of ZFS’ features. Part of that effectiveness stems from the fact that ZFS knows exactly which data is on either end and thus allows to consolidate the snapshots on the target/remote storage. Tools like ZnapZend [1], Sanoid/Syncoid [2], pyznap [3] or zrepl [4] take advantage of this, but require both the local and remote to be an actual ZFS filesystem and thus allow to consolidate snapshots and reclaim unused storage space.
On the other hand, tools like z3 [5] or ZFSBackup [6] essentially pipe the data stream from a zfs send command through other utilities and level it off to some kind of a ‘passiv’, remote data storage (i.e. a non-ZFS filesystem), with all the disadvantages already discussed here. To reclaim storage space on the remote only occupied by snapshots that are no longer needed, the utility would need to keep track of the individual records (i.e. blocks) in the data structure send to the remote storage, which is unfeasible since this is essentially what ZFS is doing in the first place.
The currently best option for ZFS dataset backups in the cloud is the offer by rsync.net [7], who allow for special zfs send capable accounts to access their underlying ZFS filesystem directly and take full advantage of the ZFS send/receive capabilities. For a great review of this service see Jim Salter’s article on arstechnica [8].
Personally, I rely on local ZFS backups to my NAS/external HDD using zrepl [4] and use restic for remote backups as it is specifically designed for this use case.

To summarize the key issues mentioned:

The ZFS send stream is only validated during receive, which is an online process. Storing the stream temporarily doesn’t guarantee data integrity. If the stream gets corrupted, the entire receive will fail, and you lose everything.
Multiple incremental sends keep increasing storage usage. To delete old snapshots, you must re-send everything from scratch (the remote can’t understand ZFS structure, so it can’t help clean up snapshots).
ZFS itself doesn’t handle compression and deduplication well in such backup scenarios.

In contrast, the advantages of zfs send backups are:

Supports --raw send to back up encrypted datasets without needing the encryption key, preserving encryption on the target
Minimal CPU overhead and high sequential read performance, great for backing up large numbers of small files

In the end, I decided to play it safe and temporarily fall back to the time-tested traditional backup approach. I’ll reconsider ZFS-only solutions only if something more mature emerges.

But I didn’t go back to the unreliable Duplicati and Duplicacy. Instead, I switched to restic. I had evaluated restic before but skipped it due to the lack of a WebUI. Now, testing it again, that decision seems to have been a mistake—restic’s stability and performance far surpass the previous two.

I ran a simple test using --dry-run to back up a 2.2TiB dataset with 5.1 million files. It took about 3 hours just to scan, compress, and deduplicate—still acceptable. More importantly, progress reporting was smooth, with no freezing like the previous tools (might also be thanks to my new all-flash NAS).

Restic in the middle of a backup

Restic also allows tuning block sizes, which is useful for cloud storage with file count limits. Right now, I’m backing up to my purchased OneDrive 365 Family subscription. I use rclone union to combine several 1TB accounts, which can consistently saturate my 120Mbps upload bandwidth. (Though in practice, I manually capped it at 60Mbps.)

During my initial backup, I processed 5,170,993 files totaling 2.237 TiB. After deduplication: 1.834 TiB. After compression: 1.627 TiB actually uploaded. The whole process took 66 hours and 37 minutes. That’s an average upload speed of ~7 MiB/s—right at my limit.

Files:       5170993 new,     0 changed,     0 unmodified
Dirs:        379805 new,     0 changed,     0 unmodified
Data Blobs:  5888669 new
Tree Blobs:  309126 new
Added to the repository: 1.834 TiB (1.627 TiB stored)

processed 5170993 files, 2.237 TiB in 66:37:09
snapshot ca3bccd5 saved

Using rclone to check remote storage, the data is stored as 13,627 files.

1
2
3

# rclone size o365:restic202407
Total objects: 13.627k (13627)
Total size: 1.627 TiB (1789074427475 Byte)

When testing recovery from a server in China, downloads were surprisingly slower than uploads—likely due to restic’s lack of parallel download optimization combined with domestic network issues. On an overseas 4C16G machine, restic check --read-data took about 3 hours and 40 minutes (reads and verifies every stored data block):

# restic check --read-data
using temporary cache in /tmp/restic-check-cache-3826684933
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
[0:39] 100.00%  1 / 1 snapshots
read all data
[1:00] 0.42%  55 / 13228 packs
[2:00] 0.88%  116 / 13228 packs
............
[3:41:14] 100.00%  13228 / 13228 packs
no errors were found

No errors—great. Later, I tried an incremental backup, which was blazing fast. On such a large base, it only took 17 minutes:

repository 30907562 opened (version 2, compression level auto)
lock repository
using parent snapshot ca3bccd5
load index files
[0:08] 100.00%  396 / 396 index files loaded
start scan on [...]
start backup on [...]
scan finished in 113.786s: 5300878 files, 2.240 TiB

Files:       130863 new,  2773 changed, 5167308 unmodified
Dirs:          157 new,   777 changed, 379027 unmodified
Data Blobs:  133178 new
Tree Blobs:    933 new
Added to the repository: 7.919 GiB (6.171 GiB stored)

processed 5300944 files, 2.241 TiB in 16:58
snapshot 8fa17113 saved

Summary

For backup solutions, reliability is paramount. I hope this setup lasts a while—~~and I won’t have to write another post about how I changed my backup strategy again~~.

This article is licensed under the CC BY-NC-SA 4.0 license.

Author: lyc8503, Article link: https://blog.lyc8503.net/en/post/17-zfs-repl-and-backup/
If this article was helpful or interesting to you, consider buy me a coffee¬_¬
Feel free to comment in English below o/