Table of Contents

ZFS replication vs rsync

First, a disclaimer. While I have decades of experience with rsync, I'm fairly new with zfs, so do take the following with that caveat.

There are two main ways I use to back up a ZFS file system; zfs send/recv (aka 'replication') and rsync, with zfs replication being hands down the best way to do it if backing up a zfs file system to another one. NOTE: send/recv is only available if both systems use ZFS.

That is not to say that ZFS replication is the best solution for all cases, but I'd say it should be the default when you are planning your backup strategy.

rsync

rsync is very old and stable technology, developed by Andrew Triggell (of Samba fame) in 1996, and it is under continuous development by a team of developers with the most recent (Preview) release, as of this writing, in 2023. It is available on most operating systems (Linux, *BSD, macOS and even Microsoft Windows).

rsync is file based, meaning it looks for files which are different in source and target, updating, adding, and optionally deleting files on the target to keep the two systems synchronized. rsync is optimized for slow systems, allowing transfer over multiple protocols.

The rsync executable must exist on both source and target systems.

Operation

rsync first collects a list of source files and requests a list of files on the target, along with metadata about the files. It then compares the metadata to determine if a file has changed (generally the timestamp it was last updated). It also has the ability to compare checksums between the target and destination to determine if a file has been updated. Generating and comparing this metadata can require a large amount of memory and cpu on larger file systems.

Once rsync has determined a file needs to be transferred, it will send the file to the target. The target will write the data to a temporary file until the file has completely transferred and compared (via checksum?) to verify it is valid. It will then unlink (delete) the original file and rename the temp file to the original name. NOTE: this is very important if you have hard links as it will only modify the target file; it will not modify the other hard links (but see the hard link parameter).

Details

Example

# example of bash code to make daily backups, keeping a copy of old ones 
# in versions/date.
rsync -av --delete /my/important/files backup-server:/my/backup/recent/
ssh backup-server 'mkdir /my/versions/`date +%Y-%m-%d` ; cp -alv /my/backup/recent/* /my/versions/`date +%Y-%m-%d`'

ZFS Replication

ZFS replication is dependent on zfs snapshots. zfs snapshots store only the blocks which have changed since they were created, thus zfs send/recv will only copy the blocks (not files) which have changed on a system.

Because both systems are using zfs, and we are using snapshots, there is very little overhead in the process. After the first run (which copies everything), the changes from the previous run are immediately sent. This is much, much faster than rsync.

Operation

zfs replication sends a zfs snapshot to the target machine, so the first step is to create a zfs snapshot. Then, execute the 'zfs send' command on the source machine and the zfs recv command on the target machine. This is generally done in one statement, with the zfs recv executed over ssh

Details

Example

# create a snapshot (named snap1) of zpool tank, dataset root_filesystem, and all datasets under it
zfs snapshot -r tank/root_filesystem@snap1
# do a dry run to estimate how large the replication will be
zfs send -rnv tank/root_filesystem@snap1
# send snapshot of tank/root_filesystem@snap1 to backupserver (under backup/otherroot), with pv showing progress
zfs send tank/root_filesystem@snap1 | pv | ssh backupserver 'zfs recv backup/otherroot'

NOTE

Since zfs replication requires adding new snapshots and removing old ones, several scripts have been developed to assist in managing this task.

* zrepl - https://zrepl.github.io/ * zrep - http://www.bolthole.com/solaris/zrep/