adrift in the sea of experience

Tuesday, February 16, 2010

Building a NAS, part 7: ZFS snapshots, scrubbing and error reporting

Setting up a ZFS pool with redundancy can only protect you against disk failures. To protect yourself against accidental deletions or modifications of files, you can use snapshots. You also need to explicitly start a ZFS data scrub at regular intervals to make sure that any checksum failures are repaired. Such things are best automated, but you might still want to receive reports so that you can keep an eye on things.

Automated snapshots

Setting up automated snapshots for ZFS-FUSE on debian is surprisingly easy. Drop this script in /etc/cron.daily/:
#!/bin/bash
zfs snapshot mypool/myfilesystem@`date +%Y.%m.%d:%H.%M`.auto
This will automatically create daily snapshots with a name like 2010.02.02:06.25.auto. Note that this will complicate things if you need to delete stuff to make room. As long as there is a snapshot referencing a file, it will continue to take space in the pool. Daily snapshots work best for a grow-only archive where you rarely need to delete something.

A word of warning: the scripts in /etc/cron.daily are only executed if they are executable and have no dots in their name. See man run-parts for more details. Test with /etc/cron.hourly to verify that everything works, then move the script to /etc/cron.daily.

Automated scrubbing

A ZFS pool can repair its checksum errors (if there is redundant storage) while still remaining on-line. This is called a scrub. The recommended scrub interval for consumer grade disks is one week. Drop this script in /etc/cron.weekly:
#!/bin/bash
zpool scrub mypool

Web feed reporting

A report of the scrub progress or the results of the last scrub can be shown with the zpool status command. A list of all file systems and snapshots (including some useful statistics) can be shown with the zfs list -t all command. To automate the reporting, I use this script in a cron job:

#!/bin/bash
reportfile=/root/poolreports/`date +%Y.%m.%d:%H.%M`.txt
date > ${reportfile}
zpool status nas-pool 2>&1 >> ${reportfile}
zfs list -t all 2>&1 >> ${reportfile}
I then generate a web feed for the /root/poolreports/ folder as I explained in my previous post and follow the feed with google reader.

1 comment:

Anonymous said...

Since you've been using ZFS-fuse for a time now, can you report here or blog about its stability? Has the daemon crashed on you, while taking a snapshot, or while "scrub"-ing?

How much data have you passed in?