adrift in the sea of experience

Tuesday, January 26, 2010

Building a NAS, part 6: testing ZFS checksumming

Let's take a look at how ZFS protects data. I plugged in a spare external disk, created two small 1GB partitions on it with fdisk, and set up a ZFS pool for testing:

fdisk /dev/sdc # set up two 1GB partitions
zpool create testpool mirror /dev/sdc1 /dev/sdc2
zfs create testpool/testfs
Note that this is just a test set-up. Normally you should definitely use two separate disks to get the full benefit of mirroring. Also, it doesn't really make sense to slice up disks into partitions.

Smashing bits

Let's create a test file which fills the file system and make a note of the sha1 fingerprint:
cd /testpool/testfs
dd if=/dev/urandom of=testfile bs=1M count=920
# prints a sha1 fingerprint for the file
sha1sum /testpool/testfs/testfile
Now comes the fun part. With a small (and very dangerous) python script, we can corrupt one of the devices by writing some junk data at regular intervals:
openedDevice = open('/dev/sdc1', 'w+b')
interval = 10000000
while (True):,1)
  print str(openedDevice.tell())
When we reread the file after the corruption, ZFS will transparently pick the pieces of data on the healthy disks. Note that in this case the file cannot not be cached in memory because it is larger than the available system memory.
# still prints the correct fingerprint!
sha1sum /testpool/testfs/testfile
Strangely enough, running zpool status testpool doesn't report any errors at this point. I have send a mail to the zfs-fuse mailing list to ask whether this is normal.

To detect and fix the errors, we have to run this simple command:

zpool scrub testpool
# shows progress and results of the scrub
zpool status testpool
To protect against bit rot on consumer grade disks, the recommendation is to run a scrub once a week. In a future post I'll explore how to do that automatically, including some kind of reporting so that I know when a disk is in trouble.

No comments: