Trying out deduplication

I moved to DragonFly 2.10 over the past few days, and I tried out deduplication, to see what kind of results I would get.  The procedure is outlined below.  I’m using /home here as an example, just to reduce the amount of text pasted in.

/pfs/@@-1:00004     966000640 566434576 399566064    59%    /home

Move my various Hammer pseudo-file systems to version 5, which supports deduplication.

# hammer version-upgrade /home 5

Issue a deduplication simulate command, to see what it guesses will be the savings:

# hammer dedup-simulate /home
Dedup-simulate /home: objspace 8000000000000000:0000 7fffffffffffffff:ffff pfs_id 4
Dedup-simulate /home succeeded
Simulated dedup ratio = 1.22

That ratio turned out to be pretty accurate for the actual deduplication.  I didn’t time it, unfortunately.  I don’t know if the time taken is proportional to the amount of deduplication or the total volume of data, though I suspect the latter.

# hammer dedup /home
Dedup /home: objspace 8000000000000000:0000 7fffffffffffffff:ffff pfs_id 4
Dedup /home succeeded
Dedup ratio = 1.22
462 GB referenced
378 GB allocated
14 MB skipped
6869 CRC collisions
0 SHA collisions
0 bigblock underflows

The end result?

/pfs/@@-1:00004     966000640 505887504 460113136    52%    /home

That data space is shared across all file systems, and it’s a 1TB disk, so it’s 7%, or 70GB. I was hoping for more, but I don’t have any obviously duplicated data (no local mail store, no on-disk backups), so perhaps this is normal. 70GB that I didn’t have before is no bad thing, though.

Incidentally, I was able to upgrade my installed software from pkgsrc-2009Q4 to pkgsrc-2011Q1 entirely using pkg_radd -u <pkgname>.  Remarkably quick and painless, though pkgin may have been able to do it even faster since it would pull from the same place.

One Reply to “Trying out deduplication”

Comments are closed.