Tuesday, 18 January 2011

NexentaStor Community Edition: Compression and Deduplication Benchmarks

If you read my previous post on benchmarking the NexentaStor VSA and want even more benchmarking information, this post is for you!

ZFS filesystems can be configured to support compression, and in the later releases, deduplication. While both these features are useful in maximising the use of disk space, what is the impact on performance running with these options configured?

The base configuration of the appliance is the same as test 4 from the previous configuration: A mirrored pair of SATA disks, with a SSD L2ARC. The filesystem is configured to use the ZFS Intent Log (ZIL) for synchronous operations (the default) but a separate log is not configured.

The performance data for the base configuration is:
  • Sequential Block Reads: 59934K/sec (58.5MB/sec)
  • Sequential Block Writes: 28793K/sec (28.1MB/sec)
  • Rewrite: 17127K/sec (16.7MB/sec)
  • Random Seeks: 517.45/sec
The benchmark will use bonnie++ running on a Solaris 11 Express VM connecting to the NexentaStor appliance over an internal vSwitch. The bonnie++ command line is:


# /usr/local/sbin/bonnie++ -uroot -x 4 -f -s 4096 -d /mnt


See the previous blog post for an explanation of these options.

Test 1: Set compression=on

Enabling compression for a specific filesystem is very simple:

# zfs set compression=on testpool/testing

The results of the test were:
  • Sequential Block Reads: 52830K/sec (51.5MB/sec)
  • Sequential Block Writes: 38811K/sec (37.9MB/sec)
  • Rewrite: 18659K/sec (18.2MB/sec)
  • Random Seeks: 1188.95/sec

Reads were lower with compression enabled, but writes and rewrites were faster. Random seeks are much faster, but I cannot explain that, although suspect that if the bonnie++ data is highly compressible, this may cause "odd" results such as this.

Test 2: Set deduplication=on

For this test, compression was turned off and de-duplication turned on:

# zfs set compression=off testpool/testing
# zfs set dedup=on testpool/testing

The results of the test were:

  • Sequential Block Reads: 45806K/sec (44.7MB/sec)
  • Sequential Block Writes: 27550K/sec (26.9MB/sec)
  • Rewrite: 15179K/sec (14.8MB/sec)
  • Random Seeks: 464/sec
This shows that there is a performance penalty for enabling data deduplication. There is also a RAM overhead as the operating system needs to store the dedupe table in memory (not measured as part of this test).


Test 3: Set compression=on, deduplication=on


For this test, both compression and de-duplication were turned on:

# zfs set compression=on testpool/testing
# zfs set dedup=on testpool/testing

The results of the test were:
  • Sequential Block Reads: 53844K/sec (52.5MB/sec)
  • Sequential Block Writes: 34315K/sec (33.5MB/sec)
  • Rewrite: 17654K/sec (17.2MB/sec)
  • Random Seeks: 1331/sec
These results suggest that if deduplication is required (to save space), then the additional overhead of compression improves both the read and write performance. As with compression turned on, random seeks are improved significantly.

Conclusion

In conclusion, for maximum read performance, do not turn on compression or deduplication. For maximum write and rewrite performance, turn on compression. If deduplication is required, consider turning on compression as well as this improves dedupe performance. There is a CPU and memory overhead using these features, but as with most things, it's a case of balancing the cost vs the benefit.

1 comment:

waytoobusytocode said...

How do you think these results are affected by your use of an SSD cache? What parallels should I draw for a simple Mirror setup without SSD caching?