Friday, 14 January 2011

NexentaStor Community Edition: Benchmarking

In a previous post I discussed how I implemented a Virtual Storage Appliance (VSA) on my VMware home server running the NexentaStor Community Edition operating system. While getting everything working was fairly straightforward, knowing how well it was running required some benchmarking.

I've used the bonnie++ benchmark program before and generally like the way it works. Although I suspect  most of this testing could be done through the web interface (setting up the disks etc.), I found it easier and quicker to use the command line and the Solaris commands.

For the test, I create a new VMDK (20GB) on my primary SATA drive and published it to the VSA. I then created a new pool and added the disk:

# zpool create testpool c1t5d0

I then created a filesystem in the pool:

# zfs create testpool/testing

For this testing, I did not enable compression or deduplication (perhaps a topic for another day...).

I ran the bonnie++ benchmark with the following command line:

#  /usr/local/sbin/bonnie++ -uroot -s 8192 -d /testpool/testing

The size (8192) tells bonnie++ to create test data that is 8GB in size. This is twice the RAM allocated to the VM, so prevents the results being skewed by using data cached in memory. I then ran each test 4 times and averaged the results. No other significant activity was taking place while the tests were running. To provide a consistent environment, I used CPU and memory reservations for the VM. I opted to focus on the sequential block reads, sequential block writes (ZFS buffers random writes and writes them sequentially), rewrite and random seek performance. A good guide to understanding bonnie++ output in a ZFS contact can be found here.

Test 1: One SATA based disk

This is the basic starting point: One disk in the pool:
  • Sequential Block Reads: 75492K/sec (73.5MB/sec)
  • Sequential Block Writes: 61966K/sec (60.5MB/sec)
  • Rewrite: 26873K/sec (26.2MB/sec)
  • Random Seeks: 278.8/sec

Test 2: Mirrored SATA disks

I added a second VMDK to the NexentaStor VM locating it on the second SATA disk. The new disk was then added to the test pool as a mirror. Once the resilvering was complete, the test was re-run:
  • Sequential Block Reads: 76141K/sec (74.3MB/sec)
  • Sequential Block Writes: 52331K/sec (51.1MB/sec)
  • Rewrite: 31525K/sec (30.7MB/sec)
  • Random Seeks: 292.6/sec
So we can see that in a mirrored configuration, block reads are marginally faster, block writes are slower, rewrites of existing blocks is faster and the number of random seeks has increased. I was a bit surprised that the block read was not much higher given that running a "zpool iostat" on the test pool shows that the read load is balanced across both disks. The slower writes are no surprise as the kernel has to write the same data to two separate devices.

Test 3: Mirrored SATA disks with SSD L2ARC

I added another VMDK to the NexentaStor VM locating it on the SSD datastore. The new disk was added to the test pool as a cache device, implementing a L2ARC:
  • Sequential Block Reads: 116837K/sec (114MB/sec)
  • Sequential Block Writes: 65454K/sec (63.9MB/sec)
  • Rewrite: 37598K/sec (36.7MB/sec)
  • Random Seeks: 440/sec
The L2ARC has improved the read performance significantly, and surprisingly the write performance is faster too (not sure why this is because the L2ARC is a read-only cache). Rewrite is a bit faster and random seeks is much higher (to be expected with an SSD).

So at this point, we have a pretty good idea what the performance of the NexentaStor appliance is writing to local disk. The next test is to see what the performance is like over NFS...

Test 4: NFS test from Solaris 11 Express VM

The Solaris 11 Express VM is running on the same host and is connected to the VSA by the same vSwitch.

The mount operation was performed by running:

# mount -F nfs nexenta01:/testpool/testing /mnt

The bonnie++ command was:

# /usr/local/sbin/bonnie++ -uroot -x 4 -f -s 4096 -d /mnt

The size of the testing dataset was reduced from the 8192MB on the VSA because the Solaris 11 Express VM only has 2GB of RAM and 4096MB is enough to ensure the VM isn't caching the data in its RAM. It's less than the RAM in the VSA, but at this point we're interested in the performance over the network to clients, not the speed of the appliance itself.

The default behaviour for Solaris NFS is to perform synchronous writes (see my last blog post for a quick primer on NFS/ZFS interactions). Using the zilstat script, I was able to confirm that the ZIL was written to during the benchmark run, proving that the write operations were indeed synchronous. As expected, performance was much worse:
  • Sequential Block Reads: 59934K/sec (58.5MB/sec)
  • Sequential Block Writes: 28793K/sec (28.1MB/sec)
  • Rewrite: 17127K/sec (16.7MB/sec)
  • Random Seeks: 517.45/sec
Of course, the network stack will be an overhead, but it's worth seeing if we can improve on these times...

Test 5: NFS test from Solaris 11 Express VM, sync=disabled on NexentaStor NFS server

The ZIL is used in ZFS to log synchronous writes to a secure place before writing the data to the pool. The NFS client will wait until the server has confirmed the write to the ZIL before continuing processing. Very good for secure data, but does slow things down. The ZFS sync=disabled option bypasses the ZIL and buffers the request in the server's RAM until it is commited to disk. In real world terms, it's more unreliable, but it's about the same as other non-ZFS based NFS servers such as Linux.

The command to disable synchronous writes (on a per-filesystem basis), is:

# zfs set sync=disabled testpool/testing

The tests were then re-run:

  • Sequential Block Reads: 69438K/sec (67.8MB/sec)
  • Sequential Block Writes: 49177K/sec (48MB/sec)
  • Rewrite: 20737K/sec (20.2MB/sec)
  • Random Seeks: 520.9/sec
As the test ran, I monitored the ZIL utilisation using zilstat and confirmed that the ZIL was not being used. The results show a significant improvement in writes of approximately 20MB/sec and a smaller improvement in rewrites.


Test 6: NFS test from Solaris 11 Express VM, sync=standard, separate slog on NexentaStor NFS server

Disabling the ZIL improved NFS performance, but what would happen if the ZIL was placed on a separate SSD disk? To do this, I created a new disk from the SSD datastore and attached it to the pool:

# zpool add testpool log c1t4d0

I changed the sync property back to standard (re-enabling synchronous writes) and ran the tests, using zilstat to confirm that the ZIL was being written to:
  • Sequential Block Reads: 55142K/sec (53.8MB/sec)
  • Sequential Block Writes: 23931K/sec (23.3MB/sec)
  • Rewrite: 16518.5K/sec (16.1MB/sec)
  • Random Seeks: 628.1/sec
Well this was unexpected! The separate ZIL has produced worse results than using the pool SATA disks writing the data twice! The surprising drop in performance may be due to the type of SSD I'm using (OCZ Vertex 2). This is a Multi Level Cell (MLC) type device which is optimised for read operations (most consumer SSDs are MLC). For high performance writes, Single Level Cell (SLC) SSDs are recommended, but they are far more expensive.

Conclusion

To wrap this up then, there are two options to consider when running NFS on ZFS:

  1. Enable the ZIL, experience slower performance but know the data is secure
  2. Disable the ZIL, experience faster performance but understand the risks
The "best" option depends on the environment the VSA is serving. Fortunately the ZIL can be turned on or off on a per-filesystem basis. This means that non-critical test lab VMs can sit on a filesystem with no ZIL for maximum performance, while critical data (e.g., family photos/videos and the copy of your tax return) can be configured with end-to-end consistency.

If you are running a VMware home lab and are looking for a decent virtual storage appliance, NexentaStor CE is definitely worth a look, and as you can see, has plenty of features!

No comments: