Saturday, 22 October 2011

HP Microserver SAN in a box: Benchmarks

In my previous post, I detailed the build of my new "SAN" using an HP Microserver and the Nexentastor virtual storage appliance. Interested in knowing what the server was capable of, I installed the bonnie++ benchmarking software on the Nexenta VM and ran a number of tests to see how well it would perform as a datastore host for the VMware lab:

Each test was run four times and the average of each result taken. All testing was performed against a four disk RAIDZ parity stripe with an SSD read cache.

For the first test, the ZFS filesystem was configured sync=standard and compression=off. This resulted in the following averages:

  • Block Writes: 65MB/sec
  • Rewrites: 39MB/sec
  • Block Reads: 173MB/sec
  • Random Seeks: 451.55

For the second test, sync=standard and compression=on:

  • Block Writes: 148MB/sec
  • Rewrites: 107MB/sec
  • Block Reads: 218MB/sec
  • Random Seeks: 2036.1

As the results show, enabling compression results in a huge performance boost.

Although not recommended in situations where data integrity is important, ZFS supports the option of disabling synchronous writes. The third test was run with sync=disabled and compression=on:

  • Block Writes: 164MB/sec
  • Rewrites: 113MB/sec
  • Block Reads: 216MB/sec
  • Random Seeks: 2522.76

As expected, disabling synchronous writes improved the write performance of the server and had a corresponding knock-on effect for the rewrites. Block reads, although marginally slower than the second test, were close enough to suggest the difference was environmental.

Although running with synchronous writes disabled resulted in the highest performance, in order to get the best possible data integrity, I opted to run with sync=standard and compression=on.

While running the benchmark with compression=on, I noted in the vSphere client that both the CPU cores in the Microserver ran at nearly 100% (the Nexenta VM has 2 x vCPUs assigned). This suggests that the performance here was limited, not by the disks, but by the rather weak CPU in the Microserver.

Sunday, 18 September 2011

HP Microserver: Building a SAN in a box

[Update 05/2013: I've since migrated my Nexenta install to a dedicated server. See here for the details.]

Following a recent upgrade to my home lab, my storage now looks like this:
  • A unified SAN capable of providing both block (iSCSI) and file (NFS/CIFS) data.
  • Eight disks:
    • 2 x SSD
    • 6 x SATA
  • Two discrete RAID groups: 
    • a 400GB (usable capacity) two disk mirror
    • a 1.2TB (usable capacity) four disk parity stripe
  • Both RAID groups have dedicated 20GB flash read caches.
  • LUNs can be configured to support compression and/or deduplication
  • Copy on Write (COW) snapshots for all filesystems and LUNs
  • Support for replicating filesystems and LUNs to a second SAN

All sounds pretty funky. Must be expensive right?

Actually, the above is all achieved using a very cheap HP Microserver running VMware ESXi and the Nexenta virtual storage appliance. I've assigned the Nexenta VM 4GB RAM, but it would happily use more for it's L1 read cache.

The HP Microserver has 4 x SATA disks (2 x 1TB and 2 x 500GB) with a single 60GB SSD disk.

The Nexenta virtual machine is then assigned VMDK files. The first RAID group is a mirror: one VMDK file on SATA disks 1 and 2. The second RAID group is a RAIDZ parity stripe: one VMDK file on SATA disks 1, 2, 3 and 4. The flash read caches are 20GB VMDK files on the SSD.

The compression, deduplication, snapshot and replication features are provided by the ZFS filesystem.

This is a pictorial representation of the configuration:


And this is what it looks like physically:




Oops. No, that's the NetApp at work. But functionality-wise they are quite similar (obviously the vastly more expensive NetApp is much faster!).

This is the real physical hardware (on the far right, next to the ML110 and ML115):



Pretty small for such a setup. I've currently only got the built-in NIC in the Microserver, but will look at adding another to create a dedicated storage network.

Saturday, 17 September 2011

Cisco SG200-26 review

Until recently I was using a Netgear GS108 switch for my home lab. This eight port, unmanaged switch performed well, but with the addition of a couple of HP Microservers, I ran out of free ports and needed something bigger.

Although not essential to the lab, I wanted a switch with a few more features. I initially looked at the Cisco SG200-18, the HP V1810-24G and a couple of other makes that I hadn't come across before (TP-Link and ZyXEL). The one requirement was that the new switch should be silent. The fans of a Cisco Catalyst switch would dominate the home office and was unacceptable.

I discounted the switches from TP-Link and ZyXEL because I couldn't find any decent reviews of them online. The HP V1810 was then discounted because the price hiked up to over £230. This left the Cisco SG200-18. I then noticed that the SG200-26 was only £3 more expensive at £188 (from Ebuyer), so buying the smaller switch would not have made financial sense. You can't have too many ports, right?




The first thing to say about the Cisco SG200-26 is that it is not an IOS switch. I assume it's the result of the purchase of Linksys. Having said that, the build quality is good, the switch is absolutely silent in operation but doesn't get hot (in contrast, the Netgear was hot to touch). The SG200-26 is a managed, layer 2 switch.

The SG200-26 has 24 standard 10/100/1000 ports, plus another two ports for uplinks. These can be RJ-45 10/100/1000 ports or SFP fibre ports (SFP modules not included). The form factor is standard rack-mount 1U (rack mount kit included) but also has attachable rubber feet for desktop use.


Configuration is through the web interface only (no SSH or serial interface), but does support external logging to a syslog server.

Be sure to upgrade to the latest firmware. This enabled the Cisco Discovery Protocol (CDP) which is very useful in vSphere networking for identifying which physical ports a NIC is plugged into.

In the web interface, ports can be given a description and those of us with OCD can spend a happy evening mapping this information into the switch. The port settings can also be used to state the speed and duplex setting of each port.

The SG200-26 supports up to four Link Aggregation Groups (LAGs) and can load balance based on either MAC address or IP/MAC address. Both static and dynamic (LACP) LAG groups can be configured. Up to eight ports can be assigned to a static LAG and sixteen ports to a dynamic LAG.

Multiple VLANs can be setup and managed as the switch supports 802.1q. Ports can be setup as trunk, general, access or Q-in-Q mode. VLAN pruning can be applied to trunk ports so that only specific VLANs are accessible to particular ports. The interface for this wasn't immediately obvious to me (and setting up the same in IOS initially seemed easier), but once I'd spent some time with it, the VLAN configuration was fairly straightforward. These VLAN options can be applied to either individual ports or a LAG.

In addition to these features, the SG200-26 can also be configured for QoS, there are numerous security features including 802.1X, Smartport macros to configure the port type (e.g, Printer, Desktop, Guest, Server etc.). Jumbo frames can be enabled, although this applies is a global setting that affects all ports (most switches, even expensive Cisco switches, work the same way). A "Green Ethernet" function reduces the power requirements of the switch by calculating the length of cable, and also by turning off unused ports to save energy.

As a lab switch, the SG200-26 is ideal. Personally, I would have liked to see a command line option for configuration as some tasks can be repetitive (e.g., setting up VLANs). Beyond that though, there is little to complain about. The SG200-26 is an excellent entry-level switch, with plenty of ports and a good range of options.



Some useful links:

The Cisco Small Business 200 Series Smart Switch Administration Guide

The Cisco Small Business Online Device Emulators page has a demo of the web interface for the SF300. The 300 series has additional layer 3 functionality, but you can get a good idea what the interface is like on the 200 series.

* Update 10/07/2012 * I experienced an issue where traffic between two ports (e.g., ports 1 and 2) would cause significant latency issues on other, unrelated ports. This was demonstrated by putting a ping on a host and watching the timings when there was significant network load (such as VM backups). This was resolved by upgrading the firmware to 1.1.2.0.

Friday, 9 September 2011

ISP router ARP cache problems when replacing servers

I experienced a problem today that took a while to understand so figured it was worth sharing...

Our external mail gateway was due for replacement and a new virtual machine was built, configured and tested alongside the old production server. Happy that everything was functioning as expected, the only remaining task was to disconnect the old server from the network and rename the IP address of the new server from its test IP to that of the old server. This would require no changes to DNS and total downtime would be about a minute.

The change was made and... nothing. No traffic to the new server.

Huh? I tested it from another IP on the public network and it was fine. We tried from another network and... nothing.

I changed the IP back to the test address and the server sprang into life.

After a significant amount of time brainstorming with colleagues as to what was happening, we hit upon the possible problem being an ARP cache issue on the ISP provided router. Unfortunately, we don't have administrative access to this router.

Fortunately, the ISP hadn't locked down the console port of the Cisco router and I was able to connect in and run a "show ip arp" command. Sure enough, it showed the MAC address of the old server. This meant that when packets arrived from the Internet the router was trying to forward them to the old server that was no longer on the network. If I had administrative access to the router, I would have been able to flush the ARP cache and all would have been good. But because this was a "managed" router, I wasn't able to do this. I could see the problem, I knew the solution, but couldn't fix it.

I did some research online to see what the default ARP cache timeout was: typically 4 hours.

I logged a call with the ISP which was not a particularly useful experience. The ISP is a subsidiary of Cable & Wireless, and if you've ever had the misfortune of working with that company you'll understand what I'm talking about! I was told I'd get a call back in 8 hours. Brilliant! Not.

There were a couple of other options: Pulling the Ethernet cable from the router would down the interface which I *think* will cause the ARP cache to flush. I didn't have the luxury of doing this in hours.

The final option was to try and get the new server to send a gratuitous ARP request. This is an ARP request that a server broadcasts about itself. The idea is that other devices on the network will update their ARP caches with the information.

My server however was hidden behind a Cisco ASA firewall.

As I was searching for ways to get this working, the ARP cache timed out (possibly due to the router configuration being lower than the default, although I can't see the config to confirm this) and the new server sprang into life.

At first I wasn't sure whether it was the gratuitous ARP that fixed it, but within the next hour, the ISP called and confirmed they cleared the cache. So fair play to them for getting on with it and sorting the problem.


It's been a learning experience in that even the simplest and quickest network change can have unforeseen side effects!

Saturday, 3 September 2011

Slow Windows 2008 install in an ESXi VM

This is just a quick note on a problem I've just experienced (and found a fix for!).

Having just built a new Windows 2008 VM and mounting the ISO from my NFS ISO datastore, I was surprised to see that the actual install was crawling along very slowly at the "Expanding Windows files" section:




As the above link indicates, the problem was the ESX hosts NIC configuration. Set to "auto-negotiate" (which it should be on gigabit connections), the port had managed to negotiate down to 10Mb half-duplex(!). I changed this to 1000Mbit full-duplex and then back to auto-negotiate (where it stayed at 1000Mbit full-duplex).



And the performance became speedy again!

Monday, 29 August 2011

HP Microserver: Remote Access Card

Remote access functionality, sometimes called "Lights Out" management, is a standard feature on mid- and high-end servers. It allows a system administrator to remotely access the console of the server as well as performing power on, off and reset operations. Most implementations also allow for remote media management, allowing the administrator to remotely connect CD-ROM or floppy images across the network to the server.

Low end servers, including the original ML110 and ML115 G5 servers, and the newer Microserver do not come with this functionality. However, it can be added as an extra.

I was out of spare slots on my KVM, so when I bought the Microserver, I included the Remote Access Card (RAC) in the purchase. The Microserver has a PCIe 16x and PCIe 1x slot. The RAC fits into the 1x slot, leaving another card free for upgrades.

The easiest way to configure the card is to initially use a keyboard and monitor.

The back of the card has a standard RJ45 Ethernet connector and a VGA port. The monitor needs to be connected to this port and not the onboard VGA port. Once connected, the machine can be powered on.

When prompted, press F10 to enter the ROM setup. From here, select the Advanced page and IPMI Configuration:


Select Set LAN Configuration:


Set the BMC LAN Configuration option to Static and then enter and IP address, subnet mask and default gateway:


While in here, it's also worth tuning the VGA configuration. Since this server isn't running anything graphical, I dropped the VGA RAM allocated down to the minumum. From the Advanced page, select PCI Express Configuration:



Under VGA Memory Size, select 32MB.

Exit the ROM setup and save settings. Reboot the server. If everything has been configured successfully, you can now disconnect the monitor and keyboard.

Once configured, open a browser to the IP port and you should get the login screen:



The default username is admin and the default password is password.

I've had problems sometimes getting past the login. My username/password is accepted, but I'm returned to the login page. To avoid, I always go to the index.html and not the login.html, and I use Firefox's Private Browsing mode. I assume a cookie is getting set incorrectly sometimes and this process seems to work around it.

Once logged in, the RAC presents a menu down the left hand side, with the main content on the right. Most is pretty self-explanatory.

Email settings


Remote power control

SNMP trap configuration

 The most interesting are at the bottom and provide access to the virtual media and virtual KVM (Keyboard, Video, Mouse):

Virtual KVM and Media configuration

The Virtual Media is a Java application (loads through Java Webstart) and allows either the local CD/DVD drive, or an ISO image to be connected remotely to the server:



The Virtual KVM is also a Jave Webstart application and provides access to the server console. Special keystrokes such as CTRL-ALT-DEL can be sent using the Macro menu. The following screenshot shows ESXi 5.0 running on the Microserver:


The only problem I had with the Java applications is when I attempted to access them with my Mac. For some reason it had problems opening the file. So I used Windows instead.

So how good is the RAC? While it's probably true to say that I won't be using it all that often, it's a very useful addition to the Microserver, especially if you want to put it somewhere out of the way like the garage or loft.

Unlike the more expensive ILO cards, the RAC does not have an onboard battery, so if the Microserver loses power completely, it's not possible to connect to it. However, if power is connected to the Microserver, you should be able to connect.

** Update 12-FEB-2013: My thanks to Tom Hall who commented that there is a 1.3 firmware for the RAC that fixes a problem where it becomes unresponsive to the network. I've seen this problem a couple of times and it's a pain as it basically makes the RAC useless. The new firmware should resolve this issue. **

Saturday, 27 August 2011

HP Microserver: BIOS upgrade

Despite my attempts to resist, the HP Microserver (with £100 cashback) was too tempting a deal, and I've recently taken ownership of a small server, Remote Access Card (RAC) for ILO functionality and 2 x 4GB memory sticks.

The Microserver comes with 4 internal SATA drive bays. Disks are mounted in the brackets and then slide into the server vertically. There is another drive bay on top for an optional optical (DVD) drive. A USB port on the motherboard can be used for installing a hypervisor like VMware ESXi.

My plan was to put 4 SATA disks into the internal bays and mount the SSD in the "ODD" (Optical Disk Drive) bay which would be used as a cache. The SSD is an OCZ Vertex 2 and is a 2.5" sized drive (as most SSDs are). To fit into a 3.5" bay, an adapter is provided. Another adapter was then required to fit the 3.5" bracket into the 5.25" bay.

The Microserver has six SATA ports. The four internal drives are connected to the Microserver's mainboard via a "MiniSAS" connector. The remaining two ports are configured as the internal optical port and an external eSATA port.

Unfortunately, the ODD SATA port and the external eSATA port are configured in "IDE Emulation" mode instead of the faster AHCI mode. This means that it will be limited to a maximum bus speed of 132MB/sec, significantly less than the 3Gbps that SATA can theoretically handle, and you lose some advanced features such as Native Command Queuing (NCQ). It's obviously not ideal to take your fastest disk and put it on the slowest port!

As I'm running VMware ESXi Hypervisor, this can be seen in the vSphere client. The four SATA disks appear on the SATA controller but the CD drive appears under the separate IDE controller as can be seen here:


Image courtesy of the excellent Techhead Microserver review and used with permission.


A fix appears to exist, courtesy of a Russian hacker, who has patched the Microserver BIOS to enable an option that allows the user to turn off IDE Emulation mode and change the port mode to standard SATA.

I was initially reluctant to install this hack in case it caused problems (and set my BIOS language to Russian; it doesn't!), but there are plenty of people who have used the hack without problems. To install, do the following:

Important: This worked for me. Apply at your own risk. I'm not responsible if this bricks your server! You probably won't be covered under warranty if you have problems.

Download the latest HP Systems ROMPaq Firmware Upgrade. (it doesn't matter what release you download as the modified BIOS will replace the version with its own).

To get started, open the start.htm file in the download and follow the instructions on writing the upgrade to a USB key.

Download the modified BIOS. You can get a copy of it here.

Once the USB keyhas been written, replace the *.ROM file with the modified BIOS, renaming it so that the original filename remains. The provided ROM on my system was called O41040211.ROM and the modified ROM was called O41_AHCI.ROM. I removed the O41040211.ROM and renamed O41_AHCI.ROM to O41040211.ROM.

Insert the USB drive in the Microserver and boot it. The firmware should apply. Once this returns the C:\ prompt, remove the USB drive and reboot.

Enter the BIOS when prompted to press F10.

Select the Chipset menu item and then SouthBridge Configuration (this is new functionality provided by the hack):


 Select SB Sata Configuration:


Set SATA IDE Combined Mode to Disabled:



Exit and save the BIOS changes. When ESXi boots, the Storage Adapters should now look like this:





All SATA ports are now running at the optimal AHCI mode allowing for up to six disks to be connected at full speed and with no legacy overhead.


Friday, 26 August 2011

Mac OS X Lion and CUPS printing

When I originally setup my Mac Mini a few years ago (with Leopard), I had some issues getting the printer setup on my network using CUPS. Having upgraded to Lion a couple of weeks ago, the printing problems returned.

The first problem I had was that my printer, an HP Deskjet 5150, was not on the supported list of Apple printer drivers. HP were similarly useless in not providing a driver.

The answer was found in the open source community. A quick download and install of Ghostscript, Foomatic-RIP and HPIJS make the correct driver available (along with many other printer drivers).

This allowed me to add my printer, but upon trying to print, the print queue window would report that it was "Unable to get printer status". Not helpful.

The remote printer is connected to my Netgear ReadyNAS Duo which runs an embedded Linux distribution and uses CUPS as the print server. Despite trying to dig into the debug options, I was not able to fix the printing error.

The default method of setting up a printer on a Mac is to use Bonjour for auto discovery. This uses the Internet Printing Protocol (IPP) under the hood but was failing. Attempts to setup the IPP queue manually also failed.

The fix that worked for me was to set up the printer as an SMB (Windows) printer. This uses the ReadyNAS's Samba install and printing now works! Not ideal, but does the job.

Wednesday, 13 July 2011

VMware announce vSphere 5 and why all we're talking about is licensing

Yesterday, VMware announced the latest version of their flagship product: vSphere 5. This new version further extends VMware's lead over its competitors in the virtualisation space giving users the ability to run more and bigger VMs. Compare the capabilities of vSphere vs XenServer or Hyper-V and the ongoing  technical superiority of vSphere is apparent.

The new version offers new features such as the ability to automatically provision ESXi servers, storage enhancements (improvements to VMFS, Storage DRS and Profile-Driven Storage), a rewritten HA component, a Virtual Storage Appliance (which looks interesting for SMBs) and a new ESXi firewall. All of which are useful additions to the product.

Regular Enterprise customers have less to get excited about since the Auto Deploy, Storage DRS and Profile-Driven Storage features join the Distributed vSwitch, Storage- and Network-I/O control and Host Profiles as Plus-only features.

You would imagine that the conversations on Twitter and in blogs would be about these amazing new features. You would be wrong.

In releasing vSphere 5, VMware have made a significant licensing change. Whereas previous versions were licensed per CPU, with a limit on the number of cores per CPU, the new version is licensed per CPU with no restriction on cores. However, the amount of memory that vSphere can use is now licensed which VMware are calling the "vRAM Entitlement". The vRAM entitlement is the amount of RAM used by VMs.

A single CPU licence for vSphere 5 Enterprise comes with a vRAM entitlement of 32GB RAM. In a dual CPU ESXi host, this means the sum of RAM allocated to VMs is 64GB. In real terms, this would be equivalent to 32x2GB RAM VMs or 16x4GB RAM VMs.

The vRAM calculation is based on a pool of all resources, so in a cluster with 8 hosts, each with 2 CPUs and 32GB RAM, the total vRAM licensed is 512GB (8 x 2 x 32). 

While the VMware Licensing, Pricing and Packaging PDF tries to make it sound like good news for end users because we will no longer be limited to a number of CPU cores, I wonder how many users are CPU-bound. Looking at the infrastructures that I support, CPU utilisation is never the bottleneck; available RAM is.

The way around this limitation is to purchase additional CPU licences. So, if you have a server with 2 CPUs and 128GB RAM running Enterprise, the 2 CPU licences you have will only support 64GB vRAM, so you'll need to purchase another 2 CPU licences. Ker-ching for VMware!

I have some sympathy for users who have deployed large scale servers such as Cisco UCS blades. With a comparatively "normal" CPU count, but huge memory capacity, the licensing requirement for these environments has just gone through the roof.

One of the advantages of VMware vs its competitors is the number of VMs that can be supported on a single host. With Transparent Page Sharing (TPS) and memory compression, vSphere can typically run more VMs than XenServer or Hyper-V and overcommit allows a 32GB server to run more than 32GBs worth of VMs. Now, the savings made in terms of required hardware is offset by the need to purchase more licences.

For those of us with home labs that we use for testing, the vRAM entitlement may be a significant bottleneck. The HP ML110 G6 is a cheap, single-socket server capable of holding 16GB RAM. The free VMware vSphere Hypervisor (AKA ESXi) will only support 8GB per CPU. Whether a second free licence can be applied is unknown, but if not, many home labs will be limited to 8GB vRAM per host. This means more servers will be required which can cause significant spouse issues.

The VMware perspective is understandable. The ability for servers to run with huge amounts of RAM means that organisations require fewer servers, which means less money to VMware.

So is there any good news in this? The one thing that I can think of is it now makes charge back of VM resources easier to calculate. It appears that instead of managing "traditional" virtual infrastructures, where IT is a cost centre, VMware are shifting to a world where everything is provisioned through a cloud infrastructure and IT is a service. In this new world, the ability to charge back will be a core component.

In summary, some users will be unaffected by this change, some will need to pay a bit more to make full use of their environment, and others will need to pay a lot more. I'm sure that VMware's competitors will have a field day with this.

(All the above is based on the VMware pricing document (linked above). If anyone spots anything incorrect, please comment and I'll fix it. Thanks!).

Sunday, 10 July 2011

How Windows Live Mesh broke my ReadyNAS backup

The following has taken me a while to figure out, but here is the answer and it's hopefully useful to someone else with the same (or similar) problem.

I have a Netgear ReadyNAS appliance which I bought because it there was a very good deal on at the time (buy a unit with a 1TB disk and get another 1TB disk for free). It's sat on my desk and not been doing much as I've had other projects to work on.

I decided to configure it as a backup NAS for some of my other machines, specifically T's HP Pavilion and my Aspire Revo, both running Windows 7. I was interested in backing up documents only, so configured the ReadyNAS to connect to each Windows PC fileshare and pull in the data.

This worked perfectly on T's PC but on mine, the ReadyNAS kept complaining that it could not connect to \\REVO\Users. I checked that T's PC could see the Revo share. It could. It tried it with my Mac and could browse the Revo using Finder. I upgraded the firmware in the ReadyNAS. That made no difference.

I then resorted to running smbtree on the command line to see what was happening. This gave me the following output:

\\REVO                  
cli_rpc_pipe_open: cli_nt_create failed on pipe \srvsvc to machine REVO.  Error was NT_STATUS_ACCESS_DENIED


Okay, this was a clue. I remembered the previous problems I had with Samba (which is also running on the ReadyNAS) and Windows 7. Microsoft had set the default authentication protocol to NTLMv2 which was not supported by older versions of Samba. The workaround was to set the Network Security policy on the Windows 7 box to accept NTLM (v1) instead. I check this on the Revo, but it was setup correctly.

I then tried to run smbclient -L //REVO to list the shares on the Revo. The equivalent command worked fine on T's PC, but I got the following on the Revo:

mac-mini:~$ smbclient -L //REVO
Password:
session setup failed: SUCCESS - 0


Time to turn on debugging (appending -d10 to the above command) and compare the output against the two machines. This showed that there was a difference in the authentication protocols being negotiated.

It was at this point, I remembered a forum post that when I read it seemed irrelevant. The post stated that if Microsoft Live Sign-On Assistant was running, this could cause problems as it introduces another authentication protocol "mechToken". Really? An application can break file sharing?

I run Windows Live Mesh on the Revo, which uses the Windows Live Sign-On Assistant. I uninstalled this. The connection worked! I then tested the ReadyNAS connection and... it worked!


I'm not sure where the "fault" lies here but life would be much easier if these protocols were all properly documented and were designed to gracefully fail if the software sees something unfamiliar.

Thursday, 9 June 2011

CrashPlan on Mac OS X

I've been testing CrashPlan as a method of backing up my files to the cloud. After taking a couple of weeks to get all the data up, it's working pretty well and I'm planning on paying for it on a monthly basis. The one downside is that having CrashPlan loaded appears to slow my old Core 2 Duo Mac Mini down. This seems to be because the Java app consumes several hundred megabytes over time.

CrashPlan comprises of two parts: The front end GUI and a Java-based engine. The first job I did was to customise the backup window so that it only backs up overnight. As my Mac is on all the time, this is not a problem. The reason for this is that I'm asleep and not using the computer, and my ISP, Plusnet, have a monthly bandwidth allowance, but 12am-8am traffic is not counted.

In order to work around the memory leak issue, I've created two root cron jobs to start up the engine just before it's needed, and shut it down again afterwards. To do this requires the UNIX command prompt (open the Terminal app):

First, open a root prompt by running a BASH shell:

$ sudo bash

(enter your password here)

Then edit the crontab by running:

# crontab -e

Enter the following lines:

# Start CrashPlan engine at 5 minutes to midnight in time for overnight run
55 23 * * * /bin/launchctl load /Library/LaunchDaemons/com.crashplan.engine.plist
# Stop CrashPlan engine at 5 minutes past 8am to free memory after overnight run
05 08 * * * /bin/launchctl unload /Library/LaunchDaemons/com.crashplan.engine.plist


Edit the vi editor (hit escape, then type :wq! and hit return).

You can view your crontab by running:

# crontab -l

This gives the benefit of overnight CrashPlan backups, but without having any unnecessary services running in the background during the day.

Wednesday, 30 March 2011

Passing the CCA: My experience

I've been quiet on the blogging front recently because I've been studying for my Citrix Certified Administrator for XenApp 5.0 for Windows 2008.

I took the exam yesterday and passed with 85% (the pass mark was 68%, so very pleased with that). As with all certs, the content of the exam is under NDA, but it might be useful to know my revision plan.

I attended the official Citrix course back in 2009, but didn't get around to doing the exam. The course is pretty thorough and details the architecture of XenApp, how to install it and configure the Web Interface, plugins, the Secure Gateway, along with application publishing and streaming, policies and the ever problematic subject of printing.

For revision, I downloaded the Exam Prep Guide and used that as the basis on which to read up from the course notes. I also subscribed to the excellent Citrixxperience site which had a very useful set of study notes as well as a large number of practice questions. Note: This is NOT a brain dump site!

I used the home lab to build a number of VMs based on Technet licences and a 60 day trial licence of XenApp 5.0. To simulate the various components of a XenApp build, I created one VM for Citrix licensing, one for the Web Interface server, two for the XenApp Server Farm and a VM for the streaming profiler. I would have liked to create an additional VM for the Secure Gateway, but this would have required a firewall and I didn't get around to it.

As for the exam itself, the Exam Prep Guide tells you all you need to know: Content, structure, times etc. I found that my experience with the practice questions gave a very similar result to my actual score.

I don't realistically expect to take my Citrix knowledge much deeper. I know enough to make some intelligent decisions when designing a XenApp solution, and it remains perhaps the best way to remotely deploy Windows applications over a web browser or in a thin client environment. Although Windows Server 2008 has improved the Terminal Services capabilities of Windows, Citrix still adds a significant number of useful enhancements with XenApp that are required in larger environments.

Sunday, 20 February 2011

Facebook: Protecting yourself from viral links

This entry is different from my normal posts. It's a response to the increasing number of viral links cropping up on Facebook. These are more than annoying and could in fact be ways in which unscrupulous people steal your personal data.

Okay, here’s how it works... One of your friends appears to post a comment on their wall urging you to click a link. For example:


Or this:



Or even:


(!)


The first thing to do is think "Why would my friend post this sort of link?". If it seems out of character, think carefully before clicking further.

Note the bit at the bottom. This was posted via “Who Visited You”, "9-9" and "Dad Caught Her Strippin". These are Facebook applications that have written the message. Sometimes these are okay (e.g., posted via iPhone/Android/Blackberry - apps you've installed on a mobile phone or tablet). But in these cases, it should cause alarm bells to ring.

So what happens when you click the link?

The link will try and get you to agree to install an application on your page. It's worth noting that Facebook applications have full access to your profile information, including your list of friends.

Here's the simple rule: Do not allow the application to install!

Applications like “Who Visited You” and others will pull in a list of your friends and write on their walls or update your status, pretending to be you and aiming to trick your friends into clicking the link.

Basically, it’s a computer virus.

Why do the application writers do this? Probably to try and harvest as much marketing information about you as possible, but it could be more insidious. If you’re publishing common “known facts” about you (e.g., your date of birth or what schools you went to), it could be used to steal your identity. Think of some of those security questions you get prompted for when you forget your email or online shopping account password. Are those answers in your profile?


Clearing up after it's happened...

If you’ve been caught out by one of these scamming apps, click the Account button at the top right of the Facebook window and select Privacy Settings:



Under “Apps and websites”, click the “Edit your settings” link:


Under “Apps you use”, click the link for “Remove unwanted or spammy apps”.



Delete (click the X on the right hand side next to the app) for all the apps you don’t want to have access to your personal information (the apps in the screenshot below are all valid apps, but you should look out for the dodgy ones).

In addition to the really obvious dodgy apps (such as those illustrated above), consider if you really want "How Blonde Are You?", "Which 80's song describes your life?", "Are you a potato?" (seriously?) or... "FarmVille" to have access to your personal data. Because when you sign up for one of these questionnaires or games, that's what you're doing.

When all the dodgy apps are removed, change your Facebook password!

Keeping safe on Facebook

Finally, if you don't believe me, at least watch this short YouTube video from anti-virus company Sophos that shows how these applications try and trick Facebook users into giving away personal data:



Your identity is important. Look after it.

Safe browsing!

Saturday, 22 January 2011

NexentaStor Community Edition: Troubleshooting the slow web interface

Although my experience with the NexentaStor Community Edition VSA has been largely positive, I found the web interface to be slow at times. I thought I'd do a bit of troubleshooting to see what was wrong...

The first step to troubleshooting is to get to a proper Unix prompt (remember that NexentaStor is built on the Solaris codebase). I opened an SSH session to the VSA and logged in as "admin". By default the admin shell is a bit special, and for real troubleshooting, we needed the root account. To get this, run the "su" command:

admin@nexenta01:~$ su
Password:
root@nexenta01:/export/home/admin#

Note that I ran "su" and not "su -".

VMware ESX admins may be familiar with "esxtop", and Linux admins with "top". The Solaris equivalent is "prstat":

   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP      
   877 root       36M   33M sleep   59    0   0:00:35 0.7% python2.5/15
  7312 root     4320K 3468K cpu0    59    0   0:00:00 0.4% prstat/1
   855 root       58M   26M sleep   59    0   0:00:33 0.3% nms/1
   196 root     7812K 4716K sleep   59    0   0:00:00 0.2% nscd/32
   576 root     6952K 4676K sleep   59    0   0:00:05 0.2% vmtoolsd/1
  3716 root       17M   15M sleep   44    5   0:00:04 0.1% volume-check/1
   596 root       39M 9092K sleep   59    0   0:00:00 0.1% nmdtrace/1
  7213 admin    7744K 5272K sleep   59    0   0:00:00 0.1% sshd/1
   953 root       58M   54M sleep   59    0   0:00:13 0.1% nms/1
  3560 root       16M   15M sleep   44    5   0:00:01 0.0% disk-check/1
  3564 root       18M   17M sleep   59    0   0:00:02 0.0% hosts-check/1
   434 root     3472K 2104K sleep   59    0   0:00:03 0.0% dbus-daemon/1
   324 root        0K    0K sleep   99  -20   0:00:02 0.0% zpool-testpool/136
     5 root        0K    0K sleep   99  -20   0:00:01 0.0% zpool-syspool/136
   509 root       18M   10M sleep   59    0   0:00:01 0.0% fmd/21
  1273 root       58M   54M sleep   59    0   0:00:05 0.0% nms/1
   392 root       13M 8400K sleep   59    0   0:00:00 0.0% smbd/18
   515 www-data   17M 6716K sleep   59    0   0:00:00 0.0% apache2/28
   234 root     2604K 1584K sleep  100    -   0:00:00 0.0% xntpd/1
  7231 root     4628K 2568K sleep   59    0   0:00:00 0.0% bash/1
   519 www-data   17M 6564K sleep   59    0   0:00:00 0.0% apache2/28
Total: 91 processes, 732 lwps, load averages: 0.55, 0.53, 0.55


When troubleshooting, I noticed that the process using the most CPU was "nms". This is a custom command provided by Nexenta. Curious to what this command was doing, I ran the truss command against the process id:

root@nexenta01:/export/home/admin# truss -f -p 855
855:    pollsys(0x08047AA0, 1, 0x08047B58, 0x00000000) (sleeping...)
855:    pollsys(0x08047AA0, 1, 0x08047B58, 0x00000000)    = 1
855:    read(4, " l01\00118\0\0\08D\0\0\0".., 2048)    = 176
855:    read(4, 0x0AB8DB78, 2048)            Err#11 EAGAIN
855:    stat64("/tmp/.nza", 0x0813E078)            = 0
855:    stat64("/tmp/.nza", 0x0813E078)            = 0
855:    stat64("/tmp/.nza/.appliance", 0x0813E078)    = 0
855:    open64("/tmp/.nza/.appliance", O_RDWR)        = 9
855:    fstat64(9, 0x0813DFE8)                = 0
855:    fcntl(9, F_SETFD, 0x00000001)            = 0
855:    llseek(9, 0, SEEK_CUR)                = 0
855:    fcntl(9, F_SETLKW64, 0x08047410)        = 0
855:    llseek(9, 0, SEEK_CUR)                = 0


The truss command traces system calls and although the output appear quite scary, you can learn a lot about what a process is doing without needing to know what the system calls are doing. Useful calls to look for are:
  • open() - opens a file for reading/writing. The number returned (on the right after the = sign) is the file descriptor.
  • close() - closes a file.
  • read() and write() - the first number after the "(" is the file that is being read from or written to. Cross reference it with the open() call.
  • stat() and stat64() - tests to see if a file exists. Don't worry if you get errors returned here as it might be the process looking for a file that may existing in multiple places (e.g., when scanning the PATH for an executable).
The -f option in truss means that child processes will be "followed". So if the process you are tracing forks another process, you will get the data on the child process as well. The -p tells truss to trace the numbered process (obtained from ps or prstat output).

That was a really quick intro to truss for the purposes of explaining how I debugged the problem. Truss is capable of a lot more than I've just described. See the man page ("man truss") for more details.

Back to the performance problem...

The truss output showed me that the nms process was scanning through a lot of ZFS snapshots. There seemed to be a lot of these snapshots. I obtained a list of snapshots on the system:

# zfs list -t snapshot

...and got hundreds back! Something was creating a large number of snapshots. On closer inspection, it appeared I was getting a new snapshot of some filesystems every 7 minutes:

filestore/Shared@snap-daily-1-2011-01-21-2122     1.23M      -  50.5G  -
filestore/Shared@snap-daily-1-2011-01-21-2129     1.22M      -  50.5G  -
filestore/Shared@snap-daily-1-2011-01-21-2136     1.39M      -  50.5G  -
filestore/Shared@snap-daily-1-2011-01-21-2143     1.22M      -  50.5G  -
filestore/Shared@snap-daily-1-2011-01-21-2150     1.12M      -  50.5G  -
filestore/Shared@snap-daily-1-2011-01-21-2157     1.06M      -  50.5G  -
filestore/Shared@snap-daily-1-2011-01-21-2201     1.04M      -  50.5G  -
filestore/Shared@snap-daily-1-2011-01-21-2208     1.04M      -  50.5G  -
filestore/Shared@snap-daily-1-2011-01-21-2215         0      -  50.5G  -
filestore/Shared@snap-daily-1-latest                  0      -  50.5G  -


It appeared that the filesystems with the large number of snapshots were also the filesystems that I had set to replicate to my second VSA using the auto-tier service. As a test, I listed all the auto-tier services and disabled the suspects:

root@nexenta01:/# svcs -a | grep auto-tier
online         20:33:40 svc:/system/filesystem/zfs/auto-tier:filestore-Software-000
online         20:33:41 svc:/system/filesystem/zfs/auto-tier:filestore-ISOs-000
online         20:33:42 svc:/system/filesystem/zfs/auto-tier:filestore-Shared-000
online         21:33:24 svc:/system/filesystem/zfs/auto-tier:filestore-Home-000
root@nexenta01:/# svcadm disable svc:/system/filesystem/zfs/auto-tier:filestore-Home-000
root@nexenta01:/# svcadm disable svc:/system/filesystem/zfs/auto-tier:filestore-Shared-000


The snapshots stopped.

To determine if the number of snapshots was the problem (I'd seen similar problems before), I destroyed all the snapshots for that filesystem:

root@nexenta01:/# for snapshot in $(zfs list -t snapshot | grep Home | grep snap-daily-1-2011-01 | awk '{ print $1 }'); do zfs destroy $snapshot; echo "Destroyed $snapshot"; done


The web interface performance was fast.

Okay, so that was the problem, but why was it happening? At first, I couldn't work it out and deleted and recreated the auto-tier jobs. Everything then worked fine... for a couple of days. Then the number of snapshots increased again.

This time I was able to identify a change in the configuration. I had had to reboot the second VSA because it had run out of memory (I assigned too little). This appears to have caused the link between the two to have been broken and the auto-tier jobs were running out of control.

Knowing that a new snapshot would fire every 7 minutes, I waited and ran the "ptree" command (shows the list of processes in a tree view showing the parent/child relationships) until I spotted the auto-tier job:

  2260  sh -c /lib/svc/method/zfs-auto-tier svc:/system/filesystem/zfs/auto-tier:filest
    2266  /usr/bin/perl /lib/svc/method/zfs-auto-tier svc:/system/filesystem/zfs/auto-tie
      2315  rsync -e ssh --delete --exclude-from=/var/lib/nza/rsync_excl.txt --inplace --ig
        2316  ssh nexenta02.local.zone rsync --server -lHogDtpre.isf --delete --ignore-errors


The problem here was the zfs-auto-tier service (process 2260). Although the full command is truncated, I compared it with the output from svcs (see above) and guessed it to be:

sh -c /lib/svc/method/zfs-auto-tier svc:/system/filesystem/zfs/auto-tier:filestore-Home-000

To examine the properties of this service, I ran:

root@nexenta01:/# svccfg -s svc:/system/filesystem/zfs/auto-tier:filestore-Shared-000 listprop
zfs                                application
zfs/action                         astring 
zfs/day                            astring  1
zfs/depth                          astring  1
zfs/dircontent                     astring  0
zfs/direction                      astring  1
zfs/exclude                        astring 
zfs/from-fs                        astring  /volumes/filestore/Shared
zfs/from-host                      astring  localhost
zfs/from-snapshot                  astring 
zfs/fs-name                        astring  filestore/Shared
zfs/keep_days                      astring  7
zfs/method                         astring  tier
zfs/minute                         astring  0
zfs/options                        astring  "--delete --exclude-from=/var/lib/nza/rsync_excl.txt --inplace --ignore-errors -HlptgoD"
zfs/proto                          astring  rsync+ssh
zfs/rate_limit                     astring  0
zfs/to-fs                          astring  /volumes/backup
zfs/to-host                        astring  nexenta02.local.zone
zfs/trace_level                    astring  1
zfs/type                           astring  daily
zfs/retry-timestamp                astring  1295492474
zfs/period                         astring  1
zfs/hour                           astring  2
zfs/last_replic_time               astring  12
zfs/time_started                   astring  21:43:18,Jan21
zfs/retry                          astring  1
startd                             framework
startd/duration                    astring  transient
general                            framework
general/enabled                    boolean  true
start                              method
start/exec                         astring  "/lib/svc/method/zfs-auto-tier start"
start/timeout_seconds              count    0
start/type                         astring  method
stop                               method
stop/exec                          astring  "/lib/svc/method/zfs-auto-tier stop"
stop/timeout_seconds               count    0
stop/type                          astring  method
refresh                            method
refresh/exec                       astring  "/lib/svc/method/zfs-auto-tier refresh"
refresh/timeout_seconds            count    0
refresh/type                       astring  method
restarter                          framework    NONPERSISTENT
restarter/auxiliary_state          astring  none
restarter/logfile                  astring  /var/svc/log/system-filesystem-zfs-auto-tier:filestore-Shared-000.log
restarter/start_pid                count    867
restarter/start_method_timestamp   time     1295642022.247594000
restarter/start_method_waitstatus  integer  0
restarter/transient_contract       count  
restarter/next_state               astring  none
restarter/state                    astring  online
restarter/state_timestamp          time     1295642022.255217000


The property that stood out was zfs/retry-timestamp and I guessed the value was a timestamp counting in seconds since the epoch. Converting the value turned it into a human-readable date:

Thu Jan 20 2011 03:01:14 GMT+0000 (BST)

This date was in the past, so was the script running because of this?

I editing the value:

svccfg -s svc:/system/filesystem/zfs/auto-tier:filestore-Home-000 setprop zfs/retry-timestamp=0

And waited...

No new snapshot was created!

I assume this a bug. The temporary failure of the second device should not cause the primary VSA to run amok! Fortunately, this fix appears to have worked and the auto-tier service is now working correctly. The web interface is also performing as expected!

Tuesday, 18 January 2011

NexentaStor Community Edition: Compression and Deduplication Benchmarks

If you read my previous post on benchmarking the NexentaStor VSA and want even more benchmarking information, this post is for you!

ZFS filesystems can be configured to support compression, and in the later releases, deduplication. While both these features are useful in maximising the use of disk space, what is the impact on performance running with these options configured?

The base configuration of the appliance is the same as test 4 from the previous configuration: A mirrored pair of SATA disks, with a SSD L2ARC. The filesystem is configured to use the ZFS Intent Log (ZIL) for synchronous operations (the default) but a separate log is not configured.

The performance data for the base configuration is:
  • Sequential Block Reads: 59934K/sec (58.5MB/sec)
  • Sequential Block Writes: 28793K/sec (28.1MB/sec)
  • Rewrite: 17127K/sec (16.7MB/sec)
  • Random Seeks: 517.45/sec
The benchmark will use bonnie++ running on a Solaris 11 Express VM connecting to the NexentaStor appliance over an internal vSwitch. The bonnie++ command line is:


# /usr/local/sbin/bonnie++ -uroot -x 4 -f -s 4096 -d /mnt


See the previous blog post for an explanation of these options.

Test 1: Set compression=on

Enabling compression for a specific filesystem is very simple:

# zfs set compression=on testpool/testing

The results of the test were:
  • Sequential Block Reads: 52830K/sec (51.5MB/sec)
  • Sequential Block Writes: 38811K/sec (37.9MB/sec)
  • Rewrite: 18659K/sec (18.2MB/sec)
  • Random Seeks: 1188.95/sec

Reads were lower with compression enabled, but writes and rewrites were faster. Random seeks are much faster, but I cannot explain that, although suspect that if the bonnie++ data is highly compressible, this may cause "odd" results such as this.

Test 2: Set deduplication=on

For this test, compression was turned off and de-duplication turned on:

# zfs set compression=off testpool/testing
# zfs set dedup=on testpool/testing

The results of the test were:

  • Sequential Block Reads: 45806K/sec (44.7MB/sec)
  • Sequential Block Writes: 27550K/sec (26.9MB/sec)
  • Rewrite: 15179K/sec (14.8MB/sec)
  • Random Seeks: 464/sec
This shows that there is a performance penalty for enabling data deduplication. There is also a RAM overhead as the operating system needs to store the dedupe table in memory (not measured as part of this test).


Test 3: Set compression=on, deduplication=on


For this test, both compression and de-duplication were turned on:

# zfs set compression=on testpool/testing
# zfs set dedup=on testpool/testing

The results of the test were:
  • Sequential Block Reads: 53844K/sec (52.5MB/sec)
  • Sequential Block Writes: 34315K/sec (33.5MB/sec)
  • Rewrite: 17654K/sec (17.2MB/sec)
  • Random Seeks: 1331/sec
These results suggest that if deduplication is required (to save space), then the additional overhead of compression improves both the read and write performance. As with compression turned on, random seeks are improved significantly.

Conclusion

In conclusion, for maximum read performance, do not turn on compression or deduplication. For maximum write and rewrite performance, turn on compression. If deduplication is required, consider turning on compression as well as this improves dedupe performance. There is a CPU and memory overhead using these features, but as with most things, it's a case of balancing the cost vs the benefit.

Friday, 14 January 2011

NexentaStor Community Edition: Benchmarking

In a previous post I discussed how I implemented a Virtual Storage Appliance (VSA) on my VMware home server running the NexentaStor Community Edition operating system. While getting everything working was fairly straightforward, knowing how well it was running required some benchmarking.

I've used the bonnie++ benchmark program before and generally like the way it works. Although I suspect  most of this testing could be done through the web interface (setting up the disks etc.), I found it easier and quicker to use the command line and the Solaris commands.

For the test, I create a new VMDK (20GB) on my primary SATA drive and published it to the VSA. I then created a new pool and added the disk:

# zpool create testpool c1t5d0

I then created a filesystem in the pool:

# zfs create testpool/testing

For this testing, I did not enable compression or deduplication (perhaps a topic for another day...).

I ran the bonnie++ benchmark with the following command line:

#  /usr/local/sbin/bonnie++ -uroot -s 8192 -d /testpool/testing

The size (8192) tells bonnie++ to create test data that is 8GB in size. This is twice the RAM allocated to the VM, so prevents the results being skewed by using data cached in memory. I then ran each test 4 times and averaged the results. No other significant activity was taking place while the tests were running. To provide a consistent environment, I used CPU and memory reservations for the VM. I opted to focus on the sequential block reads, sequential block writes (ZFS buffers random writes and writes them sequentially), rewrite and random seek performance. A good guide to understanding bonnie++ output in a ZFS contact can be found here.

Test 1: One SATA based disk

This is the basic starting point: One disk in the pool:
  • Sequential Block Reads: 75492K/sec (73.5MB/sec)
  • Sequential Block Writes: 61966K/sec (60.5MB/sec)
  • Rewrite: 26873K/sec (26.2MB/sec)
  • Random Seeks: 278.8/sec

Test 2: Mirrored SATA disks

I added a second VMDK to the NexentaStor VM locating it on the second SATA disk. The new disk was then added to the test pool as a mirror. Once the resilvering was complete, the test was re-run:
  • Sequential Block Reads: 76141K/sec (74.3MB/sec)
  • Sequential Block Writes: 52331K/sec (51.1MB/sec)
  • Rewrite: 31525K/sec (30.7MB/sec)
  • Random Seeks: 292.6/sec
So we can see that in a mirrored configuration, block reads are marginally faster, block writes are slower, rewrites of existing blocks is faster and the number of random seeks has increased. I was a bit surprised that the block read was not much higher given that running a "zpool iostat" on the test pool shows that the read load is balanced across both disks. The slower writes are no surprise as the kernel has to write the same data to two separate devices.

Test 3: Mirrored SATA disks with SSD L2ARC

I added another VMDK to the NexentaStor VM locating it on the SSD datastore. The new disk was added to the test pool as a cache device, implementing a L2ARC:
  • Sequential Block Reads: 116837K/sec (114MB/sec)
  • Sequential Block Writes: 65454K/sec (63.9MB/sec)
  • Rewrite: 37598K/sec (36.7MB/sec)
  • Random Seeks: 440/sec
The L2ARC has improved the read performance significantly, and surprisingly the write performance is faster too (not sure why this is because the L2ARC is a read-only cache). Rewrite is a bit faster and random seeks is much higher (to be expected with an SSD).

So at this point, we have a pretty good idea what the performance of the NexentaStor appliance is writing to local disk. The next test is to see what the performance is like over NFS...

Test 4: NFS test from Solaris 11 Express VM

The Solaris 11 Express VM is running on the same host and is connected to the VSA by the same vSwitch.

The mount operation was performed by running:

# mount -F nfs nexenta01:/testpool/testing /mnt

The bonnie++ command was:

# /usr/local/sbin/bonnie++ -uroot -x 4 -f -s 4096 -d /mnt

The size of the testing dataset was reduced from the 8192MB on the VSA because the Solaris 11 Express VM only has 2GB of RAM and 4096MB is enough to ensure the VM isn't caching the data in its RAM. It's less than the RAM in the VSA, but at this point we're interested in the performance over the network to clients, not the speed of the appliance itself.

The default behaviour for Solaris NFS is to perform synchronous writes (see my last blog post for a quick primer on NFS/ZFS interactions). Using the zilstat script, I was able to confirm that the ZIL was written to during the benchmark run, proving that the write operations were indeed synchronous. As expected, performance was much worse:
  • Sequential Block Reads: 59934K/sec (58.5MB/sec)
  • Sequential Block Writes: 28793K/sec (28.1MB/sec)
  • Rewrite: 17127K/sec (16.7MB/sec)
  • Random Seeks: 517.45/sec
Of course, the network stack will be an overhead, but it's worth seeing if we can improve on these times...

Test 5: NFS test from Solaris 11 Express VM, sync=disabled on NexentaStor NFS server

The ZIL is used in ZFS to log synchronous writes to a secure place before writing the data to the pool. The NFS client will wait until the server has confirmed the write to the ZIL before continuing processing. Very good for secure data, but does slow things down. The ZFS sync=disabled option bypasses the ZIL and buffers the request in the server's RAM until it is commited to disk. In real world terms, it's more unreliable, but it's about the same as other non-ZFS based NFS servers such as Linux.

The command to disable synchronous writes (on a per-filesystem basis), is:

# zfs set sync=disabled testpool/testing

The tests were then re-run:

  • Sequential Block Reads: 69438K/sec (67.8MB/sec)
  • Sequential Block Writes: 49177K/sec (48MB/sec)
  • Rewrite: 20737K/sec (20.2MB/sec)
  • Random Seeks: 520.9/sec
As the test ran, I monitored the ZIL utilisation using zilstat and confirmed that the ZIL was not being used. The results show a significant improvement in writes of approximately 20MB/sec and a smaller improvement in rewrites.


Test 6: NFS test from Solaris 11 Express VM, sync=standard, separate slog on NexentaStor NFS server

Disabling the ZIL improved NFS performance, but what would happen if the ZIL was placed on a separate SSD disk? To do this, I created a new disk from the SSD datastore and attached it to the pool:

# zpool add testpool log c1t4d0

I changed the sync property back to standard (re-enabling synchronous writes) and ran the tests, using zilstat to confirm that the ZIL was being written to:
  • Sequential Block Reads: 55142K/sec (53.8MB/sec)
  • Sequential Block Writes: 23931K/sec (23.3MB/sec)
  • Rewrite: 16518.5K/sec (16.1MB/sec)
  • Random Seeks: 628.1/sec
Well this was unexpected! The separate ZIL has produced worse results than using the pool SATA disks writing the data twice! The surprising drop in performance may be due to the type of SSD I'm using (OCZ Vertex 2). This is a Multi Level Cell (MLC) type device which is optimised for read operations (most consumer SSDs are MLC). For high performance writes, Single Level Cell (SLC) SSDs are recommended, but they are far more expensive.

Conclusion

To wrap this up then, there are two options to consider when running NFS on ZFS:

  1. Enable the ZIL, experience slower performance but know the data is secure
  2. Disable the ZIL, experience faster performance but understand the risks
The "best" option depends on the environment the VSA is serving. Fortunately the ZIL can be turned on or off on a per-filesystem basis. This means that non-critical test lab VMs can sit on a filesystem with no ZIL for maximum performance, while critical data (e.g., family photos/videos and the copy of your tax return) can be configured with end-to-end consistency.

If you are running a VMware home lab and are looking for a decent virtual storage appliance, NexentaStor CE is definitely worth a look, and as you can see, has plenty of features!