Sunday, 18 September 2011

HP Microserver: Building a SAN in a box

[Update 05/2013: I've since migrated my Nexenta install to a dedicated server. See here for the details.]

Following a recent upgrade to my home lab, my storage now looks like this:
  • A unified SAN capable of providing both block (iSCSI) and file (NFS/CIFS) data.
  • Eight disks:
    • 2 x SSD
    • 6 x SATA
  • Two discrete RAID groups: 
    • a 400GB (usable capacity) two disk mirror
    • a 1.2TB (usable capacity) four disk parity stripe
  • Both RAID groups have dedicated 20GB flash read caches.
  • LUNs can be configured to support compression and/or deduplication
  • Copy on Write (COW) snapshots for all filesystems and LUNs
  • Support for replicating filesystems and LUNs to a second SAN

All sounds pretty funky. Must be expensive right?

Actually, the above is all achieved using a very cheap HP Microserver running VMware ESXi and the Nexenta virtual storage appliance. I've assigned the Nexenta VM 4GB RAM, but it would happily use more for it's L1 read cache.

The HP Microserver has 4 x SATA disks (2 x 1TB and 2 x 500GB) with a single 60GB SSD disk.

The Nexenta virtual machine is then assigned VMDK files. The first RAID group is a mirror: one VMDK file on SATA disks 1 and 2. The second RAID group is a RAIDZ parity stripe: one VMDK file on SATA disks 1, 2, 3 and 4. The flash read caches are 20GB VMDK files on the SSD.

The compression, deduplication, snapshot and replication features are provided by the ZFS filesystem.

This is a pictorial representation of the configuration:


And this is what it looks like physically:




Oops. No, that's the NetApp at work. But functionality-wise they are quite similar (obviously the vastly more expensive NetApp is much faster!).

This is the real physical hardware (on the far right, next to the ML110 and ML115):



Pretty small for such a setup. I've currently only got the built-in NIC in the Microserver, but will look at adding another to create a dedicated storage network.

Saturday, 17 September 2011

Cisco SG200-26 review

Until recently I was using a Netgear GS108 switch for my home lab. This eight port, unmanaged switch performed well, but with the addition of a couple of HP Microservers, I ran out of free ports and needed something bigger.

Although not essential to the lab, I wanted a switch with a few more features. I initially looked at the Cisco SG200-18, the HP V1810-24G and a couple of other makes that I hadn't come across before (TP-Link and ZyXEL). The one requirement was that the new switch should be silent. The fans of a Cisco Catalyst switch would dominate the home office and was unacceptable.

I discounted the switches from TP-Link and ZyXEL because I couldn't find any decent reviews of them online. The HP V1810 was then discounted because the price hiked up to over £230. This left the Cisco SG200-18. I then noticed that the SG200-26 was only £3 more expensive at £188 (from Ebuyer), so buying the smaller switch would not have made financial sense. You can't have too many ports, right?




The first thing to say about the Cisco SG200-26 is that it is not an IOS switch. I assume it's the result of the purchase of Linksys. Having said that, the build quality is good, the switch is absolutely silent in operation but doesn't get hot (in contrast, the Netgear was hot to touch). The SG200-26 is a managed, layer 2 switch.

The SG200-26 has 24 standard 10/100/1000 ports, plus another two ports for uplinks. These can be RJ-45 10/100/1000 ports or SFP fibre ports (SFP modules not included). The form factor is standard rack-mount 1U (rack mount kit included) but also has attachable rubber feet for desktop use.


Configuration is through the web interface only (no SSH or serial interface), but does support external logging to a syslog server.

Be sure to upgrade to the latest firmware. This enabled the Cisco Discovery Protocol (CDP) which is very useful in vSphere networking for identifying which physical ports a NIC is plugged into.

In the web interface, ports can be given a description and those of us with OCD can spend a happy evening mapping this information into the switch. The port settings can also be used to state the speed and duplex setting of each port.

The SG200-26 supports up to four Link Aggregation Groups (LAGs) and can load balance based on either MAC address or IP/MAC address. Both static and dynamic (LACP) LAG groups can be configured. Up to eight ports can be assigned to a static LAG and sixteen ports to a dynamic LAG.

Multiple VLANs can be setup and managed as the switch supports 802.1q. Ports can be setup as trunk, general, access or Q-in-Q mode. VLAN pruning can be applied to trunk ports so that only specific VLANs are accessible to particular ports. The interface for this wasn't immediately obvious to me (and setting up the same in IOS initially seemed easier), but once I'd spent some time with it, the VLAN configuration was fairly straightforward. These VLAN options can be applied to either individual ports or a LAG.

In addition to these features, the SG200-26 can also be configured for QoS, there are numerous security features including 802.1X, Smartport macros to configure the port type (e.g, Printer, Desktop, Guest, Server etc.). Jumbo frames can be enabled, although this applies is a global setting that affects all ports (most switches, even expensive Cisco switches, work the same way). A "Green Ethernet" function reduces the power requirements of the switch by calculating the length of cable, and also by turning off unused ports to save energy.

As a lab switch, the SG200-26 is ideal. Personally, I would have liked to see a command line option for configuration as some tasks can be repetitive (e.g., setting up VLANs). Beyond that though, there is little to complain about. The SG200-26 is an excellent entry-level switch, with plenty of ports and a good range of options.



Some useful links:

The Cisco Small Business 200 Series Smart Switch Administration Guide

The Cisco Small Business Online Device Emulators page has a demo of the web interface for the SF300. The 300 series has additional layer 3 functionality, but you can get a good idea what the interface is like on the 200 series.

* Update 10/07/2012 * I experienced an issue where traffic between two ports (e.g., ports 1 and 2) would cause significant latency issues on other, unrelated ports. This was demonstrated by putting a ping on a host and watching the timings when there was significant network load (such as VM backups). This was resolved by upgrading the firmware to 1.1.2.0.

Friday, 9 September 2011

ISP router ARP cache problems when replacing servers

I experienced a problem today that took a while to understand so figured it was worth sharing...

Our external mail gateway was due for replacement and a new virtual machine was built, configured and tested alongside the old production server. Happy that everything was functioning as expected, the only remaining task was to disconnect the old server from the network and rename the IP address of the new server from its test IP to that of the old server. This would require no changes to DNS and total downtime would be about a minute.

The change was made and... nothing. No traffic to the new server.

Huh? I tested it from another IP on the public network and it was fine. We tried from another network and... nothing.

I changed the IP back to the test address and the server sprang into life.

After a significant amount of time brainstorming with colleagues as to what was happening, we hit upon the possible problem being an ARP cache issue on the ISP provided router. Unfortunately, we don't have administrative access to this router.

Fortunately, the ISP hadn't locked down the console port of the Cisco router and I was able to connect in and run a "show ip arp" command. Sure enough, it showed the MAC address of the old server. This meant that when packets arrived from the Internet the router was trying to forward them to the old server that was no longer on the network. If I had administrative access to the router, I would have been able to flush the ARP cache and all would have been good. But because this was a "managed" router, I wasn't able to do this. I could see the problem, I knew the solution, but couldn't fix it.

I did some research online to see what the default ARP cache timeout was: typically 4 hours.

I logged a call with the ISP which was not a particularly useful experience. The ISP is a subsidiary of Cable & Wireless, and if you've ever had the misfortune of working with that company you'll understand what I'm talking about! I was told I'd get a call back in 8 hours. Brilliant! Not.

There were a couple of other options: Pulling the Ethernet cable from the router would down the interface which I *think* will cause the ARP cache to flush. I didn't have the luxury of doing this in hours.

The final option was to try and get the new server to send a gratuitous ARP request. This is an ARP request that a server broadcasts about itself. The idea is that other devices on the network will update their ARP caches with the information.

My server however was hidden behind a Cisco ASA firewall.

As I was searching for ways to get this working, the ARP cache timed out (possibly due to the router configuration being lower than the default, although I can't see the config to confirm this) and the new server sprang into life.

At first I wasn't sure whether it was the gratuitous ARP that fixed it, but within the next hour, the ISP called and confirmed they cleared the cache. So fair play to them for getting on with it and sorting the problem.


It's been a learning experience in that even the simplest and quickest network change can have unforeseen side effects!

Saturday, 3 September 2011

Slow Windows 2008 install in an ESXi VM

This is just a quick note on a problem I've just experienced (and found a fix for!).

Having just built a new Windows 2008 VM and mounting the ISO from my NFS ISO datastore, I was surprised to see that the actual install was crawling along very slowly at the "Expanding Windows files" section:




As the above link indicates, the problem was the ESX hosts NIC configuration. Set to "auto-negotiate" (which it should be on gigabit connections), the port had managed to negotiate down to 10Mb half-duplex(!). I changed this to 1000Mbit full-duplex and then back to auto-negotiate (where it stayed at 1000Mbit full-duplex).



And the performance became speedy again!