Saturday, 26 September 2009

ESX/ESXi networking best practices

VMware Virtual Infrastructure 3 had some fairly well defined best practices for networking. There are three types of network connection that can be setup in an ESX host:
  • Service Console
  • VMkernel for vMotion
  • Virtual Machines network
Although it is possible to have a single physical NIC connecting to a single vSwitch with the above port groups configured, this is not a good approach because the NIC becomes a single point of failure. Adding a second NIC helps and teaming the two physical NICs provides redundancy, but you could still experience performance problems when a bandwidth intensive operation occurs (such as a vMotion) as VM traffic could suffer from a lack of bandwidth.

One solution is to add additional NICs to the vSwitch and hope that the increase in bandwidth is sufficient so that vMotion traffic does not impact on connected users, but there is another potential issue: vMotion traffic is unencrypted, and the content of a vMotion operation is the memory of a virtual machine. The problem here is that if a malicious user is able to eavesdrop on the connection, they might be able to access sensitive data. Using separate VLANs help, but you're still effectively crossing your fingers and hoping everything will be okay.

The safer approach is to separate the Virtual Machine traffic from the vMotion traffic using a separate vSwitch and assigning each vSwitch two physical NICs for redundancy. This ensures that the vMotion traffic is physically isolated from the VMs.

So where to put the Service Console? It is possible to assign it to either the Virtual Machine network vSwitch or the vMotion VMkernel network vSwitch. It's worth pointing out that Virtual Center requires access to the Service Console, so depending on where you run Virtual Center, this might impact on which vSwitch you assign the Service Console. By placing the Service Console on a separate vSwitch to the Virtual Machines network helps to reduce the ability of malicious users of hacking the SC. It's common (especially in environments where servers have 4 physical NICs) to find a configuration where one vSwitch is dedicated to VMs, and the second vSwitch shares the Service Console and vMotion VMkernel port groups.

A four NIC server configuration could look like:

  • 2 x NICs for Virtual Machine traffic
  • 2 x NICs for vMotion traffic and the Service Console

If six NICs are available, the configuration could look like:

  • 2 x NICs for Virtual Machine traffic
  • 2 x NICs for vMotion traffic
  • 2 x NICs for the Service Console
Although the Service Console doesn't require much bandwidth, some sites perform backups from within the SC which can have a significant network overhead.

What about non-Fibre Channel storage?

For users with iSCSI or NFS, things are slightly more complex. The VMware iSCSI initiator lives inside the Service Console but also requires a VMkernel network for actual storage traffic. It would be logical therefore to put these two port groups on the same vSwitch.

Both iSCSI and NFS traffic should be assigned to a separate network to increase security and guarantee that bandwidth is available regardless of whether a vMotion or heavy VM traffic is occuring. Two additional NICs should be allocated, resulting in the following six NIC configuration:

  • 2 x NICs for Virtual Machine traffic
  • 2 x NICs for vMotion traffic
  • 2 x NICs for Service Console and iSCSI/NFS storage

Given an additional two NICs (8 total), the following could be configured:

  • 2 x NICs for Virtual Machine traffic
  • 2 x NICs for vMotion traffic
  • 2 x NICs for Service Console
  • 2 x NICs iSCSI/NFS storage

So what's changed in the vSphere 4 world?

For our new deployment, I plan to use ESXi instead of ESX. VMware have stated their intention is to move to the ESXi hypervisor for future releases, and we have no legacy requirement to run the service console in our environment. On first glance, this would appear to remove the need for two of those ports in the above list.

But the question arises, if the Service Console is not present in ESXi, what IP address and interface does the vSphere client connect to? Well, the IP address is that assigned by the sysadmin to the host using the ESXi text mode menu (not sure what the proper terminology is for that). The interface is actually a VMkernel interface.

Replicating the ESX environment configuration in ESXi would therefore look like:

  • VMkernel for Administration Network
  • VMkernel for vMotion
  • Virtual Machines Network

Assuming each of the above has redundant connections, we still need six NICs, although if you are limited to four NICs you could apply the same approach as with ESX 3.x and combine the Vmotion and Administration networks into a single vSwitch.

If you plan on using NFS storage or iSCSI, you will need another VMkernel interface for storage, so add another couple of ports.

One of the new features in vSphere 4 is Fault Tolerance (FT). This features ideally needs a dedicated network between hosts, so that takes the total number of physical ports up to 10:

  • 2 x NICs for VMkernel Administration Network
  • 2 x NICs for VMkernel Vmotion
  • 2 x NICs for Virtual Machine Network
  • 2 x NICs for VMkernel NFS/iSCSI
  • 2 x NICs for VMkernel FT
The above example only accounts for a single vSwitch for Virtual Machine traffic. If there is a reason for a second vSwitch with VM traffic (e.g., you want to segment a DMZ onto a physical network), additional NICs will be needed. Obviously doing this will cause the number of NICs to increase.

Conversely, if your server doesn't support 10 NICs, some sharing of physical NICs / vSwitches will be required.

Our environment only supports 6 NICs per server and we don't use iSCSI. Our NFS usage is limited to ISO datastores so this can be shared with the Administration and vMotion networks, so the approach we'll probably take is:
  • 2 x NICs for VMkernel Administration Network, VMkernel vMotion and NFS network
  • 2 x NICs for Virtual Machine Network
  • 2 x NICs for VMkernel FT
This post is based on each NIC having 1Gbit ports. As the 10Gbit NIC becomes more popular, the network design approach might change. Something to think about for a future post...

Saturday, 19 September 2009

Adding a SATA disk to ESXi

Having freed up two 500GB SATA disks from my storage server, I wanted to put them in the ESXi server. Although my original intention for this box was to have an essentially disk-less system (USB key boot of the hypervisor only), the reality is that I've not got enough bays in the storage server and don't want to waste two perfectly good disks.

I've also got a little project in the back of my mind that could make use of these disks...

I put the disks in the server and booted ESXi. Using the VI client, I noticed that the disks were recognised, but when I went to add storage, I could select the disk

"Error during the configuration of the host: failed to get disk paritition information"

I booted off the CentOS disk and selected "linux rescue" and destroyed the partition table using fdisk. I wrote these changes and confidently rebooted.

I got the same error.

From the ESXi menu, I viewed the configuration logs and messages and noticed it was reporting the following:

Warning: /dev/sda contains GPT signatures, indicating that it has a GPT table. However, it does not have a valid fake msdos partition table, as it should. Perhaps it was corrupted - possibly by a program that doesn't understand GPT partition tables. Or perhaps you deleted the GPT table, and are now using an msdos partition table. Is this a GPT partition table?

I wasn't aware of what a GPT signature is/was, but it was obviously something that fdisk didn't overwrite. Some googling later suggested the problem could be solved by completely overwriting the start of the disk.

Back into the Linux rescue mode and some dd action (sledgehammer approach perhaps...):

# dd if=/dev/zero of=/dev/sda bs=1M count=1

Rebooted again and this time the disk is selectable and I could add the datastore. Repeated for the second disk and now I've got an additional 1TB of storage for VMs (albeit unmirrored, but that's fine in this non-production environment).

For those unfamiliar with dd, it's a fairly low level command that can copy raw data. The if= specifies the input file, the of= specifies the output file. In the above example, /dev/zero is a special Unix "file" that returns zero when read, and /dev/sda is the disk device I'm writing to. The bs= specifies the size of the block (1M = 1 megabyte) and count= specifies the number of blocks to read. So the above reads 1 block of 1MB size from /dev/zero (effectively 1MB of the "0" character) and writes this out to the disk, starting from the very beginning and overwriting everything there (which includes the partition table).

And this is why an understanding of Unix/Linux can be very useful, even if you don't do Unix stuff in your day job... :-)

Sunday, 13 September 2009

SAN/NAS upgrade

All of my important data is stored on my OpenSolaris storage server (an HP ML110 G5). A mirrored pair of 500GB in a ZFS Zpool provided NFS, CIFS and iSCSI sharing. Unfortunately, I ran out of space to the point that ZFS was unable to take snapshots.

I needed to add more storage, but didn't have the drive bays available to do it. So I ordered two 1TB SATA disks with the intention of replacing the two existing disks.

I followed the instructions found at Blog O Matty (a blog I highly recommend). The process was extremely easy:
  1. Remove one of the 500GB disks and replace with a 1TB disk.
  2. Tell ZFS to "resilver" (aka resync the mirror) the new disk (one command: zpool replace datapool c3d0)
  3. Wait a number of hours for the disk to resilver (10 hours when the disks are being used)
  4. Tell ZFS to clear all error status messages (zpool clear datapool). This puts the pool into an "ONLINE" state for all devices in the pool.
  5. Remove the second 500GB and replace with the second 1TB disk
  6. Tell ZFS to resilver onto the second disk
  7. Wait for this second disk to resilver. I did this overnight and it was finished in the morning.
  8. Tell ZFS to clear the error status on the new disk
  9. Check the zpool status (zpool list) and note the new size: Now 928GB
The ML110 does not have hot-swap disks, so I needed to power off each time I swapped the disks, but if you have a hot swap capable server, the entire process can be done live with the filesystems mounted. Nice.

With approximately another 500GB free space, I can now experiment with other hypervisors (XenServer and Hyper-V will probably be the first if I can get them working off a USB bootable key drive). I also took the opportunity to add more memory (8GB total) to both the OpenSolaris server and the VM server (the HP ML115). While I was spending money, I also paid out for another 500GB external USB. This means I can now take a backup of the key filesystems (photos, documents etc) and ship them to an off-site facility (aka, my parents house).

Coincidently, T had filled up her C: drive with a huge number of photos and videos. Although the disk is backed up to an external drive, I wanted to move the data to the server. I created a new filesystem in ZFS:

# zfs create datapool/Users/teresa

This filesystem can grow and consume all space in the pool, so I assigned a quota of 30GB:

# zfs set quota=30G datapool/Users/teresa

In order to make this visible to T's Vista PC, I had to share the filesystem over SMB:

# zfs sharesmb=on datapool/Users/teresa

I made sure that T had a Solaris account setup with a password configured so it could authenticate and then mapped the network drive. The UNC path replaces the slashes in the filesystem with underscores: \\opensolaris\datapool_users_teresa

I copied T's documents to the server by changing the locations of the profile shell folders (right-click "Documents", "Pictures", "Videos" etc and select properties, then specify a new location and the contents are moved across - very easy).

It was then I found even more pictures that needed to move across and the 30GB I had allocated to the filesystem was going to be tight in the long term. This was trival to fix:

# zfs set quota=40G datapool/Users/teresa

The change applied instantly and the network drive size increased to 40GB.

It's good to have all our personal data now stored on the ZFS filesystem, with full mirroring, checksumming and backed up.

The now-redundant 500GB disks will be assigned to another blog post...

Friday, 11 September 2009

Upgrading the EeePC 701 to Eeebuntu

I don't tend to use my EeePC 701 4G very much; there's not much point when you have a pretty well setup PC and network. But when it comes to going on holiday, the Eee is a must-pack luggage item.

T and I have just been away for a week in Corfu. Weather: hot. Hotel Wi-Fi: not bad and free to use (guess which is the most important criteria... :-))

It was when using the Eee on holiday that I realised how dated the default Xandros-derived distro is. Some websites even encouraged us to to upgrade to a later release of Firefox. So upon returning, I purchased a 2GB RAM upgrade (from the default 512MB), an 8GB SD card to store my files on and a 4GB USB stick with which I installed Eeebuntu.

I've never been a serious Ubuntu user (or any of its derivatives), being quite happy with OpenSUSE, so installing Eeebuntu has been interesting. Fortunately the website had some decent documentation on building an install USB key (since the Eee doesn't have a CD drive). Once that was set up, it was simply a matter of booting the Eee off the USB stick and following the prompts.

The result is a modern, GNOME-based distro that can take advantage of all the Eee functionality including the Wi-Fi and webcam. It's also a very smart-looking setup with Compiz working out of the box. I took the opportunity to add some extra software that might be useful in the future, including Wireshark and Nessus.

I'm not going to pretend that the Eee is going to be my new, main machine, or that it will be heavily used on a daily basis, but it's a very capable little computer that will be far more useful with the updated OS on it. My initial foray into Eeebuntu has also been very positive. If you're looking to get something better than the default, dated Xandros version, it's worth a look.