Saturday, 26 September 2009

ESX/ESXi networking best practices

VMware Virtual Infrastructure 3 had some fairly well defined best practices for networking. There are three types of network connection that can be setup in an ESX host:
  • Service Console
  • VMkernel for vMotion
  • Virtual Machines network
Although it is possible to have a single physical NIC connecting to a single vSwitch with the above port groups configured, this is not a good approach because the NIC becomes a single point of failure. Adding a second NIC helps and teaming the two physical NICs provides redundancy, but you could still experience performance problems when a bandwidth intensive operation occurs (such as a vMotion) as VM traffic could suffer from a lack of bandwidth.

One solution is to add additional NICs to the vSwitch and hope that the increase in bandwidth is sufficient so that vMotion traffic does not impact on connected users, but there is another potential issue: vMotion traffic is unencrypted, and the content of a vMotion operation is the memory of a virtual machine. The problem here is that if a malicious user is able to eavesdrop on the connection, they might be able to access sensitive data. Using separate VLANs help, but you're still effectively crossing your fingers and hoping everything will be okay.

The safer approach is to separate the Virtual Machine traffic from the vMotion traffic using a separate vSwitch and assigning each vSwitch two physical NICs for redundancy. This ensures that the vMotion traffic is physically isolated from the VMs.

So where to put the Service Console? It is possible to assign it to either the Virtual Machine network vSwitch or the vMotion VMkernel network vSwitch. It's worth pointing out that Virtual Center requires access to the Service Console, so depending on where you run Virtual Center, this might impact on which vSwitch you assign the Service Console. By placing the Service Console on a separate vSwitch to the Virtual Machines network helps to reduce the ability of malicious users of hacking the SC. It's common (especially in environments where servers have 4 physical NICs) to find a configuration where one vSwitch is dedicated to VMs, and the second vSwitch shares the Service Console and vMotion VMkernel port groups.

A four NIC server configuration could look like:

  • 2 x NICs for Virtual Machine traffic
  • 2 x NICs for vMotion traffic and the Service Console

If six NICs are available, the configuration could look like:

  • 2 x NICs for Virtual Machine traffic
  • 2 x NICs for vMotion traffic
  • 2 x NICs for the Service Console
Although the Service Console doesn't require much bandwidth, some sites perform backups from within the SC which can have a significant network overhead.

What about non-Fibre Channel storage?

For users with iSCSI or NFS, things are slightly more complex. The VMware iSCSI initiator lives inside the Service Console but also requires a VMkernel network for actual storage traffic. It would be logical therefore to put these two port groups on the same vSwitch.

Both iSCSI and NFS traffic should be assigned to a separate network to increase security and guarantee that bandwidth is available regardless of whether a vMotion or heavy VM traffic is occuring. Two additional NICs should be allocated, resulting in the following six NIC configuration:

  • 2 x NICs for Virtual Machine traffic
  • 2 x NICs for vMotion traffic
  • 2 x NICs for Service Console and iSCSI/NFS storage

Given an additional two NICs (8 total), the following could be configured:

  • 2 x NICs for Virtual Machine traffic
  • 2 x NICs for vMotion traffic
  • 2 x NICs for Service Console
  • 2 x NICs iSCSI/NFS storage

So what's changed in the vSphere 4 world?

For our new deployment, I plan to use ESXi instead of ESX. VMware have stated their intention is to move to the ESXi hypervisor for future releases, and we have no legacy requirement to run the service console in our environment. On first glance, this would appear to remove the need for two of those ports in the above list.

But the question arises, if the Service Console is not present in ESXi, what IP address and interface does the vSphere client connect to? Well, the IP address is that assigned by the sysadmin to the host using the ESXi text mode menu (not sure what the proper terminology is for that). The interface is actually a VMkernel interface.

Replicating the ESX environment configuration in ESXi would therefore look like:

  • VMkernel for Administration Network
  • VMkernel for vMotion
  • Virtual Machines Network

Assuming each of the above has redundant connections, we still need six NICs, although if you are limited to four NICs you could apply the same approach as with ESX 3.x and combine the Vmotion and Administration networks into a single vSwitch.

If you plan on using NFS storage or iSCSI, you will need another VMkernel interface for storage, so add another couple of ports.

One of the new features in vSphere 4 is Fault Tolerance (FT). This features ideally needs a dedicated network between hosts, so that takes the total number of physical ports up to 10:

  • 2 x NICs for VMkernel Administration Network
  • 2 x NICs for VMkernel Vmotion
  • 2 x NICs for Virtual Machine Network
  • 2 x NICs for VMkernel NFS/iSCSI
  • 2 x NICs for VMkernel FT
The above example only accounts for a single vSwitch for Virtual Machine traffic. If there is a reason for a second vSwitch with VM traffic (e.g., you want to segment a DMZ onto a physical network), additional NICs will be needed. Obviously doing this will cause the number of NICs to increase.

Conversely, if your server doesn't support 10 NICs, some sharing of physical NICs / vSwitches will be required.

Our environment only supports 6 NICs per server and we don't use iSCSI. Our NFS usage is limited to ISO datastores so this can be shared with the Administration and vMotion networks, so the approach we'll probably take is:
  • 2 x NICs for VMkernel Administration Network, VMkernel vMotion and NFS network
  • 2 x NICs for Virtual Machine Network
  • 2 x NICs for VMkernel FT
This post is based on each NIC having 1Gbit ports. As the 10Gbit NIC becomes more popular, the network design approach might change. Something to think about for a future post...


Fabiano Silos said...

What about jumboframe?

Henrik said...

We agree completely on the need for many nics: