Friday 10 July 2009

Sun StorageTek 2540 and ESX troubleshooting

We experienced a few issues with the StorageTek 2540 array that forms the core of our SAN recently. The symptom was that the array flagged itself as being in a degraded state and that one or more volumes were not assigned to the preferred controller.

The first step was to upgrade the SAN firmware and Common Array Manager (CAM) software to the latest release. Despite this, we observed the problem again. Further digging into the problem found that the failover was happening when we performed a LUN rescan under VMware ESX.

My previous understanding was that there were essentially two types of arrays: active/active and active/passive. In the active/active configuration, both controllers in an array can service I/O requests to a specific volume concurrently. In an active/passive configuration, one [active] controller handles the I/O with the second [passive] controller sitting idle, only servicing I/O if the active controller fails.

I understood the StorageTek 2540 to be an active/passive array; it is only possible to assign a volume to one controller at any time. However, in order to improve the throughput of the array, different volumes can be assigned to different controllers. For example, a volume “VOL1” might be assigned to controller A as its active controller and to controller B for its passive controller, while volume “VOL2” might be assigned to controller B as its active controller and controller A as its passive controller.

It turns out that things are more subtle than this; there is a third type of array configuration: asymmetric.

The asymmetric configuration follows the active/passive model in that only one controller is servicing I/O for a specific volume at any time, but extends this by allowing I/O operations to be received by the second controller. If this happens, the array will automatically failover the volume to the second controller to service the request. This process is called Automatic Volume Transfer (AVT). If the first controller then receives I/O operations, the AVT moves the volume back.

Yes, this could cause some flapping between controllers. It can also cause I/O stalls as the controllers fail across.

Some of the array initiator types (such as Solaris with Traffic Manager (aka MPxIO)) disable AVT, others, including the Linux initiator that we’ve used on our VMware hosts, have AVT enabled.

So the problem we’re having appears to be caused by the array failing over a volume to its second controller. But why is it doing this? The only configuration I had performed on the ESX side was to ensure the multi-pathing option was set to Most Recently Used (MRU); the correct setting for active/passive arrays. What appears to have happened is that when booting, the ESX servers are not mapping to a consistent path. Out of our five ESX servers, three were setting one controller as active, while the other two servers were setting the second controller as active. Presumably, when one of the hosts (that has the wrong active path) performs a scan, the request is sent to the failover controller which invokes AVT and fails over the volume.

How to fix?

Sun have told me that the next version of CAM, due in a few weeks, will include a “VMware” initiator type which will disable AVT. This will negate the need to perform the NVSRAM hack in VMware’s Knowledge Base, but will require a firmware upgrade.

In the meantime, it might be a case of just ensuring that all the ESX hosts are using the same path to connect to each volume. This is all theory as I’m still working this out, but at least it’s all starting to make sense.

Although not specifically VMware or 2540 related, the following links provide some interesting reading around the subject:

Sun discussion forum thread about preferred and owner controllers

Linux kernel mailing list post detailing a bug experienced with multipath and asymmetric arrays

No comments: