Tuesday 27 March 2012

EMC VNXe - diving under the hood (Part 1: Intro)



Recently we took delivery of an EMC VNXe 3100. The purpose of this unit is to provide a low cost alternative to our NetApp FAS2040, primarily for backups but also as a VMware datastore for some smaller workloads. Here is is:



If you want a decent primer on the VNXe, check out Henriwithani's blog.

The VNXe is an interesting unit and this mini series will look at what the VNXe 3100 provides and how it works. In order to understand some of the thinking behind the design, we need some background.

First, some history...

(It's worth stating here at the start that I'm not an expert on CLARiiON or Celerra and the below is my understanding based on reading the documentation. If I've made errors, please correct me!)

Go back a couple of years and EMC had two product lines for the midrange: CLARiiON and Celerra.

CLARiiON was their block storage array capable of supporting Fibre Channel and iSCSI. Each controller in a CLARiiON ran an operating environment called FLARE which actually ran on top of a Windows XP Embedded kernel.

Celerra took a CLARiiON, added one or more “X-Blades” (aka “Datamovers”) and a 1U, Linux-based “Control Station”. The Datamovers were basically NAS head units and added support for NFS and CIFS/SMB. They could also do iSCSI, albeit in a somewhat more abstracted way than running native block on CLARiiON. The operating system running on the Datamovers was a UNIX-lilke derivative called DART.

More information on Celerra’s architecture here.

In January 2011, EMC announced the successor to both CLARiiON and Celerra: The VNX. At the same time, the VNXe was announced as an entry-level sibling to the VNX. The VNX appears to be a fairly straightforward upgrade to CLARiiON and Celerra: Faster, bigger, more features etc.

However, the VNXe sports a new software architecture running on an “execution environment” called CSX. From what I can tell by looking at the VNXe support logs and poking around the command line, CSX runs on a Linux kernel. Relevant parts of the FLARE and DART stacks have been ported to run on CSX and each Storage Processor (SP) (aka controller) in the VNXe runs an instance of the Linux/CSX/FLARE/DART stack.

(More information on VNXe here.)

So what you get in a VNXe is a fusion of bits of FLARE and DART, but mixed up in a new way.

The VNXe configuration

Because it is aimed at the non-storage administrator, a lot of the technical details in the VNXe are hidden. This is a bit frustrating for those of us who like to know how things work, so I’ve tried to dive under the hood a bit.

The base VNXe 3100 is a 2U unit and has 12 drive bays and up to two controllers. Each controller (Storage Processor, or “SP” in EMC language) has a dedicated management Ethernet port, and two ports for data called Eth2 and Eth3. There is an option to install a SLIC module in each controller that adds an additional 4 x 10/100/1000Mbit Ethernet ports (copper). We didn’t buy this expansion, so I can’t comment further on those.

With two controllers, the VNXe 3100 has the ability to support up to 96 drives through the addition of Disk Array Enclosures (DAEs). The DAEs connect to the base unit via 6Gbps SAS.

As an entry level system, the VNXe has some limitations in terms of disk configuration options.  Disks are sold in packs and are designed to be added in one of the following ways:

  • 300GB SAS: 5 pack in RAID5 (4+1)
  • 300GB SAS: 6 pack in RAID10 (3+3)
  • 600GB SAS: 5 pack in RAID5 (4+1)
  • 600GB SAS: 6 pack in RAID10 (3+3)
  • 1TB NL-SAS: 6 pack in RAID6 (4+2)
  • 2TB NL-SAS: 6 pack in RAID6 (4+2)

The version we purchased has two controllers and 12 disks: 6x300GB SAS and 6x2TB NL-SAS.

Once installed in the system, the 6x300GB disks can be either configured as “High Performance” which is RAID10 (3+3), or as “Balanced Performance/Capacity” which is RAID5 (4+1) leaving 1 disk as a hot spare. We opted for the latter.

The 6x2TB NL-SAS can only be configured as RAID6 (4+2). Fine for our purposes (backups), but worth knowing as some use cases may require different configurations.

However, it's not as simple as popping in some disks and off you go (well, it is that simple, but there's a lot going on underneath!).

To understand what the VNXe does with the disks requires another background history lesson because there is a bunch of terminology carried over from both CLARiiON and Celerra.

CLARiiON terminology

The CLARiiON works on block storage in the following way:

Physical disks are added to a CLARiiON and a RAID set is created (such as a RAID5 4+1 configuration). The result is call a "RAID Group". Traditionally, up to 16 disks could belong to a single RAID Group.

From within the RAID Group, FLARE (the CLARiiON operating environment) would create LUNs. The most basic type of LUN is the cunningly named "FLARE LUN". Some FLARE LUNs have special purposes and are used internally (for snapshots, logging etc.). These are called Private LUNs.

The problem with a FLARE LUN is that you are limited to the maximum size of the RAID Group (16 disks worth of capacity minus overhead for parity or mirroring). To overcome this, EMC invented the MetaLUN. This construct combines multiple FLARE LUNs together either through striping or concatenation.

For the sake of completeness, later releases of FLARE introduced the concept of a Storage Pool, out of which Thick LUNs (with reserved space) and Thin LUNs (with no reserved space, also known as Virtual (aka Thin) Provisioning) could be created.

FLARE provides some additional features such as compression (which moves the LUN to a pool, converts it to a Thin LUN and compresses it) and Fully Automated Storage Tiering Virtual Provisioning (FAST VP) which combines multiple disk types (such as Flash, Fibre Channel and SATA) into a single pool and dynamically moves hot data to the fast disks). Pretty clever stuff.

Regardless of the type, each LUN is assigned to a preferred owning Storage Processor (SP), allowing the storage administrator the ability to manually balance LUNs for maximum performance. In the event of a controller failure, the peer controller would take ownership. The LUNs are then mapped to be visible to hosts.

Okay, so that's CLARiiON, what about Celerra?

Celerra terminology

To the CLARiiON, a Celerra is a host to which LUNs are presented. These LUNs must be configured in a specific way to be understood by the Celerra operating system, DART.

The LUNs that CLARiiON present are seen as single disks by the Celerra (regardless of the number of underlying physical disks that comprise the LUN). These disks are sometimes referred to as "dvols". You should assume a 1:1 mapping between CLARiiON LUNs and Celerra dvols.

In addition to the disk (dvol), Celerra offers some additional I/O constructs: A "slice" is a part of a dvol, and a single disk can be partitioned into many slices. Similarly, a "stripe" is an interlace of multiple dvols or multiple slices. The "meta" (not to be confused with a CLARiiON MetaLUN) is a concatenation of multiple slices, stripes, disks or other metas.

Finally, a "volume" is created. This is a block I/O construct into which a filesystem is created. And it's these filesystems that are made visible to hosts. The default filesystem is uxfs and is used by both NFS and CIFS fileservers.

As the above illustrates, the path between physical disk and filesystem is pretty complicated. How the VNXe is derived from this messy lineage will be the subject of part 2 (coming soon).

No comments: