Friday, 8 January 2010

Hands on Solaris IP Multipathing

Today I had a first look at IP multipathing on Sun Solaris. What is IP multipathing? It's a feature that can be used to provide additional network resilience to a server with multiple physical network interfaces. By ensuring that at least two interfaces are on the same subnet, IP Multi Pathing (IPMP) provides continuous uptime if one of the links goes down, migrating the IP address transparently over to the other interface. IPMP operates at the IP layer (layer 3) and does not do link aggregation (although Solaris does support this, I haven't tried that yet).

To test, I booted a Solaris 10 VM on VMware ESXi. I had assigned two NICs to the VM and ensured they were connected and plumbed in.

The first step is to create a "group" for the IPMP by assigning the interfaces e1000g0 and e1000g1 together. I unimaginatively called the group "mygroup".


bash-3.00# ifconfig e1000g0 group mygroup
bash-3.00# ifconfig e1000g1 group mygroup


I then assigned an IP address to each interface and brought the interface up:


bash-3.00# ifconfig e1000g0 192.168.192.105 up
bash-3.00# ifconfig e1000g1 192.168.192.106 up


The output of ifconfig -a looks similar to normal with the addition of the groupname flag:


bash-3.00# ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843 mtu 1500 index 2
inet 192.168.192.105 netmask ffffff00 broadcast 192.168.192.255
groupname mygroup
ether 0:c:29:f9:5d:e4
e1000g1: flags=1000843 mtu 1500 index 3
inet 192.168.192.106 netmask ffffff00 broadcast 192.168.192.255
groupname mygroup
ether 0:c:29:f9:5d:ee


With IPMP setup, I setup a continuous ping from another machine and then edited the VM in the vSphere client, disconnecting the second NIC from the network. Immediately, the following was reported in /var/adm/messages:


Jan 7 20:41:08 solaris10 in.mpathd[1208]: [ID 215189 daemon.error] The link has gone down on e1000g1
Jan 7 20:41:08 solaris10 in.mpathd[1208]: [ID 594170 daemon.error] NIC failure detected on e1000g1 of group mygroup
Jan 7 20:41:08 solaris10 in.mpathd[1208]: [ID 832587 daemon.error] Successfully failed over from NIC e1000g1 to NIC e1000g0


No packet loss so far. What does ifconfig -a now show?


bash-3.00# ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843 mtu 1500 index 2
inet 192.168.192.105 netmask ffffff00 broadcast 192.168.192.255
groupname mygroup
ether 0:c:29:f9:5d:e4
e1000g0:1: flags=1000843 mtu 1500 index 2
inet 192.168.192.106 netmask ffffff00 broadcast 192.168.192.255
e1000g1: flags=19000802 mtu 0 index 3
inet 0.0.0.0 netmask 0
groupname mygroup
ether 0:c:29:f9:5d:ee


While e1000g1 was still present, it now had a status of FAILED and was no longer UP. The in.mpathd daemon had initiated a new virtual interface, e1000g0:1, on the other interface and assigned the failed IP address.

With the test complete, I then reattached the NIC in the vSphere client and noted the following in /var/adm/messages:


Jan 7 20:41:56 solaris10 in.mpathd[1208]: [ID 820239 daemon.error] The link has come up on e1000g1
Jan 7 20:41:56 solaris10 in.mpathd[1208]: [ID 299542 daemon.error] NIC repair detected on e1000g1 of group mygroup
Jan 7 20:41:56 solaris10 in.mpathd[1208]: [ID 620804 daemon.error] Successfully failed back to NIC e1000g1


This looked good, and a final check of ifconfig -a showed:


bash-3.00# ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843 mtu 1500 index 2
inet 192.168.192.105 netmask ffffff00 broadcast 192.168.192.255
groupname mygroup
ether 0:c:29:f9:5d:e4
e1000g1: flags=1000843 mtu 1500 index 3
inet 192.168.192.106 netmask ffffff00 broadcast 192.168.192.255
groupname mygroup
ether 0:c:29:f9:5d:ee


This was so easy I don't know why I didn't do this years ago...

No comments: