Last week I had the opportunity to do some work with the Live Upgrade feature of Sun Solaris. I had been vaguely aware of it's capabilities and we had been including a provision for it on our customer server builds, but it was only yesterday that I sat down and tried to do an upgrade to the latest Solaris 10 update 8.
Live Upgrade is a capability where the system administrator can upgrade to a new version of Solaris while the existing operating system in running. The only downtime experienced is a scheduled reboot at the end of the process to initialise the new version. If something goes wrong, the original version of the OS is still available for booting.
The way it works is based around the concept of a "boot environment". The default environment is the operating system you're running at the moment. On our servers, we have been creating a 20GB root (/) filesystem. A second 20GB slice is also created, but not used (nominally mounted as /lu so we remember it's there).
The first step in running the live upgrade is to create a new boot environment. Firstly, the /lu partition was unmounted, and commented out in /etc/vfstab. Once that was done, the new boot environment was setup:
# lucreate -n osupgrade -m /:/dev/md/dsk/d30:ufs
Okay, so here we are creating a new environment called "osupgrade" and saying that the root ("/") filesystem will be installed on the device /dev/md/dsk/d30 and that the filesystem type will be UFS (it can also do ZFS but we didn't have the correct setup on my test system). This bit takes a while, depending on how much you have on your root filesystem.
For those unfamiliar with the /dev/md/ part, this is a Solaris Volume Manager (SVM) metadevice. In reality, "d30" is a mirror that contains two submirrors (probably called d31 and d32). These submirrors are comprised of one of more disk slices. In other words, in the above command, the new boot environment will be installed onto a new mirrored disk.
At the end of the lucreate command, you can actually look at the new, mounted boot environment and see that it's basically a copy of your existing root filesystem. The next step is to upgrade it. To do this, I mounted the install location of our Jumpstart server over NFS and initiated the live upgrade:
# luupgrade -u -n osupgrade -s /mnt/install_sol10_u8_sparc/
This bit takes a while (a bit like installing Solaris...) but basically upgrades the named boot environment using the media specified. At the end of this, all that needs to be done is for the boot environment to be activated:
# luactivate osupgrade
Before initialising the new environment, it's worth noting down your existing, working environment. For me, this was the root filesystem located on /dev/md/dsk/d10. Find out the underlying slices used by d10 (c2t0d0s0 and c2d1d0s0 in my case) Once done, reboot the server and the new boot environment should be loaded.
Now the coolness of this should be immediately apparent! Previously, operating system upgrades would require a backup of the system to tape (always a good idea!), followed by scheduled downtime as the system was upgraded "offline". This also meant a visit to a customer site, typically at a weekend.
Combined with the use of an ILOM interface (for network access to the console), it now becomes perfectly possible to upgrade a Solaris server during the day, while users are on the system. All that it required now is an out-of-hours reboot of the server to initialise the new release.
If there are problems with the upgrade, it's possible to rollback by setting the old boot environment to active. To do this, boot off cdrom or the network (I did the network), by typing the following at the PROM:
ok boot net -s
[wait for the OS to boot)
# mount -Fufs /dev/dsk/c2t0d0s0 /mnt
# /mnt/sbin/luactivate
Exit single user mode and reboot.
Obviously this is only scratching the surface of what Live Upgrade can do. It's possible to merge and split filesystems, detach and build disk mirrors, and much more. The use of Live Upgrade is also greater than the occasional update; it's perfectly possible to use Live Upgrade to apply system patches, with a very easy rollback capability.
Definitely a technology that needs to be investigated more fully...