Saturday, 16 January 2010

OpenSolaris: Very slow boot times

Today I lost power to my servers due to a power outage. The UPS wasn't up to coping and almost instantly died (I was running five computers on the one, small UPS...).

Booting the OpenSolaris server back up reminded me of the painfully slow boot times that can occur. We're talking *hours* to get the server up.

The reason for this is due to the number of ZFS snapshots on a system. Here's the experiment:

I booted off the OpenSolaris 2009.06 CD and ran the format command. This displayed the disks on the system. I then imported the zpools into the running installation:

# zpool import rpool -f
# zpool import datapool -f

The -f is required because the system thinks the zpools have been assigned to another server (useful if the zpool is on a SAN LUN). The first command was relatively quick, the second was much, much slower.

Running prstat revealed that devfsadm was consuming an entire CPU. The purpose of devfsadm is to dynamically add and remove devices on the system. It was stating each of the snapshots in the datapool and creating entries in /dev. After running for a few hours, it had created over 4000(!) devices in /dev/zvol/dsk/datapool and /dev/zvol/rdsk/datapool.

The number of snapshots is thanks to the automatic snapshot service which takes frequent snapshots of the filesystems in the pool. This list is not automatically cleared down, so can grow huge. Not a problem usually because the uptime of OpenSolaris is fantastic, but is a real pain when you need the server to boot.

So, in order to keep your OpenSolaris boot times down, keep an eye on the number of snapshots on your system.

No comments: