Wednesday, 15 August 2007

Managing Brightstor ARCserve Backup

At work, we use CA's Brightstor ARCserve Backup (BAB) to backup our network of Solaris, Linux, Windows and AIX servers. The backup is done on a Solaris 10 server running version 11.5 of the server software.

We connect to a 40 tape Plasmon library equipped with two LTO2 drives (although we only use 1 tape drive at the moment and allocate only 20 slots to the backup server). This connects to our SAN using an ATTO FibreBridge 2300 (SCSI in one end, Fibre Channel out the other).

BAB is fairly good at what it does, but the FibreBridge is a bit dodgy and occasionally crashes. Something it did last Friday night.

Like many organisations, we perform a full system backup on a Friday night, and then use differentials (level 1) backups Monday through to Thursday. Monthly tapes are then stored off site.

One issue I've been fighting with is that BAB stores it's backup catalogs in an Ingres database. Each file that is backed up gets a row. We've been running the backup for about 4 months and now have 61 millions rows in one of the tables.

In order to reduce the size of the database, BAB includes a "prune" option which removes the details of old jobs from the database. I configured this to run, but then noticed that a number of the text catalogs from previous jobs had not been imported into the database (BAB writes these during the backup and then loads them into the database after each session using the MergeCat utility).

So most of yesterday was spent completing the MergeCat (check /opt/CA/BrightstorARCserve/dbase/tmpcat to see what needs importing) and then running the backup last night. I have now just kicked off the dbclean to purge the jobs.

To make things a bit faster, I did the following:

By default, the Ingres configuration only had one write behind thread. This causes a bottleneck as the log buffers filled up before they were being written out. I've increased this to 20.

The dbclean does a bunch of "delete from astpdat where sesid = x" transactions and if the table is big, the transaction log file will fill. I'm currently running a 5GB transaction log file.

The logging system was still choking. Running logstat -statistics showed that I was getting a lot of log waits and log split waits. I increased the buffer count to 200 (from 20) and changed the block size from 4k to 8k (reformatting the transaction log file in the process).

The biggest performance gains came from dropping the indexes. There are three created by default, so every insert or delete also has to update the indexes. In order to speed things up, I dropped all indexes during the MergeCat, and recreated only ix_astpdat_1 for the dbclean (ix_astpdat_1 is keyed on the sesid which is used during the delete - without it would be a nightmare as astpdat is a heap table).

This is causing things to motor along now, although I still want to do some cache analysis to see whether increasing the DMF cache will improve things.

Once the dbclean completes, the only thing remaining is to modify the database (and hopefully reclaim a load of space as heap tables don't compact when rows are deleted) and then recreate the indexes.

No comments: