MySQL Database Maintenance

Lately I have been working a lot with mySQL.  I’m not doing anything like Facebook does with their 9000+ MySQL instances but I am dealing with some fairly steady database load issues that require proper attention when scaling up the underlying website.

The server itself is running on some fairly decent hardware with the main databases using SSD drives for primary storage.  Of course, I don’t fully trust cellular storage yet for databases and a backup plan is in order.  Since this particular database is using the MYISAM engine it drastically affects the options that I have for locking the database.

Performing a mysqldump on this database will effectively lock every table and cause live transactions to queue up while the dump is occurring   On a dB that is 74GB and growing fast I can’t just dump the data and freeze up the system while everybody is using it.  For this reason I decided to do some automated table archiving before doing the dumps.

1. Archive old data using a script (MYSQL w/ PHP)

2. Optimize tables if they need it

3. MySQL Hotcopy the database and flush logs (MySQL w/ bash)

4. Dump the copied database using a script every hour to tier 2 storage

5. Backup the dumps every 4 hours to tier 3 storage/backups
1mbschematic

In this case there is a lot of historical transaction data, statistics and messaging history that is really not referenced much after its time has come and gone.  We still want to keep it but we don’t really need it consuming resources in the main production database.  I decided we should create a logging database and send over the data that was older than some arbitrary time period.  I achieved this by declaring a new database and used the following SQL command for each table:

CREATE TABLE IF NOT EXISTS log_database.stats_table LIKE prod_database.stats_table;

The result is an identical schema copy of the source database. Sweet! Now on to the next, step data movement.  This was achieved with another SQL statement:

SELECT stat_id FROM prod_database.stats_table WHERE stat_added < (NOW() – INTERVAL 1
HOUR) ORDER BY stat_id DESC LIMIT 0 ,1

The result of the above statement will give me the key in that table that is older by about an 1 hour.

So I have my key and I have my target table… now to move the data across; I will do the old heave ho using the REPLACE command.  Its the best choice here because I might end up in a situation where a duplicate key exists in the target database.  In this case I want the same numbering system and I would like to update the data and the REPLACE command is perfect for that; definitely better that SELECT INTO.

I will move my data like this:

REPLACE INTO log_database.stats_table SELECT * FROM prod_database.stats_table WHERE prod_database.stats_table.stat_id < MY_KEY_FROM_ABOVE

DELETE FROM prod_database.stats_table WHERE prod_database.stats_table.stat_id < MY_KEY_FROM_ABOVE

After deleting all that data I should optimize it and clean up the fragmentation.

OPTIMIZE TABLE prod_database.stats_table

I took the above statements and wrapped it all in a PHP script.  I can use a PHP script to run as a webpage or standalone as a command line script.  I used the web server for convenience to graphically see what I was doing while capturing all of the output into a single file that could be displayed (web mode) or dumped to a file (shell mode).

Using the method above I was able to take the database from 74GB down to about 7GB.  Thats a substantial memory savings. Happy with my progress I wrote the database rotation script.  The script locates all of the necessary tools including the credentials, mysql, gzip, mysqldump, grep, cut, rm, date and cat as well as an configuration file containing a list of the databases I want to dump.  The script cleans up any old backups (older than a day) and gets rid of them.  I strongly recommend that you use absolute paths in any delete scripts.  In this case I am recursively deleting folders with date patterns… I took my testing very seriously.

The script was a classic #!/bin/sh shell script and works great.  It gets rid of old copies and dumps each database while flushing the logs on command; it even has an option to detect and skip specified tables.  So I decided to try it on my test system and watched it work nicely.  It takes about 20 seconds to dump the test system but alas the test system is way smaller than production system which has 5GB more data.  Quick math showed my that I was in for a 60 second production outage with the database’s current size… and it is growing.  To be effective my maintenance program needs to run frequently.

In steps mysqlhotcopy which is a perl utility that does a hot backup of a database with minimal downtime.  It doesn’t rely on replication or anything fancy but rather is a replication mechanism.  It will still lock the database but only briefly; long enough to lock the tables and flush the logs.  It makes sense to flush when the hotcopy is complete.  From here I take the static hot copy and use my dump script on it.  Works great and no impact to the users.  I have enough space to run a hot copy until the dB reaches about 90GB and then I should revisit my disks.

Everything is in place and working.  I used cron to run my tasks and got the data center guys to ensure a good backup of the files is done on a frequent schedule.  I record the event and email myself when it is done.

I recommend that everybody does a disaster recovery test after they implement a system like this.  While you are doing the disaster recovery make note of the critical steps and keep them in an accessible place. You never know when disaster will strike next or what shape it will be in but you can rest assured that have ensured minimal data loss and minimal downtime. In my spare time I intend on wrapping these procedures and creating a service to constantly manage, backup and report the database.

For all the critics out there that think I should just use a binary backup; you are correct. I also do have a binary backup but I don’t trust it completely.  It seems to do the job fairly well until it doesn’t… and I have a less than one hour proper backup to rely on.

~JCF