Watch for Momentary Monitoring

One of the things I preach about a lot is good monitoring of your database servers; having tools in place to tell you both what good looks like and when things go bad is critical for large scale success. But sometimes you just need to monitor a momentary process, where setting up a check in your normal monitoring software is overkill. In these cases one tool that can help out is the watch command. Case in point, the other day I needed to back up a fairly large partitioned table (about 1.3TB on disk). The plan? A quick little script to pg_dump each of the partitions (about 325). Feed the script through xargs -P so I don’t swamp the box, but I get some concurrency out of things. And of course, I planned to run the whole thing in screen session. But dumping this much data will take some time, so how to check on the progress? When working on databases, one of the most natural things to me is to whip up some SQL to see what going on inside my database. Then you pipe that through watch, and you have some quick and simple monitoring. This example happens to be on postgres, but you could do it with any database’s command line program.
watch -n 5 'psql -U robert -h mk -d wario -c "select procpid, waiting, query_start, current_query from pg_stat_activity where current_query ~ '\''^COPY'\'' ORDER BY procpid"'
which gave me some output like:
Every 5.0s: psql -U robert -h mk -d wario -c "select procpid, waiting, query_start, curren... Mon Apr 05 15:22:03 2010 procpid | waiting | query_start | current_query ---------+---------+-------------------------------+------------------------------------------------------------------------------------------- 12706 | f | 2010-04-05 15:21:34.565754-05 | COPY wario.tblhits_p2005_1010 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout; 12728 | f | 2010-04-05 15:21:39.563847-05 | COPY wario.tblhits_p2005_1017 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout; 12732 | f | 2010-04-05 15:21:43.43266-05 | COPY wario.tblhits_p2005_1024 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout; 12735 | f | 2010-04-05 15:21:48.385349-05 | COPY wario.tblhits_p2005_1031 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout; (4 rows)
Here I can see all of my COPY processes running, and I’ve selected out what I think are the most relevant bits, like when each COPY started and if they are waiting on any locks. (Luckily I had named my partitions pretty obviously to know which ones were being worked on. Also, I threw in the order by clause so that as each piece finishes and the next one starts, they will cycle through the output. It’s a little hard to get the full effect without a screencast (sorry, too lazy for that), but the next bit of the cycle would look something like this:
Every 5.0s: psql -U robert -h mk -d wario -c "select procpid, waiting, query_start, curren... Mon Apr 05 15:22:19 2010 procpid | waiting | query_start | current_query ---------+---------+-------------------------------+------------------------------------------------------------------------------------------- 12728 | f | 2010-04-05 15:21:39.563847-05 | COPY wario.tblhits_p2005_1017 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout; 12732 | f | 2010-04-05 15:21:43.43266-05 | COPY wario.tblhits_p2005_1024 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout; 12735 | f | 2010-04-05 15:21:48.385349-05 | COPY wario.tblhits_p2005_1031 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout; 12836 | f | 2010-04-05 15:22:10.746129-05 | COPY wario.tblhits_p2005_1107 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout; (4 rows)
Notice the new line starts at the bottom and everything else shifts up. This worked well for this script, but watch has other options like highlighting changes and other goodies that come in handy. So the next time you need to keep an eye on something, pair up a little bit of SQL with watch for some quick and easy monitoring.