The IO of Sauron
Tuesday, March 22nd, 2011I needed to know what the disk utilization on a system was, essentially at all times, with a granularity of one second. Asking for the current iostat every five minutes via munin wasn’t sufficient. So I wrote this munin plugin. It tries to read a file and report the average and maximum disk utilization or util% logged in that file. Then it unlinks the file and starts an iostat running every second for 250 seconds, recorded in that file, setting it up for the next time it’s polled.
It looks like this. I don’t think I have any reason to obfuscate the vertical axis of this picture unlike the #devops people. It’s the graph from an uninteresting system doing some periodic nightly batch work.

It has not escaped me that it reports average averages, maximum averages, average maximums, and maximum maximums.
Here’s the script.
!/usr/bin/perl
# iosmart by Shannon Prickett <shannon.prickett@gmail.com>
# get iostat data and provide max and average values since the last check.
use strict;
use warnings;
use File::stat;
use vars qw{ $argument $iostat_file $iostat_runs };
use subs qw{ print_header };
$iostat_file = '/var/tmp/iosmart';
$iostat_runs = 250; # normally 250, reduce when testing
$argument = $ARGV[0] || 'NONE';
if ($argument =~ /config/) { # hint to munin
# munin-run calls this to set up graphs, check limits
print <<EOM;
graph_title Extended iostat coverage
graph_vlabel utilization
graph_category disk
graph_info Collect iostat data every second
sda_max.label sda max util
sda_avg.label sda avg util
sdb_max.label sdb max util
sdb_avg.label sdb avg util
EOM
}
else { # do the work
if (( -e $iostat_file) && # it exists
( -r $iostat_file) && # we can read it
( -s $iostat_file) ) { # it has something in it
my $st = stat( $iostat_file ) or
die "can't stat $iostat_file: $!\n";
my $mtime = $st->mtime;
my $now = time( );
my $delta = $now - $mtime;
if ( $delta > 300 ) { # it's been >5 minutes
print "${iostat_file} is stale, using it anyway\n";
}
open( my $fh, '<', $iostat_file ) or
die "failed to open $iostat_file: $!\n";
my %devices;
my ($max, $skip_until, $stop_skipping);
READLOOP: while ( my $line = <$fh> ) {
next READLOOP unless ( $line =~ /^(sd\w).*?(\d+\.\d+)$/ ); # we only care about the disk rows
# we want to skip the first block of iostat output
# that's output since boot. we only care about now.
unless ( defined $skip_until ) {
$skip_until = $1;
next READLOOP;
}
my $device = $1; # per device values
my $util = $2; # keep the number from the last dolume
if (( ! defined $stop_skipping ) && # if we are still skipping
( $device ne $skip_until ) ) { # this isn't what we're looking for
next READLOOP;
}
else {
$stop_skipping = 1;
}
chomp $util; # we hates filthy newlines forever
if (( ! exists $devices{$device}{'count'} ) or # first time
( ! defined $devices{$device}{'count'} ) ) {
$devices{$device}{'count'} = 1;
}
else {
$devices{$device}{'count'} = $devices{$device}{'count'} + 1;
}
$devices{$device}{'sum'} += $util;
if (( ! exists $devices{$device}{'max'} ) or # we've never set it before
( ! defined $devices{$device}{'max'} ) or # the whole world is crazy
( $devices{$device}{'max'} < $util ) ) { # it's smaller than current
$devices{$device}{'max'} = $util;
}
}
for my $device (keys %devices) {
$devices{$device}{'average'} = $devices{$device}{'sum'} / $devices{$device}{'count'};
print "${device}_max.value $devices{$device}{'max'}\n";
print "${device}_avg.value $devices{$device}{'average'}\n";
}
close( $fh ) or die "can't close ${iostat_file}? wtf? $!\n";
unlink( $iostat_file ) or die "can't rm ${iostat_file}: $!\n";
print_header( );
# suppress an error about inappropriate ioctl by not testing exit. :(
system( "iostat -xd 1 $iostat_runs >> $iostat_file &" );
}
else {
print "can't use ${iostat_file}; making new\n";
open( my $fresh, '>', $iostat_file ) or
die "can't make new ${iostat_file}\n";
print_header( );
}
}
sub print_header {
open( my $header, '>>', $iostat_file ) or
die "failed to start ${iostat_file}: $!\n";
print $header "# this file is created by the iosmart munin-plugin\n";
print $header "# if it's not updating, check the munin-node logs\n";
close( $header ) or die "can't stop heading ${iostat_file}? HMPH $!\n";
}
ETA: using a new code formatting plugin
