The IO of Sauron

I needed to know what the disk utilization on a system was, essentially at all times, with a granularity of one second. Asking for the current iostat every five minutes via munin wasn’t sufficient. So I wrote this munin plugin. It tries to read a file and report the average and maximum disk utilization or util% logged in that file. Then it unlinks the file and starts an iostat running every second for 250 seconds, recorded in that file, setting it up for the next time it’s polled.

It looks like this. I don’t think I have any reason to obfuscate the vertical axis of this picture unlike the #devops people. It’s the graph from an uninteresting system doing some periodic nightly batch work.

munin graphs of the data from iosmart

It has not escaped me that it reports average averages, maximum averages, average maximums, and maximum maximums.

 

Here’s the script.

 


!/usr/bin/perl

# iosmart by Shannon Prickett <shannon.prickett@gmail.com>
# get iostat data and provide max and average values since the last check.

use strict;
use warnings;

use File::stat;

use vars qw{ $argument $iostat_file $iostat_runs };
use subs qw{ print_header };

$iostat_file = '/var/tmp/iosmart';
$iostat_runs = 250; # normally 250, reduce when testing

$argument = $ARGV[0] || 'NONE';

if ($argument =~ /config/) { # hint to munin
  # munin-run calls this to set up graphs, check limits
  print <<EOM;
graph_title Extended iostat coverage
graph_vlabel utilization
graph_category disk
graph_info Collect iostat data every second
sda_max.label sda max util
sda_avg.label sda avg util
sdb_max.label sdb max util
sdb_avg.label sdb avg util
EOM
}
else { # do the work

  if (( -e $iostat_file) &&  # it exists
    ( -r $iostat_file) &&  # we can read it
    ( -s $iostat_file) ) {  # it has something in it

    my $st = stat( $iostat_file ) or
      die "can't stat $iostat_file: $!\n";

    my $mtime = $st->mtime;
    my $now = time( );
    my $delta = $now - $mtime;

    if ( $delta > 300 ) {  # it's been >5 minutes
      print "${iostat_file} is stale, using it anyway\n";
    }

    open( my $fh, '<', $iostat_file ) or
      die "failed to open $iostat_file: $!\n";
  
    my %devices;
    my ($max, $skip_until, $stop_skipping);
    READLOOP: while ( my $line = <$fh> ) {
      next READLOOP unless ( $line =~ /^(sd\w).*?(\d+\.\d+)$/ ); # we only care about the disk rows

      # we want to skip the first block of iostat output
      # that's output since boot. we only care about now.
      unless ( defined $skip_until ) {
        $skip_until = $1;
        next READLOOP;
      }

      my $device = $1;  # per device values
      my $util = $2;  # keep the number from the last dolume

      if (( ! defined $stop_skipping ) && # if we are still skipping
        ( $device ne $skip_until ) ) {  # this isn't what we're looking for
        next READLOOP;
      }
      else {
        $stop_skipping = 1;
      }
    
      chomp $util;  # we hates filthy newlines forever
    
      if (( ! exists $devices{$device}{'count'} ) or  # first time
        ( ! defined $devices{$device}{'count'} ) ) {
        $devices{$device}{'count'} = 1;
      }
      else {
        $devices{$device}{'count'} = $devices{$device}{'count'} + 1;
      }

      $devices{$device}{'sum'} += $util;

      if (( ! exists $devices{$device}{'max'} ) or  # we've never set it before
        ( ! defined $devices{$device}{'max'} ) or  # the whole world is crazy
        ( $devices{$device}{'max'} < $util ) ) {  # it's smaller than current
        $devices{$device}{'max'} = $util;
      }

    }

    for my $device (keys %devices) {
      $devices{$device}{'average'} = $devices{$device}{'sum'} / $devices{$device}{'count'};

      print "${device}_max.value $devices{$device}{'max'}\n";
      print "${device}_avg.value $devices{$device}{'average'}\n";
    }

    close( $fh ) or die "can't close ${iostat_file}? wtf? $!\n";

    unlink( $iostat_file ) or die "can't rm ${iostat_file}: $!\n";

    print_header( );

    # suppress an error about inappropriate ioctl by not testing exit. :(
    system( "iostat -xd 1 $iostat_runs >> $iostat_file &" );
  }
  else {
    print "can't use ${iostat_file}; making new\n";
    open( my $fresh, '>', $iostat_file ) or
      die "can't make new ${iostat_file}\n";
    print_header( );
  }
}

sub print_header {
  open( my $header, '>>', $iostat_file ) or
    die "failed to start ${iostat_file}: $!\n";

  print $header "# this file is created by the iosmart munin-plugin\n";
  print $header "# if it's not updating, check the munin-node logs\n";

  close( $header ) or die "can't stop heading ${iostat_file}? HMPH $!\n";
}

ETA: using a new code formatting plugin

Tags: , , , ,

2 Responses to “The IO of Sauron”

  1. Brian Says:

    Can you give a newbe a lesson on how I can run this on a windows box if I already have the Iostat file captured in .csv format?

    Basically I would love to be able to graph the IOstat data like your example above. I have the .csv for 24 hours of host data but not sure what to do next :(

    Thanks!

  2. binder Says:

    @Brian: this script given here is completely unsuitable for what it sounds as if you want to do. If I were trapped in the land of Windows, I’d import the .csv file of iostat values into an Excel spreadsheet, and then use the Chart Wizard to make a pretty picture from the numbers. Your time fields are the horizontal (X) axis and the values for each field are the points of the lines.

    If you want it to look similar, you’ll also use some averaging function on your percent-utilized values to arrive at that graph line. Having it in a spreadsheet also options up the possibility of all sorts of other analysis. Good luck.

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...