<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Obsolete Your Idols &#187; Nerdery</title>
	<atom:link href="http://blog.manjusri.org/category/nerdery/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.manjusri.org</link>
	<description>Book Reviews and Blather</description>
	<lastBuildDate>Wed, 06 Jul 2011 18:46:43 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>NJC: Day 8</title>
		<link>http://blog.manjusri.org/2011/07/05/njc-day-8/</link>
		<comments>http://blog.manjusri.org/2011/07/05/njc-day-8/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 17:54:22 +0000</pubDate>
		<dc:creator>binder</dc:creator>
				<category><![CDATA[Nerdery]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[bacula]]></category>
		<category><![CDATA[changelog]]></category>
		<category><![CDATA[chef]]></category>
		<category><![CDATA[ebs]]></category>
		<category><![CDATA[knife]]></category>

		<guid isPermaLink="false">http://blog.manjusri.org/?p=415</guid>
		<description><![CDATA[This was the last day before a 3 day weekend and as is customary around these parts, not many people came in and the ones who did left early. I didn&#8217;t really achieve anything worth talking about, just researched some more ideas for my next few proposals. Specifically, I looked at AWS documentation about Elastic [...]]]></description>
			<content:encoded><![CDATA[<p>This was the last day before a 3 day weekend and as is customary around these parts, not many people came in and the ones who did left early. I didn&#8217;t really achieve anything worth talking about, just researched some more ideas for my next few proposals.</p>
<p>Specifically, I looked at AWS documentation about <a href="http://aws.amazon.com/ebs/">Elastic Block Storage</a>, I looked at new <a href="http://www.bacula.org/en/dev-manual/main/main/Current_State_Bacula.html">Bacula features</a>, I looked at <a href="http://wiki.opscode.com/display/chef/Home">Chef</a> and specifically <a href="http://wiki.opscode.com/display/chef/Knife">Knife</a>.</p>
<p>Then I went to drink with co-workers, former and present.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.manjusri.org/2011/07/05/njc-day-8/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NJC: Day 7</title>
		<link>http://blog.manjusri.org/2011/07/05/njc-day-7/</link>
		<comments>http://blog.manjusri.org/2011/07/05/njc-day-7/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 17:50:06 +0000</pubDate>
		<dc:creator>binder</dc:creator>
				<category><![CDATA[Nerdery]]></category>
		<category><![CDATA[changelog]]></category>

		<guid isPermaLink="false">http://blog.manjusri.org/?p=413</guid>
		<description><![CDATA[This day was my first day working from home at the new job, something I negotiated to get for myself. One day a week, I work from home. Unfortunately as today was the second attempt at releasing, and things took about as long as they do the first time you do something operational, this was [...]]]></description>
			<content:encoded><![CDATA[<p>This day was my first day working from home at the new job, something I negotiated to get for myself. One day a week, I work from home. Unfortunately as today was the second attempt at releasing, and things took about as long as they do the first time you do something operational, this was a 14 hour day of work for me. At least I didn&#8217;t need pants to do it.</p>
<p>Our proposed deploy process didn&#8217;t survive its encounter with the actual server environment but that doesn&#8217;t come as a surprise; it&#8217;s still very raw and will be refined a lot, soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.manjusri.org/2011/07/05/njc-day-7/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NJC: Day 6</title>
		<link>http://blog.manjusri.org/2011/07/05/njc-day-6/</link>
		<comments>http://blog.manjusri.org/2011/07/05/njc-day-6/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 17:46:47 +0000</pubDate>
		<dc:creator>binder</dc:creator>
				<category><![CDATA[Nerdery]]></category>
		<category><![CDATA[changelog]]></category>

		<guid isPermaLink="false">http://blog.manjusri.org/?p=411</guid>
		<description><![CDATA[I had hoped to WFH on this day but we were on track to release something important so I came in to the office. First order of the day was sharing the Windows virtualbox file with my co-workers so they could fire up IE and validate things work in that browser, too. That led in [...]]]></description>
			<content:encoded><![CDATA[<p>I had hoped to WFH on this day but we were on track to release something important so I came in to the office.</p>
<p>First order of the day was sharing the Windows virtualbox file with my co-workers so they could fire up IE and validate things work in that browser, too.</p>
<p>That led in to a playdate for something on the web, followed by a process meeting about deployment which turned into a pair programming / code review session using a laptop jacked into a projector. I recommend this if you have more than one developer in a meeting, use the projector to display their code as you talk about related topics.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.manjusri.org/2011/07/05/njc-day-6/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NJC: Day 4</title>
		<link>http://blog.manjusri.org/2011/07/05/njc-day-4/</link>
		<comments>http://blog.manjusri.org/2011/07/05/njc-day-4/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 16:45:23 +0000</pubDate>
		<dc:creator>binder</dc:creator>
				<category><![CDATA[Nerdery]]></category>
		<category><![CDATA[changelog]]></category>
		<category><![CDATA[ssl]]></category>

		<guid isPermaLink="false">http://blog.manjusri.org/?p=398</guid>
		<description><![CDATA[This was my first Monday in the new office and so the focus was on all the orientation activity they only do on Mondays. I got my picture taken for a security badge (taken with a smart phone) and spent the day following up on orientation information as well as researching SSL certificate options before [...]]]></description>
			<content:encoded><![CDATA[<p>This was my first Monday in the new office and so the focus was on all the orientation activity they only do on Mondays. I got my picture taken for a security badge (taken with a smart phone) and spent the day following up on orientation information as well as researching SSL certificate options before proposing that the company pay for a wildcard SSL certificate for use in production environments and that we self-sign a different wildcard SSL certificate for use in development and testing environments.</p>
<p>Then I dug in to the documentation for how to use an SSL certificate with <a href="http://docs.jboss.org/jbossweb/3.0.x/ssl-howto.html">jboss</a>. That was more opaque than it needed to be and I&#8217;ll probably explain what I did in the next post.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.manjusri.org/2011/07/05/njc-day-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NJC: Day 3</title>
		<link>http://blog.manjusri.org/2011/06/25/njc-day-3/</link>
		<comments>http://blog.manjusri.org/2011/06/25/njc-day-3/#comments</comments>
		<pubDate>Sat, 25 Jun 2011 22:38:58 +0000</pubDate>
		<dc:creator>binder</dc:creator>
				<category><![CDATA[Nerdery]]></category>
		<category><![CDATA[changelog]]></category>
		<category><![CDATA[devops]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[pingdom]]></category>

		<guid isPermaLink="false">http://blog.manjusri.org/?p=386</guid>
		<description><![CDATA[I got an early start this day because I woke early for no real reason. So I headed into the office, had some breakfast there (grilled cheese with fried egg and coffee, thanks for asking) and got down to work. That meant installing Evernote for OS X so I could attach a PDF to a [...]]]></description>
			<content:encoded><![CDATA[<p>I got an early start this day because I woke early for no real reason. So I headed into the office, had some breakfast there (grilled cheese with fried egg and coffee, thanks for asking) and got down to work.</p>
<p>That meant installing<a href="http://www.evernote.com/about/download/mac.php"> Evernote for OS X</a> so I could attach a PDF to a note, then using the information I&#8217;d gathered there to write up a proposal concerning alerting. I am trying to be thorough in documenting what I do and why. I tried to capture my thinking, rationalize my decision, and foreshadow future developments. As part of the research for the writing I think I noticed something odd about <a href="http://www.pingdom.com/">Pingdom&#8217;s</a> pricing.</p>
<p>I&#8217;m probably misunderstanding something. But if the costs per check aren&#8217;t different at the Business plan level, and I don&#8217;t care about SMS notifies, what is my incentive to ever leave the Basic plan? My efficient frontier seems like it&#8217;s up and to the left and with a linear progression, it&#8217;s a Basic ballgame.</p>
<p><img src="https://spreadsheets.google.com/spreadsheet/oimg?key=0AmC_Diu6RQc9dEphcDE4bXhZSE9xZU5sc0ZwY3FEb0E&amp;oid=2&amp;zx=81ybniskmuc3" alt="" /></p>
<p>If the cost per check on the business plan is less, then the graph is wrong and there is a break-even point on the expense of checks. But <a href="http://www.pingdom.com/services/extraservices/">it sure doesn&#8217;t look like it.</a></p>
<p>I ran my proposal for additional monitoring past my boss and got his buy-in and then started deploying it. So that&#8217;s my first operational task which is not entirely reactive in nature; there had been an issue earlier with a system going away and no one noticing, but it wasn&#8217;t a production system and I was more interested in getting some kind of alerting going for those systems as they come online.</p>
<p>I fully expect to be iterating on the deployed monitoring solution, as it was a trade-off between results and costs (financial and my time/effort/brain) and there are arguably better solutions I didn&#8217;t feel like I could invest enough into at this point to get the most out of. It&#8217;s a starting point, an incremental improvement over what the company had in place before me.</p>
<p>This wasn&#8217;t quite a No Changes Friday (the only religious holiday I observe) but  it seemed worth it to push through that sabbath to get additional awareness of the environment. Arguably it didn&#8217;t impact anything production-related, beyond the tiny additional impact of the monitoring checks being done, which are all tickling network listening daemons.</p>
<p>Then I spent the rest of my day researching options for my next proposal, which will be SSL related.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.manjusri.org/2011/06/25/njc-day-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NJC: Day 2</title>
		<link>http://blog.manjusri.org/2011/06/23/njc-day-2/</link>
		<comments>http://blog.manjusri.org/2011/06/23/njc-day-2/#comments</comments>
		<pubDate>Fri, 24 Jun 2011 05:17:16 +0000</pubDate>
		<dc:creator>binder</dc:creator>
				<category><![CDATA[Nerdery]]></category>
		<category><![CDATA[changelog]]></category>
		<category><![CDATA[devops]]></category>

		<guid isPermaLink="false">http://blog.manjusri.org/?p=379</guid>
		<description><![CDATA[Today I gained a bucket of credentials and started poking around at what exists and what services are being used. The company has a pair of domain registrars they&#8217;ve used, a leased hosting provider, a cloud provider, a connectivity monitoring service, a code repository hosting service. They&#8217;ve got the beginnings of a series of development [...]]]></description>
			<content:encoded><![CDATA[<p>Today I gained a bucket of credentials and started poking around at what exists and what services are being used.</p>
<p>The company has a pair of domain registrars they&#8217;ve used, a leased hosting provider, a cloud provider, a connectivity monitoring service, a code repository hosting service. They&#8217;ve got the beginnings of a series of development environments, of a production environment, of a staging environment.</p>
<p>So now I feel like I have my hands around what they&#8217;ve got going already, in a general sense. At this point, I can start drilling down into specifics and start nudging the pieces to align the way I think they should.</p>
<p>So other than absorb, what did I do today? I discussed with some developers and the release engineer what pieces need to be in place for a go-live coming up, and how to validate their ideas BEFORE it encounters the enemy (live users, those adorable and unforgiving creatures) and we whiteboarded some ideas for them to explore.</p>
<p>I didn&#8217;t get to the point I wanted to with the alerting problem I had expected to tackle today, but I&#8217;m prone to a surprising degree of optimism, considering my line of work. I did gather some numbers to start estimating costs for different options.</p>
<p>The main options I see:</p>
<ol>
<li>extend the existing availability monitoring service to check more things
<ul>
<li>the cost scales linearly with the things checked</li>
<li>it only checks things which can be discerned remotely</li>
<li>it checks from multiple different places</li>
<li>it has some nice escalation options</li>
</ul>
</li>
<li>hand-roll some rudimentary alerting on the most critical things
<ul>
<li>this is a path of brittleness and pain</li>
<li>it&#8217;s cheap, the only cost is my time and my brain</li>
<li>it could do both internal and external checks</li>
<li>it won&#8217;t scale beyond a dozen checks</li>
</ul>
</li>
<li>deploy some paid alerting software package
<ul>
<li>this is paying someone else to have thought and worried about important things already</li>
<li>I don&#8217;t have a lot of experience with paid alerting software and the systems I did use were awkward, counter-intuitive, and had several false negatives when they should have been screaming bloody murder</li>
<li>I&#8217;m leery of spending money I&#8217;ll regret spending at this stage of things</li>
</ul>
</li>
<li>deploy some opensource alerting software package
<ul>
<li>I&#8217;ve got some familiarity with a couple different ones</li>
<li>the price point is pretty good</li>
<li>some of them offer paid support if things get too hairy or I need it to scale faster</li>
<li>the ones I&#8217;ve used accommodate writing extensions which seems necessary; I&#8217;ve never been happy with off the shelf software</li>
</ul>
</li>
</ol>
<p>Then it becomes a question, if I&#8217;m not choosing option 1, of where to run whatever I deploy. It could go on one or more leased hosts, it could go somewhere in the cloud, or I could go full-tilt and buy some servers and co-locate them.</p>
<p>I&#8217;m going to finish out my math on costs but I have a hunch I know where the sweet spot will be to start out: a low end leased host running an open source package in the Nagios-space (Nagios, Icinga, Opsview). The cloud would arguably be cheaper but I&#8217;m nervous putting something I need to depend on in the cloud at this stage of things.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.manjusri.org/2011/06/23/njc-day-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NJC: Day 1</title>
		<link>http://blog.manjusri.org/2011/06/23/njc-day-1/</link>
		<comments>http://blog.manjusri.org/2011/06/23/njc-day-1/#comments</comments>
		<pubDate>Thu, 23 Jun 2011 13:25:33 +0000</pubDate>
		<dc:creator>binder</dc:creator>
				<category><![CDATA[Nerdery]]></category>
		<category><![CDATA[changelog]]></category>
		<category><![CDATA[devops]]></category>
		<category><![CDATA[os x]]></category>

		<guid isPermaLink="false">http://blog.manjusri.org/?p=377</guid>
		<description><![CDATA[This day (yesterday as I write this) was less packed than the previous day but somehow just as tiring. I started the day thinking I&#8217;d be reading documentation, both for technologies I wasn&#8217;t wholly familiar with already in use, and the existing corpus on what processes and systems the company had already put in place [...]]]></description>
			<content:encoded><![CDATA[<p>This day (yesterday as I write this) was less packed than the previous day but somehow just as tiring. I started the day thinking I&#8217;d be reading documentation, both for technologies I wasn&#8217;t wholly familiar with already in use, and the existing corpus on what processes and systems the company had already put in place before I arrived.</p>
<p>What I actually did was write documentation. Specifically, I started a spreadsheet for information about servers as they became pertinent. At this point, it&#8217;s got only four server names in it, with an IP address, where it&#8217;s located, and whether I have shell access on it, yet. It was slow going, as I had to extract it from developers who are busily readying to launch a new project.</p>
<p>I got myself set up with access to a Continual Integration server only to learn that this was the <a href="http://hudson-ci.org/">deprecated CI server</a> and I actually wanted access to an <a href="http://jenkins-ci.org/">entirely different CI tool</a> on a different server.</p>
<p>As I&#8217;d started accumulating a wild array of bookmarks in my browser already, on top of the motley array inherited from several previous browser migration imports, I took some time to actually go through and organize my bookmarks. It felt silly at the time but as my bookmarks sync between the different places I run browsers, I only had to do it the once.</p>
<p>At lunch I played a 5-person game of <a href="http://boardgamegeek.com/boardgame/36218/dominion">Dominion</a> with co-workers. My streak of not winning Dominion continued. I don&#8217;t expect it to last much longer, as they evidently play regularly, so now I have no excuse to not get serious about it.</p>
<p>Then my laptop halted for the first time that day, followed by the timely arrival of the replacement I was given.</p>
<p>Then I spent an hour installing software onto the replacement laptop. The same software I installed on the<a href="http://blog.manjusri.org/2011/06/21/new-job-changelog-day-0/"> first day</a> onto the original laptop, with one addition, <a href="http://adium.im/">Adium</a>, for old time&#8217;s sake. At least it went faster this time as I had the list to work from in seeking software.</p>
<p>I got access to the company&#8217;s <a href="http://aws.amazon.com/">AWS</a> console and that proved useful when I performed what I&#8217;d consider my first operational task of the new job: I rebooted an instance which developers could no longer remotely interact with. Which leads to what I expect today&#8217;s tasks to involve, adding additional alerting so I know there&#8217;s a problem before the users of the systems have to tell me.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.manjusri.org/2011/06/23/njc-day-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New Job Changelog: Day 0</title>
		<link>http://blog.manjusri.org/2011/06/21/new-job-changelog-day-0/</link>
		<comments>http://blog.manjusri.org/2011/06/21/new-job-changelog-day-0/#comments</comments>
		<pubDate>Wed, 22 Jun 2011 04:14:32 +0000</pubDate>
		<dc:creator>binder</dc:creator>
				<category><![CDATA[Nerdery]]></category>
		<category><![CDATA[changelog]]></category>
		<category><![CDATA[devops]]></category>
		<category><![CDATA[os x]]></category>

		<guid isPermaLink="false">http://blog.manjusri.org/?p=372</guid>
		<description><![CDATA[Today was my first day at a new job. I&#8217;m the only operations guy minding the deployed and deploying production environment for the place. Not even replacing a previous operations staff; this is green field territory, more or less. So here&#8217;s an abbreviated list of what I accomplished today. Set up the MacBookPro issued to [...]]]></description>
			<content:encoded><![CDATA[<p>Today was my first day at a new job. I&#8217;m the only operations guy minding the deployed and deploying production environment for the place. Not even replacing a previous operations staff; this is green field territory, more or less.</p>
<p>So here&#8217;s an abbreviated list of what I accomplished today.</p>
<ul>
<li>Set up the MacBookPro issued to me by installing
<ul>
<li><a href="http://www.google.com/chrome"> Chrome</a></li>
<li><a href="http://www.truecrypt.org/">TrueCrypt</a></li>
<li><a href="http://www.fpx.de/fp/Software/Gorilla/">Password Gorilla</a></li>
<li><a href="http://www.dropbox.com/">Dropbox</a></li>
<li><a href="http://www.keepassx.org/">KeePassX</a></li>
<li><a href="http://code.google.com/p/git-osx-installer/">Git</a></li>
</ul>
</li>
<li>Met with my boss to start iterating on a list of things for me to take over from him and from engineers.</li>
<li>Had lunch with a co-worker, former and present.</li>
<li>Crashed a meeting on an existing and envisioned release process which wandered far afield as I was just trying to sponge up all the information I could.</li>
<li>Found out how to file a helpdesk ticket and filed one for the three inexplicable halts the MBP experienced under light use.</li>
<li>Filed a helpdesk ticket because email sent to me was being delivered to a co-worker.</li>
<li>Learned that the resolution was going to be swapping a whole new MBP for the one I had today, so I&#8217;ll get to re-install that software all over again.</li>
<li>Got setup with the bugtracking system (<a href="http://www.atlassian.com/software/jira/">Jira</a>) because I&#8217;m going to need somewhere to track work and it&#8217;s already in place.</li>
</ul>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.manjusri.org/2011/06/21/new-job-changelog-day-0/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The IO of Sauron</title>
		<link>http://blog.manjusri.org/2011/03/22/the-io-of-sauron/</link>
		<comments>http://blog.manjusri.org/2011/03/22/the-io-of-sauron/#comments</comments>
		<pubDate>Wed, 23 Mar 2011 05:30:37 +0000</pubDate>
		<dc:creator>binder</dc:creator>
				<category><![CDATA[Nerdery]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[iostat]]></category>
		<category><![CDATA[metrics]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[munin]]></category>

		<guid isPermaLink="false">http://blog.manjusri.org/?p=331</guid>
		<description><![CDATA[I needed to know what the disk utilization on a system was, essentially at all times, with a granularity of one second. Asking for the current iostat every five minutes via munin wasn&#8217;t sufficient. So I wrote this munin plugin. It tries to read a file and report the average and maximum disk utilization or [...]]]></description>
			<content:encoded><![CDATA[<p>I needed to know what the disk utilization on a system was, essentially at all times, with a granularity of one second. Asking for the current iostat every five minutes via munin wasn&#8217;t sufficient. So I wrote this munin plugin. It tries to read a file and report the average and maximum disk utilization or util% logged in that file. Then it unlinks the file and starts an iostat running every second for 250 seconds, recorded in that file, setting it up for the next time it&#8217;s polled.</p>
<p>It looks like this. I don&#8217;t think I have any reason to obfuscate the vertical axis of this picture unlike the #devops people. It&#8217;s the graph from an uninteresting system doing some periodic nightly batch work.</p>
<div id="attachment_332" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.manjusri.org/wp-content/uploads/2011/03/Screen-shot-2011-03-22-at-10.25.01-PM.png"><img class="size-medium wp-image-332" title="iosmart graphs" src="http://blog.manjusri.org/wp-content/uploads/2011/03/Screen-shot-2011-03-22-at-10.25.01-PM-300x177.png" alt="munin graphs of the data from iosmart" width="300" height="177" /></a><p class="wp-caption-text">It has not escaped me that it reports average averages, maximum averages, average maximums, and maximum maximums.</p></div>
<p>&nbsp;</p>
<p>Here&#8217;s the script.</p>
<p>&nbsp;</p>
<p><pre><code><br />
!/usr/bin/perl<br />
<br />
# iosmart by Shannon Prickett &lt;shannon.prickett@gmail.com&gt;<br />
# get iostat data and provide max and average values since the last check.<br />
<br />
use strict;<br />
use warnings;<br />
<br />
use File::stat;<br />
<br />
use vars qw{ $argument $iostat_file $iostat_runs };<br />
use subs qw{ print_header };<br />
<br />
$iostat_file = &#039;/var/tmp/iosmart&#039;;<br />
$iostat_runs = 250; # normally 250, reduce when testing<br />
<br />
$argument = $ARGV[0] || &#039;NONE&#039;;<br />
<br />
if ($argument =~ /config/) { # hint to munin<br />
&nbsp;&nbsp;# munin-run calls this to set up graphs, check limits<br />
&nbsp;&nbsp;print &lt;&lt;EOM;<br />
graph_title Extended iostat coverage<br />
graph_vlabel utilization<br />
graph_category disk<br />
graph_info Collect iostat data every second<br />
sda_max.label sda max util<br />
sda_avg.label sda avg util<br />
sdb_max.label sdb max util<br />
sdb_avg.label sdb avg util<br />
EOM<br />
}<br />
else { # do the work<br />
<br />
&nbsp;&nbsp;if (( -e $iostat_file) &amp;&amp;&nbsp;&nbsp;# it exists<br />
&nbsp;&nbsp;&nbsp;&nbsp;( -r $iostat_file) &amp;&amp;&nbsp;&nbsp;# we can read it<br />
&nbsp;&nbsp;&nbsp;&nbsp;( -s $iostat_file) ) {&nbsp;&nbsp;# it has something in it<br />
<br />
&nbsp;&nbsp;&nbsp;&nbsp;my $st = stat( $iostat_file ) or<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;die &quot;can&#039;t stat $iostat_file: $!\n&quot;;<br />
<br />
&nbsp;&nbsp;&nbsp;&nbsp;my $mtime = $st-&gt;mtime;<br />
&nbsp;&nbsp;&nbsp;&nbsp;my $now = time( );<br />
&nbsp;&nbsp;&nbsp;&nbsp;my $delta = $now - $mtime;<br />
<br />
&nbsp;&nbsp;&nbsp;&nbsp;if ( $delta &gt; 300 ) {&nbsp;&nbsp;# it&#039;s been &gt;5 minutes <br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print &quot;${iostat_file} is stale, using it anyway\n&quot;;<br />
&nbsp;&nbsp;&nbsp;&nbsp;}<br />
<br />
&nbsp;&nbsp;&nbsp;&nbsp;open( my $fh, &#039;&lt;&#039;, $iostat_file ) or <br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;die &quot;failed to open $iostat_file: $!\n&quot;;<br />
&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;my %devices;<br />
&nbsp;&nbsp;&nbsp;&nbsp;my ($max, $skip_until, $stop_skipping);<br />
&nbsp;&nbsp;&nbsp;&nbsp;READLOOP: while ( my $line = &lt;$fh&gt; ) {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;next READLOOP unless ( $line =~ /^(sd\w).*?(\d+\.\d+)$/ ); # we only care about the disk rows<br />
<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# we want to skip the first block of iostat output<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# that&#039;s output since boot. we only care about now.<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;unless ( defined $skip_until ) {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$skip_until = $1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;next READLOOP;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;my $device = $1;&nbsp;&nbsp;# per device values<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;my $util = $2;&nbsp;&nbsp;# keep the number from the last dolume<br />
<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (( ! defined $stop_skipping ) &amp;&amp; # if we are still skipping<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;( $device ne $skip_until ) ) {&nbsp;&nbsp;# this isn&#039;t what we&#039;re looking for<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;next READLOOP;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;else {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$stop_skipping = 1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;chomp $util;&nbsp;&nbsp;# we hates filthy newlines forever<br />
&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (( ! exists $devices{$device}{&#039;count&#039;} ) or&nbsp;&nbsp;# first time<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;( ! defined $devices{$device}{&#039;count&#039;} ) ) {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$devices{$device}{&#039;count&#039;} = 1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;else {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$devices{$device}{&#039;count&#039;} = $devices{$device}{&#039;count&#039;} + 1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$devices{$device}{&#039;sum&#039;} += $util;<br />
<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (( ! exists $devices{$device}{&#039;max&#039;} ) or&nbsp;&nbsp;# we&#039;ve never set it before<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;( ! defined $devices{$device}{&#039;max&#039;} ) or&nbsp;&nbsp;# the whole world is crazy<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;( $devices{$device}{&#039;max&#039;} &lt; $util ) ) {&nbsp;&nbsp;# it&#039;s smaller than current<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$devices{$device}{&#039;max&#039;} = $util; <br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
<br />
&nbsp;&nbsp;&nbsp;&nbsp;}<br />
<br />
&nbsp;&nbsp;&nbsp;&nbsp;for my $device (keys %devices) {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$devices{$device}{&#039;average&#039;} = $devices{$device}{&#039;sum&#039;} / $devices{$device}{&#039;count&#039;};<br />
<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print &quot;${device}_max.value $devices{$device}{&#039;max&#039;}\n&quot;;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print &quot;${device}_avg.value $devices{$device}{&#039;average&#039;}\n&quot;;<br />
&nbsp;&nbsp;&nbsp;&nbsp;}<br />
<br />
&nbsp;&nbsp;&nbsp;&nbsp;close( $fh ) or die &quot;can&#039;t close ${iostat_file}? wtf? $!\n&quot;;<br />
<br />
&nbsp;&nbsp;&nbsp;&nbsp;unlink( $iostat_file ) or die &quot;can&#039;t rm ${iostat_file}: $!\n&quot;;<br />
<br />
&nbsp;&nbsp;&nbsp;&nbsp;print_header( );<br />
<br />
&nbsp;&nbsp;&nbsp;&nbsp;# suppress an error about inappropriate ioctl by not testing exit. :(<br />
&nbsp;&nbsp;&nbsp;&nbsp;system( &quot;iostat -xd 1 $iostat_runs &gt;&gt; $iostat_file &amp;&quot; );<br />
&nbsp;&nbsp;}<br />
&nbsp;&nbsp;else {<br />
&nbsp;&nbsp;&nbsp;&nbsp;print &quot;can&#039;t use ${iostat_file}; making new\n&quot;;<br />
&nbsp;&nbsp;&nbsp;&nbsp;open( my $fresh, &#039;&gt;&#039;, $iostat_file ) or <br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;die &quot;can&#039;t make new ${iostat_file}\n&quot;;<br />
&nbsp;&nbsp;&nbsp;&nbsp;print_header( );<br />
&nbsp;&nbsp;}<br />
}<br />
<br />
sub print_header {<br />
&nbsp;&nbsp;open( my $header, &#039;&gt;&gt;&#039;, $iostat_file ) or <br />
&nbsp;&nbsp;&nbsp;&nbsp;die &quot;failed to start ${iostat_file}: $!\n&quot;;<br />
<br />
&nbsp;&nbsp;print $header &quot;# this file is created by the iosmart munin-plugin\n&quot;;<br />
&nbsp;&nbsp;print $header &quot;# if it&#039;s not updating, check the munin-node logs\n&quot;;<br />
<br />
&nbsp;&nbsp;close( $header ) or die &quot;can&#039;t stop heading ${iostat_file}? HMPH $!\n&quot;;<br />
}<br />
<br />
</code></pre></p>
<p>ETA: using a new code formatting plugin</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.manjusri.org/2011/03/22/the-io-of-sauron/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Administrating MongoDB</title>
		<link>http://blog.manjusri.org/2011/03/17/administrating-mongodb/</link>
		<comments>http://blog.manjusri.org/2011/03/17/administrating-mongodb/#comments</comments>
		<pubDate>Thu, 17 Mar 2011 21:11:10 +0000</pubDate>
		<dc:creator>binder</dc:creator>
				<category><![CDATA[Nerdery]]></category>
		<category><![CDATA[mongodb]]></category>
		<category><![CDATA[nosql]]></category>
		<category><![CDATA[system administration]]></category>

		<guid isPermaLink="false">http://blog.manjusri.org/?p=322</guid>
		<description><![CDATA[During the last two months I have spent a lot of time in close proximity to mongodb. Enough so that I feel like I&#8217;ve learned some things I should pass on. These are rooted in mistakes I have made and survived. Think really hard about why you need mongodb. This is not because mongodb is [...]]]></description>
			<content:encoded><![CDATA[<p>During the last two months I have spent a <strong>lot</strong> of time in close proximity to mongodb. Enough so that I feel like I&#8217;ve learned some things I should pass on. These are rooted in mistakes I have made and survived.</p>
<ul>
<li>Think really hard about why you need <a title="NoSQL datastore" href="http://www.mongodb.org/">mongodb</a>. This is not because mongodb is bad, but it may be the wrong choice for what you&#8217;re trying to do.
<ul>
<li>Do you know what data will be the most valuable to you from your app? Then<strong> don&#8217;t</strong> use mongodb.</li>
<li>Can you afford to take downtime to modify the relationships between your data? Then <strong>don&#8217;t</strong> use mongodb.</li>
<li>Do you have existing trustable models which project the expected growth for your data? Then <strong>don&#8217;t</strong> use mongodb.</li>
<li>Are there one or more glaring unknowns about the data from your app? Then <strong>do</strong> use mongodb.</li>
</ul>
</li>
<li>Get yourself in a support contract with <a href="http://www.10gen.com/">10gen</a>. They are the best source of mongodb information, advice, and help.</li>
<li>Your  minimum production environment is nine servers.
<ul>
<li>Your servers should as congruent as you can get them for the same reason the drives in a RAID should be.</li>
<li>You&#8217;re going to want a dedicated RAID-10 device for each mongod process&#8217;s datafiles.</li>
<li>You want the drives to be maximized for speed. RPMs matter more than capacity when you are getting started.</li>
<li>RAM is the other thing which mongod is hungry for. The more the better.</li>
<li>The nine servers will be distributed like so<br />
<table border="3">
<tbody>
<tr>
<th>shard 1</th>
<th>shard 2</th>
<th>shard 3</th>
</tr>
<tr>
<td>primary replSet member</td>
<td>primary replset member</td>
<td>primary replSet member</td>
</tr>
<tr>
<td>secondary replSet member</td>
<td>secondary replSet member</td>
<td>secondary replSet member</td>
</tr>
<tr>
<td>delayed secondary replSet member + configdb</td>
<td>delayed secondary replSet member + configdb</td>
<td>delayed secondary replSet member + configdb</td>
</tr>
</tbody>
</table>
</li>
<li>This is a reasonably robust setup to distribute data in a redundant manner. You can fail over to the non-delayed secondary in any given replSet, you can do stop+copy backups of your data at the delayed secondary member, you can put all three replSets into a shard and avoid some lopsided conditions from, for example, disabling the Balancer process. If you have to cut costs, you can skimp on the bottom row of servers. Try not to have to do that.</li>
</ul>
</li>
<li>Never delete anything. Have enough disk that you can &#8216;remove&#8217; files by moving them to another place on the same system. This applies to configuration files, data files, log files, anything mongodb related.</li>
<li>Find out about numactl. This is a <a href="http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/">good clearinghouse post about numactl</a> as it applies to mysql. At large core + large memory sizes, it applies to mongodb, as well.</li>
<li>Graph everything you can. Iostat, memory usage, swap usage, mongodb operations, throughput at every layer in front of mongodb. You&#8217;ll need to know when a problem is with mongodb and when it&#8217;s higher or lower in the stack.</li>
<li>Run your mongod instances on alternate ports from the default.</li>
<li>Favor mv and scp over rsync when moving mongodb data files around.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.manjusri.org/2011/03/17/administrating-mongodb/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

