Archive for June, 2011

NJC: Day 3

Saturday, June 25th, 2011

I got an early start this day because I woke early for no real reason. So I headed into the office, had some breakfast there (grilled cheese with fried egg and coffee, thanks for asking) and got down to work.

That meant installing Evernote for OS X so I could attach a PDF to a note, then using the information I’d gathered there to write up a proposal concerning alerting. I am trying to be thorough in documenting what I do and why. I tried to capture my thinking, rationalize my decision, and foreshadow future developments. As part of the research for the writing I think I noticed something odd about Pingdom’s pricing.

I’m probably misunderstanding something. But if the costs per check aren’t different at the Business plan level, and I don’t care about SMS notifies, what is my incentive to ever leave the Basic plan? My efficient frontier seems like it’s up and to the left and with a linear progression, it’s a Basic ballgame.

If the cost per check on the business plan is less, then the graph is wrong and there is a break-even point on the expense of checks. But it sure doesn’t look like it.

I ran my proposal for additional monitoring past my boss and got his buy-in and then started deploying it. So that’s my first operational task which is not entirely reactive in nature; there had been an issue earlier with a system going away and no one noticing, but it wasn’t a production system and I was more interested in getting some kind of alerting going for those systems as they come online.

I fully expect to be iterating on the deployed monitoring solution, as it was a trade-off between results and costs (financial and my time/effort/brain) and there are arguably better solutions I didn’t feel like I could invest enough into at this point to get the most out of. It’s a starting point, an incremental improvement over what the company had in place before me.

This wasn’t quite a No Changes Friday (the only religious holiday I observe) but  it seemed worth it to push through that sabbath to get additional awareness of the environment. Arguably it didn’t impact anything production-related, beyond the tiny additional impact of the monitoring checks being done, which are all tickling network listening daemons.

Then I spent the rest of my day researching options for my next proposal, which will be SSL related.

NJC: Day 2

Thursday, June 23rd, 2011

Today I gained a bucket of credentials and started poking around at what exists and what services are being used.

The company has a pair of domain registrars they’ve used, a leased hosting provider, a cloud provider, a connectivity monitoring service, a code repository hosting service. They’ve got the beginnings of a series of development environments, of a production environment, of a staging environment.

So now I feel like I have my hands around what they’ve got going already, in a general sense. At this point, I can start drilling down into specifics and start nudging the pieces to align the way I think they should.

So other than absorb, what did I do today? I discussed with some developers and the release engineer what pieces need to be in place for a go-live coming up, and how to validate their ideas BEFORE it encounters the enemy (live users, those adorable and unforgiving creatures) and we whiteboarded some ideas for them to explore.

I didn’t get to the point I wanted to with the alerting problem I had expected to tackle today, but I’m prone to a surprising degree of optimism, considering my line of work. I did gather some numbers to start estimating costs for different options.

The main options I see:

  1. extend the existing availability monitoring service to check more things
    • the cost scales linearly with the things checked
    • it only checks things which can be discerned remotely
    • it checks from multiple different places
    • it has some nice escalation options
  2. hand-roll some rudimentary alerting on the most critical things
    • this is a path of brittleness and pain
    • it’s cheap, the only cost is my time and my brain
    • it could do both internal and external checks
    • it won’t scale beyond a dozen checks
  3. deploy some paid alerting software package
    • this is paying someone else to have thought and worried about important things already
    • I don’t have a lot of experience with paid alerting software and the systems I did use were awkward, counter-intuitive, and had several false negatives when they should have been screaming bloody murder
    • I’m leery of spending money I’ll regret spending at this stage of things
  4. deploy some opensource alerting software package
    • I’ve got some familiarity with a couple different ones
    • the price point is pretty good
    • some of them offer paid support if things get too hairy or I need it to scale faster
    • the ones I’ve used accommodate writing extensions which seems necessary; I’ve never been happy with off the shelf software

Then it becomes a question, if I’m not choosing option 1, of where to run whatever I deploy. It could go on one or more leased hosts, it could go somewhere in the cloud, or I could go full-tilt and buy some servers and co-locate them.

I’m going to finish out my math on costs but I have a hunch I know where the sweet spot will be to start out: a low end leased host running an open source package in the Nagios-space (Nagios, Icinga, Opsview). The cloud would arguably be cheaper but I’m nervous putting something I need to depend on in the cloud at this stage of things.

NJC: Day 1

Thursday, June 23rd, 2011

This day (yesterday as I write this) was less packed than the previous day but somehow just as tiring. I started the day thinking I’d be reading documentation, both for technologies I wasn’t wholly familiar with already in use, and the existing corpus on what processes and systems the company had already put in place before I arrived.

What I actually did was write documentation. Specifically, I started a spreadsheet for information about servers as they became pertinent. At this point, it’s got only four server names in it, with an IP address, where it’s located, and whether I have shell access on it, yet. It was slow going, as I had to extract it from developers who are busily readying to launch a new project.

I got myself set up with access to a Continual Integration server only to learn that this was the deprecated CI server and I actually wanted access to an entirely different CI tool on a different server.

As I’d started accumulating a wild array of bookmarks in my browser already, on top of the motley array inherited from several previous browser migration imports, I took some time to actually go through and organize my bookmarks. It felt silly at the time but as my bookmarks sync between the different places I run browsers, I only had to do it the once.

At lunch I played a 5-person game of Dominion with co-workers. My streak of not winning Dominion continued. I don’t expect it to last much longer, as they evidently play regularly, so now I have no excuse to not get serious about it.

Then my laptop halted for the first time that day, followed by the timely arrival of the replacement I was given.

Then I spent an hour installing software onto the replacement laptop. The same software I installed on the first day onto the original laptop, with one addition, Adium, for old time’s sake. At least it went faster this time as I had the list to work from in seeking software.

I got access to the company’s AWS console and that proved useful when I performed what I’d consider my first operational task of the new job: I rebooted an instance which developers could no longer remotely interact with. Which leads to what I expect today’s tasks to involve, adding additional alerting so I know there’s a problem before the users of the systems have to tell me.

New Job Changelog: Day 0

Tuesday, June 21st, 2011

Today was my first day at a new job. I’m the only operations guy minding the deployed and deploying production environment for the place. Not even replacing a previous operations staff; this is green field territory, more or less.

So here’s an abbreviated list of what I accomplished today.

  • Set up the MacBookPro issued to me by installing
  • Met with my boss to start iterating on a list of things for me to take over from him and from engineers.
  • Had lunch with a co-worker, former and present.
  • Crashed a meeting on an existing and envisioned release process which wandered far afield as I was just trying to sponge up all the information I could.
  • Found out how to file a helpdesk ticket and filed one for the three inexplicable halts the MBP experienced under light use.
  • Filed a helpdesk ticket because email sent to me was being delivered to a co-worker.
  • Learned that the resolution was going to be swapping a whole new MBP for the one I had today, so I’ll get to re-install that software all over again.
  • Got setup with the bugtracking system (Jira) because I’m going to need somewhere to track work and it’s already in place.

 

Last week.

Sunday, June 12th, 2011

This is my last week at my current job. Maybe I’ll try to use the blog to record a changelog of what I’m up to in the future.

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...