The evolution of Incident Management part 1: in the beginning…

Share this post:
Remote Working /
Meet the team: Nick Kyrkewood, Director of...
Remote Working /
Meet the team: Nick Kyrkewood, Director of...

Related content:

Meet the team: Alex Duell, VP People
The evolution of Incident Management part 1: in the...
Meet the team: Francesca Scantlebury, Senior Talent Manager
Meet the team: Craig Gregory, CISO
Meet the Team: Chiara Pensato, CMO
Enterprise Agility: What does Business As Usual look like...
Creativity under adversity
Meet the team: Alex Duell, VP People
The evolution of Incident Management part 1: in the...
Pride Every Day: insights from the Gaygooners founder on...
The Front End Diaries - Using Cloud Engineering learnings...
Jack eats a can of worms - Using Cloud Engineering...
Meet the team: Francesca Scantlebury, Senior Talent Manager
‘Your AWS account is about to be suspended’ - Using Cloud...

Jim Korchak

September 3, 2021

Back, deep in the history of humankind, there were darker times: living conditions were sub-optimal, hygiene was poor, the knowledge of “health” was laughable by modern standards. Danger was all around the people that lived then (if you could actually call it living). Knowledge was hard to come by and it was guarded very closely and carefully by the few that clung to it. Of course, the astute historians amongst you will know immediately that I refer to that period known as “the 1990s”.

When speaking about technology and incident management history, I’ll begin this journey at the dawn of the adoption of distributed computing (aka the 1990s). For those of you who are unfamiliar with the near-vestigial term distributed computing – this refers to the point where systems were broken into backends that ran on one or more servers that would interact with applications running on PCs/Workstations that were computers in their own right. I start here because this is the point where complexity in computing took an almighty step up. This, in my humble opinion, is the starting point when different technology teams really needed to begin to interact to resolve issues that arose, and hence “fixing a user issue” evolved into “incident management”. Sure, there will be those of you who will argue with my timeframe or point out with great aplomb that there were mainframes long before that, but you are forgetting a few critical factors: this is my blog series, you are welcome to write your own on your personal home mainframe and you should probably go enjoy retirement to the fullest while you still can.

To understand some of the challenges of managing ‘tech issues’ back then, it’s important to set the scene a bit – so, for context, here’s what I remember as the backdrop: 

  • The internet wasn’t really much of a thing yet - it was there, but only the most technically curious would mess with it. Why? Well, there wasn’t the richness of content we all now enjoy and if you have never experienced the joy of a dial-up modem…So, it follows that “intranets” only really started to become used/useful toward the end of the period. 
  • For anyone not in the IT team, technology was literally magic. It was a mystical thing like a unicorn or Medusa that nobody outside of IT wanted to even try to understand for fear of going mad or turning to stone. Because of this fact, the IT teams were pretty much left to their own devices, which meant that…
  • Everyone in technology had privileged access to just about everything: nothing was under lock and key. Every system administrator had root, every app support person had database admin access and nobody questioned it. Why would you? Would you lock a line chef’s knives away and dole them out every time they needed to cut a vegetable, but only if they got permission from the head chef? Of course not.
  • There wasn’t much packaged software available for business end-users. Most third-party products were building blocks like databases (Sybase, Oracle) and eventually web servers. This is vastly different from today where if you want to find software to specifically manage your dog walking business you have about eight packages to choose from that do just dog walking (I recommend iPooch, MyK9, or UppityPuppity). This lack of choice meant that most early tech adopter companies predominately wrote their own software.
  • Technology teams started to realize that the productivity of their developers was impacted by them having to deal with user issues, so stand-alone application support teams started to emerge as the ‘go to’ model
  • And finally (and somewhat wistfully), I remember IT having a sense of humor - perhaps it was the fact that the insiders were a part of this ‘hands off’ mystical unicorn club, but for whatever reason, it wasn’t uncommon for something like a random fortune-cookie generator to be built into your FX Trading system if you knew where to look

Without exaggeration, it was literally the Wild West and I was pretty much Doc Holliday. 

But to bring us back to the main topic, what was the equivalent back then of modern-day incident management? Given the backdrop I’ve painted above, you may expect that the diagnosis and resolution of IT issues at the time was chaotic. To be honest, it was far worse than you could ever imagine. In many cases, the issues were caused by the technology team themselves. With relatively unfettered access to all the different environments (dev, test, live), it wasn’t uncommon for someone to accidentally take an action in live when they were meant to be working on the test environment. Also, not that dissimilar to today, system changes almost always led to some form of unintended consequence.

If I think about the method of managing issues back then, in many ways my Wild West analogy holds - sorting out an issue meant gathering a posse, giving them a loose set of directions, and having them go at it. It was messy, it was eventually effective, and you could almost always expect some form of collateral damage.

But if you look at the lifecycle of an incident in a bit more detail, the stark contrast to today begins to take shape:

  • Issue identification - almost always a phone call or email from the impacted users. Why? As mentioned, there wasn’t a great deal of third-party software in the wild. Monitoring agents and alerting solutions weren’t as ubiquitous as they are today. The more switched-on technology teams may have written a few shell scripts to auto-check the odd thing and send an email, but this wasn’t the norm.
  • Mobilization – given the limitations of technology of the time, it was rare to have globally connected solutions. Most systems were localized to the users and a technology team would be on-site or nearby. It was fairly easy to pull a cross-functional team around a workstation (the posse) to try to figure out what was going on, but often this meant having people down tools on whatever they were working on to focus on solving the issue. This ‘chopping and changing’ of context was particularly impactful on people that wrote code.
  • Diagnosis/Recovery – unstructured and chaotic. Given most of the teams had privileged access by default, there was often a mad rush from those involved to be the one who solved the issue. Communication and coordination between people were often non-existent, leading to support people tripping over each other and duplicating effort.
  • Resolution/Post-Incident Review – getting the system back to operational was typically the only goal in mind. Understanding the true root cause was rarely an objective unless the same issue had occurred repeatedly. Not surprisingly, this often led to a “Just Reboot It” mentality to make the problem go away more quickly.

To say that in the early days technology incident management was a chaotic dark art is not an overstatement. It’s not surprising really that the natural evolution from this point was toward a more structured and formulaic approach which I’ll explore in the next part of this series - The evolution of Incident Management part 2: the advent of ITIL. Nonetheless, I look back on those early days fondly. Maybe it was a sense of being an insider in a special club, perhaps it was the jingling of my spurs as I walked through the halls, or maybe it’s the memory of the mainframe guy sitting in the dark corner of the room in his sandals and socks yelling over to our posse, “this is why we should just stick with the mainframe”

Jim

 

Jim Korchak is a twenty-five-year technology veteran whose career has centered on automation, application service management, and the software delivery lifecycle. He has extensive experience in the financial services sector both in the UK and the USA where he has held a number of senior positions helping to shape technology strategy and execution. 

He is currently a Principal Consultant for Resilient Technology Specialists where he advises companies on how to drive improvement in the application lifecycle, from requirements management to production operations by leveraging best practices and intelligent automation.

A recognized thought leader in Application Management and Technology, Jim has been quoted by Forrester Research, has held advisory positions for several technology start-ups, and has spoken publicly as a lecturer for Hult International Business School as well as at a number of industry events as keynote speaker. 

Share this post:
Join the Runbook Revolution
Work orchestration /
5 ways runbooks will transform how you manage complex work
Remote Working /
Meet the team: Nick Kyrkewood, Director of Engineering
Remote Working /
Meet the team: Alex Duell, VP People