In the age of DevOps, Agile and continuous integration, major outages should be a thing of the past. If everything is automated and continually delivering, how can anything go catastrophically wrong? The truth is that most major organizations are nowhere near this DevOps utopia and still carry out major upgrades and migrations which heavily rely on people - people who don’t always have the best tools available to do their job.
The latest example of a major outage having far-reaching consequences occurred about six weeks ago when TSB migrated their entire operations from Lloyd’s to Sabadell. Over a month on, the bank is still under scrutiny from the FCA and fraud investigators and has lost an estimated 12,500 customers. Repercussions of this magnitude will likely start to worry banks and other enterprises that feel they are “too big to fail”.
There has been a spate of other outages recently. London Stock Exchange experienced a one-hour delay to their opening auction due to a software issue. Tesco Bank customers were locked out of their online and mobile accounts for four hours in the middle of the day. A technical issue at Visa left people unable to make payments and resulted in some having to abandon their shopping trolleys at supermarkets due to card machines not working.
How is This Still Happening?
It’s hard to believe that banks can still be subject to such huge IT failures in 2018. It’s starting to feel like in this age of innovation big banking IT failures are becoming more, not less, frequent. With banking activities increasingly moving from bricks-and-mortar to online, these are having an even greater impact on customers. So why do big banks continue to fail to execute critical change events where the stakes are so high?
New Apps, Old Infrastructure
Architectures remain complex and diverse. Although most banks are developing new technology-enabled services such as mobile banking apps, they remain critically dependent on underlying legacy services that are often decades old. Millions of people need constant access to their bank so striking the balance between agile delivery and legacy stability remains a big challenge. Similarly, dealing with a mix of proprietary and vendor-supplied software adds additional complexity and the whole process can only move as fast as the slowest component critical for end-to-end service delivery.
Mergers and Acquisitions
The merging, splitting and acquiring of banks leads to a lot of back-office complications when it comes to dealing with customer records and adds complexity and reduces visibility and understanding of key systems. Mergers and acquisitions often mean migrations between different platforms have to take place while adding complexity to the process.
Execution Complexity
Poor migration execution is another factor that often contributes to major event failure. Banks spend millions of pounds a year on implementations and migrations but are using methods as old as their infrastructure. No wonder things often go wrong! With little visibility of the entire system and event, finding out what the issue is can be as much of a challenge as fixing it.
Supporting The People
The causes of major outages vary, from infrastructure issues to cyber attacks to simple human error. All major banks currently have to deal with challenges like complex interconnected systems, shrinking budgets and strict regulations, making big changes like this extremely difficult. Ultimately, these critical events are still about people, who need to be held accountable and properly supported with the right technology to make these big events safe and successful.
In 100% of financial services organizations we have encountered, some teams are still having to orchestrate changes via spreadsheets. There are plans for increasing automation but these are not here yet and these organizations need help right now. Cutover is often compared to systems that don’t yet exist and can be used right now to facilitate the transition to increased automation.