Disaster recovery is the process of getting services back online after a failure or outage. These could be caused by anything from a natural disaster to a cyber attack. This can be a largely manual process that requires a high level of human orchestration. It has long been the way most banks deal with major system outages. Recently there has been a significant shift towards focusing on resilience rather than recovery.
The resilience approach focuses on protecting core services and preventing issues before they occur. This can involve identifying the risks and vulnerabilities associated with the services that support critical business processes and performing detailed risk assessments of the impact of an outage. Measures are then taken to remove these risks. This could involve removing single points of failure or adding the ability to automatically scale services e.g. load-balancing servers.
The shift towards resilience rather than recovery has become increasingly important due to changes in customer demands and habits. The digital banking customer is ‘always on’ and expects uninterrupted access to their bank at any time. This means that outages are more likely to affect them. Due to the increased rate of change also caused by digital customer demands, there is now a greater risk of change-related outages than before. This is the critical focus of the DevOps movement, ensuring that automated test and release processes remain robust, particularly around regression. The threshold for acceptable levels of service is higher and most banks are updating their strategy to deal with this.
Know the System
Business risk assessments determine the acceptable levels of service that the business needs to maintain. These underpin the provision of robust resilience and recovery tools and processes. Knowing the system, its weaknesses and both the resilience and recovery requirements are essential for understanding how to make the system truly resilient. This is also an opportunity to identify which processes are the most critical to the business and should take priority.
The resilience processes also need to be constantly reviewed so they can be improved and updated to increase resilience over time. There may be new threats to the system, weaknesses as technology changes or increasing demands due to the launch of new products and services. Monitoring the system and collecting data can help to continuously assess possible weaknesses and make it as resilient as possible.
We still need Disaster Recovery
Resilience is the main focus for most organizations at the moment. It presents the most desirable option — avoiding outages rather than having to fix them in the event of an incident. While prevention is better than cure no system can be 100% resilient. There will always be a need for disaster recovery events that need high levels of human orchestration.
When disaster recovery events have to be invoked, Cutover can be used to test specific pre-prepared service recovery plans for disasters such a data centre going down. When a disaster like this does occur the recovery can run efficientlyjust like a planned event. The tool can also be used to store more general disaster recovery plans that can be invoked in the event of a real disaster and updated to fit the specific scenario. Cutover facilitates the human orchestration involved in these events and provides real-time status visualization.
While focusing on resilience is the smart thing for banks to do at the moment, having a good backup recovery process is still essential for protecting the business and its customers.
Find out more about how Cutover can be used for resilience and disaster recovery activities by reading our resilience use case: