The move to IT resilience vs disaster recovery

This article explains what IT resilience and IT disaster recovery are, their uses, the differences between them, and why, despite an increased focus on IT resilience, recovery is still essential for major enterprises. It also includes information on how Cutover’s IT disaster recovery solution can help alleviate common challenges.

What is IT resilience?

IT resilience is the ability of an organization to maintain acceptable levels of service when there is a disruption - this could be caused by a network failure, cyber attack, simple human error, or any number of other threats to your IT infrastructure and services.

Disaster recovery is the process of getting services back online after a failure or outage. Unlike resilience, which encompasses preventing outages and building systems that can withstand certain threats, recovery specifically deals with getting systems back online after an outage has already occurred. This can be a largely manual process that requires a high level of human orchestration. It has long been the way most banks deal with major system outages. Recently there has been a significant shift towards focusing on resilience over recovery.

There is a contemporary shift towards resilience over recovery - prevention is the best cure after all - but unfortunately, it’s not a good idea to neglect disaster recovery. Read our white paper to find out why you still need disaster recovery.

The resilience approach focuses on protecting core services and preventing issues before they occur. This can involve identifying the risks and vulnerabilities associated with the services that support critical business processes and performing detailed risk assessments of the impact of an outage. Measures are then taken to remove these risks. This could involve removing single points of failure or adding the ability to automatically scale services e.g. load-balancing servers.

'Always-on' banking

The shift towards IT resilience over recovery has become increasingly important due to changes in customer demands and habits. The digital banking customer is ‘always on’ and expects uninterrupted access to their bank at any time. This means that outages are more likely to affect them. Due to the increased rate of change also caused by digital customer demands, there is now a greater risk of change-related outages than before. This is the critical focus of the DevOps movement, ensuring that automated test and release processes remain robust, particularly around regression. The threshold for acceptable levels of service is higher and most banks are updating their strategy to deal with this.

The importance of disaster recovery

Business risk assessments determine the acceptable levels of service that the business needs to maintain. These underpin the provision of robust resilience and recovery tools and processes. Knowing the system, its weaknesses, and both the resilience and recovery requirements are essential for understanding how to make the system truly resilient. This is also an opportunity to identify which processes are the most critical to the business and should take priority.

The continuous improvement of IT resilience processes

The resilience processes also need to be constantly reviewed so they can be improved and updated to increase resilience over time. There may be new threats to the system, weaknesses as technology changes, or increasing demands due to the launch of new products and services. Monitoring the system and collecting data can help to continuously assess possible weaknesses and make it as resilient as possible.

Complementary forces: IT resilience and disaster recovery

Resilience is the main focus for most organizations at the moment. It presents the most desirable option — avoiding outages rather than having to fix them in the event of an incident. While prevention is better than cure, no system can be 100% resilient. There will always be a need for disaster recovery events that need high levels of human orchestration.

When disaster recovery events have to be invoked, Cutover can be used to test specific pre-prepared service recovery plans for disasters, such as a data center going down. When a disaster like this does occur the recovery can run efficiently just like a planned event. The platform can also be used to store more general disaster recovery plans that can be invoked in the event of a real disaster and updated to fit the specific scenario. Cutover facilitates the orchestration of teams and technology involved in these events and provides real-time status visualization.

IT resilience and disaster recovery challenges

IT resilience and disaster recovery are both essential to an organization’s overall resilience but implementing both can have their challenges. So what are the key IT resilience and disaster recovery challenges?

IT resilience challenges

Knowing where to focus efforts: Threats to your IT infrastructure are always developing. From changing architectures to new cyber threats, it can be difficult to know exactly where to focus resilience testing efforts to ensure preparedness for the most common and potentially damaging scenarios.
Varying architectures: Most enterprises’ architectures are becoming more complex and interconnected. The resilience of the cloud is often perceived to be better than on-premises architectures but, like any other form of architecture, is not foolproof, and a hybrid mix of on-premises and cloud architectures adds complexity to resilience efforts.
Lack of awareness: Resilience is not just the responsibility of the IT resilience team. Everyone in the organization should be aware of common resilience threats and how to guard against them. For example, major organizations have suffered due to individuals doing something as seemingly minor as clicking on a link in a phishing email.

IT disaster recovery challenges

Disasters are inevitable - no matter how advanced your IT resilience, recovery is still essential and you need to have robust IT disaster recovery procedures in place. This comes with several challenges, including:

Complex IT environments: A real challenge of disaster recovery is being able to meet the varying requirements of different architectures, software, and applications with the resources you have. You’ll need an expert team that is well-versed in your specific infrastructure and frequent recovery testing to promote the best outcomes. Whether your architecture is hosted on-premises or in the cloud, disaster recovery solutions need to be tailored to your organization’s specific needs.
Keeping plans up to date: It’s important to revisit your DR strategy periodically to research and identify new risks, acquire the relevant resources to combat those threats, and address any areas that require strengthening. Regularly performing IT disaster recovery exercise scenarios will help you to pinpoint these areas for improvement.
Ensuring regulatory compliance: Executing a disaster recovery in the heat of an outage or attack is challenging enough without the added pressure of ensuring regulatory compliance. Having an automated, indelible audit trail helps to remove the burden of reporting on recovery procedures to regulators post-event.
Downtime and recovery costs: When a disaster occurs, recovery costs are inevitable - not just during the outage itself but further down the line as well. Implementing a failover plan in your disaster recovery strategy will allow critical operations to continue and mitigate lost productivity and revenue. The difference between failover vs disaster recovery is that failover is used for smaller, everyday machine failures whereas disaster recovery addresses large-scale infrastructural damage.

The future of IT resilience and disaster recovery

While many organizations have in the past focused on measures such as cyber security and resilience, having a good IT disaster recovery process is essential for protecting the business and its customers. It’s not a case of choosing between disaster recovery vs resilience but understanding the role that each plays in keeping your organization and customers safe.

Why use Cutover for disaster recovery?

Find out more about how Cutover’s automated disaster recovery solution can be used for disaster recovery activities by reading our IT disaster recovery use case.

Chloe Lovatt

No items found.

The move towards IT resilience isn’t the end of disaster recovery