Cyber attacks can cause far-reaching business disruptions and recovering from them is increasingly becoming very complex and unpredictable. It’s difficult to know whether or not your traditional failover site has also been affected by malware or the overall blast radius of the attack which all lead to delayed response and recovery.
There are numerous public disclosures of enterprises’ prolonged recovery timeframes following a cyber attack. This leads to a lack of confidence in an enterprise’s ability to recover from a cyber attack within tolerance, from the board to the business down to the IT staff. It is well known that cyber recoveries are assumed to take a lot longer than IT DR recoveries and exceed stated recovery time objectives (RTOs) due to a multi-level, complex response to recover across the control plane, applications, services, and data.
What makes cyber recovery so tough?
In a previous blog post I outlined the key pain points around IT disaster recovery and what you can put in place to combat them. For cyber recovery, all of those still stand - but there’s even more to think about when you’re recovering from a cyber attack:
- The nature of the outage: The nature of typical IT disasters is passive - some change or external event occurs and causes problems. In a cyber attack there is an active entity that wants to cause harm leading to increased volatility in the recovery approach as it is far more likely that related and parallel attacks can occur at the same time.
- The recovery approach: During an IT disaster recovery, you normally fail over applications and data to a secondary virtual or physical location. For a cyber attack, you cannot safely fail over in the same way because your secondary site may be affected so you need to enact a bare metal recovery.
- The RTOs: In an IT DR these are pre-defined and measured based on recovery plan testing. Cyber recoveries bring about a new level of complexity and unpredictability as to when the malware was introduced and how widespread the attack is in your operations. These unknowns make RTOs uncertain in most cases.
- The recovery point objective (RPO): After an IT disaster, you will aim to recover data from the most recent backup. For a cyber recovery, the recovery point objective depends on the availability of the last known good backups and forensics on when the attack occurred. Therefore the longer it takes you to recover the more data you’re likely to lose.
- Return to business as usual: How long this will take is well known in an IT DR based on RTOs, but when there is a cyber attack it is heavily dependent on the nature of the attack, the subsequent blast radius and the integrity of the data. Therefore, getting back to business as usual could take weeks or even months.
How to manage cyber recovery complexity with Cutover
Once you have successfully detected and contained the cyber attack as outlined by NIST and other cyber agencies, here are the key steps to managing cyber recovery and strengthening your cyber recovery posture:
1. Cyber recovery orchestration and automation
Our customers use Cutover to orchestrate cyber recoveries once the immediate threat has been neutralized. Depending on the nature of the attack it might be necessary to first recover the control plane - ensuring network-based operations, authentication, and access management systems are all restored as quickly as possible when impacted by the attack.
The next step involves recovering all of the critical or important business services and data to clean, bare metal hardware either in the cloud or on premises. Individual runbooks can be used to recover hundreds if not thousands of applications and services from the last known good sources to ensure there are no continuing malware threats. Using Cutover’s integrations to infrastructure as code tools such as Ansible you can easily orchestrate the sequence of tasks in the process to provision the hardware, application and data. Integrity checks can also be built into runbooks to verify that everything is running as expected before being placed back into the production network.
Cutover provides a single platform to recover from outages and cyber incidents such as ransomware by bringing together all automated and manual tasks via dynamic automated runbooks and real-time dashboards. It enables our customers to modernize their cyber recovery strategies and restore applications and data faster.
Because the Cutover platform also integrates with the other tools required in the process it provides a single source of recovery execution and visibility - this can include your IT service management platform, configuration management database, infrastructure as code tools such as Ansible to kick off scripts, and communications tools such as Microsoft Teams, Slack and Zoom.
2. Cyber recovery post-event analysis and improvement
When your recovery is finished it’s essential to analyze its successes and failures so you can continue to improve your recovery process and overall execution times. Cutover’s immutable audit trail and analytics reporting automatically record the details of every action taken so you can clearly see when there were delays or issues and how the plan can be improved. These auto-generated audit logs can also help you meet cyber regulatory requirements by making reporting to the regulator simple.
3. Cyber recovery planning, testing and preparation
The process of cyber recovery is a cycle. Post-event analysis should create a feedback loop to help you improve and exercise your recovery runbooks to ensure you’re constantly prepared for the next threat. Cutover can help you simplify your testing and create repeatable, templated runbooks from well-defined recovery plans so you can exercise, test and confidently recover from a real cyber attack.