Gartner® report: 9 Principles for Improving Cloud Resilience
Download
No items found.
Blog
March 22, 2024

What is an IT disaster recovery plan (DRP) and how can you protect your business?

In today’s fast-changing digital landscape, businesses face cyber threats and IT challenges that can disrupt operations and cause significant losses. An IT disaster recovery plan (DRP) is not just a precaution; it is crucial for any organization’s risk management strategy. This article explains what an IT disaster recovery plan (DRP) is, goes into detail about what components a DRP should contain, offers some examples of why having an effective DRP is so important, and introduces automated runbooks as a solution to build better DRPs.

What is an IT disaster recovery plan (DRP)?

An IT DRP documents how an organization needs to respond in the event of an IT disaster such as a network failure, cloud outage, or cyber attack. It outlines the steps that teams need to take in order to get systems back up and running and includes recovery time objectives (RTOs) defining the maximum amount of time this should take. DRPs can take various forms, such as documents, spreadsheets, or playbooks but a codified and executable runbook format will yield the best results.

DRPs will vary based on the organization and the systems or applications being restored based on several factors including company size and criticality of application. Different scenarios may also require different responses, so, for example, you may need to build a specific cloud disaster recovery plan for your cloud-based applications which will be different from restoring on-premises applications.

So what does having a DRP mean in practice and what does it contain?

What is included in an IT disaster recovery plan?

In IT, a DRP includes all the steps to recover your applications and services after an outage. Below are the things you need to create a disaster recovery plan:

Infrastructure and applications ranked by criticality

Your IT disaster recovery plan (DRP) should include information about the infrastructure or applications to be recovered and their level of criticality. There are typically four tiers of technology criticality:

  • Tier 1: Mission-critical services, such as an online banking system, that will have direct impacts on customers and potentially the wider economy if they go down, can cause immediate reputational and financial losses to the organization, and carry the highest regulatory penalties for non-compliance. These services require continuous availability with zero tolerance for downtime.
  • Tier 2: Business-critical services, such as accounting software, that will cause significant impact on customer services and operations and/or prevent the collection of revenue. These services also require continuous availability but downtime is not as catastrophic as in tier 1. In the case of a disaster, RTOs will be very short.
  • Tier 3: Business operational services, such as those associated with production and procurement, that are non-critical but if not available will reduce efficiency and increase the cost of operations. The RTOs for these services, though still short, will have more leeway than tier 2.
  • Tier 4: Administrative services that are used by internal users only such as communications and calendar management. Downtime here will reduce individual performance and productivity within the organization but won’t directly or significantly impact customers or revenue. In the case of a large-scale disaster or outage, these will be the last priority for recovery after the other three tiers have been recovered.

Organizing your DRPs based on the criticality of the applications or services being recovered ensures that, if a large-scale disaster does occur, the most efficient path to recovery is taken and critical services are prioritized.

Service-oriented disaster recovery plans

Service-oriented recovery plans are individual plans detailing how to recover each application or service. These should describe how to recover each function and the steps required to bring them back online, including both the technical and business steps that will need to be taken. Technical steps will include the specifics of your organization’s backup procedures and how to recover to the last known good backup, while business steps may involve an action plan for internal and external communications and alerting the regulator. The recovery plan for each service and all the steps within it should each be assigned to the relevant person or role.

IT disaster recovery plan templates can help you ensure consistency across your plans and create and modify new ones quickly, to reduce the manual effort of service-level DRP creation.

Automation and integrations

Your IT disaster recovery processes will likely need to use data from various IT service management or business continuity management tools such as ServiceNow, Remedy, or JIRA. Integrating these with your recovery plans will ensure that the correct data is being used and reduce human error and manual work for the people executing the recovery.

Integrations with the communications tools you’re already using, such as Slack, Microsoft Teams, and email will also ensure better collaboration across the organization and make internal communications about the recovery process more efficient and transparent.

Recovery time objectives

As mentioned above, each service or application will have a set RTO depending on its level of criticality. The time that the recovery is supposed to take should be included in the plan itself. Recovery time actuals (RTAs) should also be measurable, so you can compare with RTOs and find areas in need of improvement.

Tracking and audit

Many major organizations have regulatory reporting requirements when it comes to disaster recovery. The urgent nature of a disaster recovery execution can make it difficult to attain the right information after the fact to report to regulators. A recovery execution platform that includes built in tracking that creates an indelible audit trail of how your DRP was executed in reality will make this process much simpler. This is also true of testing events, as an accurate record of successful testing will aid accurate and timely regulatory reporting - Cutover’s automated audit trail facilitates a 60% reduction in audit preparation time.

Testing the IT disaster recovery plan

Testing your DRPs in a practical way is essential to ensure their success in a real-world scenario, and proving you can recover to your management team and the industry regulators. The best way to ensure this is to practice your disaster recoveries as closely to a real scenario as you can, with the same tools and teams, so you know you’re fully prepared for a real scenario when minimal planning time is available.

The importance of an IT DRP for businesses

Without effective DRPs, your organization is open to a number of risks, including data loss, reduced productivity, increased costs, reputational damage, and loss of customers. The format that DRPs are created in is crucial to their success as well. Our survey of 300 IT decision makers found that 24% of the surveyed organizations were using manual DRPs and 59% still using spreadsheets. Executable and automated DRPs provide more functionality for regular testing and faster, more successful recovery execution. 

Slow recovery is extremely costly. Pingdom estimated these average costs of downtime per industry:

  • Finance: $5 million per hour
  • Enterprise: $1 million-$5 million per hour
  • Auto: $3 million per hour
  • Energy: $2.48 million per hour
  • Telecommunications: $2 million per hour
  • IT: $145,000-$450,000 per hour
  • Manufacturing: $260,000 per hour

Having effective DRPs that are well tested, enable fast orchestration, and can be actioned immediately when an outage is detected can significantly reduce outage time and therefore cost.

The correct IT disaster recovery plan strategy is essential

It’s not enough just to have static, spreadsheet- or document-based DRPs in place. Having the right software for your disaster recovery plan templates, testing and execution is essential to success. This is why Cutover’s automated runbooks can help you increase your level of recovery maturity and respond more quickly to outages. 

Cutover’s Collaborative Automation SaaS platform enables enterprises to simplify complexity, streamline work, and increase visibility. Cutover’s automated runbooks connect teams, technology, and systems, increasing efficiency and reducing risk in IT disaster and cyber recovery, cloud migration, release management, and technology implementation. Cutover is trusted by world-leading institutions, including the three largest US banks and three of the world’s five largest investment banks.

Ready to build your automated DRPs?

This article has covered what a DRP is, what it should contain, and the advantages of having effective DRPs in place. Book a demo to see how Cutover can help you build an automated IT DRP plan.

Chloe Lovatt
IT Disaster Recovery
Latest blog posts