Reducing the time to test thousands of disaster recovery plans
An American investment bank needed to implement a comprehensive resilience solution.
Like other financial services firms, they had an obligation to demonstrate their ability to deal with a catastrophic scenario in a timely manner. They needed the capability to pull together thousands of standardized technical recovery plans (TRPs) into test scenarios in minutes rather than weeks. The operational resilience testing function involved thousands of people globally and up to 2,000 applications that needed to be tested every year.
The existing home-grown system for this process was inadequate and did not provide the level of planning, visibility, communication, orchestration, or observability that was needed to ensure success. It did not provide the required level of resilience assurance they needed to meet regulatory requirements. They needed to find a better way to store and execute the TRPs for their 2,000 services.
A single data recovery test for improved disaster recovery
Cutover provides a comprehensive operational resilience platform that hosts thousands of TRPs that can be configured into various test scenarios in minutes.
Now that the bank is using Cutover, a single data center recovery (DCR) test that can encompass more than 300 TRPs can be prepared and managed independently and then merged into Cutover for orchestration and enterprise observability. Having TRPs on Cutover also allows the bank to standardize them using a template so that there is minimal work to finalize DCR test runbooks. Cutover provides users with the status information and updates they need during the test itself to ensure success without manual effort, such as being able to visualize the critical path, which is highlighted in Cutover.
To increase efficiency and resilience, the bank also created templated TRPs in Cutover. This made it quicker and easier for users to find, review, edit, and execute TRPs and build new ones based on templates, and to collect data on executed TRPs.
Other ways Cutover improved the process:
- Auto-calculation of recovery time objective through structured technical recovery plans
- Standardized and observable event execution led to fewer issues and better decision making
- Providing the ability to benchmark performance in data center tests against previous runs
- Integrated with existing apps to provide observability across the entire process
- Better compliance as there was demonstrable evidence of testing and the associated timings
- Robust auditability and reporting resulted in meeting all audit requirements
Better informed, faster disaster recovery
Using Cutover, the bank was able to reduce event planning time by 70% and easily facilitated and recorded all 143,000 completed tasks across its 10,000 users.
The bank is now in the process of decommissioning an existing system in favor of Cutover. The team running the event is now better informed during data center tests, helping to improve decision making. They were also better able to collaborate with the auditor and meet audit points, as a record of all activity was automatically provided by Cutover.
Due to the success of past and current resilience activities, Cutover is also being used to support building power downs and to orchestrate some of the infrastructure parts of resilience testing, including the application testing part of DCR tests.