The challenge: an American investment bank needed to implement a comprehensive resilience solution
Like other financial services firms, they had an obligation to demonstrate their ability to deal with a catastrophic scenario in a timely manner. They needed the capability to pull together thousands of standardized technical recovery plans (TRPs) into test scenarios in minutes rather than weeks. The operational resilience testing function involved thousands of people globally and up to 2000 applications that needed to be tested every year.
The existing home-grown system for this process was inadequate and did not provide the level of planning, visibility, communication, orchestration or observability that was needed to ensure success. It did not provide the required level of resilience assurance they needed to meet regulatory requirements. They needed to find a better way to store and execute the TRPs for their 2,000 services.
Cutover provides a comprehensive operational resilience platform that hosts thousands of TRPs that can be configured into various test scenarios in minutes.
Now that the bank is using Cutover, a single data center recovery (DCR) test that can encompass more than 300 TRPs can be prepared and managed independently and then merged into Cutover for orchestration and enterprise observability. Having TRPs on Cutover also allows the bank to standardize them using a template so that there is minimal work to finalize DCR test runbooks. Cutover provides users with the status information and updates they need during the test itself to ensure success without manual effort, such as being able to visualize the critical path, which is highlighted in Cutover.
To increase efficiency and resilience, the bank also created templated TRPs in Cutover. This made it quicker and easier for users to find, review, edit, and execute TRPs and build new ones based on templates, and to collect data on executed TRPs.
Other ways Cutover improved the process:
- Better compliance as there was demonstrable evidence of testing and the associated timings
- Auto-calculation of recovery time objective through structured technical recovery plans
- Providing the ability to benchmark performance in data center tests against previous runs
- Integrated with existing apps to provide observability across the entire process
The bank is now in the process of decommissioning an existing system in favor of Cutover.
The team running the event is now better informed during data center tests, helping to improve decision making. They were also better able to collaborate with the auditor and meet audit points, as a record of all activity was automatically provided by Cutover.
Due to the success of past and current resilience activities, Cutover is also being used to support building power downs and to orchestrate some of the infrastructure parts of resilience testing, including the application testing part of DCR tests.