The Cutover platform and automated runbooks orchestrate your teams and technologies to standardize and automate technology operations processes. Example processes include cyber recovery, IT disaster recovery, cloud migration, and release management where we help our clients increase efficiency and reduce risk
Our most common use case is cyber and IT disaster recovery. This post sets out a summary of how we support that use case and answers the top three questions our clients ask about it.
How Cutover supports cyber and IT disaster recovery
For our typical customers, recovery plans are stored in Cutover as runbook templates, associated with each service or application. Depending on how the organization operates, recovery steps for shared services and infrastructure can exist within those plans or as separate recovery plans.
There are six key areas of the Cutover platform:
- A managed repository of recovery runbooks, linked to configuration management database (CMDB) services, to enable rapid mobilization spanning hundreds or thousands of applications.
- Automated runbooks to orchestrate the sequence of tasks and communications across human and machine activities in real time.
- Real-time reporting and analytics during an event to enable efficient control, visibility and stakeholder engagement at scale.
- API and integrations to automate repetitive, manual tasks with automation and integration to any application across your recovery technology stack.
- Post-execution analytics to drive continuous improvement, ensuring lessons learned are incorporated into updated recovery plans.
- Audit trail and auto-generated compliance logs and reports to support audits and regulatory reporting.
Cutover for cyber and IT disaster recovery compared with process, workflow and automation tooling
Many process or workflow tools allow for a specialist group to design, build and then publish a workflow that then executes repeatedly at volume with little deviation from the expected flow. The process may be complex but it is typically static or at least predictable with defined decision or branching points.
Cutover runbooks are different because they enable:
- Creation at scale, for example, application owners creating thousands of their own specific recovery plans.
- Frequent changes, for example, a recovery plan may change with code/infrastructure changes.
- The ability to respond to external factors in execution that often drive volatility creating the need to act on the set of tasks to quickly edit or update them automatically based on conditions during execution
- Sequencing a mix of human and automated steps, providing a path to greater end-to-end automation
- Fast response to time-critical scenarios such as recovering a tier one service within one hour and operating in units of minutes and seconds.
- Command and control oversight with needed telemetry to drive escalations and quick decision making.
Cutover and ITSM tools for cyber and IT disaster recovery
IT Service Management (ITSM) tools, such as ServiceNow, are important in the recovery process as they are the ‘system of record’ for CMDB and governance (problem, incident and change). Cutover integrates with ITSM tools to serve as the ‘system of execution’ for the entire recovery process. A Cutover runbook is associated with a record in the CMDB and that runbook’s execution will be linked to an incident or change record.
Cutover integrates with ITSM tools in the following ways:
- Templated recovery plan runbooks are associated with the corresponding CMDB record. This allows for real-time filtering of runbooks based on metadata held in the CMDB.
- The creation, progression and closure of change request tickets from within a Cutover runbook.
- The ability to complete fields in ITSM tools that are updated in Cutover in the execution of, for example, a recovery process where the golden source is ultimately your ITSM platform - removing the need to double key.
- The ability to hold status data in Cutover during the execution of a recovery across your estate where potentially the ITSM may be unavailable.
The quality of cyber and IT disaster recovery plans
Having a repository of recovery plans is only valuable if they are high quality and can be relied upon. Cutover supports this in the following ways:
- Adherence to a set of quality standards within the runbook in terms of content - e.g. common stages, milestones, tasks, and team assignments.
- Use of Cutover’s template workflow for approval of application recovery templates with an audit trail of sign-offs.
- Automated triggering of a review cycle based on the change request closing on the underlying CMDB service.
- Active Quality Assurance of the set of recovery plans (using Cutover’s API querying the data set) to understand the state of plans such as the number of approved/draft plans, the number of rehearsals per period per plan, the tasks that are statistically odd in terms of duration in plan execution, plans with defined teams and accountability, and the number of times the CMDB data has been updated.
- Using the test schedule (planned and unplanned) to identify errors in the plan and associated configuration and updating the parent template as an output.
- Active live check as part of the plan to validate for configuration drift using an integration.
- Automatically updating the CMDB record as part of the runbook to ensure data quality.
Sequencing IT and cyber disaster recovery plans in an event
Recovery in the event of an outage or cyber attack typically requires services to be recovered in a certain order. Examples include recovering core infrastructure services and the control plane before individual applications and then recovering services in an order based on their criticality.
Cutover supports this in a number of ways:
- The order and sequencing of recovery can be stored in Cutover’s templated master recovery plans that relate to a scenario. These store the overall flow and sequence for a given scenario and link out to child ‘linked runbooks’ for the application level plans.
- The above structure can be stored elsewhere and Cutover’s API used to construct the same parent > child structure and sequence. If auto-discovery tools exist then the master recovery plan can be automatically updated using the Cutover API.
- Each component plan can be dynamic and contain a live check to validate the status with automated tests before the recovery commences. This allows for a level of redundancy in plans without the risk of triggering a recovery in error.
Customer success story: Top-tier US bank
As part of a large data center failover test, Cutover runbooks were used to fail over around 1,200 applications. The people involved in the event used Cutover to check status on demand and/or to complete tasks assigned to them alongside the integrated automated activities in the runbook.
- All recovery plans were synced with CMDB data to accurately sequence the correct applications
- The CMDB data also gave stakeholders visibility into their particular responsibilities, including business line, tier criticality, etc.
- Runbooks all went through the QA process to ensure they were good quality and up to date using a mix of manual controls and automated reporting.
Using Cutover enabled the bank to carry out the datacenter failover more quickly and efficiently with greater visibility and control.
Cutover is widely adopted in many of the world’s largest financial institutions to enable them to fail over thousands of applications/services to meet recovery time objectives, with governance, control, and visibility. In addition, the Cutover platform enables our customers to meet regulatory oversight with immutable event audit trails. Cutover is used as the system of execution for both testing and live recovery events.