Disaster recovery testing is often limited to testing an organization’s ability to fail workloads over to the cloud or to a remote datacenter. And while these tests are undeniably important, on their own they may be inadequate. Disasters come in many different forms, and performing a basic failover test probably isn’t going to accurately convey the experience of recovering from a real disaster.
Disaster recovery testing must be about more than just failing over or recovering key systems. Testing should be holistic and be oriented toward coping with all aspects of the disaster.
Ideally, all of the organization’s key players should collectively formulate a set of policies, procedures, and checklists that can be used in times of disaster. Disaster recovery tests should be designed to verify the accuracy, effectiveness, and completeness of these checklist items. In fact, disaster recovery testing should be something of a disaster simulation.
Imagine for a moment that the datacenter is destroyed by a fire. The first step in recovering from such a disaster should be to retrieve the disaster recovery checklist and begin working through the pre-established procedures. If this checklist is stored in someone’s desk, then the disaster recovery simulation is over before it has even begun, because the checklist would have presumably been destroyed along with the rest of the datacenter. At least one hardcopy of the checklist should be stored in a safe, offsite location. A digital copy should also be stored in the cloud.
One of the initial steps in dealing with a large-scale disaster is that of alerting key stakeholders. Hence, a disaster recovery test should include contacting each of these people by using the contact data that is listed in the checklist. This is the perfect opportunity to verify that the contact information is accurate and up-to-date.
A good disaster recovery checklist should identify the individual tasks that need to be completed and the person (plus an alternative person) who is responsible for completing each task. It should also describe, in detail, each step that is required for performing each specific task. Remember that the disaster recovery plan needs to be written in such a way as to be comprehensive and legally defensible—in the event that the plan is called into question by regulators or auditors.
The degree to which disaster recovery testing can be performed will vary based on a number of different factors. Realistically, it may prove to be impossible to test everything due to costs or disruptions that would be caused by the tests. In those situations, the goal should be to test the plan to the greatest extent possible and to brainstorm and work through various “what if” scenarios for the parts of the plan that cannot be tested.
Ultimately, the testing process should help you verify the contact information for key employees and validate the individual disaster recovery procedures, while also taking into account the availability of key resources during times of disaster (power, internet connectivity, functional hardware, etc.). The testing procedures should also help you identify any deficiencies in the documentation, such as missing credentials, configuration data, or tech support contacts.
Get the latest MSP tips, tricks, and ideas sent to your inbox each week.