Preventing 3 common Disaster Recovery scenarios

The success of your disaster recovery strategy is often judged based on whether it is actually implemented or not. Although no prevention method is 100% fool proof, risk avoidance and taking proactive measures for preparedness are essential elements of the disaster recovery process.

On the flip side, despite all the measures we take to avoid a disaster, we must assume that a disaster will happen. Having this mind-set will help shape our decisions when it comes to planning for IT disaster recovery scenarios.

Here we look at prevention and recovery methods for three common such scenarios.

Scenario 1: The OS becomes corrupt

This scenario considers a situation where the operating system of one of your servers becomes corrupt, but the underlying data is still there. This might be caused by a failed Windows update, malware, or a non-graceful shutdown.

How do I prevent this?

Always test Windows updates or new software prior to deployment. This should ideally be done on a test system containing the same image as the original server. Anti-Virus applications should also be kept up-to-date on every node in your network.

Be sure to restrict access to the server room to prevent malicious or accidental modifications to the physical aspects of the servers. Have a UPS in place to cater for power outages and handle graceful shutdowns.

How do I recover from this?

Use a standby image. The idea here is to restore from a snapshot of your system prior to when the OS became corrupt.
Perform a bare metal restore. Once the OS is back up and running, use a backup application that supports delta recovery to quickly bring the underlying data back online.

Scenario 2: The hard drive dies

This scenario considers a situation where the hard drive of one of your servers suddenly fails. This might be caused by a broken RAID set, overheating, or a mechanical fault.

How do I prevent this?

The occasional hardware fault is accepted as being part and parcel of the manufacturing of modern IT equipment. However, there are things we can do to limit the impact of a failed drive. These include having the correct RAID configuration in place and/or a replication mechanism with auto-failover. Keeping IT equipment at a cool temperature (e.g. using air conditioning) is also vital. If it operates in a sub-optimal physical environment, the risk of overheating increases.

How do I recover from this?

Use a backup solution that has Continuous Recovery and preconfigure a virtual machine (VM) to be on standby. When the production machine goes down all you need to do is press play on the preconfigured VM.
Perform a bare-metal restore from local storage to new hardware.
Recover from the cloud using your MSP’s infrastructure, Azure, or AWS.
Restore from a mountable VHD. If you previously created a standby image of your computer, you can quickly open it in your virtual environment using Hyper-V or VirtualBox.

Scenario 3: The roof caves in

This scenario considers a situation where physical damage occurs to the building or room where your data resides. This is usually caused by a natural (environmental) disaster; hurricanes, tornadoes, floods, or excess snowfall are just some examples.

How do I prevent this?

This is probably the toughest scenario to prevent. ‘Acts of God’ (as defined by insurance companies) are unpredictable and can happen without any forewarning. The only sure-fire way to lessen the impact of a physical disaster is to have a replicated copy of the data and IT environment in a completely separate geographical location (e.g. in the cloud, another regional office, at your MSP).

How do I recover from this?

Recover from the cloud using your MSP’s infrastructure, Azure, or AWS.
Use a backup solution that has Continuous Recovery and preconfigure a virtual machine (VM) in a remote location to be on standby. When the production machine(s) goes down all you need to do is press play on the preconfigured VM.

Note: To keep your business going, your Business Continuity Plan (BCP) should include your company’s strategy for how people can work from remote locations (e.g. their home).

Conclusion

I’ve always believed in the old adage of “prevention is better than cure”. This holds true for disaster recovery scenarios as well. Being proactive is a better approach than being reactive. However, when we are dealing with unpredictable situations beyond our control, we also need to ensure we have the people, processes and methods in place to react as quickly as possible and help bring things back to normal.

With this in mind, it would be wise to have a system in place that allows for speedy and reliable recovery of data. If you do this and run routine restore tests, you will be best placed to minimize the impact of any IT disaster scenario that you are faced with.

By Andrew Tabona

Event

May 7 2024, 13:00 - 15:30 EDT (19:00 - 21:30 CEST)

N-central General Automation - Session 2

Automation can help you improve efficiency, take on new customers more easily, and keep more of what you earn.

Event

May 7 2024, 08:00 - 10:30 EDT (14:00 - 16:30 CEST)

N-central General Automation - Session 1

Automation can help you improve efficiency, take on new customers more easily, and keep more of what you earn.

Blog

22nd April, 2024

N-able VoluNteer Spotlight: Yakau Shtykau

In our latest VoluNteer Spotlight, we look at how a member of our Warsaw Collaboration Hub talks about her work with Schronisko Pegasus.

Event

June 12 2024, 12:00 - 13:00 AEST (22:00 - 23:00 EDT)

Cove’s ‘Master of Disaster Recovery’ Class

The worst time to learn about disaster recovery is during a disaster. So, we’ve developed a free, interactive course to help you get prepared.

Event

May 28 2024, 13:00 - 14:00 EDT (19:00 - 20:00 CEST)

Cove’s ‘Master of Disaster Recovery’ Class

The worst time to learn about disaster recovery is during a disaster. So, we’ve developed a free, interactive course to help you get prepared.

Event

May 31 2024, 13:00 - 14:30 EDT (19:00 - 20:30 CEST)

Accelerate Your Growth with N-able MDR - Session 2

It can take days, weeks or even months and lots of effort to properly setup a SIEM, write correlation and detection rules, feed that into a security orchestration, automation and response (SOAR) system, build a 24x7x365 security operations center (SOC) that can investigate events, and maintain a system of record to help with compliance and audits. It may seem impossible but these mission critical capabilities can be implemented quickly with N‑able’s MDR. Join Lewis Pope as he delves into how what were once chokepoints for MSPs can become accelerators for growing your portfolio of cyber security services to your current clients and move up-market to industries that require adherence to established frameworks. Who should attend? MSP Business Owners, Service Managers, or anyone else looking to mature their cyber-security practices beyond just the basics.

Cove now supports cloud disaster recovery in Azure

The MSP Horizons Report – 2024

Winner - Best in Class, MSP Platforms

Preventing 3 common Disaster Recovery scenarios

Scenario 1: The OS becomes corrupt

Scenario 2: The hard drive dies

Scenario 3: The roof caves in

Conclusion

Want to stay up to date?