What is RTO and RPO in disaster recovery?
In disaster recovery, these numbers determine how long your organization experiences downtime, and how much data could be lost. In this important context, what is a “good” recovery point objective or recovery time objective? The answer is not straightforward. A good standard RPO/RTO depends on the type of disaster as well as the maximum tolerable period of disruption.
First, it’s important to define the potential set of disasters against which you would like to protect your organization. Some disasters that require data recovery and backup include:
- Loss of data: This may be as simple as someone deleting a folder, or as complex as a case of ransomware or an infected database.
- Loss of an application: This refers to when changes to security, an update, or system configurations negatively impact services.
- Loss of a system: This includes when hardware fails, or, if you have a virtual server, when the operating system crashes.
- Loss of business location: In this instance, a disaster might include an electrical outage, fire, flooding, or even a chemical spill outside the building. The business facilities require recovery to an alternate location.
- Loss of operations: This is a complete stoppage of business operations—i.e., the worst-case scenario.
Each of these potential scenarios illustrates how important it is to consider your data, systems, applications, and physical location in your disaster-recovery strategy. These factors play a role in the RTO and RPO values. Once you’ve defined the particular disaster scenarios you’re hoping to protect against, you can prioritize the scenarios your customer is most interested in preventing, then implement data-protection features that match their RTO and RPO requirements.
A third figure factors into your RTO/RPO strategy: the maximum tolerable period of disruption (MTPD). This represents how long your customer is able to crisis-manage a system outage, and varies for every application and service you manage. Factors that play into this figure include tangible costs like employee wages, lost sales, weakened stock prices, and recovery expenses, as well as intangibles like reputational risk. It’s important to discuss the MTPD with your customer, and then apply that number to your RTO/RPO reduction strategy.
For example, for a given application, your customer’s maximum period of toleration might be two hours. That means your recovery time objective must equal less than two hours, and your data must be backed up less than every two hours to meet the ideal RPO. This gives you the guideline you need to create a physical and virtual system that meets the needs of your customer in the event of a disaster.
If your customer isn’t sure what their maximum tolerable period of disruption is, there are a few key questions that can help them set better expectations. Ask these questions to understand a customer’s RTO and RPO on a more granular level.
- How often does this type of data change?
- What does each minute of downtime for this service cost, either in lost revenue or lost productivity?
- Could you transact business with pencil and paper, if necessary, while this service is down?
- If you are experiencing downtime, how does it impact your customers?
Going through these questions with your customer can help you work backward to what you need to back up and how this data needs to be backed up to minimize risk in a disaster scenario.
What is RTO and RPO in SQL Server?
SQL Server is a Microsoft-specific relational database management system that stores and retrieves data as requested by other applications. The server allows users to set up automated log backups to be restored from a standby server. With this log shipping, users can recover a fairly recent database copy—depending on the RTO and RPO of that process. Those RTO and RPO requirements are set by users, depending on their needs, budget, and any technological network limitations.
However, SQL Server RTO and RPO are not necessarily straightforward. In many cases, the process isn’t as fast as a client may imagine. They may have an ideal RPO in mind, but slow network speeds or an incorrectly configured backup can throttle this process. In addition, restoring a log backup in this way can involve transferring large amounts of data, and this process can easily exceed the determined acceptable RTO.