What's your RTO/RPO and how do you calculate it?
In one of my previous blogs, I introduced you to the concept of both your recovery time objective (RTO) and your recovery point objective (RPO). They are two important concepts that define how long you have to perform a recovery and just how much lost data is acceptable. For critical tier 1 applications, the general rule of thumb is 15 minutes each. With all the cool technology available today – like continuous replication to an offsite virtual environment – it’s no longer the stuff of dreams; it’s a reality.
But no two clients of yours have the same level of expectation, nor the same available service budget. So, while 15 minutes sounds really great, it just may not be realistic.
So, how do you calculate what the right RTO and RPO should be?
I’d like to start by adding in a new acronym being thrown around – the MTPD. That’s the maximum tolerable period of disruption. It basically represents how long until your customer is hopping mad! Now, like the RTO and RPO, this won’t be the same for every application and service you manage for them. In fact, it’s better for you from a services standpoint if you don’t.
(Why? Because having different SLAs for various applications at a customer allows you to charge different rates, yielding more services revenue!)
What is MTPD and where does it fit?
So, I’d start by defining the MTPD on a per application basis with your customer. Let’s say for a given application, it’s 1 hour. OK, well, then you know the RTO and RPO can’t be more than an hour. What we’ve done by using the MTPD is establish the upper limit, giving you some guidelines around where the values need to be to keep your customer’s business running.
Next, you’ve got to figure out the physical limitations around the specific data set(s) that will need to be recovered tied with the current backup methods being used. Here’s what I mean: If you have some massive SQL database – like a couple of TBs in size – and you’re currently doing a file level backup of that database (because your customer has not yet invested in you hosting an offsite virtual environment), you’ve got to do the math on just how long it will take to recover that data (the RTO) and how far back in time the data will be once recovered (the RPO).
Now, bring that MTPD back into play – are you good? If you’re over, you need to have a conversation with the customer about changing their backup (and recovery) method so that you can come up with an RTO/RPO that meets the customer’s needs.
This process should be repeated for every data set, application and system important to your customer.
In reality, there is no set RTO/RPO you should be meeting, other than the one in the customer’s mind. That’s why elevating the conversation to understanding what their MTPD is, can help you better calculate an RTO and RPO that either meets their needs, or build a new recovery strategy (read: bring in more revenue for a better quality service) that does.