Everyone suffers from downtime at some point. Even hyperscale vendors walk away red-faced occasionally. At some point, your primary email service provider will suffer– and you’ll feel the fallout. Office 365 has experienced multiple full or partial outages in the past few months, sometimes lasting for hours.
The first step is to set your tolerance for service disruption. Disaster recovery experts typically measure this using two metrics: the recovery time objective (RTO); and the recovery point objective (RPO). The first is your ideal recovery time in the event of an outage, while the second sets the bar for how much data you are prepared to lose. An RPO of two hours means that you could lose up to the last two hours of email in the event of an outage.
Given the mission-critical nature of yan organizations’ email, most companies will have an extremely low tolerance. Ideally, you want to avoid losing any mail or disrupting your users’ services at all. This is doable, with the help of a complementary email service provider.
By using email services from two different companies, you avoid relying on any one vendor’s infrastructure. This can be valuable if a physical disaster occurs at a single data centre, or if, as in the case of Office 365, a service provider frequently experiences outages due to human error and flawed processes.
When a primary email service fails, it isn’t as simple as switching it out for another like a faulty engine part. For one thing, the change isn’t instant. You have to change your MX records, which are the addresses for your email servers online. These have a ‘time to live’ setting, which means that it will take a while for the entire Internet to learn of your new email server’s address if you change it.
A more sensible option is to use an upstream service designed to complement rather than replace your primary email service provider. Pointing your MX records to the upstream service gives you an alternative email service that sees all of your mail first. It can then act as a buffer and an additional level of protection for your primary email service, queueing your email if your primary provider suffers an outage or slows to a crawl.
A complementary service like this will ideally have an online interface so that employees can still read and send emails. When the primary email service comes back online, the primary provider will then flush its queue, delivering all of the emails to the customer’s regular inbox.
What if the upstream service goes offline? That’s typically less of a risk than it is with a large, complex cloud service like Azure, on which Office 365 is based. If customers are worried, though, they can take advantage of the ability to set a list of multiple MX records. Particularly risk-averse admins could keep their primary mail provider’s email servers as a lower priority part of that list. That would cause incoming emails to route to the primary email service provider should the upstream provider become unavailable.
You’ll want to ensure that your complementary email service provider is rock solid, especially if your primary service provider is suffering from service slowdowns and outages. Check its service status pages to see how well it has managed and if it has a history of service slowdowns in the past. Talking to other customers is an excellent way to determine the company’s reliability, too.
Any email service provider should also give you a clear service level agreement, outlining how much downtime they allow themselves to have for each time period, and the financial compensation due in the event that they exceed this target.
By following a few sensible steps and investing a little more on a monthly basis, customers can protect themselves while still taking advantage of their primary service provider’s productivity features. Having a backup service designed intelligently into your mailflow is a good way to get functionality and peace of mind.