Backup & Disaster Recovery
We all know patching is important right? Patches fix bugs, security holes and add additional features. Keeping systems secure should be your number one concern but you have to strike the right balance between that and stability, which can be a difficult one to pull off. Generally speaking, patches will install fine and without any issues because vendors do a lot of work in the background to ensure their patches pass high-quality control tests.
Sometimes however things can go horribly wrong.
Despite vendors' best efforts patches can cause problems
There have been a number of problematic patches over the past few years. Back in 2014 several patches caused Windows 8.1 to reboot with the “Blue Screen of Death”. In late 2015 an update caused Window Server 2012 R2 to not boot up after displaying INACCESSIBLE_BOOT_DEVICE. This can be a nightmare situation for IT solution providers (whether you’re an IT Admin or a Managed Service Provider – or MSP), especially those who setup a standard patch schedule across all of their machines. More recently the Windows 10 anniversary update caused thousands of webcams to freeze up and stop working. This kind of patch is usually less urgent to deal with and the fix is relatively straight forward, but it’s not always the case.
What to do when a patch goes wrong?
When a patch goes wrong it can be something as small as a feature stops working correctly. However, sometimes a server will fail to boot, causing massive disruption to an organisation. If you’re a small business, then you’re unlikely to have a full business continuity system in place. Even if you have backups, you won’t want to use them until you have performed some comprehensive troubleshooting since it can be more work to restore a system from backup than it takes to resolve the issue in the first place.
Your response to an issue after installing a patch will vary depending on its severity. If a server has installed a patch, rebooted and won’t start up correctly, it’s important not to panic! At first you may not realise that a patch has caused the issue. This is where experience comes into play. Here are a few tips to help in these situations:
- Don’t panic
Panicking gets you nowhere, managing patches that go horribly wrong should not be new to you.
- Start keeping good notes
A good administrator/technician always keeps comprehensive notes – even under pressure. The more information you keep, the easier it will be to manage the situation and justify your time and actions to others later.
- Assess the situation
Are multiple devices experiencing the issue? Are the issues affecting lots of users or is it just a small handful of desktops?
- Prioritise issues to be worked on
For MSPs, not all clients pay for the same level of service, so start prioritising and concentrate on your premium clients and work your way down.
This is a vital step, when employees are sitting at their desks unable to work due to a server failure they will wonder what’s going on and assume nothing is happening. Assign someone to keep users updated and let them know engineers are on the case.
- Initiate business continuity plans
Some companies will have backup servers ready to be spun up at a moment’s notice. If this is the case and the issue has not been resolved after basic troubleshooting, then action these plans.
- Remote access
If you’ve setup ILO or Drac access to servers then you can begin remote troubleshooting straight away. Either working on a single issue or spreading multiple cases across your team. This is where implementing a standard server build helps. Having remote access to devices even when the operating system is not working will save you time and money in the long run.
Fall back options
When a patch goes horribly wrong you have to act quickly so the first troubleshooting steps will usually involve one or more of the following:
- Remove the updates
One of your first troubleshooting steps will usually be to remove the patches either manually or via your management tools.
- System restore
When a large update is installed a system restore point is created giving you the option to manually restore the system if something goes wrong.
If all else fails, then it’s time to fire up your backup tools and restore the system or the complete operating system.
Your patching options
So what are your patching options to try and reduce the chances of a patch blowing up? Everyone has their own opinions and experiences, but generally a business will adopt one or a combination of the following policies:
- Never patch
Never patching is not really an option, so unless you have a reckless disregard for security or the system is never going to be on the internet or never have new software installed then this really isn’t an option.
- Manually install at a convenient time
Manually installing patches when its quiet, and you can afford some downtime, can be done. This is only viable if you manage a very small handful of devices and you’re the onsite admin for the network. Even then, you leave yourself open to security issues, and it can take up too much of your time. Many small businesses adopt the manual patching method but these are usually very small businesses who are self-managed and the patching is usually done by the business owner or one of the employees – when they remember to do it.
- Automated patching of critical patches and manually install optional patches
Another patching method is to install critical patches every evening or once a week and install optional ones either manually or every other week. This strikes a good balance between stability and keeping your system up to date.
- Testing in a virtual environment
Applying patches in a test environment before rolling them out to your devices is possible. Unless you are restoring a backup from the previous evening onto exactly the same hardware then you can’t guarantee what happens in a virtual environment will happen in production. Doing this can take up a lot time and unless you are in a large organisation you may not feel you have the resources available to do this, however, as an MSP, installing in a test environment is exactly what is required to identify harmful patches before deploying to a wider audience.
- Automated patching of all available updates weekly
One valid option is to delay your patching by a few days. Microsoft releases new patches at set times – this used to be new patches on the second Tuesday of every month but is due to change for a range of platforms in October 2016. This means you can choose to play it safe and let others test the patches first, and wait for the fallout in the forums if it goes wrong. This also gives Microsoft time to revoke patches or release further fixes to resolve known issues. Automating the installation of patches on a Friday night gives you the weekend to check if things have gone wrong. As long as you have server monitoring setup then you can be notified of any major outages caused by a bad patch.
- Use your tools to automate the process
Having the right tools in place helps a lot and can save you time. Centralised patch management is vital for managing any number of workstations and servers. Being able to deploy patches easily on a schedule and even have the ability to remove them if needed. Automation is key to successful patch management and tools such as Windows Server Update Services (WSUS) or using SolarWinds MSP Patch Management are great choices.
Installing updates and patches is vital to ensure your networks are protected against security threats, are as stable as possible and that new features are available to users. Issues will always crop up and it’s how you prepare and respond to these issues that shows how professional you are.
Define your procedures and ensure everyone knows them… and remember during a crisis, communication and keeping calm are vital.