Tales from the edge: The black screen of death

Ian Trump

“Why has my screen gone black?”

Recognize this question? Sadly, it’s one IT admins receive far too often. When it’s a workstation maybe it’s not so much of a big deal… but when it’s a server, chances are your day is about to get really interesting. And by interesting I mean long.

Unfortunately, unless fla172335551mes or audible grinding or clicking noises can be heard from the drive array, it can be really difficult to understand how a server goes from working perfectly one minute to being a pile of junk the next.

There are, of course, times when it’s a miraculous, or easy fix. Simply using the ubiquitous "power it off/power it on" routine can work wonders. Or it could simply be due to a USB stick or CD-ROM being left in a drive, and “tricking” the server into trying to boot with no OS. Simply removing the media puts you back in business in no time. Other times it’s the array controller, or BIOS that, post OS patches and updates, needs a firmware update before you’re back in business.

As an example, after one shutdown followed by a move of the server across the room, the system hung on boot just after the RAID card BIOS loaded. Yes, we moved it carefully in case you were wondering; very carefully. Both drives in the OS RAID 1 partition had flipped to “Foreign”. No big deal… only the customer’s Exchange and Accounting Databases were located on the server as well as 10 years worth of files.

Fortunately, in this case a phone call to Dell, tech support let us into a secret: pressing the shift key twice yielded a ‘secret screen’. This allowed us to import the “Foreign” drive configuration and “hey presto!”, we were back in business.

Other times it’s a problem somewhere between the hardware and the layers of software found in a modern server. When it’s looking like something more complex, the pressure really starts to mount.

So how do you reduce that pressure?

  1. Prepare for Armageddon
    Thinking and preparing for server death in advance is much more preferable than having to do this on a customer site, when it has happened. It’s a great idea to focus on monitoring regular backup success and knowing where the media and support DVDs are located. Knowing you have the customer data safely backed up reduces the anxiety considerably.
  2. Have faith
    Modern technology is, well modern – you are not the first person on the planet to endure a painful time in what feels like the end of a server. Manufacture tech support should be high on the list along with Google searches. Having said that, here’s a cautionary tail: not everything on Google is going to be helpful, it may even be harmful. You need to be a little more specific than Googling “Dead Dell Server”.
  3. Try to rule out the hardware early
    The phone call to the manufacturer leads to (hopefully) some system testing, which occasionally yields (in my experience) a “failed hardware module.” Overnight, air-express, next business day delivery get’s everything back online.
  4. Software is the Devil
    More often than not it’s the software, usually the OS or a Driver, has had a conniption (not a technical word, but the best description I could think of), or has been corrupted. Last Known Good Configuration boot, Windows Automated System Repair, careful Windows re-install or copying the drivers from the support CD may bring you back to life; although these steps are not for the faint of heart.
  5. Never Again
    If all your steps fail and you’re faced with executing a bare metal restore or (ick) a hand re-build of the server, you need to seriously consider a Physical to Virtual (P2V) project to move the business services to be as hardware independent as possible. The relative ease of moving hypervisors or VMs between hardware platforms and snapshot technology for backups makes for a quick recovery of services.

Ultimately, anyone who has been in IT for a few years has faced the daunting task of a server down scenario and realistically if you’re relatively new to IT you can look forward to your baptism by fire.

How hot the fire burns depends on the quality of your backup and how much you have thought about what to do when you see the Black Screen of Death.

–––––––––––––––––––––––––––––––––––––––––––––––

Want to know more about security? Then check out the videos serious by our security lead, Ian Trump…