The lost art of reviewing event logs and other stories…

David Ianetta

When I teach troubleshooting to IT professionals, I always use this Sherlock Holmes quote to begin the session:

“It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”

Elementary-my-dear-Data-star-trek-the-next-generation-21005302-420-558And no, we are not talking about Data from Star Trek Next Generation here. Although I’m sure he could figure out any computer problem. And while we are on the subject…the reboot of the movies series is awesome! Nice to see we are getting away from the whole, “Captain Kirk, there be wales here!” franchise. No disrespect, but did anyone else have a problem with that? Khan was cool, the rest I could do without. 

Anyway, stay with me, don’t go downloading the movies now and taking up company bandwidth (a subject we will cover in a future instalment) let’s stay on track.

Data.

You need data to figure out what is wrong, or to prevent something from going wrong in the first place. Information is your best friend and you need to know where to get it. Collecting data and troubleshooting; these are the life blood of the System Admin. They are two sides of the same coin.

So what information sources do we have as a System Admin? When we encounter an issue, where do we begin our investigation?

Number one on my list would be monitoring services along with having instant access to control them.

Let’s get back to basics here, so what are Windows Services?

A service is like an application that runs in the background. Obviously, this means you don’t have a window that the end user can interface with. If you did the screen would look like that old Windows 3.1 Screen saver… the one with all the different color windows flying at you in random patterns?

Anyone? Anyone? Bueller? Sigh…

So, why is it important for you to monitor these services?

Well, for one, security software such as Antivirus and Firewalls run as a service. A good sign a system has been infected or compromised would be finding that one of these is shut off or disabled. Or perhaps you have mission-critical software running as a service, having a good pulse on that service would be very useful.

Some services are not needed depending on the role the machine plays. Since every service running takes resources to do so (see process monitoring) some can be disabled to speed up your system. Knowing services and their functions can lead to increasing the efficiency and security of your system.

So, monitoring services and having access to them is a very good thing!

Another great information source would be event logs…

When I used to work in application support, I was amazed at the number of times I would request event logs, and then have to explain to the system admin what I was talking about or how to gather them. I feel like reviewing these logs has become a lost art.

These logs contain a wealth of information for the System Admin to both diagnosis problems and head-off systems failures or security breaches before they happen.

There are several different kinds of logs, but if you are not familiar with reading them, the top three to get started with would be Application Logs, System Logs and Security Logs.

  • Application (program) events
    When an application locks up or just flat stops working, it will more than likely record the details to this event log. Events are classified as error, warning, or information, depending on the severity of the event. At the very least, you will have a time stamp. When dealing with application support, this will be a great source of information to provide them with.
  • Security-related events
    These events are called audits and are described as successful or failed depending on the event, such as whether a user trying to log on to Windows was successful. So a sign of a hacker or former employee attempting to gain access to your system would be many failed login audits.
  • System events
    System events are logged by the operating system and are also classified as error, warning, or information. Warnings especially is your system’s way of giving you a big heads up that you have an issue coming. Much like that noise in the back of your car you continue to ignore. Ignoring warnings can lead to bad things happening.

Investigate these logs on a regular basis. Read them like you would the Sunday paper (do people still read the Sunday paper?). Check them especially when you have a reoccurring problem.

You will find them to contain a rich source of error numbers and messages that can be Googled, often leading to a quick resolution of the problem. You fix the problem, then you are the hero. Isn’t that our goal?

Monitoring your system services and event logs can seem a daunting task at first. However, the more you learn how to use these valuable resources, the sooner you will come to a place where you will wonder how you ever did your job without them.

Next up, I’m going to talk about Patch Management.