TechRepublic : A ZDNet Tech Community

IT Security

Host: Chad Perrin
Contact

Prevent recurring problems with root cause analysis

In this series, we’ll step through an easy root cause analysis process that requires no special training — just a little effort and a lot of common sense.

——————————————————————————————————————-

Documented, management supported, incident response processes — processes for which response teams are well trained — won’t necessarily achieve the ultimate objective of preventing recurrence unless root causes are identified. In this series, we’ll step through an easy root cause analysis process that requires no special training — just a little effort and a lot of common sense.

In Part 1, we look at constructing a simple root cause diagram for later analysis.

Why worry about root cause?

Many organizations, and even well-trained response teams, fail to prevent recurrence of unwanted events because they treat the symptoms instead of underlying causes. For example, if payroll is late because a switch failed, many organizations would simply look at how to deal better with switch failure. But switch failure may be the proximate cause, not the root cause. For our purposes, I define proximate cause as that activity which occurred, spatially or temporally, immediately prior to the incident.

The root cause is often a failed control, process, or a gap in staff skill sets that caused an earlier condition or event. This earlier event set off a series of causes and effects leading to the proximate causes. In Figure 1, for example, the root cause condition or activity occurred at Event 2, well before the proximate cause. The best way to prevent recurrence is to change what happened at Event 2. In other words, making changes to processes or conditions early in the chain of events is usually better than managing proximate causes.

In our switch failure example, the organization might discover that the underlying problem is a missing or broken change management process. Fixing this root cause will not only prevent the switch failure recurrence. It would also help prevent other unrelated failures as well.

Figure 1: Root cause conceptual diagram

Figure 1: Root cause conceptual diagram

Building a simple root cause diagram

There are many ways to build a root cause diagram. The most popular approach pushed by most root cause trainers is the fishbone or Ishakawa diagram. A simple fishbone is shown in Figure 2, with a more complicated analysis shown in Figure 3 (childrensmercy.com).

Figure 2: Simple fishbone diagram

Figure 2: Simple fishbone diagram

Figure 3: Complex fishbone diagram

Figure 3: Complex fishbone diagram

However, most of us technical types are not prepared for nor inclined to spend time building complex decision/analysis frameworks. We need something more straightforward, something that quickly gets to root cause so we can get to the next user or system issue which arose while we worked this one. The “8D Five Why’s” is my answer to this challenge.

8D problem solving consists of eight steps that lead from an incident to managed resolution, including root cause analysis and recurrence prevention. Step 4 (D4) is root cause analysis, with a very simple approach. Ask why five times and you should be able to identify the fundamental issue, or issues, leading to the primary event. Although many 8D practitioners don’t actually graph their answers, I prefer to do so. As we’ll discuss in Part 2, a picture often makes it easier to “see” the problem.

Let’s step through a real-world example, shown in Figure 4. In this incident, a vendor supplied desktop computer which controlled a critical production system was replaced. The production system immediately failed, causing interruption to a critical process.

Figure 4: 8D five why root cause diagram

Figure 4: 8D five why root cause diagram

The root cause analysis team was formed by following the company’s after action review (AAR) process, ensuring complete and objective recording of events. To begin, the analysis facilitator asked the first why. Why did the incident happen? Two proximate causes were identified. First, the replacement system was not configured properly. Second, the response to user problem reports was not effective. The team agreed these two causes should be treated separately. They appear to result from different cause and effect chains. For our example, we’ll focus only on what caused bad system configuration.

The facilitator continued by asking the second why. Why was the system configured improperly? This continued for each answer through three more iterations. The assumption is that this is sufficient granularity to identify root cause. But root cause sometimes is not apparent after answering the fifth why. When that happens, the team must step through the process again, looking for activities or conditions which might have been omitted on the first pass.

Like any AAR process, root cause analysis must be free from finger-pointing. Every participant must understand his or her participation will not result in disciplinary action or peer ridicule, and management must back up these assertions.

We’ll continue this process in Part 2 by identifying one or more root causes and how to decide what to do about them.

Tom OlzakTom Olzak is an IT professional with over 25 years experience. He holds CISSP and MCSE certifications and an MBA. Currently, he is Director of Information Security for HCR Manor Care. Read his full bio and profile.

Print/View all Posts Comments on this blog

What do you think?

White Papers, Webcasts, and Downloads

Recent Entries

TR on Twitter

Archives

TechRepublic Blogs



500 Things Every Technology Professional Needs to Know
Did you know Microsoft's RegClean does not work with XP but you can use shareware to clean your registry? Did you know most wireless access points don't have encryption enabled by default? Did you know there are 500 tidbits of information contained in TechRepublic's 500 Things Every Technology Professional Needs to Know that will help you become a successful IT professional.
Buy Now
Quick Reference: Linux Commands
Reduce stress and speed up resolutions with the easiest command references right at your fingertips. You'll receive a PDF file covering Linux, packed with the most common commands you'll need and use daily.
Buy Now

Popular Sanity Saver Videos