TechRepublic : A ZDNet Tech Community

Tech of all Trades

Host: Tim Malone
Contact

The calls starting coming in as I was on my way to the office. Incoming e-mail from the outside was no longer getting through. Internal e-mail and site-to-site e-mail was OK. Mobile users were receiving ActiveSync and BlackBerry messages and my Treo with Goodlink was fine. The problem was somewhere on our SMTP gateway. An e-mail outage is serious business. It is immediately escalated to a full-blown emergency.

Our e-mail processing configuration

We run our own Exchange Server. We are still on 2003 Enterprise edition which has proven to be a reliable platform. We process our incoming e-mail through two filters before sending it to the Exchange server. It first goes through Symantec Mail Security for SMTP. We check there for viruses and filter out all the bogus bounce messages or Non-Delivery Receipts from all the spammers that have hijacked our email addresses.

We next process the e-mail through our Commtouch anti-spam filter. Part of the engine and queues are on our gateway server. We send the e-mail out for filtering to the Commtouch regional processing centers. The spam is quarantined in case we have false positives. Only the good stuff gets through to the Exchange server. There it goes through yet another virus scan before it is ever delivered to the user mailboxes for retrieval.

The failure and resolution

E-mail processing is obviously a complicated process. There are a lot of components that can possibly fail. When I arrived at the office I immediately began looking for clues as to what part of the SMTP service was broken. It couldn’t have been more obvious. When I logged on to the gateway server, several messages popped up indicating that the SMS filter-hub service had terminated unexpectedly at least fifteen times.

A manual restart produced the same results. It would run for a minute and then fail. I suspected that a piece of spam had defeated the engine. A look at the queues showed some malformed e-mail in the queue. It did no good to clear the queue and restart. Something was very wrong with the engine. A check of the Symantec web site reveals that a new patch had been released. We quickly download, install it and restart the service.

The human side of the emergency

Success! The whole problem analysis and resolution process took about 45 minutes. The majority of the time was spent in downloading and installing the patch. It took forever to stop the filter-hub service. All the while I was trying to do my job, I kept my junior associate at the door fending off the anxious employees. He would also occasionally go out to the various departments and provide an update to keep them informed.

I don’t know how critical e-mail delivery is in your organization but in our business, it is the life-line of just about everything we do. So much depends on our e-mail system functioning properly. We could function without our accounting system for a day but it is possible that somebody could lose their job if the e-mail system were to be out for more than a few hours. People tend to get real nasty when they can’t get e-mail.

Communicate during the emergency

I’m a professional. Years of experience in technology problem solving has allowed me to handle the most stressful of circumstances like this with focus and action that gets results. That’s why they pay me the big bucks. OK, I’m bragging. The point of this post is to illustrate something that I hope you didn’t miss. I made sure that another member of my team was actively communicating to management every step of the way.

Most business owners and employees don’t understand technology. In fact, many of them fear it. I know, that’s hard to believe but its true. When things don’t work they tend to panic. Perhaps you may remember the feeling from the first time you had a system failure and didn’t know what to do. If you keep a running dialog going with those who are affected, you will find that the emergency is much less stressful for everyone.

Tim is currently employed at the Burbank airport as the IT Manager of a jet management company. Prior to joining his current employer, Tim worked in a variety of management and individual contributor positions at small to mid-szie manufacturing and publishing companies. He began his career as a programmer but currenly focuses on technology mangement in the enterprise and small business. Tim is a graduate of Mt SAC - Walnut CA, earning his Associate degree in Computer Programming. He is a Microsoft Certified Systems Engineer (MCSE) and maintains currency in his field through recent Server 2003 classes at Moorpark College. He specializes in supporting Microsoft Technology, especially Small Business Servers. Tim was born in Covina, CA and now resides in Camarillo, CA. He is married with 1 son. Tim is very active in his local community and spent two years in Central America. Besides reading, research and writing, in his spare time Tim enjoys Technology, Current Events and Health Research, blogging about each.

Print/View all Posts Comments on this blog

Communicating during a tech emergency tmalonemcse@... | 04/02/08
Sometimes the higher-ups exacerbate the problem no matter how well you do alex.a@... | 04/02/08
Do not get it? The Listed 'G MAN' | 04/02/08
We just started using the filter-hub last week tmalonemcse@... | 04/02/08
Email is everyones life WiseITOne | 04/03/08
Re: Email is everyones life HoagieBP | 04/04/08
Reducing email dependence billbohlen@... | 04/10/08
What IM platform are you using? tmalonemcse@... | 04/11/08
What everyone needs to remember mike@... | 04/03/08
So true... wdperry@... | 04/07/08
We Had A Mail Server Go Down for 3 Weeks--Always Have a Contingency Plan Arsynic | 04/07/08
Recruiting the right skills tim uk | 04/07/08
Skills Needed billbohlen@... | 04/10/08
Thanks! tim uk | 04/11/08
Excellent summary! tmalonemcse@... | 04/11/08
We provide high expectations zloeber@... | 04/07/08
RE: When the e-mail system fails jubernal@... | 04/07/08
If all it was was a malformed header; then delete it? Photogenic Memory | 04/07/08
Linux vs Microsoft email solutions tmalonemcse@... | 04/08/08
Your Exhange server sounds great! Photogenic Memory | 04/08/08
RE: When the e-mail system fails Meesha | 04/08/08
You Seem to Be An Open Source Zealot Arsynic | 04/08/08
No Zealot Meesha | 04/09/08
Exchange server is stable - problem was SMTP filter tmalonemcse@... | 04/08/08
Not in Sales Meesha | 04/09/08
very well said and ver true sir! vidyadhish_d@... | 04/26/08
RE: When the e-mail system fails ambroxiet@... | 04/30/08

What do you think?

White Papers, Webcasts, and Downloads

Recent Entries

TR on Twitter

Top Rated

    Archives

    TechRepublic Blogs



    500 Things Every Technology Professional Needs to Know
    Did you know Microsoft's RegClean does not work with XP but you can use shareware to clean your registry? Did you know most wireless access points don't have encryption enabled by default? Did you know there are 500 tidbits of information contained in TechRepublic's 500 Things Every Technology Professional Needs to Know that will help you become a successful IT professional.
    Buy Now
    IT Help Desk Survival Guide, Third Edition
    TechRepublic's IT Help Desk Survival Guide, Third Edition provides tools and recommendations to help you better manage help desk services, improve end-user support, troubleshoot frustrating hardware issues, identify quick fixes to vexing Windows problems, and help users make the most of Microsoft Office 2003.
    Buy Now

    SmartPlanet

    Click Here