Today I christen a new post format: The Walkalong! Similar to a walkthrough except I’m writing as I go through the steps shown. I hope this lends some excitement because I have no idea what path we’ll travel as we work the problem. Let’s go!
Yikes… Not what anyone likes to see unexpectedly… My Exchange server in our satellite site (EXSat) had a bad time recently and I don’t know why!
Let’s go through the steps I took to investigate and troubleshoot:
Step One – Event Viewer
Event Viewer often holds the answer to both “What the hell happened?” and “When the hell did this happen?”. In this case I’m looking for the kernel power event – specifically when it happened. So I navigate to [Event Viewer -> Windows Logs -> System] in the left console tree of Event Viewer, and sort the events by Level.
Ignoring (for this post) the flood of Cluster Name errors (I’m writing that post as I write this one)
Immediately we see that the OS has had two unexpected power cycles recently – the one I’m interested in investigating is the top event, whose date is just last evening at 7:45:12 PM. Now I look for ‘interesting’ events preceding this time that could explain what cause the power cycle. With the Kernel-Power event highlighted, sort by ‘Date and Time’; one click sorts earlier to later.
Unfortunately, none of the prior events show anything out of the ordinary. Some are normal start-up events (the boot following the power cycle happened at 7:45:07) and the earlier ones don’t seem to have caused a crash. One piece of information is extractable: the gap between the Service Control Manager event at 7:31:53 and the Kernel-Boot event at 7:45:07 should create a time boundary in which the offending crash occurred. So let’s see if we can find it.
Our System events container doesn’t seem to contain the crash cause, so let’s check some different event views. Filtering Application logs to show all events between 7:15 and 7:50 only shows that multiple Exchange services failed to reach the EXSat server. Not very helpful beyond confirming that EXSat was indeed offline.
The Security event viewer holds zero events between 7:15 and 7:50PM. Skunked.
Inside Applications and Services Logs there are multitudes of subfolders containing minute event information about OS and Exchange subsystems. I had a poke around but saw nothing indicative of a crash, halt, critical error, or the like. Event Viewer has come up empty.
Step Two – Google it!
Summarized as slim pickin’s. Most results suggested checking a crashdump file (none were generated on EXSat) and examining Event Viewer for powerup/down events and their neighbors.
Step Three – Accept the unknown and be thankful this isn’t production
You tell me, HAL.