Content

Strictly no frills

Are you up, SharePoint? The 3-minute vitality check

Thursday 23 July 2009 - Filed under Tech Notes

Whenever a significant change is made to SharePoint content or configurations, it’s a jolly good idea to check to see if everything is still hunky-dory. In the life of a SharePoint guy/gal, any one of the following may constitute a “significant” change:

  • Applying a hotfix, Cumulative Update, or Service Pack
  • Creating a new SharePoint application, site collection, or content database
  • Deploying a third-party or custom-developed feature/solution/add-in
  • Dumping a ginormous chunk of new content into SharePoint
  • Extending the SharePoint farm by adding new servers
  • Making and recovering from a deadly booboo
  • Giving a fully-functioning server a reboot (which is scarier than it sounds)

SharePoint event IDs Here’s a real-life example of what can happen: I apply the hot-off-the-press June 2009 Cumulative Update to a number of SharePoint farms under my care. After a couple of hours, things look fine and all the sites are up and functioning. OK, job well done, I think. The next morning, I decide to check back just because I’m paranoid. While the x86 farms are in good shape as expected, it turns out one of the x64 farms has been silently producing a long array of SharePoint errors with event ID’s 7076, 6398, and 6482 literally every minute since 11.25pm the night before. I say “silent” because, well, the sites are still up. Still, something went terribly wrong under the hood and I fail to see a connection between the Cumulative Update and 11.25pm. After approximately 90 seconds of binging (by binging I mean the act of googling), I suspect it’s a situation that requires the Microsoft Hotfix 946517, which apparently has no known relationship with the June 2009 CU… yet. And, ta-da! Special Agent 946517 successfully disincentivises the machine gun of mysterious errors. While the mystery itself remains, I document my observations and actions, and move on.

The moral of the story is that it pays to go through a health check-up every now and then. But you can’t probe and interrogate everything all the time unless that’s the very job you are paid to do. What I often do is a simple 3-minute routine that should be carried out immediately after a significant change and perhaps once again 24 hours later. The routine is neither exhaustive nor bulletproof, but does cover the essentials most of the time. It consists of these steps:

  1. Check the services
  2. Check the event logs
  3. Examine the Search Administration page

It’s service that matters

Let me elaborate. First, go to Run > services.msc and ensure that all the vital services have started. They include:

  • Windows SharePoint Services Administration
  • Windows SharePoint Services Search
  • Windows SharePoint Services Timer
  • Windows SharePoint Services Tracing
  • Office SharePoint Server Search
  • SQL Server (if the server also runs databases; typical in a development setup)

If any of these are not running, give them a nudge and get them started. In rare cases, restarting the Search and Timer services also helps.

Benign or malignant?

Next, go to Run > eventvwr and click on Applications while praying for not too many red icons. Pay special attention to recurring errors. It takes some research and practice to be able to accurately distinguish between harmless one-off glitches and real problems. When done with Application events, move onto events under System. If you see a red alert and you have no idea what to do, start binging.

Next, open Shared Services Administration in the browser, and navigate to the Search Administration page. If you don’t see a page called Search Administration, somebody needs to install Service Pack 2 and all subsequent updates.

SharePoint crawl log

The Search Administration page gives a nice summary of what’s going on, including the log of all recent full and incremental crawls, automated or manually initiated. Check that the crawls ran on schedule, the numbers of indexed items, how long each crawl job took, and the numbers of successes and errors in each completed crawl job. There is no clean-cut metric here, but a row with 3980 successes and 21 errors is usually not a red flag. One with 17056 successes and 2593 errors, on the other hand, is a worry.

Now we wait A quick glance at the page will also gather other pieces of vital information. If the SSP is configured to import Active Directory user profiles on a regular basis, check the profile import log as well.

Did you do it, Doug?

Lastly, do the obvious: Check that all the essential SharePoint sites on that server load without problems. While you’re at it, throw in a couple of test search queries. All this should take less than 3 minutes if everything is indeed hunky-dory. Now you can say it: Women, you all look awesome today.

Tagged: » »

2009-07-23  »  JK

Share your thoughts

Re: Are you up, SharePoint? The 3-minute vitality check







Tags you can use (optional):
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>