A couple of months ago I posted the first version of my SCOM 2007 R2 Daily Health Check Script - here is version 2. It’s more than a little motivated by some friendly competition with a Microsoft PFE for SCOM, hopefully you’ll agree it’s a big improvement on the last version.
Updated for this version
Formatting changed to make it more readable and more compatible Added “Report generated on ” to the top of the report Management Server states reported as one section Default MP check moved to beneath the Management servers Agents in pending states moved to be with the Agent health states Clarified “Unresponsive Agents” and “Agents reporting errors” Management server alerts streamlined Added top 10 alerts for the last 7 days, and added top alerters for each I’m planning to wrap in some SQL database size checks and some of the other recommendations later - I’ll post again here when that’s ready 🙂
An updated version of this script has been released: https://www.definit.co.uk/2012/05/scom-2007-r2-daily-health-check-script-v2/
I’ve been working with a Microsft SCOM PFE (Premier Field Engineer) for the last few months and part of the engagement is an environment health check for the SCOM setup. Based on this Microsoft recommend a series of health checks to for the environment that should be carried out every day. This is summarised as the following:
Check the health of all Management Servers and Gateways Check the RMS is not in maintenance mode Review Outstanding Alerts Review Agent’s Health Status Review Backup Status Review any Management Group Alerts Review the Pending Management status Review Database Sizes (Operations, Data warehouse, ACS) Review Volume of Alerts Review Alert Latency Document any changes From this, there are certain aspects that can’t be automated so easily, or shouldn’t be - e.
The Test MAPI Connectivity monitor for the Exchange 2007 management pack will automatically generate a critical error for any Recovery Storage Groups you have on monitored Exchange Mailbox Roles. As these are generally temporary Storage Groups created for a recovery and then removed, you don’t want an alert - but manually adding an override for every time is not a great use of your time either.
The State Change event details are as follows:
I learned something new today: SCOM 2007 R2 certificate based communications not only checks the validity of the certificate you use, but also the CA that issued it…let me expand:
Like many organisations there is a root CA (we’ll call it ROOTCA01), and then a subordinate CA (we’ll call that SUBCA01). OPSMGM01 has a certificate to identify itself and has certificates for ROOTCA01 and SUBCA01 in it’s Trusted Root Certificate Authorities.
The DFS monitoring tool in SCOM 2007 has some great features, which will replace many a custom VB script running in enterprises. As with a lot of Management Packs, to get the most out of it you need to have a dedicated RunAs account with local admin permissions on the servers you are monitoring (e.g. for the Backlogged Files reporting).
The easy (and wrong) option here is to go with the less secure option and distribute a RunAs account to ALL servers.
Getting a SCOM 2007 R2 SCOM agent on TMG is a useful way of monitoring TMG, especially with the SCOM TMG Management Pack – it’s not exactly “out-of-the-box” functionality though, with many sources I’ve read simply stating that it can’t be done. There are some half-working solutions I’ve seen, but nothing that worked for me.
The process involves simply opening the correct ports and protocols between the TMG servers and the SCOM management servers, which after a few attempts watching the live logs, I found.
SCOM 2007 R2’s Audit Collection Services (ACS from now on) is very useful for meeting compliance (e.g. Sarbanes Oxley) and security audit requirements – working with financial companies often requires such compliance. It’s pretty simple to install in a domain environment – you run the installer to create a collection server, then activate the forwarder on the client servers.
When it comes to servers you really want to audit, those that are by definition more at risk from security breach because they are publicly accessible, it’s not so straightforward.
This should be a simple update of some hotfixes, but there were a few tripping points along the way that I had to stumble past. As reference I used the CU2 update page and I also a Kevin Holman technet article.
So, I’m going to assume that a) you’re installing the update for a reason, like one of the bugs it fixes and b) you have taken a backup of your OpsManager databases.
This is a pretty specific set of instructions for a specific environment:
you are using Microsoft System Center Operations Manager 2007
you have a Microsoft Certificate Services 2003 Certificate Authority on your domain
you have non-domain Windows Server 2008 servers you wish to monitor or set up as a gateway server.
Getting a certificate for either a Gateway Server or remotely monitored Server can be a touch vexing.
This was a bit of an odd one. I was adding a Gateway Server to a newly rebuilt SCOM 2007 R2 Root Management Server when I kept encountering this error:
The certificate specified in the registry at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Machine Settings cannot be used for authentication. The error is The credentials supplied to the package were not recognized(0x8009030D).
I followed the Microsoft install and setup guides exactly, and it’s not my first time either – but I’d never seen that one before.