DefinIT Because if IT were easy, everyone would do it…

25Jan/1220

SCOM 2007 R2: Daily Health Check Script

Posted by Sam McGeown

An updated version of this script has been released: http://www.definit.co.uk/2012/05/scom-2007-r2-daily-health-check-script-v2/

MSFT-System-Center-logoI've been working with a Microsft SCOM PFE (Premier Field Engineer) for the last few months and part of the engagement is an environment health check for the SCOM setup. Based on this Microsoft recommend a series of health checks to for the environment that should be carried out every day. This is summarised as the following:

  1. Check the health of all Management Servers and Gateways
  2. Check the RMS is not in maintenance mode
  3. Review Outstanding Alerts
  4. Review Agent's Health Status
  5. Review Backup Status
  6. Review any Management Group Alerts
  7. Review the Pending Management status
  8. Review Database Sizes (Operations, Data warehouse, ACS)
  9. Review Volume of Alerts
  10. Review Alert Latency
  11. Document any changes 
16Jan/120

I’m running the Virgin London Marathon 2012 for The Lighthouse Group

Posted by Sam McGeown

Virgin London Marathon Logo

This post is nothing more than a shameless request for sponsorship! As the title suggests, I am running the London marathon this year (in 96 days!) for the charity "The Lighthouse Group". Check out the TLG site for more detail on what they do, but in a nutshell they are a charity that works with young people who have been excluded from school, at risk of exclusion or are at crisis point in their education. It's a really worthwhile cause and my father-in-law has just been involved in opening a TLG center based in Normanton, Yorkshire

I'd appreciate any contribution, big or small! It's fair to say I'm not quite the right build to run a marathon, so a little bit of sponsorship would be very encouraging! I've been training since late August last year, and am currently managing two 7 mile runs a week, plus a game of football and a couple of swims! Keep up to date with my progress over on my Runkeeper profile.

JustGiving - Sponsor me now!

5Jan/121

Overriding the OpsMgr Exchange 2007 Test MAPI Connectivity Monitor for Recovery Storage Groups

Posted by Sam McGeown

MSFT-System-Center-logoThe Test MAPI Connectivity monitor for the Exchange 2007 management pack will automatically generate a critical error for any Recovery Storage Groups you have on monitored Exchange Mailbox Roles. As these are generally temporary Storage Groups created for a recovery and then removed, you don't want an alert - but manually adding an override for every time is not a great use of your time either.

5Jan/120

Trouble with SCOM 2007 R2 Certificates? Validate the entire PKI path!

Posted by Sam McGeown

MSFT-System-Center-logoI learned something new today: SCOM 2007 R2 certificate based communications not only checks the validity of the certificate you use, but also the CA that issued it...let me expand:

Like many organisations there is a root CA (we'll call it ROOTCA01), and then a subordinate CA (we'll call that SUBCA01). OPSMGM01 has a certificate to identify itself and has certificates for ROOTCA01 and SUBCA01 in it's Trusted Root Certificate Authorities.

The certificate to secure the connection between OpsMgr Gateway (OPSGW01) and the OpsMgr Management Server (OPSMGM01) is issued by SUBCA01 and is installed on OPSGW01, and to validate the certificate chain SUBCA01's certificate is also installed in the Trusted Root Certification Authorities. Opening OPSGW01's certificate and examining the Certificate Path tab shows the certificate is valid all the way up to the issuing CA - SUBCA01.

The connection will not work - OPSGW01 logs the following events:

Log Name:      Operations Manager
Source:        OpsMgr Connector
Date:          05/01/2012 10:18:28
Event ID:      21016
Level:         Error
Computer:      opsgw01.definit.co.uk
Description:   OpsMgr was unable to set up a communications channel to opsmgm01.definit.co.uk and there are no failover hosts.  Communication will resume when opsmgm01.definit.co.uk is available and communication from this computer is allowed.

Log Name:      Operations Manager
Source:        OpsMgr Connector
Date:          05/01/2012 10:18:25
Event ID:      20070
Level:         Error
Computer:      opsgw01.definit.co.uk
Description:   The OpsMgr Connector connected to opsmgm01.definit.co.uk, but the connection was closed immediately after authentication occurred.  The most likely cause of this error is that the agent is not authorized to communicate with the server, or the server has not received configuration.  Check the event log on the server for the presence of 20000 events, indicating that agents which are not approved are attempting to connect.

Log Name:      Operations Manager
Source:        OpsMgr Connector
Date:          05/01/2012 10:18:24
Event ID:      21002
Level:         Warning
Computer:      opsgw01.definit.co.uk
Description:   The OpsMgr Connector could not accept a connection from xxx.xxx.xxx.xxx:5723 because mutual authentication failed.

Log Name:      Operations Manager
Source:        OpsMgr Connector
Date:          05/01/2012 10:18:24
Event ID:      20067
Level:         Warning
Computer:      opsgw01.definit.co.uk
Description:   A device at IP xxx.xxx.xxx.xxx:5723 attempted to connect but the certificate presented by the device was invalid.  The connection from the device has been rejected.  The failure code on the certificate was 0x800B0109 (A certificate chain processed, but terminated in a root certificate which is not trusted by the trust provider.).

It's the last event that led me to check the certificate chain for the SUBCA01 certificate, which was installed and trusted but did not validate up the chain to ROOTCA01. Installing the ROOTCA01 certificate resolved this issue.