DefinIT

Trouble with SCOM 2007 R2 Certificates? Validate the entire PKI path!

MSFT-System-Center-logoI learned something new today: SCOM 2007 R2 certificate based communications not only checks the validity of the certificate you use, but also the CA that issued it…let me expand:

Like many organisations there is a root CA (we’ll call it ROOTCA01), and then a subordinate CA (we’ll call that SUBCA01). OPSMGM01 has a certificate to identify itself and has certificates for ROOTCA01 and SUBCA01 in it’s Trusted Root Certificate Authorities.

The certificate to secure the connection between OpsMgr Gateway (OPSGW01) and the OpsMgr Management Server (OPSMGM01) is issued by SUBCA01 and is installed on OPSGW01, and to validate the certificate chain SUBCA01’s certificate is also installed in the Trusted Root Certification Authorities. Opening OPSGW01’s certificate and examining the Certificate Path tab shows the certificate is valid all the way up to the issuing CA – SUBCA01.

The connection will not work – OPSGW01 logs the following events:

Log Name:      Operations Manager
Source:        OpsMgr Connector
Date:          05/01/2012 10:18:28
Event ID:      21016
Level:         Error
Computer:      opsgw01.definit.co.uk
Description:   OpsMgr was unable to set up a communications channel to opsmgm01.definit.co.uk and there are no failover hosts.  Communication will resume when opsmgm01.definit.co.uk is available and communication from this computer is allowed.

Log Name:      Operations Manager
Source:        OpsMgr Connector
Date:          05/01/2012 10:18:25
Event ID:      20070
Level:         Error
Computer:      opsgw01.definit.co.uk
Description:   The OpsMgr Connector connected to opsmgm01.definit.co.uk, but the connection was closed immediately after authentication occurred.  The most likely cause of this error is that the agent is not authorized to communicate with the server, or the server has not received configuration.  Check the event log on the server for the presence of 20000 events, indicating that agents which are not approved are attempting to connect.

Log Name:      Operations Manager
Source:        OpsMgr Connector
Date:          05/01/2012 10:18:24
Event ID:      21002
Level:         Warning
Computer:      opsgw01.definit.co.uk
Description:   The OpsMgr Connector could not accept a connection from xxx.xxx.xxx.xxx:5723 because mutual authentication failed.

Log Name:      Operations Manager
Source:        OpsMgr Connector
Date:          05/01/2012 10:18:24
Event ID:      20067
Level:         Warning
Computer:      opsgw01.definit.co.uk
Description:   A device at IP xxx.xxx.xxx.xxx:5723 attempted to connect but the certificate presented by the device was invalid.  The connection from the device has been rejected.  The failure code on the certificate was 0x800B0109 (A certificate chain processed, but terminated in a root certificate which is not trusted by the trust provider.).

It’s the last event that led me to check the certificate chain for the SUBCA01 certificate, which was installed and trusted but did not validate up the chain to ROOTCA01. Installing the ROOTCA01 certificate resolved this issue.