DefinIT Because if IT were easy, everyone would do it…

25Jan/12Off

SCOM 2007 R2: Daily Health Check Script

An updated version of this script has been released: http://www.definit.co.uk/2012/05/scom-2007-r2-daily-health-check-script-v2/

MSFT-System-Center-logoI've been working with a Microsft SCOM PFE (Premier Field Engineer) for the last few months and part of the engagement is an environment health check for the SCOM setup. Based on this Microsoft recommend a series of health checks to for the environment that should be carried out every day. This is summarised as the following:

  1. Check the health of all Management Servers and Gateways
  2. Check the RMS is not in maintenance mode
  3. Review Outstanding Alerts
  4. Review Agent's Health Status
  5. Review Backup Status
  6. Review any Management Group Alerts
  7. Review the Pending Management status
  8. Review Database Sizes (Operations, Data warehouse, ACS)
  9. Review Volume of Alerts
  10. Review Alert Latency
  11. Document any changes 

From this, there are certain aspects that can't be automated so easily, or shouldn't be - e.g:

3. Review Outstanding Alerts: There is no point in scripting a capture of this as they need to be dealt with in the Console.

5. Review Backup Status: Backups are generally monitored elsewhere and by other software, you could query SQL to find the last backup of the DBs if required.

8. Review Database Sizes: This one I don't agree with being a daily check - if you have sized and tuned your environment correctly then these can be checked on a weekly or monthly schedule.

11. Document any changes: Clearly this isn't going to work as a scripted healthcheck!

Check the health of all Management Servers and Gateways

$ReportOutput = "<H2>Management Servers not in Healthy States</H2>"
$Count = Get-ManagementServer | where {$_.HealthState -ne "Success"} | Measure-Object
if($Count.Count -gt 0) {
 $ReportOutput += Get-ManagementServer | where {$_.HealthState -ne "Success"} | select Name,HealthState,IsRootManagementServer,IsGateway | ConvertTo-HTML -fragment
} else {
 $ReportOutput += "<p>All management servers are in healthy state.</p>"
}

Check the RMS is not in maintenance mode

$RMS = Get-ManagementServer | where {$_.IsRootManagementServer -eq $True}
$criteria = new-object Microsoft.EnterpriseManagement.Monitoring.MonitoringObjectGenericCriteria("InMaintenanceMode=1")
$objectsInMM = (Get-ManagementGroupConnection).ManagementGroup.GetPartialMonitoringObjects($criteria)
$is = "is not"
foreach ($MM in $objectsInMM){
 if($MM.Displayname -eq $RMS.Name){
   $is = "is"
 }
}
$ReportOutput += "<h2>RMS in Maintenance Mode</h2><p>"+ $RMS.Name +" "+$is+" in maintenance mode</p>"

Review Agent's Health Status

$ReportOutput += "<h2>Agents where Health State is not Green</h2>"
$ReportOutput += Get-Agent | where {$_.HealthState -ne "Success"} | select Name,HealthState | ConvertTo-HTML -fragment$ReportOutput += "<h2>Agents where the Monitoring Class is not available</h2>"
$AgentMonitoringClass = get-monitoringclass -name "Microsoft.SystemCenter.Agent"
$ReportOutput += Get-MonitoringObject -monitoringclass:$AgentMonitoringClass | where {$_.IsAvailable -eq $false} | select DisplayName | ConvertTo-HTML -fragment

Review any Management Group Alerts

$ManagementServers = Get-ManagementServer
foreach ($ManagementServer in $ManagementServers){
 $ReportOutput += "<h3>Alerts on " + $ManagementServer.ComputerName + "</h3>"
 $ReportOutput += get-alert -Criteria ("NetbiosComputerName = '" + $ManagementServer.ComputerName + "'") | where {$_.ResolutionState -ne '255' -and $_.MonitoringObjectFullName -Match 'Microsoft.SystemCenter'} | select TimeRaised,Name,Description,Severity | ConvertTo-HTML -fragment
}

Review the Pending Management status

$ReportOutput += "<h2>Agents in Pending State</h2>"
$ReportOutput += Get-AgentPendingAction | sort AgentPendingActionType | select AgentName,ManagementServerName,AgentPendingActionType | ConvertTo-HTML -fragment

Review Volume of Alerts

$ReportOutput += "<h2>Top 10 Repeating Alerts</h2>"
$ReportOutput += get-alert -Criteria 'ResolutionState < "255"' | Sort -desc RepeatCount | select-object –first 10 Name, RepeatCount, MonitoringObjectPath, Description | ConvertTo-HTML -fragment$ReportOutput += "<h2>Agents in Pending State</h2>"
$ReportOutput += Get-AgentPendingAction | sort AgentPendingActionType | select AgentName,ManagementServerName,AgentPendingActionType | ConvertTo-HTML -fragment

The Final Product

My full Daily Health Check Script, with some formatting, output for console and sending an email:

$Head = "<style>"
$Head +="BODY{background-color:#CCCCCC;font-family:Verdana,sans-serif; font-size: x-small;}"
$Head +="TABLE{border-width: 1px;border-style: solid;border-color: black;border-collapse: collapse; width: 100%;}"
$Head +="TH{border-width: 1px;padding: 0px;border-style: solid;border-color: black;background-color:green;color:white;padding: 5px; font-weight: bold;text-align:left;}"
$Head +="TD{border-width: 1px;padding: 0px;border-style: solid;border-color: black;background-color:#F0F0F0; padding: 2px;}"
$Head +="</style>"write-host "Getting Management Health Server States" -ForegroundColor Yellow
$ReportOutput = "<H2>Management Servers not in Healthy States</H2>"
$Count = Get-ManagementServer | where {$_.HealthState -ne "Success"} | Measure-Object
if($Count.Count -gt 0) {
 $ReportOutput += Get-ManagementServer | where {$_.HealthState -ne "Success"} | select Name,HealthState,IsRootManagementServer,IsGateway | ConvertTo-HTML -fragment
} else {
 $ReportOutput += "<p>All management servers are in healthy state.</p>"
}
write-host "Getting RMS Maintenance Mode" -ForegroundColor Yellow
$RMS = Get-ManagementServer | where {$_.IsRootManagementServer -eq $True}
$criteria = new-object Microsoft.EnterpriseManagement.Monitoring.MonitoringObjectGenericCriteria("InMaintenanceMode=1")
$objectsInMM = (Get-ManagementGroupConnection).ManagementGroup.GetPartialMonitoringObjects($criteria)
$is = "is not"
foreach ($MM in $objectsInMM){
 if($MM.Displayname -eq $RMS.Name){
   $is = "is"
 }
}
$ReportOutput += "<h2>RMS in Maintenance Mode</h2><p>"+ $RMS.Name +" "+$is+" in maintenance mode</p>"write-host "Getting Agent Health Status" -ForegroundColor Yellow
$ReportOutput += "<h2>Agents where Health State is not Green</h2>"
$ReportOutput += Get-Agent | where {$_.HealthState -ne "Success"} | select Name,HealthState | ConvertTo-HTML -fragment$ReportOutput += "<h2>Agents where the Monitoring Class is not available</h2>"
$AgentMonitoringClass = get-monitoringclass -name "Microsoft.SystemCenter.Agent"
$ReportOutput += Get-MonitoringObject -monitoringclass:$AgentMonitoringClass | where {$_.IsAvailable -eq $false} | select DisplayName | ConvertTo-HTML -fragmentwrite-host "Getting Management Server Alerts" -ForegroundColor Yellow
$ReportOutput += "<h2>Management Server Alerts</h2>"
$ManagementServers = Get-ManagementServer
foreach ($ManagementServer in $ManagementServers){
 $ReportOutput += "<h3>Alerts on " + $ManagementServer.ComputerName + "</h3>"
 $ReportOutput += get-alert -Criteria ("NetbiosComputerName = '" + $ManagementServer.ComputerName + "'") | where {$_.ResolutionState -ne '255' -and $_.MonitoringObjectFullName -Match 'Microsoft.SystemCenter'} | select TimeRaised,Name,Description,Severity | ConvertTo-HTML -fragment
}write-host "Getting Top 10 Unresolved Alerts" -ForegroundColor Yellow
$ReportOutput += "<h2>Top 10 Unresolved Alerts</h2>"
$ReportOutput += get-alert -Criteria 'ResolutionState < "255"'  | Group-Object Name | Sort-object Count -desc | select-Object -first 10 Count, Name | ConvertTo-HTML -fragmentwrite-host "Getting Top 10 Repeating Alerts" -ForegroundColor Yellow
$ReportOutput += "<h2>Top 10 Repeating Alerts</h2>"
$ReportOutput += get-alert -Criteria 'ResolutionState < "255"' | Sort -desc RepeatCount | select-object –first 10 Name, RepeatCount, MonitoringObjectPath, Description | ConvertTo-HTML -fragmentwrite-host "Getting Agents in Pending State" -ForegroundColor Yellow
$ReportOutput += "<h2>Agents in Pending State</h2>"
$ReportOutput += Get-AgentPendingAction | sort AgentPendingActionType | select AgentName,ManagementServerName,AgentPendingActionType | ConvertTo-HTML -fragmentwrite-host "Getting Overrides in Default Management Pack" -ForegroundColor Yellow
$ReportOutput += "<h2>Overrides in Default Management Pack</h2>"$OverrideCount = Get-ManagementPack | where {$_.DisplayName -match "Default Management Pack"} | get-override | measure-objectif($OverrideCount.Count -gt 2){
 foreach ($monitor in Get-ManagementPack | where {$_.DisplayName -match "Default Management Pack"} | get-override | where {$_.monitor}) {
  $ReportOutput += get-monitor | where {$_.Id -eq $monitor.monitor.id} | select-object DisplayName,Description | ConvertTo-HTML -fragment
  $ReportOutput += "<br />"
 }
 foreach ($rule in Get-ManagementPack | where {$_.DisplayName -match "Default Management Pack"} | get-override | where {$_.rule}) {
  $ReportOutput += get-rule | where {$_.Id -eq $rule.rule.id} | select-object DisplayName,Description | ConvertTo-HTML -fragment
  $ReportOutput += "<br />"
 }
} else {
 $ReportOutput += "<p>There are no unexpected overrides in the Default Management Pack</p>"
}$Body = ConvertTo-HTML -head $Head -body "$ReportOutput"$SmtpClient = New-Object system.net.mail.smtpClient
$MailMessage = New-Object system.net.mail.mailmessage
$SmtpClient.Host = "smtp.definit.co.uk"
$mailmessage.from = "[email protected]"
$mailmessage.To.add("<a href="mailto:[email protected]">[email protected]</a>")
#$mailmessage.To.add("<a href="mailto:[email protected]">[email protected]</a>")
$mailmessage.Subject = "SCOM Daily Healthcheck Report"
$MailMessage.IsBodyHtml = 1
$mailmessage.Body = $Body$smtpclient.Send($mailmessage)
 

You can download the file here (zip): Get-HealthCheck

Creative Commons License
SCOM 2007 R2: Daily Health Check Script by DefinIT, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Posted by Sam McGeown