Understanding #vROps High Availability

| 29/03/2017 | Tags: , , , ,

While there is a reasonable amount of information about how HA works in vROps I have found there is still some confusion as to how HA actually works with vROps or rather what are the benefits and perhaps more importantly the cost for enabling the feature.

HA is a great feature and in my opinion should be considered seriously with any deployment (where possible).

Not only does HA protect your Master node (which essentially behaves as an index for your vROps cluster and if lost will render your cluster dead unless you have a working backup of your cluster) it will also allow your cluster to tolerate a data node failure. So in short what is not to like!

The important thing to remember here is the cost in terms of object/metric capacity for your vROps cluster and in turn the additional nodes and or hardware you may need to facilitate/provide for having a HA enabled vROps Cluster.

If you have deployed vROps before then I would hope you are already familiar with the VMware provided vROps Sizing Spreadsheets When you input your environment figures you will be informed of the size of nodes and number of nodes required to monitor your environment. Furthermore you can select how many months you plan to retain data and whether you will enable HA or not.

#vROps Webinar 2016 : Part 5 : Design and Deployment considerations

vROps webinar logoAs promised, I am posting the recording for the 5th Session of vROps Webinar Series 2016. Both Sunny and I successfully delivered the session on Design and Deployment considerations.

Session Details:- In this instalment of the series, we discussed the steps and thought processes that should be used before and during the design and deployment of vRealize Operations Manager. During the session among other things we will cover the planning, core components, correct sizing, HA, clustering, DR and future growth.

Once again I would like to thank my friend and partner in this project Sunny as without him this would not be possible.

So without further ado, here is the recording for this session:

Note : It is recommended that you watch the video in HD quality for a great experience.

The vROps HA conundrum

VMware.jpgOne of the great new features included in vROps is High Availability, however when you look a little closer at how it works careful thought needs to go into whether you want to use it or not.

I have had several discussions with my colleagues on the subject about whether you should or should not enable it in any given deployment of a vROps cluster.

So the following are my thoughts and bullet points for you to consider when faced with same dilemma.

By its very name, I assumed wrongly that it could be used as a way to tackle BC/DR concerns, it turns out the HA feature cannot span multiple logical datacenters – KB article – Forum discussion. I am hoping in future editions this gets resolved as it would be -very- handy.

So what other things do I need to take into account?

  • The Master node behaves like an index for your cluster, lose it and lose your cluster, so HA can protect it although is no substitute for a proper backup solution. “.. Global xDB is used for data that cannot be sharded, it is solely located on the master node (and master replica if high availability is enabled)
  • HA takes several minutes to “kick in” so one could argue why not rely on vSphere HA (especially if your management cluster is tight on resources)
  • HA would protect you against a LUN/Datastore failure assuming you had sensibly separated your nodes.
  • HA adds an additional node so if you are tight on resources it might not be an option for you.
  • Removal of data nodes (if you need to downsize your cluster) will result in data loss unless you have HA enabled.

The bullet points are by no means exhaustive but they are essential information while you muse your design choice for you next vROps cluster.

If you are looking for good reference material I can recommend the book Mastering vRealize Operations Manager by Scott Norris.



vSphere 6 HA SSO (PSC) with NetScaler VPX Load Balancer for vRealize Automation

vCenterProviding a highly available single sign on for vRealize Automation is a fundamental part of ensuring the availability of the platform. Traditionally, (vCAC) vRA uses the Identity Appliance and relies on vSphere HA to provide the availability of the SSO platform, but in a fully distributed HA environment that’s not really good enough. It’s also possible to use the vSphere 5.5 SSO install in a HA configuration – however, many companies are making the move to the latest version of vSphere and don’t necessarily want to maintain a 5.5 HA SSO instance.

The vSphere 6 Platform Services Controller can be deployed as an appliance or installed on a Windows host – personally I am a huge fan of the appliances and I tend to use them in my designs because of the simplicity and ease of use. A pair of PSCs can be deployed as a highly available SSO solution for vRealize Automation 6.2, replacing the Identity Appliance or vSphere 5.5. SSO, using either a NetScaler or F5 load balancer to load balance connections and provide the availability.

Personally, I’d prefer to use an NSX Edge Services Gateway to load balance the PSCs, but at the time of writing the Edge does not support the “Ability to have session affinity to the same PSC node across all configured ports”. See KB2112736 for more details.

So, this guide will show you how to create a highly available pair of Platform Service Controllers, configure one as a subordinate Certificate Authority to a Microsoft Certificate Services CA, and then load balance them with a NetScaler VPX. Although I am using just two node, you can in fact use the same method to load balance up to four. (more…)

vSphere HA agent for host [Host’s Name] has an error in [Cluster’s Name] in [Datacenter’s Name]: vSphere HA agent cannot be correctly installed or configured

| 24/08/2012 | Tags: , , , , ,

Here’s a lesson in checking the basics! I added new ESXi 5 host to a cluster today and spent a good couple of hours troubleshooting the error:

vSphere HA agent for host [Host’s Name] has an error in [Cluster’s Name] in [Datacenter’s Name]: vSphere HA agent cannot be correctly installed or configured

After a few basic checks, migrating the host in and out of the cluster and rebooting, I headed off to google and began troubleshooting.

Cannot install the vSphere HA (FDM) agent on an ESXi host – this article suggests that the host is in lockdown mode. This is unlikely since we don’t use lockdown mode, but I checked anyway:

Get-vmhost | select Name,@{N="LockDown";E={$_.Extensiondata.Config.adminDisabled}} | ft -auto Name,LockDown

This returned false – no lockdown.

To exit lockdown mode, you can use:

(get-vmhost | get-view).ExitLockdownMode()

I spent a good amount of time going through the list on Troubleshooting VMware High Availability (HA) in vSphere which isn’t entirely ESXi relevant but has some good pointers nonetheless.

I finally got to Reconfiguring HA (FDM) on a cluster fails with the error: Operation timed out, with the following gem of info:

 This issue occurs if the vSphere High Availability Agent service on the ESXi host is stopped.    

*Facepalm* – I checked the services and set the service to start and stop automatically. HA is now happily configured.

No matter how much you know, you gotta check the basics!