DefinIT

vRealize Automation 7.3 Distributed Install – Prerequisites

Pre-requisites - Get your ducks in a row!As a consultant I’ve had the opportunity to design, install and configure dozens of production vRealize Automation deployments, from reasonably small Proof of Concept environments to globally-scaled multi-datacenter fully distributed behemoths. It’s fair to say, that I’ve made mistakes along the way – and learned a lot of lessons as to what makes a deployment a success.

In the end, pretty much everything comes down to getting the pre-requisites right. Nothing that I’ve written here is not already documented in the official documentation, and the installation wizard does a huge amount of the work for you.

For the purposes of this post, I am working with the following components, which have been pre-deployed on a single flat network.

vRA Appliances

Server
CPU
RAM
Disk
vra-app-1
4
18
140
vra-app-2
4
18
140
vRA IaaS Windows Servers
Server
CPU
RAM
Disk
vra-web-1
2
8
60
vra-web-2
2
8
60
vra-man-1
2
8
60
vra-man-2
2
8
60
vra-dem-1
2
4
60
vra-dem-2
2
4
60
vra-sql
2
8
60

(more…)

vSphere Web Client – VSAN is Turned Off – Edit button disappears

| 16/09/2016 | Tags: , , , ,

I ran into a strange one with my lab today where the previously working VSAN cluster couldn’t be enabled. Symptoms included:

  • The button to enable VSAN was missing from vSphere Web ClientVSAN is Turned OFF
  • vsphere_client_virgo.log had the following error:

[2016-09-16T14:49:03.473Z] [ERROR] http-bio-9090-exec-18 70001918 100023 200008 com.vmware.vise.data.query.impl.DataServiceImpl Error occurred while executing query:
QuerySpec
QueryName: dam-auto-generated: ConfigureVsanActionResolver:dr-57
ResourceSpec
Constraint: ObjectIdentityConstraint
TargetType: ClusterComputeResource
Target: ManagedObjectReference: type = ClusterComputeResource, value = domain-c481, serverGuid = a44e7d15-e63f-46c2-a1aa-b9b1cbf972be

I was able to enable VSAN on the cluster using rvc commands

  1. SSH to VCSA
  2. Enable bash shell
  3. rvc administrator@vshere.local@locahost
  4. vsan.enable_vsan_on_cluster /localhost/<datacenter name>/computers/<cluster name>

Following the enabling of VSAN on the cluster, I was still getting errors:

  • “Unable to load VSAN configuration” when viewing the VSAN configuration for the cluster in the vSphere Web Client
  • “HTTP400 Error” when viewing the cluster summary tab, on the VSAN health widget

The HTTP400 Error led me to the following KB VMware Virtual SAN 6.x health plug-in fails to load with the error: Unexpected status code: 400 (2133384), following the resolution in this KB resolved the issue.

It seems that, yet again, VMware’s certificate tooling does not replace a key certificate, and this is the root cause of the problem. When I deployed the VCSA, I configured the PSC as a subordinate Certificate Authority and followed the documented procedure to replace the certificates. Clearly this one was missed!

Unable to connect NSX to Lookup Service when using a vSphere 6 subordinate certificate authority (VMCA)

After deploying a new vSphere 6 vCenter Server Appliance (VCSA) and configuring the Platform Services Controller (PSC) to act as a subordinate Certificate Authority (CS), I was unable to register the NSX Manager to the Lookup Service. Try saying that fast after a pint or two!?

Attempting to register NSX to the Lookup Service would result in the following error:

NSX Management Service operation failed.( Initialization of Admin Registration Service Provider failed. Root Cause: Error occurred while registration of lookup service, com.vmware.vim.vmomi.core.exception.CertificateValidationException: Server certificate chain not verified )

image

Initially I thought that the NSX manager needed to somehow import the VMCA certificate to trust the Lookup Service certificate, however after reaching out to the NBSU ambassadors list I had a reply from Julienne Pham, a Technical Solutions Architect and CTO Ambassador with VMware Professional Services, who pointed me to the correct solution.

It seems that changing the PSC and vCenter certificates (even with the Certificate Manager tool) does not correctly update the service registration information. To quote VMware KB 2109074:

…the vCenter Server system uses a new certificate, but the service registration information on the Platform Services Controller is not updated

To resolve this issue, we need to use the ls_update_certs.py script to register the services correctly. (more…)

Generating and Installing CA Signed Certificates for VMware SRM 5.5

image I’m fairly new to SRM, but even so this one seemed like a real head-scratcher! If you happen to be using CA signed certificates on your “protected site” vCenter and “recovery site” vCenter servers, when you come to linking the two SRM sites you encounter SSLHandShake errors – basically SRM assumes you want to use certificates for authentication because you’re using signed certificates. If you use the default self-signed certificates, SRM will default to using password authentication (see SRM Authentication). Where the process fails is during the “configure connection” stage, if either one of your vCenter servers does not have CA signed and the other does (throws an error that they are using different authentication methods) or that you are using self-signed certificates for either SRM installation (throws an error that the certificate or CA could not be trusted).

SRM server ‘vc-02.definit.local’ cannot do a pair operation. The reason is: Local and remote servers are using different authentication methods.

image (more…)

SSO Admin password reset with ssopass – SslHandshakeFailed – vSphere 5.1

vmware logoToday I found out that in vSphere 5.1 the SSO administrator account (admin@system-domain) has a password that expires after 365 days. See KB2035864:

vCenter Single Sign-On account (SSO) passwords expire after 365 days, including the password for admin@system-domain.

Awesome.

In vSphere 5.5 it gets even better – the password expires every 90 days by default! (See the vSphere 5.5 SSO documentation)

By default, vCenter Single Sign-On passwords, including the password for administrator@vsphere.local, expire after 90 days.

Following KB2034608 to reset the admin@system-domain I came across an interesting error:

image

(more…)

vSphere Security: Advanced SSH Configurations

Security-Guard_thumb2_thumb.pngThere are different schools of thought as to whether you should have SSH enabled on your hosts. VMware recommend it is disabled. With SSH disabled there is no possibility of attack, so that’s the “most secure” option. Of course in the real world there’s a balance between “most secure” and “usability” (e.g. the most secure host is powered off and physically isolated from the network, but you can’t run any workloads Winking smile). My preferred route is to have it enabled but locked down.

Note: VMware use the term “ESXi Shell”, most of us would term it “SSH” – the two are used interchangeably in this article although there is a slight difference. You can have the ESXi Shell enabled but SSH disabled – this means you can access the shell via the DCUI. For the sake of this article assume ESXi Shell and SSH are the same. (more…)

Powershell – Generate Microsoft CA signed SSL certificates with vSphere 5.1

vmware logoThe process of requesting certificates for vSphere 5.1 is a fairly grim, manual process. It’s repetitive and easy to make a mistake on any step of the way. Since I’ve got to do this for quite a few VirtualCenter Servers, I thought I’d script the certificate generation if nothing else. I am following the excellent documentation provided in Implementing CA signed SSL certificates with vSphere 5.1 and more specifically in Creating certificate requests and certificates for vCenter Server 5.1 components.

The script assumes that:

  1. You have a working Certificate Authority
  2. You are in an Active Directory domain environment
  3. You have the relevant permissions to modify Certificate Templates, Request and Issue certificates.
  4. You have installed OpenSSL v1.0.1c or later.

You will need to modify the configuration section to suit your environment and the $WorkingDir folder should exist before you run the script. (more…)

Trouble with SCOM 2007 R2 Certificates? Validate the entire PKI path!

MSFT-System-Center-logoI learned something new today: SCOM 2007 R2 certificate based communications not only checks the validity of the certificate you use, but also the CA that issued it…let me expand:

Like many organisations there is a root CA (we’ll call it ROOTCA01), and then a subordinate CA (we’ll call that SUBCA01). OPSMGM01 has a certificate to identify itself and has certificates for ROOTCA01 and SUBCA01 in it’s Trusted Root Certificate Authorities.

The certificate to secure the connection between OpsMgr Gateway (OPSGW01) and the OpsMgr Management Server (OPSMGM01) is issued by SUBCA01 and is installed on OPSGW01, and to validate the certificate chain SUBCA01’s certificate is also installed in the Trusted Root Certification Authorities. Opening OPSGW01’s certificate and examining the Certificate Path tab shows the certificate is valid all the way up to the issuing CA – SUBCA01.

The connection will not work – OPSGW01 logs the following events:

Log Name:      Operations Manager
Source:        OpsMgr Connector
Date:          05/01/2012 10:18:28
Event ID:      21016
Level:         Error
Computer:      opsgw01.definit.co.uk
Description:   OpsMgr was unable to set up a communications channel to opsmgm01.definit.co.uk and there are no failover hosts.  Communication will resume when opsmgm01.definit.co.uk is available and communication from this computer is allowed.

Log Name:      Operations Manager
Source:        OpsMgr Connector
Date:          05/01/2012 10:18:25
Event ID:      20070
Level:         Error
Computer:      opsgw01.definit.co.uk
Description:   The OpsMgr Connector connected to opsmgm01.definit.co.uk, but the connection was closed immediately after authentication occurred.  The most likely cause of this error is that the agent is not authorized to communicate with the server, or the server has not received configuration.  Check the event log on the server for the presence of 20000 events, indicating that agents which are not approved are attempting to connect.

Log Name:      Operations Manager
Source:        OpsMgr Connector
Date:          05/01/2012 10:18:24
Event ID:      21002
Level:         Warning
Computer:      opsgw01.definit.co.uk
Description:   The OpsMgr Connector could not accept a connection from xxx.xxx.xxx.xxx:5723 because mutual authentication failed.

Log Name:      Operations Manager
Source:        OpsMgr Connector
Date:          05/01/2012 10:18:24
Event ID:      20067
Level:         Warning
Computer:      opsgw01.definit.co.uk
Description:   A device at IP xxx.xxx.xxx.xxx:5723 attempted to connect but the certificate presented by the device was invalid.  The connection from the device has been rejected.  The failure code on the certificate was 0x800B0109 (A certificate chain processed, but terminated in a root certificate which is not trusted by the trust provider.).

It’s the last event that led me to check the certificate chain for the SUBCA01 certificate, which was installed and trusted but did not validate up the chain to ROOTCA01. Installing the ROOTCA01 certificate resolved this issue.

Using System Center Operations Manager 2007 R2 Audit Collection Services for remote, DMZ or workgroup servers

MSFT-System-Center-logoSCOM 2007 R2’s Audit Collection Services (ACS from now on) is very useful for meeting compliance (e.g. Sarbanes Oxley) and security audit requirements – working with financial companies often requires such compliance. It’s pretty simple to install in a domain environment – you run the installer to create a collection server, then activate the forwarder on the client servers.

When it comes to servers you really want to audit, those that are by definition more at risk from security breach because they are publicly accessible, it’s not so straightforward. Take for example that web server, or FTP host in your DMZ, certainly not domain joined and probably bombarded by daily brute force password attacks. Select the SCOM agent in the console and enable Audit Collection Services?

(more…)