Lab Notes – vCloud Director 9.1 for Service Providers – Part 1: Pre-requisites

| 08/02/2019 | Tags: , , , , , ,

This series was originally going to be a more polished endeavour, but unfortunately time got in the way. A prod from James Kilby (@jameskilbynet) has convinced me to publish as is, as a series of lab notes. Maybe one day I’ll loop back and finish them…



Because I’m backing my vCloud Director installation with NSX-T, I will be using my existing Tier-0 router, which interfaces with my physical router via BGP. The Tier-0 router will be connected to the Tier-1 router, the NSX-T logical switches will be connected to the Tier-1, and the IP networks advertised to the Tier-0 (using NSX-T’s internal routing mechanism) and up via eBGP to the physical router.

The Tier-1 router will be created in Active-Standby mode because it will also provide the load balancing services later.

Tier1 vCloud Director

Tier 1 vCloud Director advertised routes

Logical Switches

I want to build vCloud Director as many Service Provider customers do, with different traffic types separated by logical switches. I will be using and subnetting into some smaller /27 networks to avoid wasting IPs (a typical Service Provider requirement) To that end, I am deploying four NSX-T logical switches:

  • vCloud Director API/UI/Console
  • vCloud Director SQL
  • vCloud Director NFS
  • vCloud Director RabbitMQ/Orchestrator

The four logical switches have been connected to the Tier1 router created for vCloud Director, and have router ports configured in the correct subnet

vCD Router Ports

Load Balancing

There are various load balancing requirements for the full vCloud Director installation, which will be fulfilled by the NSX-T Logical Load Balancer on the Tier-1 router:

  • vCloud Director API/UI
  • vCloud Director Console
  • vCloud Director RabbitMQ
vRealize Orchestrator

The actually load balancer configuration will be done later on when I have the components deployed.


All the VMs that are part of the vCloud Director installation will require A and PTR (forward and reverse) lookup records

Required DNS Records

Notice that the VCD cells have two IPs per VM, one for the UI/API, and one for the Console traffic. Two records are also created for the load balancer URLs for vRealize Orchestrator, and RabbitMQ.

VM Sizing

The vCloud Director cells, PostgreSQL database and RabbitMQ will be deployed using a standard CentOS7 template. vRealize Orchestrator is deployed as an appliance. The open-vm-tools package is installed on the template.

Vcd-sql-1. 2cpu. 4gb. 40gb


All VMs have been updated using yum update -y


All VMs are configured to use a default NTP source:

yum install -y ntp

systemctl enable ntpd

systemctl start ntpd


Replace SELINUX=enforcing with SELINUX=disabled in /etc/selinux/config and reboot

sed -i ‘s/^SELINUX=.*/SELINUX=disabled/g’ /etc/selinux/config && cat /etc/selinux/config && reboot

Lab Notes – vCloud Director 9.1 for Service Providers – Part 4: RabbitMQ Cluster Installation

| 13/07/2018 | Tags: , , , , , ,

This series was originally going to be a more polished endeavour, but unfortunately time got in the way. A prod from James Kilby (@jameskilbynet) has convinced me to publish as is, as a series of lab notes. Maybe one day I’ll loop back and finish them…

RabbitMQ for vCloud Director

RabbitMQ High Availability and Load Balancing

The vCloud Architecture Toolkit states

RabbitMQ scales up to thousands of messages per second, which is much more than vCloud Director is able to publish. Therefore, there is no need to load balance RabbitMQ nodes for performance reasons.

Therefore, I am deploying RabbitMQ in cluster mode for high availability rather than scaling out resources. This means that I can use a RabbitMQ cluster with two nodes, configure replication for the vCloud Director queue, and then load balance the two nodes.

  • When you configure a highly available queue, one node is elected the Master, and the other(s) become Slave(s)
  • If you target a node with the Slave, RabbitMQ will route you to the queue on the Master node
  • If the queue’s Master node becomes unavailable, a Slave node will be elected as Master

In order to provide a highly available RabbitMQ queue for vCloud Director extensibility, the load balancer will target the queue’s Master node and send traffic there. In the event that the node with the Master queue becomes unavailable, the load balancer will redirect traffic to the second node, which will have been elected as Master.

Both vCloud Director and vRealize Orchestrator will access the queue via the load balancer.

  • vCloud Director will publish messages to the load balancer
  • vRealize Orchestrator will subscribe as a consumer to the load balancer

RabbitMQ HA Cluster


I’ve deployed two CentOS7 VMs from my standard template, and configured the pre-requisites as per my pre-requisites post. Updates, NTP, DNS and  SELinux have all been configured.

RabbitMQ needs the exact same Erlang version installed on each node, the easiest way to do this is to enable the EPEL repository:

yum install epel-release -y
yum install erlang -y

vCloud Director 9.1 supports RabbitMQ 3.6, so locate and download the correct RPM from the GitHub release page


To trust the downloaded package I need to import the RabbitMQ public signing certificate:

rpm –import

Finally, lets open the host firewall ports required for RabbitMQ

firewall-cmd –zone=public –permanent –add-port=4369/tcp
firewall-cmd –zone=public –permanent –add-port=25672/tcp
firewall-cmd –zone=public –permanent –add-port=5671-5672/tcp
firewall-cmd –zone=public –permanent –add-port=15672/tcp
firewall-cmd –zone=public –permanent –add-port=61613-61614/tcp
firewall-cmd –zone=public –permanent –add-port=1883/tcp
firewall-cmd –zone=public –permanent –add-port=8883/tcp
firewall-cmd –reload

RabbitMQ Installation

The following steps should be completed on BOTH RabbitMQ nodes

Install the RabbitMQ RPM

yum install rabbitmq-server-3.6.16-1.el7.noarch.rpm -y

Enable, and start the RabbitMQ server:

systemctl enable rabbitmq-server

systemctl start rabbitmq-server

Enable the management interface and restart the server to take effect

rabbitmq-plugins enable rabbitmq_management
chown -R rabbitmq:rabbitmq /var/lib/rabbitmq/

systemctl restart rabbitmq-server

Finally, add an administrative user for vCloud Director:

sudo rabbitmqctl add_user vcloud ‘VMware1!’
sudo rabbitmqctl set_user_tags vcloud administrator
sudo rabbitmqctl set_permissions -p / vcloud “.*” “.*” “.*”

Validate that the RabbitMQ admin page is accessible on http://vcd-rmq-1.definit.local:15672

RabbitMQ Admin Interface

Clustering RabbitMQ nodes

Now I have two independent, stand-alone RabbitMQ nodes running, it’s time to cluster them. Firstly the Erlang cookie needs to be copied from the first node to the second, which allows them to join the same cluster.

IMPORTANT: Make sure both nodes can resolve each other using their short names (e.g. vcd-rmq-1 and vcd-rmq-2). If they cannot, create entries in the HOSTS file to ensure that they can.

On the first node only (vdc-rmq-1)

Read the Erlang cookie from the file:

cat /var/lib/rabbitmq/.erlang.cookie

Copy the cookie contents (e.g. “FAPNMJZLNOCUTWXTNJOG”) to the clipboard.

On the second node only (vcd-rmq-2)

Stop the RabbitMQ service:

Systemctl stop rabbitmq-server

Then replace the existing cookie file with the cookie from the first node

Echo “FAPNMJZLNOCUTWXTNJOG” > /var/lib/rabbitmq/.erlang.cookie

Start the RabbitMQ service

Systemctl start rabbitmq-server

Stop the RabbitMQ app and reset the configuration:

Rabbitmqctl stop_app

Rabbitmqctl reset

Join the second node to the first node:

rabbitmqctl join_cluster rabbit@vcd-rmq-1

Then start the RabbitMQ app:

rabbitmqctl start_app

Validate the cluster status, using rabbitmqctl cluster_status, or by refreshing the management interface:

RabbitMQ Cluster Status

Configuring RabbitMQ for vCloud Director

Queue HA Policy

Now that the RabbitMQ nodes are clustered, we can configure the Queue mirroring with a HA policy. The below command creates a policy called “ha-all”, which applies to all queues (matching “”), then sets the ha-mode to “all” (replicate to all nodes in cluster) and the ha-sync-mode to “automatic” (if a new node joins, sync automatically). You can read more about RabbitMQ HA configuration here

rabbitmqctl set_policy ha-all “” ‘{“ha-mode”:”all”,”ha-sync-mode”:”automatic”}’

Create a Topic Exchange

Using the RabbitMQ management interface, log on with the “vcloud” user created earlier and select the “Exchanges” tab. Expand the “Add a new exchange” box and enter a name for the exchange. The remaining settings can be left at default. Once the new Exchange has been created, you can see that the “ha-all” policy has applied to it.

Create Topic Exchange

Configuring the RabbitMQ Load Balancer

The final configuration step is to load balance the two RabbitMQ nodes in the cluster – as described in the opening of this post, this will steer the publisher (vCloud Director) and subscriber (vRealize Orchestrator) to the node with the active queue.

I will be configuring an NSX-T load balancer, on the Tier-1 router that all the vCloud Director components are connected to. However the basic configuration should apply across most load balancer vendors. The load balancer should direct all traffic to vcd-rmq-1, unless the health check API does not return the expected status.

  • Virtual Server
  • (vcd-rmq.definit.local)
  • Layer 4 – TCP 5672
  • Server Pool
  • Round Robin (though in reality, it’s active/standby)
  • (vcd-rmq-1)
  • TCP 5672
  • Weight 1
  • Enabled
  • (vcd-rmq-2)
  • TCP 5672
  • Weight 1
  • Enabled
  • Backup Member (used if the other member goes down)
  • Health Check
  • Request URL  /api/healthchecks/node
  • HTTP 15672
  • Header (basic authorisation header)
  • Response status: 200
  • Response body: {“status”:”ok”}

Next Steps

Later, once the vCloud Director installation is completed, I will configured vCloud Director to use notifications to this RabbitMQ cluster.

Lab Notes – vCloud Director 9.1 for Service Providers – Part 3: NFS Server Installation

This series was originally going to be a more polished endeavour, but unfortunately time got in the way. A prod from James Kilby (@jameskilbynet) has convinced me to publish as is, as a series of lab notes. Maybe one day I’ll loop back and finish them…


I’ve deployed a CentOS7 VM from my standard template, and configured the prerequisites as per my prerequisites post. Updates, NTP, DNS and  SELinux have all been configured. I have added a 200GB disk to the base VM, which has then been partitioned, formatted and mounted to /nfs/data – this will be the share used for vCloud Director.

Install and enable the NFS server

Installing and configuring an NFS share is a pretty common admin task, so it doesn’t require a lot of explanation (I hope!)

Install the packages:

yum install nfs-utils rpcbind

Enable, and start the services:

systemctl enable nfs-server

systemctl enable rpcbind

systemctl enable nfs-lock

systemctl enable nfs-idmap

systemctl start rpcbind

systemctl start nfs-server
systemctl start nfs-lock
systemctl start nfs-idmaptouch

Configure the NFS Export (Share)

Once the services have been configured I add a configuration line to /etc/exports to export the mount (/nfs/data), allow access from the NFS subnet ( with the required settings for vCloud Director.

echo “/nfs/data,sync,no_root_squash,no_subtree_check)” >> /etc/exports

The following command will load the /etc/exports configuration:

exportfs -a

Finally, open the firewall ports to allow NFS clients to connect:

firewall-cmd –permanent –zone=public –add-service=nfs
firewall-cmd –permanent –zone=public –add-service=mountd
firewall-cmd –permanent –zone=public –add-service=rpc-bind
firewall-cmd –reload

Next Steps

Now that the NFS share is in place, I can move on to the next supporting service for vCloud Director – RabbitMQ. The NFS share will be mounted to the vCloud Director cells when they are installed later.

Lab Notes – vCloud Director 9.1 for Service Providers – Part 5: vRealize Orchestrator Cluster

This series was originally going to be a more polished endeavour, but unfortunately time got in the way. A prod from James Kilby (@jameskilbynet) has convinced me to publish as is, as a series of lab notes. Maybe one day I’ll loop back and finish them…


PostgreSQL server deployed and configured

Two vRO 7.4 appliances deployed

Before powering them on, add an additional network card on the vcd-sql network

Power on the VM and wait until it boots, then log onto the VAMI interface (https://vcd-vro-[1-2]:5480) and configure the eth1 interface with an IP address on the vcd-sql subnet

vRO eth1 interface

Configure the NTP server

vRO NTP Server

Configuring the first vRO node

Log onto the Control Centre for the first node https://vcd-vro-1.definit.local:8283/vco-controlcenter

Select the deployment type as standalone, and configure the load balancer name.

vRO Install type

Select the vSphere authentication provider, and accept the certificate.

vRO vSphere Authentication

Enter credentials to register with vSphere

vRO Auth Credentials

Select the an Administrators group to manage vRO

vRO Admin Group

Configure the remote database connection

vRO Remote PostgreSQL DB

After a couple of minutes, the vRO server will have restarted and I can progress to the second node – check this has happened by going to the Validate Configuration page and waiting for all the green ticks!

vRO Configuration Validation

Configuring the second vRO node

Select Clustered Orchestrator from the deployment page, and enter the details of the first vRO node

vRO Clustered Orchestrator

Wait for the second node to restart it’s services (~2 minutes again) to apply the configuration. Once the configuration has applied, you should see both nodes in the Orchestrator Cluster Management page

vRO Cluster completed

Load Balancing the vRealize Orchestrator Cluster

I will be configuring an NSX-T load balancer, on the Tier-1 router that all the vCloud Director components are connected to. However the basic configuration should apply across most load balancer vendors.

Virtual Servers


  • IP address: (vcd-vro.definit.local)
  • Port: Layer 7 HTTPS 8281
  • SSL Offload


  • IP address: (vcd-vro.definit.local)
  • Port: Layer 7 HTTPS 8283
  • SSL Offload

Server Pool

  • Members:,
  • Algorithm: Round Robin

Health Check


  • URL:  /vco/api/healthstatus
  • Port: 8281
  • Response: HTTP 200


  • URL:  /vco-controlcenter/docs/
  • Port: 8283
  • Response: HTTP 200

Next Steps

Later, once the vCloud Director installation is completed, vRealize Orchestrator will be configured for “XaaS” extensibility, as well as being hooked in as a subscriber to the vCloud notifications on the RabbitMQ cluster.

Lab Notes – vCloud Director 9.1 for Service Providers – Part 2: PostgreSQL Installation

| 10/07/2018 | Tags: , , , , , ,

This series was originally going to be a more polished endeavour, but unfortunately time got in the way. A prod from James Kilby (@jameskilbynet) has convinced me to publish as is, as a series of lab notes. Maybe one day I’ll loop back and finish them…

Installing PostgreSQL 10 Server

The base OS for the PostgreSQL server is CentOS7, deployed from the same template and with the same preparation as detailed in the prerequisites post.

Install PostgreSQL and configure

Add the correct repository (OS and processor) for the base VM – for my CentOS7 64-bit installation, based on the PostgreSQL web site. I used the following command:

rpm -Uvh

Install PostgreSQL server and client tools:

yum install -y postgresql10-server postgresql10

Change the default postgres user password

passwd postgres

Then initialise PostgreSQL

/usr/pgsql-10/bin/postgresql-10-setup initdb

Finally, start, enable and validate the service:

systemctl start postgresql-10

systemctl enable postgresql-10

systemctl status postgresql-10

Create the vCloud Director and vRO Database

To create a database for vCloud Director to use, switch to the postgres user and open the psql command line:

sudo -u postgres -i


Then create the databases and users required – one for vCloud Director, and one for the vRealize Orchestrator cluster:

create user vcloud;
alter user vcloud password ‘VMware1!’;
alter role vcloud with login;
create database vcloud;
grant all privileges on database vcloud to vcloud;

create user vro;
alter user vro password ‘VMware1!’;
alter role vro with login;
create database vro;
grant all privileges on database vro to vro;

Quit psql with \q, then exit back to the root prompt.

Configure remote PostgreSQL access

In order to allow remote access from the vCloud Director Cells, and vRealize Orchestrator, we need to add some configuration to the PostgreSQL configuration files.

These two commands add a line to the pg_hba.conf file, allowing the user vcloud access to the database vcloud, and the user vro to access the database vro from the vcd-sql subnet. You could specify individual hosts to increase security, but I’m going to be using the NSX distributed firewall to secure these connections too, so the subnet will suffice.

echo “host vcloud vcloud md5” >> /var/lib/pgsql/10/data/pg_hba.conf

echo “host vro vro md5” >> /var/lib/pgsql/10/data/pg_hba.conf

By default, PostgreSQL will be listening on it’s internal loopback address. To configure PostgreSQL to listen on all addresses, the following lines need to be added to the postgresql.conf file:

echo “listen_addresses = ‘*'” >> /var/lib/pgsql/10/data/postgresql.conf
echo “port = 5432” >> /var/lib/pgsql/10/data/postgresql.conf

Finally, open the host-based firewall to allow in-bound connections from the same two IP subnets:

firewall-cmd –permanent –zone=trusted –add-source=
firewall-cmd –permanent –zone=trusted –add-port=5432/tcp
firewall-cmd –reload

Restart PostgreSQL

Systemctl restart postgresql-10

Configure PostgreSQL Performance Tuning

For production deployments, there are some recommended tuning settings specified in the following KB. These settings are specifically tuned for the size of PostgreSQL server deployed in my lab, so I have implemented them –

Testing Remote Access

In order to validate the PostgreSQL configuration, database setup, network, and firewall configuration, connect to the PostgreSQL database from one of the vCloud Director cell VMs to ensure access:

Upgrading PKS with NSX-T from 1.0.x to 1.1

PKSYesterday, Pivotal Container Service 1.1 dropped and, as it’s something I’ve been actively learning in my lab, I wanted to jump on the upgrade straight away. PKS with NSX-T is a really hot topic right now and I think it’s going to be a big part of the future CNA landscape.

My Lab PKS 1.0.4 deployment is configured as a “NO-NAT with Logical Switch (NSX-T) Topology” as depicted in the diagram below (from the PKS documentation). My setup has these network characteristics:

  • PKS control plane (Ops Manager, BOSH Director, and PKS VM) components are using routable IP addresses.
  • Kubernetes cluster master and worker nodes are using routable IP addresses.
  • The PKS control plane is deployed inside of the NSX-T network. Both the PKS control plane components (VMs) and the Kubernetes Nodes use routable IP addresses.

I used William Lam’s excellent series on PKS with NSX-T to configure a lot of the settings, so I am going to assume a familiarity with that series. If not, I suggest you start there to get a good understanding of how everything is laid out.

NO-NAT with Logical Switch (NSX-T) Topology (more…)

vRealize Lifecycle Manager 1.2 VC data collection fails when NSX-T hostswitches are in use

| 18/04/2018 | Tags: , , , ,

vRLCM LogoWhen vRealize Lifecycle Manager 1.2 was released recently, I was keen to get it installed in my lab, since I maintain several vRealize Automation deployments for development and testing, as well as performing upgrades. With vRLCM I can reduce the administrative overhead of managing the environments, as well as easily migrate content between environments (I’ll be blogging on some of these cool new features soon).

However, I hit a snag when I began to import my existing environment – I couldn’t get the vCenter data collection to run.

Data Collection Failed (more…)

NSX-T 2.0 Lab Build: Upgrading to NSX-T 2.1

| 22/12/2017 | Tags: , ,

Yesterday saw the release of NSX-T 2.1, with some new features and also some usability enhancements. You can check out the release notes here

As I’m mid-way through this blog series, I thought I’d stick in the upgrade as a little bonus!

Download the upgrade bundle

Validate the version and status of NSX-T components

Check the Controller cluster status and Manager connections are up.

Validate the hosts are installed, and have a connection to the controller and manager.

Ensure the Edges

Finaly, check that the transport nodes are all in a “Success” state

You can also validate the state of NSX-T via the command line

SSH to the controller and use “get control-cluster status verbose”

Uploading the NSX-T Upgrade bundle

Navigate to System > Utilities > Upgrade, then click the “PROCEED TO UPGRADE” button

Select the upgrade .mub file and click “UPLOAD”

Since the upgrade bundle is fairly hefty (3.7GB) the upload will take a while, and once it’s uploaded it is extracted and verified, which again takes some time


Once the package has uploaded, click to begin the upgrade. The upgrade coordinator will then check the install for any potential issues. In my environment there are two warnings for the Edges that the connectivity is degraded – this is because of the disconnected 4th VMNIC on my Edge VMs and is safe to ignore.

Click Next to view the Hosts Upgrade page. Here you can define the order and method of upgrade for each host, and define host groups to control the order of upgrade. I’ve gone with the defaults, serial (one at a time) upgrades over the parallel (up to 5 at once). All three hosts in this environment are in an automatic group for Pod200-Cluster-1. Click START to begin the upgrade, and the hosts will be put in maintenance mode, then upgraded and rebooted if necessary (a reboot shouldn’t be necessary!) Bear in mind you need to have DRS enabled and the VMs on the hosts must be able to vMotion off of the host being put in maintenance mode. Once the host has upgraded, and the MPA (Management Plane Agent) has reported back to the Manager, the Upgrade Coordinator will move on to the next host in the group.

Once the hosts are upgraded, click NEXT to move to the Edge Upgrade page

Edge Clusters can be upgraded in serial or parallel, but the Edges within are grouped by the Edge Clusters and upgraded serially to ensure connectivity is maintained. I have a single Edge Cluster with two Edge VMs, so this will be upgraded one Edge at a time. Click START to begin the upgrade, and select the Edge Cluster to view the status of the Edge VMs within the Cluster.

Once the Edge Cluster upgrades are complete, click NEXT to move to the Controllers. You can’t change the upgrade of the controllers, these are done in parallel by default. Click START to begin the upgrade – this step took by far the longest in my lab, so be patient!

Once the upgrade has completed, click NEXT to move to the NSX Manager upgrade page. The NSX Manager will become unavailable about 2-4 minutes after you click START, and may take 10-15 minutes to become available again afterwards – don’t worry about errors that come up in the meantime!

Once the Manager upgrade has completed you can re-validate the installation as I did at the start, checking that we have all the green lights, and the versions have increased.

Over the next few posts I will be exploring some the new features introduced in 2.1


NST-T 2.0 Lab Build: Logical Router Configuration

| 19/12/2017 | Tags: , , , , , ,

Disclaimer! I am learning NSX-T, part of my learning is to deploy in my lab – if I contradict the official docs then go with the docs!

Lab Environment

This NSX-T lab environment is built as a nested lab on my physical hosts. There are four physical ESXi hosts, onto which I will deploy three ESXi VMs, a vCenter Server Appliance, NSX Manager, an NSX Controller cluster, and two NSX Edge Nodes.

Physical, virtual and nested components of the NSX-T lab

Deployment Plan

I will follow the deployment plan from the NSX-T 2.0 documentation:

  • Install NSX Manager.
  • Install NSX Controllers.
    • Join NSX Controllers with the management plane.
    • Initialize the control cluster to create a master controller.
    • Join NSX Controllers into a control cluster.
  • Join hypervisor hosts with the management plane.
  • Install NSX Edges.
    • Join NSX Edges with the management plane.
  • Create transport zones and transport nodes.
  • Configure Logical Routing and BGP

When this post series is complete, the network topology should be something like this, with two hostswitches configured. The ESXi Hosts will have a Tunnel Endpoint IP address, as will the Edge. The Edge will also have an interface configured for a VLAN uplink.

The NSX-T Transport Node network configuration

In this post I will walk through configuring VLAN Logical Switch, Tier-0 Router, Tier-1 Router, Uplink Profiles and BGP dynamic routing to the physical router.