I've been playing about with a compact SRM install in my lab - since I have limited resources and only one site I wanted to create a run-through for anyone learning SRM to be able to do it in their own lab too. I am creating two sites on the same IP subnet (pretend it's a stretched LAN across two sites) and will be protecting a single, tiny Linux web server using vSphere Replication. I'm aiming to cover SAN based replication in a later post.
Below is the list of hosts and VMs running for this exercise:
- ESXi-01 - my "Protected Site" - this is running DC-01, VC-01, SRM-01 and VRA-01 (to be installed later)
- ESXi-02 - my "Recovery Site" - this is running VC-02, SRM-02 and VRA-02 (to be installed later)
- DC-01 – this is my domain controller, I’m only going to use one DC for both “sites” as I don’t have the compute resource available to have a second running. This is also my Certificate Authority.
- VC-01 – this is my primary Virtual Center server, it’s a Windows 2012 R2 server. It is managing ESXi-01.
- VC-02 – this is my “recovery site” and it’s a Virtual Center Server Appliance (VCSA). It is managing ESXi-02
- SRM-01 - “protected site” SRM server, base install of Windows Server 2012 at this point
- SRM-02 - “recovery site” SRM server, base install of Windows Server 2012 at this point
- WEB-01 - this is a really, really, basic Ubuntu web server I've deployed from a template to use for testing.
Right - without further ado, let's get stuck in!
There are many ways to tackle the problem of quickly redeploying or recovering ESXi hosts, Host profiles, Auto deploy etc.. however such options are either out of reach for SME/SMB users where their license does not cover such features or they have very small clusters of which Auto deploy etc would perhaps be considered overkill.
So how can we backup the config of our ESXi hosts? There is a great command you can use in vSphere CLI "vicfg-cfgbackup.pl", which when used with certain switches can either back up or restore your ESXi host config.
Backing up a host
Quite simply you fire up your vSphere CLI client and run the command as shown below, make sure you define a file name as well as the destination folder or it will error.
You will then be prompted for authentication to the host, assuming you input the correct credentials the firmware configuration will be saved successfully to the folder you specified.
You may notice on my example I saved the file type as .tgz, you can drill into the .tgz file and see all of the config this process saves which is kind of handy if you want to be doubly sure it did the job correctly.
Restoring a host
So now you want to restore a host from a backup you have taken, we can use the same command but with the -l switch.
Important things to note
- This action will reboot your host
- This command will want to place your host in maintenance mode so therefore you will need to evacuate any VMs on the host.
- Placing the host into maintenance mode prior to running the command will not work and it will error, the process needs to place the host in maintenance mode itself.
- If you are running a small cluster you will likely need to disable HA while you perform this action to avoid errors being generated due to the lack of available resources.
Example error below
Successful restore below
I have found this to be really handy if I wish to restore a host to a previous running config, and by example will save you having to re-enter all of your network config etc.
I’m fairly new to SRM, but even so this one seemed like a real head-scratcher! If you happen to be using CA signed certificates on your “protected site” vCenter and “recovery site” vCenter servers, when you come to linking the two SRM sites you encounter SSLHandShake errors – basically SRM assumes you want to use certificates for authentication because you’re using signed certificates. If you use the default self-signed certificates, SRM will default to using password authentication (see SRM Authentication). Where the process fails is during the “configure connection” stage, if either one of your vCenter servers does not have CA signed and the other does (throws an error that they are using different authentication methods) or that you are using self-signed certificates for either SRM installation (throws an error that the certificate or CA could not be trusted).
SRM server 'vc-02.definit.local' cannot do a pair operation. The reason is: Local and remote servers are using different authentication methods.
This had me scratching my head, what seemed to be a common problem wasn’t fixed by the common solution. It was actually my fault – too familiar with the product and setting things up too quickly to test.
I installed a VCSA 5.5 instance in my lab as a secondary site for some testing and during the process found I couldn’t log on to the web client – it failed with the error:
Failed to connect to VMware Lookup Service https://vCVA_IP_address:7444/lookupservice/sdk - SSL certificate verification failed.
I had a closer look at the certificate being generated and noticed that the Subject Name was malformed “CN=vc-02.definit.loca” – that led me to the network config of the VCSA. I’d entered the FQDN into the “host name” field, which was in turn being passed to the certificate generation, truncated and throwing the SSL error. Changing the FQDN back to the host name “VC-02” and regenerating the certificate resolved the issue.
If you do have to follow that process, remember to disable the SSL certificate regeneration after it’s fixed – otherwise you’ll suffer slow boot times!
I’ll put that one down to over-familiarity with the product!
After having a play with Virtual Flash and Host Caching on one of my lab hosts I wanted to re-use the SSD drive, but couldn’t seem to get vFlash to release the drive. I disabled flash usage on all VMs and disabled the Host Cache, then went to the Virtual Flash Resource Management page to click the “Remove All” button. That failed with errors:
“Host’s virtual flash resource is inaccessible.”
“The object or item referred to could not be found.”
In order to reclaim the SSD you need to erase the proprietary vFlash File System partition using some command line kung fu. SSH into your host and list the disks:
You’ll see something similar to this:
You can see the disk ID “t10.ATA_____M42DCT032M4SSD3__________________________00000000121903600F1F” and below it appended with the “:1” which is partition 1 on the disk. This is the partition that I need to delete. I then use partedUtil to delete the partition I just identified using the format below:
partedutil delete “/vmfs/devices/disks/<disk ID>” <partition number>
partedutil delete “/vmfs/devices/disks/t10.ATA_____M42DCT032M4SSD3__________________________00000000121903600F1F” 1
There’s no output after the command:
Now I can go and reclaim the SSD as a VMFS volume as required:
Hope that helps!