Extending a vCenter Orchestrator (vCO) Workflow with ForEach – Backing up all ESXi hosts in a Cluster
In my previous post Backing up ESXi 5.5 host configurations with vCenter Orchestrator (vCO) – Workflow design walkthrough I showed how to create a workflow to back up host configurations, but it was limited to one host at a time. For this post I’m going to show how to create a new workflow that calls the previous one on multiple hosts using a ForEach loop to run it against each one. This was actually easier than I had anticipated (having read posts on previous versions of vCO that involved created the loops manually).
Ready? Here we go…
Create a new workflow – I’ve called mine “DefinIT-Backup-ESXi-Config-Cluster” and open the Schema view. Drag a “ForEach” element onto the workflow and the Chooser window pops up. This is a bit deceptive at first because it’s blank! However, if you enter some search text it will bring up existing workflows, so I searched for my “DefinIT-Backup-ESXi-Config” workflow that was created in that previous post.
Backing up ESXi 5.5 host configurations with vCenter Orchestrator (vCO) – Workflow design walkthrough
As a little learning project, I thought I’d take on Simon’s previous post about backing up ESXi configurations and extend it to vCenter Orchestrator (vCO), and document how I go about building up a workflow. I'm learning more and more about vCO all the time, but I found it has a really steep entry point, and finding use cases is hard if you haven't explored capabilities.
The steps I want to create in this post are:
- Right click host to trigger
- Sync and then back up the configuration to file
- Copy the file to a datastore, creating a folder based on date and the host
Ideas for future extensions of this workflow include - trigger from cluster/datastore/host folder, email notifications, emailing the backup files and backup config before triggering a VMware Update Manager remediation. Hopefully all this will come in future posts - for now, lets get stuck into the creation of this workflow.
After having a play with Virtual Flash and Host Caching on one of my lab hosts I wanted to re-use the SSD drive, but couldn’t seem to get vFlash to release the drive. I disabled flash usage on all VMs and disabled the Host Cache, then went to the Virtual Flash Resource Management page to click the “Remove All” button. That failed with errors:
“Host’s virtual flash resource is inaccessible.”
“The object or item referred to could not be found.”
In order to reclaim the SSD you need to erase the proprietary vFlash File System partition using some command line kung fu. SSH into your host and list the disks:
You’ll see something similar to this:
You can see the disk ID “t10.ATA_____M42DCT032M4SSD3__________________________00000000121903600F1F” and below it appended with the “:1” which is partition 1 on the disk. This is the partition that I need to delete. I then use partedUtil to delete the partition I just identified using the format below:
partedutil delete “/vmfs/devices/disks/<disk ID>” <partition number>
partedutil delete “/vmfs/devices/disks/t10.ATA_____M42DCT032M4SSD3__________________________00000000121903600F1F” 1
There’s no output after the command:
Now I can go and reclaim the SSD as a VMFS volume as required:
Hope that helps!
Fairly recently I came across this error message on one of my hosts "esx.problem.visorfs.ramdisk.full"
While trying to address the issue I had the following problems when the ramdisk did indeed "fill up"
- PSOD (worst case happened only once in my experience)
- VM's struggling to vMotion from the affected host when putting into maintenance mode.
A reboot of the host would clear the problem (clear out the ramdisk) for a short while but the problem will return if not addressed properly.
- Clustered ESXi hosts (version 5.1)
- vCenter 5.1
First of all I had to see just how bad things were so I connected to the affected host via SSH ( you may need to start the SSH service on the host as by default it is usually stopped)
Using the following command I could determine what used state the ramdisk was in.
(using an example output)
It was clear the root was full so now to find out what was filling it up so I searched KB articles for answers below were more the more helpful ones.
Unlike many of the other articles I had read which all seemed to point to /var/log being the cause the culprit in this instance was the /scratch disk
After doing some quick reading it was clear I could set a separate persistent location for the /scratch disk. So I followed the VMware KB article on this process and rebooted the host to apply the changes.
Even though I had followed the article to the letter and rebooted the host the changes did not apply on this host as when I double checked the ramdisk and this was the output as you can see the used space was already growing and when I put the console window side by side with the other hosts it was growing rapidly whereas the other hosts were quite static in used space.
It was not until I rebooted the host again did the changes I made apply, I will point out that this was the only time this little issue occurred the other hosts only required one reboot after changing the /scratch location.
After the second reboot I could see the host was as it should be
There are different schools of thought as to whether you should have SSH enabled on your hosts. VMware recommend it is disabled. With SSH disabled there is no possibility of attack, so that’s the “most secure” option. Of course in the real world there’s a balance between “most secure” and “usability” (e.g. the most secure host is powered off and physically isolated from the network, but you can’t run any workloads ). My preferred route is to have it enabled but locked down.
Note: VMware use the term “ESXi Shell”, most of us would term it “SSH” – the two are used interchangeably in this article although there is a slight difference. You can have the ESXi Shell enabled but SSH disabled – this means you can access the shell via the DCUI. For the sake of this article assume ESXi Shell and SSH are the same.