Host disk write latency errors – troubleshooting
So recently we upgraded our cluster monitoring suite to it’s latest iteration (Veeam ONE), it was not long before I began to receive emails from the monitor informing me of Host disk write latency “errors” (Datastore write latency had exceeded the defined threshold in the monitor) on several of the Datastores on our SAN.
Naturally I began the process of cross referencing backup routines and any heavy I/O routines that may have been running at the time the warning messages were generated. My conclusion was that even under average load these alerts were being generated, which was far from ideal even if we had not noticed any performance problems with any of the busy VMs.
After consulting the web/reference material and a few very knowledgable friends it was clear the first port of call was the Host Datastore Multipath policy. Upon quick inspection, all of the offending Datastores were configured with the Path Selection “Most Recently Used (vmware)”. I had the option to set the Path Selection to “Round Robin (vmware)” but before doing so I double checked our MSA2312i SAN could support such a policy, which in this case it did.