ESX 3.5 snapshots of disks on different storage are stored with the VM files

Written by Sam McGeown
Published on 2/10/2009 - Read in about 5 min (898 words)

A.K.A Why not to use snapshots

I ran into a slightly confusing problem today - our SQL servers are all created with 4 disks on 4 separate LUNs (System, Swap, SQL Data and SQL Logs). When viewing the server through Virtual Center I couldn’t see all of the LUNs, just the System LUN. It’s not a major problem as the VM can see the storage, but a little annoying when you have to remember what LUN the disks are on.

Slightly more distressing was the fact that the System-LUN was running out of space - fast. A LUN that should have had about 150GB free was running dangerously low. On investigation I found various snapshot files were being stored in with the System-LUN, which is where the VM’s VMX, vswap etc are situated. These were the snapshot delta files of the additional disks, which were on other storage! This isn’t first apparent at first as the disk snapshots have been named sequentially by ESX, so a VM with 4 disks on separate LUNs will in fact create 4 snapshot files on the SYSTEM-LUN named VM01-00001.vmdk, VM01-00002.vmdk, VM01-00003.vmdk and VM01-00004.vmdk. 00001 is for the System disk, 00002 is for the Swap disk etc etc. This means that the IO on that LUN has been multiplied, and the storage space is shrinking very rapidly.

A little more digging and it seems that this is by design - snapshots are not meant to be kept for very long, and I think VMware made a deliberate decision to make it difficult to do so. Any virtual disks created for a VM, lets call it VM01, were named VM01.vmdk. When additional virtual disks were created through vCenter on a different LUN, they were still named VM01.vmdk - there’s no conflict because they’re in different locations. However, when vCenter takes a snapshot it places them with the original disk, and because it’s got the same name as the existing disk it starts to enumerate them.

This is bad for a number of reasons - most prominent of which is that if the snapshot file grows large, vCenter does not handle the commit well. In fact, neither does ESX, but I’ll get to that. vCenter will time out on any operation that takes more than 15 minutes, so a commit of a 10GB snapshot will look for all intents and purposes in vCenter like it’s failed. On top of that, the enumeration of snapshot delta files can cause confusion as to which disk it actualy belongs to, and if that happens, commiting

We all know snapshots are performance killers, but the functionality they provide is not insignificant, and as with most things a balance has to be struck between the functionality and the performance.

So the headlines

  • VMs created with disks on multiple LUNs in vCenter use the SAME DISK NAME (eg; for VM01 the disks were created in /vmfs/volumes/SYSTEM-LUN/VM01.vmdk, /vmfs/volumes/SWAP-LUN/VM01.vmdk etc etc).

    • Mitigate this by creating disks using the vmkfstools and adding them to the VM or renaming the existing disks (see below).
  • Snapshots cause ALL disk delta files onto the “system” LUN (i.e. where your VMX file is stored.) This is bad because a) it multiplies your I/O on that disk and b) you negate the benefits of storing on multiple LUNs.

    • Mitigate this by deleting your snapshots. There’s no other way*, don’t try manually moving them or you will have problems.
  • Commiting large snapshots takes time - LOTS of time - and can have a big performance hit on your server.

    • Mitigate this by shutting down your VM first and commiting the disk using the vmware-cmd out of business hours. You can also merge the old disk and snapshots into a “new” disk, then shut down the VM and boot with the “new disk”.
  • vCenter has a hard coded 15m timeout.

    • If you are doing a operation that will take longer than that, do it via the console!
  • when I say there’s no other way, I mean, there’s no other practical way. There are methods to move the snapshot files to another LUN but they bring some serious problems with them.

Create a vmdk (virtual disk) using vmkfstools

  • Log in to your server console.
  • Type su - (to log in as root, enter root password, note the “-” to load the root user environment variables)
  • Navigate to the storage that you wish to use. E.g. cd /vmfs/volumes/System-LUN/
  • Create a new folder for the virtual disk: mkdir VM01
  • Navigate to the folder: cd VM01
  • Create the disk: vmkfstools -c -a <buslogic|lsilogic>
  • For help just type vmkfstools

Rename vmdk files using vmkfstools

  • Shut down the VM in vCenter.
  • Edit the VM settings and remove the disk you wish to change. Do not delete the file!
  • Log in to your server console.
  • type su - (to log in as root - enter root password, note the “-” to load the root user environment variables)
  • use the command: vmkfstools -E /vmfs/volumes///.vmdk /vmfs/volumes///.vmdk
  • Go back to the vCenter and re-add the disk, using the new name.

Commit your snapshots using vmware-cmd

  • Log in to your server console.
  • Type su - (to log in as root, enter root password, note the “-” to load the root user environment variables).
  • Use the vmware-cmd -l command to list your VMs. Note the path to the VM you want to deal with.
  • Remove all snapshots for a VM: vmware-cmd /path/to/vm/VM01.vmx removesnapshots
Share this post