Skip navigation

vmware-disklock

IF you have seen this err0r, you’re somehow trully fucked-up. Probably a third party appliance or software made a lock on your stored in datastore *.vmdk files creating -delta ones. There are couple of methods how you could fix this… . Actually this is not a big or major err0r in vmWare environment, but can be a quite pain in the ass if you don’t have any DSR site enabled.

Let’s talk about diagnosis:

DIAGNOSIS

First of all you need to try to power on the vm. You will see one of these err0rz:

Failed to add disk scsi0:1. Failed to power on scsi0:1
Unable to open Swap File
Unable to access a file since it is locked
Unable to access a file  since it is locked
Unable to access Virtual machine configuration

In the /var/log/vmkernel log file, you see entries similar to:

WARNING: World: VM xxxx: xxx: Failed to open swap file : Lock was not free
WARNING: World: VM xxxx: xxx: Failed to initialize swap file 

The purpose of locking is to prevent changes of concurent vm files and FS. Sometimes locks cannot be released because of other handler still wants to do something with files. Even if machine is powered off. There are couple of methods how you can dislock it by yourself.

TROUBLESHOOTING

You need to locate which files, which machine in which realm is doing bad things. For this to do I suggest you to enable SSH on the ESXi/vCenter host.  Log in to a vCenter using the vSphere Client. Go to Inventory > Configuration > Security Profiles > Services > Properties > choose…

ESXi Shell
SSH
Direct Console UI

Good. Now c0nnect via Pussy through SSH and obtain full path to your VM and navigate to the f0lder, through dis c0mmands:

# vim-cmd vmsvc/getallvms
# cd /vmfs/volumes/vm-datastore/vm-dir/

Check which filez have been locked (*-delta.vmdk created) but plz remember that it can be misguiding, because these files could be obsolete. Better method is to check /var/log/vmware.log for corresponding consolidation errors on which files.
From this part you need to use vmfstools to check the used *.vmdk files.

# vmkfstools -qv10 vm-disk-000009.vmdk
* Repeat this command for every disk file used by VM (You can check it in Edit Settings tab on Hard Disk n parameter) which is actually used as a snap.

This shows you the actual snapshot chain from your selected snapshot down to the flat disk. Now, you have a list of *.vmdk files and you need to use command below to find the actual owner or RO (Read-Only) owner of the file:

vmkfstools -D vm-disk-flat.vmdk
vmkfstools -D vm-disk-000001-delta.vmdk
...

If you see output like this, that it!

# vmkfstools -D test-000008-delta.vmdk
Lock [type 10c00001 offset 45842432 v 33232, hb offset 4116480
gen 2397, mode 2, owner 00000000-00000000-0000-000000000000 mtime 5436998]
RO Owner[0] HB offset 3293184 xxxxxxxx-xxxxxxxx-xxx-xxxxxxxxxxxx 
Addr <4, 80, 160>, gen 33179, links 1, type reg, flags 0, uid 0, gid 0, mode 100600
len 738242560, nb 353 tbz 0, cow 0, zla 3, bs 2097152

The RO Owner part is the MAC address of vNIC that had locked the file! If you cannot see the RO Owner, but you don’t see zeros in owner, it is also a MAC o vNIC. If you only see zeros, the problem is heavier. Now let’s see how we can repair this.

FIX

Before you start fixing things, you should consider whether there are not other locks or uses of vm files, so you can type this:

# egrep -i  /vmfs/volumes/*/*/*.vmx
# vmkvsitools lsof | grep 

Now, if you could see the RO Owner or owner MAC addr, you can use this command to find the corresponding physical address:

esxcfg-nics -l

Simple, shutdown/restart the vNIC and disk consolidation/power on machine probably can now be possible. If not… the only non-invasive method is to clone/migrate vm to other datastore/host. You also can preventively restart ESX Management by:

services.sh restart

For cloning you can use these commands (or use the GUI of vCenter :-)) :

# vmkload_mod multiextent
# vmkfstools -i /path/datastore1/source.vmdk /path/datastore2/new.vmdk -d thin -a scsilogic/buslogic
# vmkfstools -U source.vmdk
# vmkfstools -E new.vmdk nsource.vmdk
# vmkload_mod -u multiextent

When cloning/migrating is done. You can consolidate the disks and remove the old VM. Remember that cloning is faster than migration, but in both ways you need extra space.


One Comment

  1. Good article. Short note – if using LACP or NSX, services.sh restart is not recommended, if any VM is running on ESXi.


Leave a comment