Troubleshooting Stuck Snapshots on Large Virtual Machines in VMware
A snapshot recovery checklist for large VMs, focused on storage pressure, snapshot chain health, consolidation, and safe validation.
Quick Read
- Symptom: A snapshot recovery checklist for large VMs, focused on storage pressure, snapshot chain health, consolidation, and safe validation.
- Check first: Confirm the VM name, datastore free space, active backup jobs, replication jobs, and whether consolidation is already running before starting another snapshot task.
- Risk: Changes system state
Symptoms
Snapshots on large virtual machines are stuck and cannot be deleted or consolidated.
Environment
VMware vSphere 6.7 and later, large virtual machines with multiple snapshots.
Most Likely Causes
Stuck snapshots can occur due to insufficient storage space, high I/O operations, or issues with the snapshot manager.
What to Check First
- Confirm the VM name, datastore free space, active backup jobs, replication jobs, and whether consolidation is already running before starting another snapshot task.
- Identify every VMDK, delta disk, snapshot descriptor, and VMX file path in the VM folder before using command-line snapshot operations.
- Check whether the VM is latency-sensitive or production-critical, because snapshot consolidation can create heavy datastore and guest I/O impact.
Fix Steps
- Check Storage Space
Verify that there is sufficient storage space available on the datastore where the VM resides.
Example pattern only. Adjust for your environment before running.
Log in to the vSphere Client. Navigate to 'Storage' and select the datastore. Check the 'Free Space' available on the datastore.
- Identify Stuck Snapshots
Use the Snapshot Manager to identify any snapshots that are stuck.
Example pattern only. Adjust for your environment before running.
Right-click on the VM and select 'Snapshots' > 'Manage Snapshots'. Review the list of snapshots for any that are not responding or have an unusual status.
- Attempt to Consolidate Snapshots
Try to consolidate snapshots through the Snapshot Manager.
Example pattern only. Adjust for your environment before running.
In the Snapshot Manager, select 'Consolidate'. Monitor the task progress in the Recent Tasks pane.
- Power Off the VM
If snapshots remain stuck, power off the VM to proceed with further troubleshooting.
Example pattern only. Adjust for your environment before running.
Right-click on the VM and select 'Power' > 'Power Off'. Confirm the action when prompted.
- Remove Snapshots via Command Line
Use the command line to remove stuck snapshots if the GUI method fails.
Safe to run: read-only
SSH into the ESXi host where the VM is located. Run the command: 'vim-cmd vmsvc/getallvms' to find the VM ID. Then execute: 'vim-cmd vmsvc/snapshot.removeall <VM_ID>' to remove all snapshots.
- Check for VM Disk Issues
Inspect the VM's virtual disks for any corruption or issues that may prevent snapshot removal.
Example pattern only. Adjust for your environment before running.
Run the command: 'vmkfstools -e /vmfs/volumes/<datastore>/<vm_folder>/<vm_disk>.vmdk' to check the disk health.
- Reboot the ESXi Host
As a last resort, reboot the ESXi host if snapshots remain stuck after all previous steps.
Safe to run: read-only
Log in to the ESXi host via SSH. Run the command: 'reboot' to restart the host.
Validation
- The VM no longer shows a consolidation-needed warning in vSphere after the consolidation or remove-all task completes.
- The VM folder no longer has unexpected orphaned delta disks attached to the active VMX configuration.
- Guest application checks pass after consolidation, not just the vSphere task status.
Logs to Check
- vCenter Tasks and Events for snapshot create, delete, remove-all, and consolidate operations.
- VMware VMX log in the VM folder for disk-chain and snapshot-lock messages.
- ESXi host logs around the datastore and VM timestamp if consolidation stalls or fails.
Rollback and Escalation
- Do not delete VMDK, delta, VMSD, or VMSN files manually unless VMware support or a validated recovery plan confirms the active disk chain.
- Record the VM folder listing and snapshot tree before consolidation so you can prove which files existed before the change.
- If consolidation fails, stop additional snapshot changes and preserve the current datastore state for recovery or vendor support.
Escalate When
- Escalate when the VM has no current backup, the snapshot chain is unclear, or datastore free space is below the size needed for consolidation growth.
- Escalate if VMX logs show disk-chain corruption, file locks that do not clear, or repeated consolidation failures on a production VM.
Edge Cases
- If the VM is part of a cluster, ensure that DRS settings are not impacting snapshot operations.
- Verify that there are no active backups or replication tasks that might interfere with snapshot management.
Notes from the Field
- Large-VM snapshot work is as much a capacity problem as a VMware task problem. Verify datastore headroom before starting consolidation.
- Backup products often own snapshot lifecycles. Check the backup console before assuming vSphere is the only actor touching snapshots.