
Advisory: Windows Server Update KB5062557 Linked to Critical Cluster and VM Failures
System administrators managing Windows Server environments should be aware of significant issues reportedly linked to the recent cumulative update KB5062557. Following its installation, a growing number of IT professionals have observed critical failures within Failover Cluster environments, particularly those running Hyper-V.
These problems can introduce severe instability, impacting service availability and virtual machine operations. If you have recently applied this update, it is crucial to review your systems for signs of trouble.
What’s Going Wrong? The Key Symptoms
The issues stemming from update KB5062557 are not isolated but affect the core functionality of a high-availability infrastructure. Administrators have reported a consistent pattern of problems, including:
- Complete Cluster Service Failures: The most severe symptom is the Cluster Service failing to start after the update is installed and the server is rebooted. This effectively brings the entire cluster offline, as nodes cannot communicate or coordinate.
- Inability to Live Migrate VMs: Virtual machines may fail to live migrate between cluster nodes. The process either errors out immediately or hangs indefinitely, disrupting load balancing and planned maintenance.
- Cluster Shared Volumes (CSVs) Going Offline: Access to Cluster Shared Volumes can be lost. VMs stored on these CSVs may fail to start or crash, showing as “unmounted” or “inaccessible” within Failover Cluster Manager.
- General Instability and Unresponsiveness: Even if the cluster remains partially online, nodes may become unstable, dropping in and out of the cluster or exhibiting extreme performance degradation.
These symptoms primarily impact Windows Server 2022 and Windows Server 2019, although other versions could potentially be affected. The root cause appears to be related to changes in how cluster communication or authentication protocols are handled after the patch is applied.
Your Action Plan: How to Mitigate and Fix the Issue
If your environment is experiencing these problems after installing KB5062557, immediate action is required to restore stability. The current consensus and most effective solution is to remove the problematic update.
1. Verify the Update is Installed
First, confirm that KB5062557 is present on your affected cluster nodes. You can do this quickly using PowerShell:
Get-HotFix -Id KB5062557
If this command returns information about the update, it is installed on the machine.
2. Uninstall the Problematic Update
Removing the update is the most direct path to resolving the cluster failures. You have two primary methods for this:
Using the Command Line (Recommended): The fastest way to remove the update across multiple servers is with the Windows Update Standalone Installer (WUSA) command. Run the following in an elevated Command Prompt or PowerShell window:
wusa /uninstall /kb:5062557
You will likely be prompted to reboot the server after the uninstallation is complete. Remember to drain roles from each node before rebooting.
Using the GUI: Navigate to Control Panel > Programs and Features > View installed updates. Find
Update for Microsoft Windows (KB5062557)
in the list, right-click it, and select Uninstall.
3. Pause Windows Updates Temporarily
After uninstalling the update, it is crucial to pause updates on your servers to prevent Windows from automatically reinstalling KB5062557.
You can do this by going to Settings > Update & Security > Windows Update > Advanced options and pausing updates for a set period. In a corporate environment, this should be managed via Group Policy (GPO) or your endpoint management solution to ensure consistency.
Best Practices for Future Update Management
While this situation is disruptive, it serves as a critical reminder of the importance of a robust patching strategy. To minimize the risk of future updates causing production outages, consider these security tips:
- Test in a Lab Environment: Always deploy new Windows updates to a non-production environment that mirrors your production setup. Test all critical functions—like live migration, cluster failover, and application access—before approving a wider rollout.
- Stagger Your Rollout: Avoid deploying a new patch to all servers simultaneously. Create deployment rings, starting with less critical systems and gradually moving to your most important production servers over several days or weeks.
- Maintain a Rollback Plan: Always ensure you have a documented and tested plan for uninstalling a patch. This includes having recent, application-consistent backups of your systems and VMs.
The current recommendation is to keep KB5062557 uninstalled from production cluster nodes until an official acknowledgment and a revised, stable patch are released. Monitor official channels for updates on this issue.
Source: https://www.bleepingcomputer.com/news/microsoft/microsoft-windows-server-kb5062557-causes-cluster-vm-issues/