Tuesday, February 7, 2023

vSphere 7: Performing a Reconfigure for VMware HA operation on a primary node causes an unexpected virtual machine failover

After enable Skyline Health you may see error telling you something like this:



When you perform a Reconfigure for VMware HA operation on the primary node in an HA cluster, an unexpected virtual machine failover occurs for the virtual machines running on that primary node.







When the primary HA host is manually reconfigured for HA, it causes the remaining secondary host to enter an election to find a new primary host.

The newly elected primary host places the virtual machines that were running on the old primary host in an unknown power state, and waits for up to 10 seconds for notification that the virtual machines on the old primary host are powered on and running.

If the old primary host does not become secondary within that 10-second interval, the new primary host assumes that the virtual machines are down, and attempts to restart them. This causes a false failover event to occur, and consequently the failover task fails because the virtual machines were never powered off. The virtual machines remain unaffected in this scenario.

To resolve this issue, increase the monitor period:
Notes
  • Starting with vCenter Server 7.0 Update 1, the Property name for fdm.policy.unknownStateMonitorPeriod has changed to fdm.unknownStateMonitorPeriod.
  • The das.config can be prefixed to these properties, which when completed can apply to all the hosts in the cluster.
    1. In vCenter, right-click the cluster and select Edit Settings.



    










2. Click vSphere HA and then Advanced Options.



    

























3. Add a new option (if not already present)
        For 7.0U1 or greater:
            Default Option is 10
            das.config.fdm.unknownStateMonitorPeriod = 10
        Pre 7.0U1:
            das.config.fdm.policy.unknownStateMonitorPeriod = 10

        For this issue change the value from 10 to 30.

        For 7.0U1 or greater:
            das.config.fdm.unknownStateMonitorPeriod = 30
        Pre 7.0U1:
            das.config.fdm.policy.unknownStateMonitorPeriod = 30


    

























4. Disable and re-enable HA settings of the cluster.



No comments:

Post a Comment