VMware recommends some best practices for the configuration of host NICs and network topology for VMware HA. Best Practices include recommendations for your ESX/ESXi hosts, and for cabling, switches, routers, and firewalls.

The following network maintenance suggestions can help you avoid the accidental detection of failed hosts and network isolation because of dropped VMware HA heartbeats.

When making changes to the networks that your clustered ESX/ESXi hosts are on, VMware recommends that you suspend the Host Monitoring feature. Changing your network hardware or networking settings can interrupt the heartbeats that VMware HA uses to detect host failures, and this might result in unwanted attempts to fail over virtual machines.

When you change the networking configuration on the ESX/ESXi hosts themselves, for example, adding port groups, or removing vSwitches, VMware recommends that in addition to suspending Host Monitoring, you place the host in maintenance mode.

Note

Because networking is a vital component of VMware HA, if network maintenance needs to be performed inform the VMware HA administrator.

To identify which network operations might disrupt the functioning of VMware HA, you should be aware of which management networks are being used for heart beating and other VMware HA communications.

On ESX hosts in the cluster, VMware HA communications travel over all networks that are designated as service console networks. VMkernel networks are not used by these hosts for VMware HA communications.

On ESXi hosts in the cluster, VMware HA communications, by default, travel over VMkernel networks, except those marked for use with vMotion. If there is only one VMkernel network, VMware HA shares it with vMotion, if necessary. With ESXi 4.0 and later, you must also explicitly enable the Management Network checkbox for VMware HA to use this network.

For VMware HA to function, all hosts in the cluster must have compatible networks. The first node added to the cluster dictates the networks that all subsequent hosts allowed into the cluster must also have. Networks are considered compatible if the combination of the IP address and subnet mask result in a network that matches another host's. If you attempt to add a host with too few, or too many, management networks, or if the host being added has incompatible networks, the configuration task fails, and the Task Details pane specifies this incompatibility.

For example, if the first host you add to the cluster has two networks being used for VMware HA communications, 10.10.135.0/255.255.255.0 and 10.17.142.0/255.255.255.0, all subsequent hosts must have the same two networks configured and used for VMware HA communications.

A network isolation address is an IP address that is pinged to determine if a host is isolated from the network. This address is pinged only when a host has stopped receiving heartbeats from all other hosts in the cluster. If a host can ping its network isolation address, the host is not network isolated, and the other hosts in the cluster have failed. However, if the host cannot ping its isolation address, it is likely that the host has become isolated from the network and no failover action is taken.

By default, the network isolation address is the default gateway for the host. There is only one default gateway specified, regardless of how many management networks have been defined, so you should use the das.isolationaddress[...] advanced attribute to add isolation addresses for additional networks. See VMware HA Advanced Attributes.

When you specify additional isolation address, VMware recommends that you increase the setting for the das.failuredetectiontime advanced attribute to 20000 milliseconds (20 seconds) or greater. A node that is isolated from the network needs time to release its virtual machine's VMFS locks if the host isolation response is to fail over the virtual machines (not to leave them powered on.) This must happen before the other nodes declare the node as failed, so that they can power on the virtual machines, without getting an error that the virtual machines are still locked by the isolated node.

For more information on VMware HA advanced attributes, see Customizing VMware HA Behavior.

Configuring Switches. If the physical network switches that connect your servers support the PortFast (or an equivalent) setting, enable it. This setting prevents a host from incorrectly determining that a network is isolated during the execution of lengthy spanning tree algorithms.

Host Firewalls. On ESX/ESXi hosts, VMware HA needs and automatically opens the following firewall ports.

Incoming port: TCP/UDP 8042-8045

Outgoing port: TCP/UDP 2050-2250

Port Group Names and Network Labels. Use consistent port group names and network labels on VLANs for public networks. Port group names are used to reconfigure access to the network by virtual machines. If you use inconsistent names between the original server and the failover server, virtual machines are disconnected from their networks after failover. Network labels are used by virtual machines to reestablish network connectivity upon restart.