Blog post

1.6.4 – Describe vSphere High Availability

Provides a base level of protection for your VMs by restarting VMs if a host fails and enables a collection of ESXi hosts to work together to provide workload availability. 

When vSphere HA is activated, an HA agent is installed on each host in the cluster. These agents communicate with each other to determine which host will become the primary host and all other hosts become a secondary host. 

There are three types of failures which can occur with a host: 

  • Failure: host stops functioning 
  • Isolation: host becomes isolated from the network 
  • Partition: host loses network connectivity with the primary host but not with the secondary host(s) 

vSphere HA provides rapid recovery from outages and cost-effective high availability for applications running in VMs. HA protects apps availability in several ways: 

  • ESXI host failure – Restarts the VM on other hosts within the cluster 
  • VM Failure – Restarting the VM when VMware tools heartbeat is not received within a set time 
  • App failure – Restarting the VM when an app heartbeat is not received within a set time 
  • Datastore accessibility failure – Restarting the affected VMs on other hosts that can still access the datastore
  • Network Isolation – Restarting VMs if their host becomes isolated on the management or vSAN network. Even when the network is partitioned 

HA can detect datastore accessibility failures if VM Component Protection (VMCP) is configured. 

There are two responses: 

  • APD All paths down
    • Recoverable
    • Is a transient or unknown access loss 
    • Response can be either issue events, power off and restart VMs – conservative restart policy or power off and restart VMs – aggressive restart policy
  • PDL Permanent device loss
    • Unrecoverable loss of access 
    • Occurs when a storage device reports that the datastore is no longer available for the host
    • Response can be either issue events or Power off and restart VMs 

The conservative policy only restarts the VMs if they can be booted on another host. The aggressive policy restarts the VM even if it’s not yet determined that it can be booted on another host.

Host network isolation occurs when a host is still running but it can no longer observe traffic from HA agents on the management network. 

  • Host tries to ping the isolation addresses. This is an IP address or FQDN that can be manually specified. 
  • If pinging fails, the host declares that its isolated from the network. 
  • This can even work if the network becomes partitioned.