High Availability Deep Dive What’s New in vSphere 5 High Point Solutions

High Availability Deep Dive
What’s New in vSphere 5
David Lane, Virtualization Engineer
High Point Solutions
What is High Availability
What’s New in vSphere 5
Core Components of High Availability vSphere 5
How High Availability Works in vSphere 5
Scenarios for High Availability in vSphere 5
Exploiting High Availability with vSphere 5
What is High Availability?
• The Answer to Hardware Density Concerns
• Resilient Architecture
• Automated Recovery
• Simple Setup / Familiar Interface
High Availability Prerequisites
Minimum of 2 Hosts
Minimum of 3GB of Host Memory
VMware vCenter Server
Shared Storage
Pingable Constant Address (Gateway)
HA Communication Firewall Ports (TCP/UDP 8182)
Essentials Plus and Up
Configuring High Availability
• 10 Steps - 10 Minutes
• Create a Cluster
• Drag and Drop Hosts
What’s New for vSphere 5
• FDM (Fault Domain Manager) – New HA Agent
• Master / Slave Nodes
• Datastore Heartbeating
• Enhanced Isolation Validation
• No DNS Dependency
• Supports Management Network Partitions
• Enhanced Admission Control Policies
Core HA Components of vSphere 5
• FDM (Fault Domain Manager)
• VMware vCenter
• hostd
• Replaces Legato AAM (Automated Availability Manager)
• Single Process Agent with Watchdog Failsafe
• No DNS Dependency No DNS Limitations
• Consolidated Logging with Syslog Compatibility
• Talks Directly to hostd and vCenter Not
Dependent on VPXA
VMware vCenter
• Deploys FDM Agents – Parallel (AAM Serial)
• Communicates Configuration Changes in Cluster
to Master Node
• Retrieves Virtual Machine Status
• Displays Protection Status of VMs
• Required for FDM
• Runs on Host
• Relays information about VMs on host
• Responsible to Power On VMs
How Does High Availability vSphere 5 work?
The Tools
• Master / Slave Nodes
• Heartbeating
• Isolated vs. Network Partitioned
• Virtual Machine Protection
Master / Slave Nodes
• One Master Node Per Cluster (exception Network Partitioned)
• Master Node Monitors VM Health Directs Slaves
• Master Node Takes Ownership of Datastores where VMs
Configuration Files are Located
• Master Node Reports VM Status to vCenter Server
• Master Node Assigned by Election
• Slaves Monitor Their running VMs and send Status to
Master and perform restarts on Master Node Requests
• Slaves Also Monitor Master Node Health
Master Node Election
• Election held When HA is Enabled or Reconfigured and When
Master Node - Fails, Becomes Isolated or Partitioned, Disconnects from vCenter, In
Maintenance Mode, In Standby
• Utilizes UDP
• Takes 15 Seconds
• Host with Most Connected Datastores Wins
• If Multiple Hosts Share Highest Number Of Datastores the
Host with the highest Managed Object ID (MOID) Wins
• New Master Node will Attempt to Acquire Ownership of All
Datastores by Locking “protectedlist” File (Protected VM List
Inventory File, on Datastores in Cluster)
• In The Case of Master Node Isolation File Locks will be Released
• Network Heatbeating
• Datastore Heartbeating
Network Heartbeating
• Heartbeats sent from Slaves to Master and From
Master to Slaves
• Heartbeats Sent Every Second
• Determines the State Of the Hosts
Datastore Heartbeating
• Prevents Unnecessary Restarts
• Extra Heartbeat Added to Determine State if
Management Network is Lost
• Validates Failure or Just Isolation
• Uses PowerOn File to Determine Isolation
Isolated vs. Network Partitioned
• Isolated (Host Separated from Master VMs May be Restarted)
– Not Receiving Heartbeat From Master
– Not Receiving Election Traffic
– Cannot Ping Isolation Address
• Partitioned (Multiple Host Isolated but Can Communicate to
Each Other Over Management Network)
– Not Receiving Heartbeats from Master
– Does Receive Election Traffic
Virtual Machine Protection
• vCenter Server Performs Protection on State
• Protection guaranteed when the master has
committed the change of state to disk
• Protectedlist File Contains VM State and
Scenarios For High Availability vSphere 5
Using The Tools
• Failed Host
• Isolated Host
• Application Monitoring - Failed VM OS
Failed Host
• Failed Master Host
– Master Election Initiated
– New Master Elected
– New Master Restarts all VMs on the Protectedlist with Not
Running State
• Failed Slave Host
– Master Check Network heartbeat
– Master Checks Datastore Heartbeat
– Master Restarts VMs Affected
Isolated Host
• Isolation Responses
– Power Off
– Leave Powered On
– Shut Down
• Isolation Detection
– Slaves will Hold Single Server Election and Check Ping Address
– Master will Check Ping Address
– Master Restarts VMs Affected
Application Monitoring - Failed VM OS
• Restarts Individual VM When Needed
• Configurable VM Tools Heartbeat
• Monitors Network and Storage I/O Activity as Fail-Safe
Exploiting HA with vSphere 5
• Stretched Clusters
– Storage DRS
• Blade Chassis Failure
• Larger Clusters Tenant Based Cloud