vmware metro cluster paths

VMware Metro Cluster Guide

What is a VMware Metro Cluster?

A VMware Metro Cluster (vMSC) is also sometimes called a stretched cluster which gives more of a clue to it’s function, since it allows a single cluster to operate across geographically separate data centres. This ability to operate two locations as a single cluster gives significant benefits in terms of availability both for planned and unplanned outages.

How does a Metro Cluster Work?

A Metro Cluster allows VMs spread across data centres to act like they are in a single local cluster. In order to allow this functionality the VMs need access to the same storage at both sites, this is achieved with products like NetApp’s Metrocluster and HPE’s Peer Persistence products which enable a single view of the storage even though it is located in a multi site configuration,this is depicted in the diagram below.  Let’s dig into how this works.

VMware metro cluster

Each LUN is replicated between both storage systems using synchronous replication, however only one LUN can be written to at a time, whilst the other remains in a read only mode. The writable LUN is presented out to the hosts via active paths, the read only LUN is effectively hidden by the paths to it being marked as standby. This is based on ALUA (Asymmetric Logical Unit Access), which was used in traditional storage systems like the EMC Clarion. ALUA was used to mark preferred optimized paths to the controllers owning a LUN and non optimized paths marked indicated indirect paths to the LUN. The non optimized standby paths would only become live if the primary path failed.

Below shows an example of the paths on a ESXi host  connected to a Metro Cluster, the live paths are shown as active but this can be switched over using the storage array management software so that the active and standby paths reverse.

vmware metro cluster paths

What are the requirements FOR A STRETCHED CLUSTER?

In order to setup a VMware Metro Cluster the following is required:

  • VMware metro storage cluster licencing – There is no minimum license edition of vSphere for the creation of a metro cluster. However if automated workload balancing is required with DRS the minimum licence required would be Enterprise Plus edition
  • Supported storage connectivity. Fibre Channel, iSCSI, NFS, and FCoE are supported
  • Max latency for vMotion from vSphere 6 is 150ms
  • Stretched storage network across sites
  • Max supported storage replication 10ms, may be lower depending on vendor
  • Suitable software options selected for storage e.g. 3PAR Peer Persistence option
  • Maximum network latency RTT between sites for the VMware ESXi management networks is 10ms
  • vSphere vMotion network has a redundant network link, minimum of 250Mbps.
  • A third site is required for deployment of a witness which will act as an arbitrator
  • Storage IO control is not supported on a Metro Cluster enabled datastore


  • Mobility – since storage and network config is shared across the sites VMotion requirements are met and VMs can be either manually migrated or dynamically balanced across the cluster and locations using DRS
  • Reduce physical boundaries – DRS can be used to automatically balance workloads across locations
  • Reduce downtime – A metro cluster allows the online movement of VMs and storage for planned events without downtime. These can be performed together or independently.  For example if maintenance was planned on the storage system the active paths could be switched over to the other site or if the entire site was expected to be offline the storage and VMs could be migrated to the opposite site
  • High availability – vMSC protects against both storage system and site failures. In the event of a storage system failure this will be detected by a witness VM and the active paths switched over to the other system and for a site failure VMware HA will restart the VMs at the surviving site
  • Reduced RTO – Automated recovery reduces RTO for storage or site failure


  • Complexity – Although setting up a vMSC is not too strenuous, it is certainly more complex than a single site cluster
  • Testing – Although vMotion between sites and switch over of storage between sites can be tested there is no simple way to test for a full failover scenario for example with a run book

Considerations for VMware metro cluster design

  • HA admission control – The first consideration around HA is a logical one and this is that you should use admission control and set it to a reservation level of 50% for CPU and memory.  This is to ensure that should a failover between sites be required it will guarantee the resources are available
  • HA datastore heart beating – Is used to validate the state of a host. It is important that datastores used for heart beating are configured at both locations so that false results are not received if a site is lost.  It is recommended by VMware that 2 datastores are set for heart beating at each site
  • HA APD – The response for an All Paths Down needs customising, you will find the setting in HA settings after selecting Protect against Storage Connectivity Loss you will then need to select Power off and restart VMs.

vsphere metro cluster HA settings

  • ESX host names – Create a logical naming convention which will allow you to quickly identify which site a host is in. This could be the site is in the naming convention you choose or you choose a numbering system that reflects the location, for example odd hosts are in site one. This will make designing your system and running it day to day easier
  • Data locality and host affinity rules – Ideally hosts should be accessing data from their local storage array to improve response time. To ensure this is the case use VMware affinity rules to define the preferred site for VMs to run from a local LUN. Do not use must rules, if you do even in the event of a site failure the VM’s will not move as it would violate the rule
  • Logically name the LUN’s with their home sites – This is not a must and some may argue they want the flexibility to move LUNs between datacentres but it will make it easier for BAU staff to track which are local datastores

What causes a failover?

For an automated fail over of the storage to occur there are a number of failure conditions that must be met, those conditions that must be met for 3PAR are summarised in the following table from HPE.

3par peer persistence error handling

Essentially contact needs to be lost with the storage array and replication needs to be stopped.


There is no automated testing method for a Metro Cluster however with a bit of thought it is possible to run some tests, although some are more invasive and risky than others. We will run through the tests starting with the least risky and move towards more invasive and risky

1 vMotion – This is the simplest test to move a VM between sites. Although a simple test vMotion has more requirements than HA and so will start to build confidence as we move through tests

2 Storage switch over – Switching which site the storage is actively running on can again be completed online with little risk

3 Simulated storage failure – This test incurs risk since it is possible IO could be lost when the storage system is taken offline. Verify the specifics of a failover scenario with your storage vendor but for example with a 3PAR you will need to take the management network and Remote Copy network offline simultaneously.  Before you complete this disable auto failover of the LUNs you do not wish to be part of the test

4 Simulated site fail over – For this test you simulate a site failure by simulating a storage failure as above plus a host failure to get HA to kick in. Choose a VM to test and move this to a host by its self, power off other VMs in the environment put the hosts out of scope into maintenance mode.  Perform HA simulated failover as per https://kb.vmware.com/s/article/2056634. Again there is risk in this test, be selective about which VMs you choose to test

Remember tests 3 and 4 do incur risk, perform them at your at your own risk and only if the project requires it.

Further Reading

VMware vSphere Metro Storage Cluster Recommended Practices

The dark side of Stretched clusters

NetApp Metro Cluster tutorial video


Thoughts on VMware Cloud on AWS


Last month VMware announced the availability of VMware cloud on AWS. The size and scale of VMware essentially means that any large-scale product launch like this is significant. Large players like this can create the trend as well as follow it. Whilst VMware has not been massively successful in the cloud space yet, their foothold on-premises is huge and therefore the market potential also.

Technical Specs

Components of VMware cloud on AWS

This service leverages vSphere, NSX and vSAN to allow you to run your Vmware VM’s in the AWS cloud. This is not a nested solution like Ravello, but runs on dedicated hosts housed in the AWS data center. Today this service is only available in one region, AWS west and with a minimum of 4 dedicated hosts. The ESXi hosts are beefy with each having dual E5-2686 v4 CPUs @2.3GHz with 18 Cores, that’s 36 cores total or 72 including hyper threading. Memory is 512GB of RAM and storage 10TB raw per node.


Cost has been one of the most eagerly anticipated aspects of this announcement and initially there is one option which is an on demand billed per hour. This is $8.37 per host per hour, given the requirement for 4 hosts minimum this works out at a monthly cost of approx. $24,000 /month. This off the bat sounds expensive but Keith Townsend has done some analysis which shows this is comparative to running a VM in AWS EC2. In time 1 year and 3 year pricing deals will be available which will offer 30% and 50% reductions respectively compared to the on demand pricing.

Adoption and Use cases

In terms of technical innovation VMware on AWS does not offer significant additional benefits currently v hosting on premises today. Further integration with AWS services is expected in the future. However this still offers a number of cloud type benefits such as on demand pricing and scaling. VMware are responsible for all the patching and hardware maintenance of the hosts, so this becomes like a SAAS offering of VMware with only the management of the VM’s remaining a concern.

The 4 host minimum may be prohibitive to many SME’s. If VMware was able to deliver a non-dedicated hardware model this would facilitate the adoption rate by lowering the barriers to entry. It will be interesting to see if they look to a Ravello style nested system or if the performance hit of this approach is viewed as too great.

The on demand pricing facilities cloud bursting, imagine a travel company that in the summer season has double demand could request and deploy additional capacity in a familiar tool. This would be powerful.

VMware is extremely familiar to most organisations, it is a known and trusted technology. Some organisations may choose to do a lift and shift of their current VMware infrastructure to the cloud. When choosing to move to another cloud technology for example native AWS this would require a significant re-skilling process and potentially costly redesign exercise. VMware on AWS would enable a far simpler transition and be compatible with current processes and skill sets

From a pessimistic point of view VMware on AWS also offers CIO’s an easy on ramp for those under pressure to introduce a move to the cloud into the organisation.

This new lift and shift model being offered by VMware and Ravello gives organisations a more simplified path to the cloud. Whilst re-architecting application may be optimal to leverage the architectural differences of the cloud, that is a significant undertaking. This is a 1.0 release, it seems likely additional integration with AWS will come over time plus this product brings choice to the market with another method to move to the cloud.

What are your thoughts? Let me know in the comments

Don’t miss any more news or tips by following via e-mail or Twitter.




HVX hypervisor now with Hardware assisted virtulization

Ravello Cloud Matures

Last week Ravello announced their next generation of nested cloud solution with the Oracle Ravello Cloud Service. When Oracle first announced the purchase of Ravello I was unsure of their motives. At the time this was a cool solution essentially for geeks, allowing them to host lab environments in the Cloud. However as VMware have proven with the recent buzz around VMware on AWS, lift and shift of production workloads to the cloud is gaining mainstream focus.

But of course production workloads need performance and scale and that’s what Oracle aim to deliver with this new release of Ravello.

Ravello Architecture

For those new to this technology Ravello is a nested cloud solution that allows VMware virtual machines to run on AWS, Google or Oracle clouds. The technology that allows this is HVX which creates an abstracted storage layer, exposes VMware virtual devices and provides a networking layer.

Oracle Ravello Cloud HVX hypervisor

The VMware VM’s run on top of this HVX layer which traditionally has its self-resided on top of a KVM hypervisor. In summary Ravello allows you to run your VMware VM’s in the cloud like VMware on AWS but without the need for dedicated hardware hosts. If you want to see how the interface looks you can check out when I deployed vCentre on Ravello

Increased Performance

Whilst Ravello can run across several cloud providers, this release focuses on brining additional performance to Ravello in the Oracle Cloud. Hardware assisted virtualisation is now implemented allowing up to a 14 times improvement in performance versus without hardware assisted virtualisation.

HVX hypervisor now with Hardware assisted virtulization

Further performance enhancements are possible by entirely removing the KVM layer and allowing VM’s to run directly on the HVX layer in a bare metal mode. You can read about the performance tests of fellow bloggers Ather Beg and Robert Verdam in the Ravello Oracle Cloud.

Increased Scale

As well as performance increases to the underlying layer, VM’s themselves receive a performance boost by being able to scale to 32 vCPUs and 200GB RAM per VM. Oracle lists some of the potential use cases as test / dev , training, security testing and production. These performance increases certainly seem to make production a possibility. With Ravello and AWS both tempting customers with a lift and shift options it will be interesting to see how this cloud migration strategy evolves and how many companies opt down this route.