I recently took on the support of a customer with a Peer Persistence setup. I had lots of questions when I first started looking into it and I see lots of questions on the forums around this area so I wanted to cover them off in this post.
What is Peer Persistence?
Peer Persistence is a 3PAR feature that enables your chosen flavour of hypervisor, either Hyper-V or vSphere to act as a metro cluster. A metro cluster is a geographically dispersed cluster that enables VM’s to be migrated without interruption from one location to the next with zero down time. This transparent movement of VM’s across data centres, allows load balancing, planned maintenance and can form part an organisations high availability strategy.
What are the building blocks for Peer Persistence?
The first thing you are going to need is two 3PAR systems with Remote Copy, Peer Persistence is effectively an add-on to Remote Copy and cannot exist without it. Remote Copy must by in synchronous mode and so there are some requirements around latency. The max round-trip latency between the systems must be 2.6ms or less, this rises to 5ms or less with 3PAR OS 3.2.2
As this is effectively a cluster setup a quorum is required, which HPE provide in the form of a witness VM deployed from OVF. This witness VM acts as the arbitrator to the cluster to verify which systems are available and if automatic fail over to the 2nd site should be initiated.
The other requirements are:
- The 3PAR OS must be a minimum 3.2.1 or newer for Hyper-V. I would recommend at least 3.2.1 MU3 since this included a fix which removed the need to rescan disks on hosts after a fail over. 3.1.2 MU2 or newer for VMware
- The replicated volumes must have the same WWN on both 3PAR systems. If you create a new volume and add it to remote copy this will happen automatically.
- You will need a stretched fabric that will allow hosts access to both systems
- Hosts need to be zoned to the 3PAR systems at both sites
- When you create the remote copy groups you must enable both auto_failover and path_management polices to allow for automated failover
- FC, iSCSI, or FCoE protocols are supported for host connectivity. RCFC is recommended for the remote copy link.
Further requirements specific to hypervisor are:
- Windows hosts must be set to a persona of 15
- For non-disruptive failover Hyper-V hosts must be 2008 R2 or 2012 R2
- Windows hosts must be set to a persona of 11
- For non-disruptive failover ESXi hosts must be ESXi 5.x or newer
- No storage DRS in automatic mode
- Recommended to configure datastore heart beating to a total of 4 to allow 2 at each site
- Set the HA admission policy to allow all the required workloads from the other site to run in the event of a fail over
The picture bit
I have robbed the picture above from HPE Peer Persistence documentation, it has more lines on it than the London underground, but let me try and explain. There are two geographically dispersed data centres site A and B. Both sites contain 3 hypervisor hosts shown at the top of the picture and a 3PAR shown at the bottom. The data centres are then linked by a stretched fabric so the zoning information is shared across the sites, synchronous Remote Copy will also occur across the link. Each host is zoned to 3PAR systems at both sites.
At the top of the picture is a blue cylinder at site A and a grey one at site B this represents that each volume is presented twice, once at each site. The volume has the same WWN and by using ALUA one of the volume will be marked as writeable (blue cylinder), whilst the other is visible to the host but marked non writeable (grey cylinder). In the event of a switchover the volume from site A has its paths are marked as standby at site A and whilst the volume at site B has its paths marked as active.
The quorum witness shown at the bottom of the picture as QW is a VM which must sit at a third site not site A or B and must not rely on the storage it is monitoring. It is deployed using an OVF template and is available in Hyper-V and VMware versions, I will cover its deployment in another post. The job of the quorum witness is to act as an independent arbitrator and decide if an automatic failover from one site to another should occur. The witness VM essentially checks two things the status of the remote copy link and the availability of the 3PAR systems. When it detects a change in one of these conditions the action taken is displayed in the following table borrowed from the HPE Peer Persistence documentation. The key thing to take away from the table is that an automatic failover across sites will only ever occur if the witness detects that Remote Copy has stopped and one of the 3PAR systems cannot be contacted.
Enough chat, let’s implement this thing
To summarise everything we have talked about so far I am going to list the high level steps to create a 3PAR Peer Persistence setup. I am going to use a Hyper-V setup as an example but the steps for VMware are very similar.
- I will assume you have synchronous replication up and running
- Verify you have the Peer Persistence licence
- Setup a stretched fabric across both sites
- Configure your zoning so all hosts can access both 3PAR systems
- If necessary upgrade your 3PAR OS to the 3PAR OS version listed in the requirements section
- Check and if necessary set the correct host persona
- Check the volumes have the same WWN on each array. This is unnecessary if you have created the volume on the source array and allowed remote copy to create the destination.
- Deploy the witness VM. I will cover this soon in a future post
- Configure the quorum witness from inside SSMC. Main menu, Remote Copy
- Create the remote copy group which will choose which volumes are replicated. From SSMC Main menu, Remote Copy Group, create. From advanced options ensure path management and auto failover are selected.
- Once all this is setup to manually balance a workload from one site to another select switchover