I recently took on the support of a customer with a Peer Persistence setup. I had lots of questions when I first started looking into it and I see lots of questions on the forums around this area so I wanted to cover them off in this post.
What is Peer Persistence?
Peer Persistence is a 3PAR feature that enables your chosen flavour of hypervisor, either Hyper-V or vSphere to act as a metro cluster. A metro cluster is a geographically dispersed cluster that enables VM’s to be migrated without interruption from one location to the next with zero down time. This transparent movement of VM’s across data centres, allows load balancing, planned maintenance and can form part an organisations high availability strategy.
Peer Persistence can also be used with Windows failover clustering to enable a metro cluster for services such as SQL server on physical servers.
What are the building blocks for Peer Persistence?
The first thing you are going to need is two 3PAR systems with Remote Copy, Peer Persistence is effectively an add-on to Remote Copy and cannot exist without it. Remote Copy must be in synchronous mode and so there are some requirements around latency. The max round-trip latency between the systems must be 2.6ms or less, this rises to 5ms or less with 3PAR OS 3.2.2
As this is effectively a cluster setup a quorum is required, which HPE provide in the form of a witness VM deployed from OVF. This witness VM acts as the arbitrator to the cluster to verify which systems are available and if automatic fail over to the 2nd site should be initiated.
The other requirements are:
- The 3PAR OS must be a minimum 3.2.1 or newer for Hyper-V. I would recommend at least 3.2.1 MU3 since this included a fix which removed the need to rescan disks on hosts after a fail over. 3.1.2 MU2 or newer for VMware
- The replicated volumes must have the same WWN on both 3PAR systems. If you create a new volume and add it to Remote Copy this will happen automatically.
- You will need a stretched fabric that will allow hosts access to both systems
- Hosts need to be zoned to the 3PAR systems at both sites
- When you create the remote copy groups you must enable both auto_failover and path_management polices to allow for automated failover
- FC, iSCSI, or FCoE protocols are supported for host connectivity. RCFC is recommended for the remote copy link.
Further requirements specific to hypervisor are:
- Windows hosts must be set to a persona of 15
- For non-disruptive failover Hyper-V hosts must be 2008 R2 or 2012 R2
- Windows hosts must be set to a persona of 11
- For non-disruptive failover ESXi hosts must be ESXi 5.x or newer
- No storage DRS in automatic mode
- Recommended to configure datastore heart beating to a total of 4 to allow 2 at each site
- Set the HA admission policy to allow all the required workloads from the other site to run in the event of a fail over
The picture bit
I have robbed the picture above from HPE Peer Persistence documentation, it has more lines on it than the London underground, but let me try and explain. There are two geographically dispersed data centres site A and B. Both sites contain 3 hypervisor hosts shown at the top of the picture and a 3PAR shown at the bottom. The data centres are then linked by a stretched fabric so the zoning information is shared across the sites, synchronous Remote Copy will also occur across the link. Each host is zoned to 3PAR systems at both sites.
At the top of the picture is a blue cylinder at site A and a grey one at site B this represents that each volume is presented twice, once at each site. The volume has the same WWN and by using ALUA one of the volume will be marked as writeable (blue cylinder), whilst the other is visible to the host but marked non writeable (grey cylinder). In the event of a switchover the volume from site A has its paths are marked as standby at site A and whilst the volume at site B has its paths marked as active.
The quorum witness shown at the bottom of the picture as QW is a VM which must sit at a third site not site A or B and must not rely on the storage it is monitoring. It is deployed using an OVF template and is available in Hyper-V and VMware versions, I will cover its deployment in another post. The job of the quorum witness is to act as an independent arbitrator and decide if an automatic failover from one site to another should occur. The witness VM essentially checks two things the status of the remote copy link and the availability of the 3PAR systems. When it detects a change in one of these conditions the action taken is displayed in the following table borrowed from the HPE Peer Persistence documentation. The key thing to take away from the table is that an automatic failover across sites will only ever occur if the witness detects that Remote Copy has stopped and one of the 3PAR systems cannot be contact
Enough chat, let’s implement this thing
To summarise everything we have talked about so far I am going to list the high level steps to create a 3PAR Peer Persistence setup. I am going to use a Hyper-V setup as an example but the steps for another type of failover cluster and VMware are very similar
- I will assume you have synchronous replication up and running and meet the latency requirements as described above
- Verify you have the Peer Persistence licence
- Setup a stretched fabric across both sites
- If necessary upgrade your 3PAR OS to the 3PAR OS version listed in the requirements section
- Deploy the witness VM. Check out my full deploying the 3PAR quorum witness VM guide for assistance on this
- Configure your zoning so all hosts are zoned to both 3PAR systems
- Check and if necessary set the correct host persona
- On the source system create the remote copy group which contains all the volumes requiring replication. From SSMC Main menu, Remote Copy Group, create.
- When creating the remote copy group the WWN of the source and target volume need to be identical. To ensure this is the case when you create the remote copy group ensure that that Remote copy volumes create automatically is selected
- Also when creating the remote copy group ensure the two tick boxes in the Peer Persistence section are checked for path management and auto failover
- For each 3PAR export the volumes to the hosts in the cluster. i.e. the source and destination volumes should both be exported to the cluster. Before doing the export ensure the host are already added to a cluster to avoid corruption
- You may need to rescan the disks, in Disk manager once they are exported
To change which 3PAR system is actively serving the data and which is standby. Select Remote copy groups, highlight the group you wish to change where it is active, and choose Switchover