3PAR storage Remote Copy Failover, Recovery & Restore using IMC
This post covers the procedure using SSMC, if you wish to read some more background on Remote Copy and using the 3PAR management console check out this previous post.
For the remote copy failover, recover and restore operation there are defined steps involved which need to followed as below.
- DR is normally done in case we are having disaster in one primary/source Data center and none of the server are accessing impacted storage which need to be used for failover.
- In this document since we are doing the exercise with below volumes so it is assumed and also necessary that associated hosts with these LUNs are shutdown so that there is no I/O from hosts in Primary DC:
- DR Group name: DR_Testing_RC_GRP
- Source Volume name: test_volume_for_dr_testing
- Destination Volume name: test_volume_for_dr_testing
a) Stop Remote Copy Group
Reverse the role of all volume groups on the system that is still in normal operation (the failover system). Here we will consider that system “System-A” is in failed state or having some planned maintenance activity.
Below screenshot shows that how to open Remote Copy Groups from the SSMC:
Below screen shot shows the snippet we want to do the failover of replication group “DR_Testing_RC_GRP” from Source system System-A to System-B
Left pane shows the group name and on right side we can see the source, target volume name.
Below screenshot shows the state of the 3PAR remote copy (Started and working condition initially) before starting the failover:
During this exercise, first of all we have to stop the remote copy replication group “DR_Testing_RC_GRP”.
Note: Before this ensure that hosts associated with Source volume are shut down and there is no I/O on the source volume otherwise data corruption may happen.
From the SSMC, click on the Action button and select Stop
By stopping the Remote copy group below status will show in SSMC. At this point in time our Volume is still writable in System-A, until we initiate Failover.
b) Failover Remote Copy Group
Once the remote copy group is in failed state then we will see that Group state is in Stopped state (as shown in last snippet). Now we must Click on Actions again on the failed remote copy group (in our case we are preparing example with remote copy group “DR_Testing_RC_GRP”) and click “Failover” option as shown below
Now one pop-up window will open (as shown in next picture) and click “Failover” and after that one more pop-up will open and click “YES, Failover” here.
Note: – After this failover will be executed for the selected group. The failover operation changes the role of secondary groups on the backup system (i.e. System-B) from “Secondary” to “Primary-Rev”. Any LUNs (VLUNs) associated with the volumes in the selected groups become writable by hosts connected to the destination system (destination system is System-B in our case). These VLUNs will be in writable permission from both side 3PAR so we must ensure that no write operation on VLUN from Source site as stated in pre-requisite before failover started.
Now there should be Group state as “stopped” as per below picture. Replication status in the table will be as “Stopped” and Destination system (i.e. System-B) will show its role as “Primary-Rev” with DR state as “Failover”. This also shows that now Writable LUNs are on both side system, “System-A” & “System-B” (in the “Writable LUNs” column).
At this stage Volume can be presented to host in Destination Data center from destination Storage system (System-B) and application can be started in Destination DC (assuming that Source DC is still down).
a) Description of Recovery Option
Recover option will recover the failed system (in our case failed system is System-A). When both systems in the remote-copy pair are ready to resume normal operation, reverse the natural direction of data flow and resynchronize the systems.
Now, we have to go on the replication group which was failed earlier (in our case it is “DR_Testing_RC_GRP”) and click Actions and then “Recover” This will copy data / initiates reverse replication and synchronize the delta changes from the reversed volume groups on the failover system (System-B) to the corresponding volume groups on the recovered system (System-A).
Once executed the role of the remote copy group on the source system (System-A) becomes “Secondary-Rev”. Also, any LUN associated with the volumes in the selected groups become non-writable on the source system (System-A).
Note: – The recovery operation can be executed only on groups that have successfully completed the failover option. LUNs on System-A will be in read-only during this time to ensure that no writes are happening on Source volume from any host and Delta are copied from target system to Source system. We also need to make sure that before starting recovery, host associated with destination volume should be shut down to ensure that there is no new writes are happening during this process.
b) Recover Remote Copy Group
Below two pictures shows the process on how to put a replication group in recover state, Click Actions on replication group which was failover and click on the option “Recover” and then click RECOVER in pop-up window.
Once clicked on Recover prompt, there will be one more pop-up, click Yes, recover now. Important to note that VLUNs will not be writable now on source system “System-A” and it will be in writable mode on backup system “System-B”.
Now, we should be able to see Source role for “DR_Testing_RC_GRP” on source system should be “Secondary-Rev” and “Primary-Rev” on the Destination System (System-B) as shown in below picture. Now the DR state will be in “Recover” state.
1 Issue the showrcopy command from the CLI on the failover system (System-B).Verify the following:
- The Status of the target system (recovered system – “System-A”) is ready
- The SyncStatus of all volumes in the Primary-Rev volume groups is Syncing.
- The Status of all sending links is Up,
2 Issue the showrcopy command from the CLI on the recovered system (System-A) Verify the following:
- The Status of the target system (System-B) is ready
- The Status of all sending links is Up
- The Role of the synchronizing volume groups is Secondary-Rev
- The SyncStatus of all volumes in the Secondary-Rev volume groups is Syncing
Once the recovery is completed for the remote copy group then we must restore back the actual state of replication group. i.e. bring the replication group (in our case it is “DR_Testing_RC_GRP”) back to “System-A” as primary role and again replication from the source system (System-A) to destination system (System-B).
b) Procedure to Implement
For doing this click Actions on the replication group and click “Restore”. This restore operation restores replication for the selected Remote Copy group to a pre-failover state after the recovery operation has been completed. Once the Restore operation is executed the role of the remote copy group on source system (System-A) will be “primary” and the Remote copy group on the Destination system will be “Secondary”. Also, any LUNs associated with the volumes in the selected groups become writable by hosts connected to the source system (i.e. System-A) and become non-writable by hosts connected to the destination system (i.e. System-B)
To restore the operation back to “actual” status below pictures shows the steps:
And once we will click on the OK in the above pasted picture, then there will be one more pop-up to click Yes or No, we have to click “Yes” to complete the process.
Now the system should be able to see the status of restored replication group as it was in actual state. Below picture shows the actual status with System-A system as Source and normal state and replication pointing from System-A to System-B.
At this point servers can be started in source system to start the application.
Though we have restored the Remote copy operation to original but still we should start the Remote copy group for regular sync-up using below options:
Continue running from DR
These are combination of one DR recovery in which we don’t want to bring back delta from the DR site to primary site
For Failover refer section (as described above) Option 1-3 (Failover, Recovery, Restore) and follow the same steps as upto steps Failover Remote Copy Group.
After the above steps are covered we performed “Recovery” in previous procedure however now we will perform “Revert Failover” where delta of data changes on Target system will not be reverted to Source system.
Note: – “Revert Failover” operation can be executed only on groups that have successfully completed the failover option. We need to make sure that before starting “Revert Failover”, host associated with destination volume should be shut down to ensure that there is no new writes are happening during this process
To initiate the “Revert failover” from the “Failover” state, click on Actions and then “Revert Failover” option.
Once we click the “Yes, revert failover” option then system will come back into same condition as initial state