HP 3Par Replacing a Failed Disk

Replacing a failed disk in a 3PAR is pretty simple you just need to follow a few steps to make sure you do it safely. If you are new to 3PAR or would like to learn more a good place to start is our 3PAR beginners guide

Let’s get started with the disk replacement procedure:

1 Check to see if you have any failed or degraded disks in the system. Take a note of the disk ID and cage position. In this case disk ID =26, cage position = 2:8:0

3PARSAN01 cli% showpd -failed -degraded

                           -Size(MB)-- ----Ports----

Id CagePos Type RPM State   Total Free A     B     Cap(GB)

46 2:8:0? FC   10 failed 417792   0 ----- -----     450

------------------------------------------------------------

 1 total                  417792   0

2 Check if the disk sevicemag command is running on the drive. The servicemag command is used to inform the system to evacuate all the chunklets from a drive so that it is ready for service. Below we can see the servicemag has succeeded on the drive we identified in step 1.

3PARSAN01 cli% servicemag status

Cage 2, magazine 8:

The magazine was successfully brought offline by a servicemag start command.

The command completed Thu Jul 10 20:07:03 2014.

servicemag start -pdid 46 – Succeeded

3 Next we double check there is no data left on the drive. You can do this by running showpd –space driveID as below. You need to check that all columns other than size and failed are zero

3PARSAN01 cli% showpd –space 46

 

Id CagePos Type -State-   Size Volume Spare Free Unavail Failed

46 2:8:0? FC   failed 417792     0                 0     0          0         417792

---------------------------------------------------------------

1 total                        417792     0            0   0          0        417792

 

4 Next to replace the physical disk. Make sure you are happy with the above steps. Then pop that bad boy out, you will have a note of the location of the failed drive from step 1.

 

5 Once the disk is in you can monitor the progress of the rebuild by running servicemag status, which will give you an ETA for completion.

3PARSAN01 cli% servicemag status

Cage 2, magazine 8:

The magazine is being brought online due to a servicemag resume.

The last status update was at Thu Jun 26 12:09:19 2014.

Chunklets relocated: 73 in 50 minutes and 34 seconds

Chunklets remaining: 400

Chunklets marked for moving: 400

Estimated time for relocation completion based on 41 seconds per chunklet is: 4hours, 57 minutes and 39 seconds

servicemag resume 2 8 -- is in Progress

6 Once this is complete you can check that your disks are showing in a normal state with showpd -state

3PARSAN01 cli% showpd -state

Id CagePos Type -State- --------------------Detailed_State---------------------

43 2:5:0   FC   normal normal

44 2:6:0   FC   normal normal

45 2:7:0   FC   normal normal

46 2:8:0 FC   normal normal

47 2:9:0   FC   normal normal

That’s it, job done! This blog has over 150+ 3PAR articles we have put together a selection of the best articles for you to learn more about 3PAR

 

To stay in touch with more 3PAR news and tips connect with me on LinkedIn and Twitter.

Published by

33 thoughts on “HP 3Par Replacing a Failed Disk

  1. I believe servicemag resume cageposition is not necessary with 3.1.2 and newer. System will automatically detect new PD and start rebuilding.

  2. Hi Richard,

    Thanks for your article on replacing HDDs.
    I have a requirement to replace 4 HDDs on a 8 cage 7400 3par box.
    Is it possible do so at the same time?

    1. Not sure why you would be replacing 4 disks at the same time. If they are already failed I see no harm in replacing them with a brief pause inbetween. If you are removing disks for any other reason I would do one at a time.

  3. DIsk 3 failed on our 7200 node. The disk ID was 3 and when the “relocation” started on the new replaced drive the new disk was assigned disk ID of 48. Is there a way to change IDs?

    1. Hi you are quite right the ID will change after a disk swap. I have never seen a way to change back, but will publish your comment here to see if anyone else has come across this. Thanks for reading!

  4. Hi,

    Have a V800, with a failed disk within a 4disk cage. Its state is failed, and the disk is empty when showing showpd -space pdid as you describe. When I run servicemag status Im not seeing any servicemag operations like you have shown, it just says “no servicemag operations running”. Any thoughts? Do I need to perform something active before replacing the disk, ie. pulling out the cage?

    1. Hi. The disks on the V800 are held in disk magazines of 4 disks as you describe. The servicemag command will work at the magazine level and log or relocate the data from all 4 disks in the magazine you wish to remove. When the servicemag command has completed you can remove the magazine.

      1. Thanks a bunch, it worked and the drive has been replaced now 🙂

        I have another question:

        I have 2 disks in degraded state, 1 in a 7400 system and 1 in a v800 system. But it remains in degraded state and does not change to failed as I would expect it to do after some time?

        Is there a way to force it into failed mode or will I be able to run the servicemag start on these disks even though they are only in degraded state?

        1. It could remain in a degraded state if it had failed chunklets on it, you can check for this using showpd -c. The disk will only be marked failed automatically if it has 6 failed chunklets on it. Like you say you can force an evacuation by using servicemag start if you like. The only other thing I can think of is to check the firmware versions on those degraded disks look correct showpd -i. Good luck!

      2. hi
        in this case we run this command servicemag start -log -pdid 137
        and all the 4 disk datanil ( 3 disks are degraded state , 1 is faile)

        we didnt replace the disk untill 24 hours

        is this leads any impact

  5. Hello Richard,
    I’ve got 3Par InServ e200 system, and after replacing bunch of failed disks, I still have this error:
    inserv-e200b cli% showpd -failed -degraded
    –Size(MB)– —-Ports—-
    Id CagePos Type Speed(K) State Total Free A B
    1 0:12:0? FC 10 failed 380928 0 —– —–
    3 0:3:0? FC 10 failed 380928 0 —– —–
    7 0:7:0? FC 10 failed 380928 0 —– —–
    10 0:10:0? FC 10 failed 380928 0 —– —–
    12 0:12:0? FC 10 failed 380928 0 —– —–
    15 0:15:0? FC 10 failed 380928 0 —– —–
    16 1:0:0? FC 10 failed 380928 0 —– —–
    20 1:4:0? FC 10 failed 380928 0 —– —–
    21 1:5:0? FC 10 failed 380928 0 —– —–
    25 1:9:0? FC 10 failed 380928 0 —– —–
    30 1:14:0? FC 10 failed 380928 0 —– —–
    33 0:3:0? FC 10 failed 380928 0 —– —–
    ———————————————————-
    12 total 4571136 0

    Is there command to clean all the settings and make system like new. At this point I don’t are about any data.
    Thanks in advance
    Bohdan

    1. In this scenario : I would have first checked : showpd -p -cg 0 -mg 12, then for another drive and if there is more than 1 drives on particular slot, would have checked servicemag status, if it reflects succeeded, then we can go ahead and run dismisspd 1. We need to repeat this for all the drives reflecting above. To clean the system, the steps given by 3pardude is perfect.

  6. Hi
    I am using a HPE 3PAR StoreServ 7200 with 16 disk and 8 empty hdd bays. So I am going to add 4 new disk. I inserted new disks and did every single step right, but somehow it says “degraded” in status for a while and then “Failed”. What can I do?
    Thanks a lot, and sorry for bad english.

      1. yes, my 3PAR OS Version Was 3.2.1 mu2 and the hard disks was not listed in that version of OS, So i found out that i had to update OS at least to 3.2.1 mu21. i Updated that to 3.2.2 mu2 and that worked perfectly.

  7. I have a replaced a disk on Hp 3par 7400, the disk went online and rebuild completed. But the old disk (degraded) is still showing in the system. I’ve tried dismisspd without any luck. Any suggestions ?

  8. At step 3, you may use “showpd -c 46” to check the detailed chunklet information, other than “showpd -space 46”.

    1. You could run the Out Of Box setup again, but this would remove your previous config and volumes. But of course this would not be the same as a secure wipe ie the data could still be recovered by experts

  9. Hi Richard, I have a 7200 with a degraded drive, for which servicemag failed:

    The output of the servicemag start was:
    servicemag start -log -pdid 11
    … servicing disks in mag: 0 11
    … normal disks:
    … not normal disks: WWN [5000CCA016117623] Id [11] diskpos [0]
    … not vacant disks: WWN [5000CCA016117623] Id [11] diskpos [0]
    disk Id(s) [11] must be vacant before servicemag start -log is issued.
    Issue the command “movepd -devtype 11” to begin or continue disk vacate.
    Chunklet movement may be monitored using the “showpd -c 11” command
    servicemag start -log -pdid 11 — Failed

    movepd does not help on vacating the drive, Here is showpd -c output:

    ——- Normal Chunklets ——– —- Spare Chunklets —-
    – Used – ——– Unused ——– – Used – —- Unused —-
    Id CagePos Type State Total OK Fail Free Uninit Unavail Fail OK Fail Free Uninit Fail
    11 0:11:0 FC degraded 408 0 1 0 321 0 52 0 0 0 0 34
    ——————————————————————————————
    1 total 408 0 1 0 321 0 52 0 0 0 0 34

    I understand that I cannot proceed with disk replacement unless OK and Fail are 0. Any tips on what to do now? OS is 3.1.3 MU1

    1. Hi Mario

      You are quite right, do not proceed until all chunklets have been moved from the disk you are trying to remove. I think the issue is that you are using the -log option this is meant for 10K systems where a magazine contains multiple disks. You need to run servicemag start without the -log option, the format is servicemag start -pdid . So in your case servicemag start -pdid 11

      Hope that helps

      Richard

  10. Hi Richard

    I have failed drive in a 7400 and I want to remove it and not replace it. The system automatically ran servicemag when the drive failed so not sure how I go about dismissing that pd?

    Thanks
    Nathan

    1. Hi Nathan

      You would need to follow the process below, you can skip 1 as you have already moved the data

      1. Empty the drive (assuming plenty more drives above the cpg set-size * rowsize) behind the same node) -> movepdtospare -vacate -nowait

      2. Remove spares -> removespare :a

      3. Dismiss pd from config -> dismisspd

      4. Offloop drive -> controlmag offloop : (drive maintenance led will now be lit)

      5. Physically remove drive

  11. Replace a failed hard disk on our 3PAR V400.

    Initiated the command “servicemag start -pdid 15 and the command was successful.

    Next we replaced the new disk.

    Then we inititated the command “servicemag resume 0 9” very soon (We cant sure whether the plug led has been off)but this command failed and we got the following error:

    V400 cli% servicemag status -d
    Cage 0, magazine 9:
    A servicemag resume command failed on this magazine.
    The command completed at Thu Mar 30 14:02:56 2017.
    The output of the servicemag resume was:
    servicemag resume 0 9
    … mag 0 9 already onlooped
    … firmware is current on pd WWN [2000B45253996193] Id [12]
    … firmware is current on pd WWN [2000B4525397CB59] Id [13]
    … firmware is current on pd WWN [2000B4525397BB3B] Id [14]
    … firmware is current on pd WWN [5000CCA04090DD04]
    … firmware is current on pd WWN [2000B4525397B911] Id [15]
    … checking for valid disks…
    … checking for valid disks…
    … disks not normal yet..trying admit/onloop again
    … onlooping mag 0 9
    Failed —
    DC2/DC4 cmd 3: enclosure returned error: Hardware is busy (0x8)

    The replaced disk pdid 15 remains in system.

    V400 cli% showpd
    —-Size(MB)—– —-Ports—-
    Id CagePos Type RPM State Total Free A B Cap(GB)
    — 0:9:3 FC 15 failed 0 0 0:6:1- 1:6:1- 0
    0 0:0:0 FC 15 normal 559104 365568 0:6:1* 1:6:1 600
    1 0:0:1 FC 15 normal 559104 364544 0:6:1 1:6:1* 600
    2 0:0:2 FC 15 normal 559104 366592 0:6:1* 1:6:1 600
    3 0:0:3 FC 15 normal 559104 371712 0:6:1 1:6:1* 600
    5 0:1:1 FC 15 normal 559104 367616 0:6:1 1:6:1* 600
    6 0:1:2 FC 15 normal 559104 365568 0:6:1* 1:6:1 600
    7 0:1:3 FC 15 normal 559104 371712 0:6:1 1:6:1* 600
    8 0:8:0 SSD 150 normal 94208 0 0:6:1* 1:6:1 100
    9 0:8:1 SSD 150 normal 94208 0 0:6:1 1:6:1* 100
    10 0:8:2 SSD 150 normal 94208 0 0:6:1* 1:6:1 100
    11 0:8:3 SSD 150 normal 94208 0 0:6:1 1:6:1* 100
    12 0:9:0 FC 15 normal 559104 0 0:6:1* 1:6:1 600
    13 0:9:1 FC 15 normal 559104 0 0:6:1 1:6:1* 600
    14 0:9:2 FC 15 normal 559104 0 0:6:1* 1:6:1 600
    15 0:9:3? FC 15 failed 559104 0 —– —– 600

    V400 cli% showpd -state 15
    Id CagePos Type -State- ——————————Detailed_State——————————-
    15 0:9:3? FC failed vacated,missing,invalid_media,multiple_chunklets_media_bad,spinup,servicing
    —————————————————————————————————
    1 total

    V400 cli% showversion
    Release version 3.1.2 (MU1)
    Patches: P04

    Component Name Version
    CLI Server 3.1.2 (P04)
    CLI Client 3.1.2 (P04)
    System Manager 3.1.2 (MU1)
    Kernel 3.1.2 (MU1)
    TPD Kernel Code 3.1.2 (MU1)

    after a few days,The new disk states is new and pid 4,so i issues the command “servicemag resume 0 9” again,but still failed。then the new disk states is failed。

    QZGA_V400 cli% servicemag status -d
    Cage 0, magazine 9:
    A servicemag resume command failed on this magazine.
    The command completed at Tue Oct 10 08:09:22 2017.
    The output of the servicemag resume was:
    servicemag resume 0 9
    … mag 0 9 already onlooped
    … firmware is current on pd WWN [2000B45253996193] Id [12]
    … firmware is current on pd WWN [2000B4525397CB59] Id [13]
    … firmware is current on pd WWN [2000B4525397BB3B] Id [14]
    … firmware is current on pd WWN [5000CCA04090DD04] Id [ 4]
    … firmware is current on pd WWN [2000B4525397B911] Id [15]
    … checking for valid disks…
    … checking for valid disks…
    … disks not normal yet..trying admit/onloop again
    … onlooping mag 0 9
    … checking for valid disks…
    … checking for valid disks…
    … disks not normal yet..trying admit/onloop again
    … onlooping mag 0 9
    … checking for valid disks…
    … checking for valid disks…
    … disks not normal yet..trying admit/onloop again
    … onlooping mag 0 9
    … checking for valid disks…
    Failed —
    disk WWN [5000CCA04090DD04] Id [ 4] is not normal. Please use showpd -s to see details of disk state
    servicemag resume 0 9 — Failed
    QZGA_V400 cli% servicemag status
    Cage 0, magazine 9:
    A servicemag resume command failed on this magazine.
    The command completed at Tue Oct 10 08:09:22 2017.
    servicemag resume 0 9 — Failed. Please run servicemag status -d for more detail
    QZGA_V400 cli% showpd -s 4
    Id CagePos Type -State- ———————————Detailed_State———————————
    4 0:9:3 FC failed vacated,not_available_for_allocations,invalid_media,multiple_chunklets_media_bad
    ——————————————————————————————————–
    1 total
    QZGA_V400 cli% showpd -s
    Id CagePos Type -State- ———————————Detailed_State———————————
    0 0:0:0 FC normal normal
    1 0:0:1 FC normal normal
    2 0:0:2 FC normal normal
    3 0:0:3 FC normal normal
    4 0:9:3 FC failed vacated,not_available_for_allocations,invalid_media,multiple_chunklets_media_bad
    5 0:1:1 FC normal normal
    6 0:1:2 FC normal normal
    7 0:1:3 FC normal normal
    8 0:8:0 SSD normal normal
    9 0:8:1 SSD normal normal
    10 0:8:2 SSD normal normal
    11 0:8:3 SSD normal normal
    12 0:9:0 FC normal servicing
    13 0:9:1 FC normal servicing
    14 0:9:2 FC normal servicing
    15 0:9:3? FC failed vacated,missing,not_available_for_allocations,invalid_media,no_valid_ports
    16 1:0:0 FC normal normal
    17 1:0:1 FC normal normal
    18 1:0:2 FC normal normal

    thank you

  12. hello dears, thanks for your post
    can i replace a failed disk with disk wich get big capacity than the failed and other disk of the magazine

    1. Yes you can use different size disks in a CPG. Personally I wouldn’t recommend it as the smaller disks will get full quicker than the larger ones and you will end up with an uneven load.

Leave a Reply

Your email address will not be published. Required fields are marked *