HP 3Par Replacing a Failed Disk

Replacing a failed disk in a 3PAR is pretty simple you just need to follow a few steps to make sure you do it safely. If you are new to 3PAR or would like to learn more a good place to start is our 3PAR beginners guide

Let’s get started with the disk replacement procedure:

1 Check to see if you have any failed or degraded disks in the system. Take a note of the disk ID and cage position. In this case disk ID =26, cage position = 2:8:0

3PARSAN01 cli% showpd -failed -degraded

                           -Size(MB)-- ----Ports----

Id CagePos Type RPM State   Total Free A     B     Cap(GB)

46 2:8:0? FC   10 failed 417792   0 ----- -----     450

------------------------------------------------------------

 1 total                  417792   0

2 Check if the disk sevicemag command is running on the drive. The servicemag command is used to inform the system to evacuate all the chunklets from a drive so that it is ready for service. Below we can see the servicemag has succeeded on the drive we identified in step 1.

3PARSAN01 cli% servicemag status

Cage 2, magazine 8:

The magazine was successfully brought offline by a servicemag start command.

The command completed Thu Jul 10 20:07:03 2014.

servicemag start -pdid 46 – Succeeded

3 Next we double check there is no data left on the drive. You can do this by running showpd –space driveID as below. You need to check that all columns other than size and failed are zero

3PARSAN01 cli% showpd –space 46

 

Id CagePos Type -State-   Size Volume Spare Free Unavail Failed

46 2:8:0? FC   failed 417792     0                 0     0          0         417792

---------------------------------------------------------------

1 total                        417792     0            0   0          0        417792

 

4 Next to replace the physical disk. Make sure you are happy with the above steps. Then pop that bad boy out, you will have a note of the location of the failed drive from step 1.

 

5 Once the disk is in you can monitor the progress of the rebuild by running servicemag status, which will give you an ETA for completion.

3PARSAN01 cli% servicemag status

Cage 2, magazine 8:

The magazine is being brought online due to a servicemag resume.

The last status update was at Thu Jun 26 12:09:19 2014.

Chunklets relocated: 73 in 50 minutes and 34 seconds

Chunklets remaining: 400

Chunklets marked for moving: 400

Estimated time for relocation completion based on 41 seconds per chunklet is: 4hours, 57 minutes and 39 seconds

servicemag resume 2 8 -- is in Progress

6 Once this is complete you can check that your disks are showing in a normal state with showpd -state

3PARSAN01 cli% showpd -state

Id CagePos Type -State- --------------------Detailed_State---------------------

43 2:5:0   FC   normal normal

44 2:6:0   FC   normal normal

45 2:7:0   FC   normal normal

46 2:8:0 FC   normal normal

47 2:9:0   FC   normal normal

That’s it, job done! This blog has over 150+ 3PAR articles we have put together a selection of the best articles for you to learn more about 3PAR

 

To stay in touch with more 3PAR news and tips connect with me on LinkedIn and Twitter.

Published by

46 thoughts on “HP 3Par Replacing a Failed Disk

  1. I believe servicemag resume cageposition is not necessary with 3.1.2 and newer. System will automatically detect new PD and start rebuilding.

  2. Hi Richard,

    Thanks for your article on replacing HDDs.
    I have a requirement to replace 4 HDDs on a 8 cage 7400 3par box.
    Is it possible do so at the same time?

    1. Not sure why you would be replacing 4 disks at the same time. If they are already failed I see no harm in replacing them with a brief pause inbetween. If you are removing disks for any other reason I would do one at a time.

  3. DIsk 3 failed on our 7200 node. The disk ID was 3 and when the “relocation” started on the new replaced drive the new disk was assigned disk ID of 48. Is there a way to change IDs?

    1. Hi you are quite right the ID will change after a disk swap. I have never seen a way to change back, but will publish your comment here to see if anyone else has come across this. Thanks for reading!

  4. Hi,

    Have a V800, with a failed disk within a 4disk cage. Its state is failed, and the disk is empty when showing showpd -space pdid as you describe. When I run servicemag status Im not seeing any servicemag operations like you have shown, it just says “no servicemag operations running”. Any thoughts? Do I need to perform something active before replacing the disk, ie. pulling out the cage?

    1. Hi. The disks on the V800 are held in disk magazines of 4 disks as you describe. The servicemag command will work at the magazine level and log or relocate the data from all 4 disks in the magazine you wish to remove. When the servicemag command has completed you can remove the magazine.

      1. Thanks a bunch, it worked and the drive has been replaced now 🙂

        I have another question:

        I have 2 disks in degraded state, 1 in a 7400 system and 1 in a v800 system. But it remains in degraded state and does not change to failed as I would expect it to do after some time?

        Is there a way to force it into failed mode or will I be able to run the servicemag start on these disks even though they are only in degraded state?

        1. It could remain in a degraded state if it had failed chunklets on it, you can check for this using showpd -c. The disk will only be marked failed automatically if it has 6 failed chunklets on it. Like you say you can force an evacuation by using servicemag start if you like. The only other thing I can think of is to check the firmware versions on those degraded disks look correct showpd -i. Good luck!

      2. hi
        in this case we run this command servicemag start -log -pdid 137
        and all the 4 disk datanil ( 3 disks are degraded state , 1 is faile)

        we didnt replace the disk untill 24 hours

        is this leads any impact

  5. Hello Richard,
    I’ve got 3Par InServ e200 system, and after replacing bunch of failed disks, I still have this error:
    inserv-e200b cli% showpd -failed -degraded
    –Size(MB)– —-Ports—-
    Id CagePos Type Speed(K) State Total Free A B
    1 0:12:0? FC 10 failed 380928 0 —– —–
    3 0:3:0? FC 10 failed 380928 0 —– —–
    7 0:7:0? FC 10 failed 380928 0 —– —–
    10 0:10:0? FC 10 failed 380928 0 —– —–
    12 0:12:0? FC 10 failed 380928 0 —– —–
    15 0:15:0? FC 10 failed 380928 0 —– —–
    16 1:0:0? FC 10 failed 380928 0 —– —–
    20 1:4:0? FC 10 failed 380928 0 —– —–
    21 1:5:0? FC 10 failed 380928 0 —– —–
    25 1:9:0? FC 10 failed 380928 0 —– —–
    30 1:14:0? FC 10 failed 380928 0 —– —–
    33 0:3:0? FC 10 failed 380928 0 —– —–
    ———————————————————-
    12 total 4571136 0

    Is there command to clean all the settings and make system like new. At this point I don’t are about any data.
    Thanks in advance
    Bohdan

    1. In this scenario : I would have first checked : showpd -p -cg 0 -mg 12, then for another drive and if there is more than 1 drives on particular slot, would have checked servicemag status, if it reflects succeeded, then we can go ahead and run dismisspd 1. We need to repeat this for all the drives reflecting above. To clean the system, the steps given by 3pardude is perfect.

  6. Hi
    I am using a HPE 3PAR StoreServ 7200 with 16 disk and 8 empty hdd bays. So I am going to add 4 new disk. I inserted new disks and did every single step right, but somehow it says “degraded” in status for a while and then “Failed”. What can I do?
    Thanks a lot, and sorry for bad english.

      1. yes, my 3PAR OS Version Was 3.2.1 mu2 and the hard disks was not listed in that version of OS, So i found out that i had to update OS at least to 3.2.1 mu21. i Updated that to 3.2.2 mu2 and that worked perfectly.

  7. I have a replaced a disk on Hp 3par 7400, the disk went online and rebuild completed. But the old disk (degraded) is still showing in the system. I’ve tried dismisspd without any luck. Any suggestions ?

  8. At step 3, you may use “showpd -c 46” to check the detailed chunklet information, other than “showpd -space 46”.

    1. You could run the Out Of Box setup again, but this would remove your previous config and volumes. But of course this would not be the same as a secure wipe ie the data could still be recovered by experts

  9. Hi Richard, I have a 7200 with a degraded drive, for which servicemag failed:

    The output of the servicemag start was:
    servicemag start -log -pdid 11
    … servicing disks in mag: 0 11
    … normal disks:
    … not normal disks: WWN [5000CCA016117623] Id [11] diskpos [0]
    … not vacant disks: WWN [5000CCA016117623] Id [11] diskpos [0]
    disk Id(s) [11] must be vacant before servicemag start -log is issued.
    Issue the command “movepd -devtype 11” to begin or continue disk vacate.
    Chunklet movement may be monitored using the “showpd -c 11” command
    servicemag start -log -pdid 11 — Failed

    movepd does not help on vacating the drive, Here is showpd -c output:

    ——- Normal Chunklets ——– —- Spare Chunklets —-
    – Used – ——– Unused ——– – Used – —- Unused —-
    Id CagePos Type State Total OK Fail Free Uninit Unavail Fail OK Fail Free Uninit Fail
    11 0:11:0 FC degraded 408 0 1 0 321 0 52 0 0 0 0 34
    ——————————————————————————————
    1 total 408 0 1 0 321 0 52 0 0 0 0 34

    I understand that I cannot proceed with disk replacement unless OK and Fail are 0. Any tips on what to do now? OS is 3.1.3 MU1

    1. Hi Mario

      You are quite right, do not proceed until all chunklets have been moved from the disk you are trying to remove. I think the issue is that you are using the -log option this is meant for 10K systems where a magazine contains multiple disks. You need to run servicemag start without the -log option, the format is servicemag start -pdid . So in your case servicemag start -pdid 11

      Hope that helps

      Richard

  10. Hi Richard

    I have failed drive in a 7400 and I want to remove it and not replace it. The system automatically ran servicemag when the drive failed so not sure how I go about dismissing that pd?

    Thanks
    Nathan

    1. Hi Nathan

      You would need to follow the process below, you can skip 1 as you have already moved the data

      1. Empty the drive (assuming plenty more drives above the cpg set-size * rowsize) behind the same node) -> movepdtospare -vacate -nowait

      2. Remove spares -> removespare :a

      3. Dismiss pd from config -> dismisspd

      4. Offloop drive -> controlmag offloop : (drive maintenance led will now be lit)

      5. Physically remove drive

  11. Replace a failed hard disk on our 3PAR V400.

    Initiated the command “servicemag start -pdid 15 and the command was successful.

    Next we replaced the new disk.

    Then we inititated the command “servicemag resume 0 9” very soon (We cant sure whether the plug led has been off)but this command failed and we got the following error:

    V400 cli% servicemag status -d
    Cage 0, magazine 9:
    A servicemag resume command failed on this magazine.
    The command completed at Thu Mar 30 14:02:56 2017.
    The output of the servicemag resume was:
    servicemag resume 0 9
    … mag 0 9 already onlooped
    … firmware is current on pd WWN [2000B45253996193] Id [12]
    … firmware is current on pd WWN [2000B4525397CB59] Id [13]
    … firmware is current on pd WWN [2000B4525397BB3B] Id [14]
    … firmware is current on pd WWN [5000CCA04090DD04]
    … firmware is current on pd WWN [2000B4525397B911] Id [15]
    … checking for valid disks…
    … checking for valid disks…
    … disks not normal yet..trying admit/onloop again
    … onlooping mag 0 9
    Failed —
    DC2/DC4 cmd 3: enclosure returned error: Hardware is busy (0x8)

    The replaced disk pdid 15 remains in system.

    V400 cli% showpd
    —-Size(MB)—– —-Ports—-
    Id CagePos Type RPM State Total Free A B Cap(GB)
    — 0:9:3 FC 15 failed 0 0 0:6:1- 1:6:1- 0
    0 0:0:0 FC 15 normal 559104 365568 0:6:1* 1:6:1 600
    1 0:0:1 FC 15 normal 559104 364544 0:6:1 1:6:1* 600
    2 0:0:2 FC 15 normal 559104 366592 0:6:1* 1:6:1 600
    3 0:0:3 FC 15 normal 559104 371712 0:6:1 1:6:1* 600
    5 0:1:1 FC 15 normal 559104 367616 0:6:1 1:6:1* 600
    6 0:1:2 FC 15 normal 559104 365568 0:6:1* 1:6:1 600
    7 0:1:3 FC 15 normal 559104 371712 0:6:1 1:6:1* 600
    8 0:8:0 SSD 150 normal 94208 0 0:6:1* 1:6:1 100
    9 0:8:1 SSD 150 normal 94208 0 0:6:1 1:6:1* 100
    10 0:8:2 SSD 150 normal 94208 0 0:6:1* 1:6:1 100
    11 0:8:3 SSD 150 normal 94208 0 0:6:1 1:6:1* 100
    12 0:9:0 FC 15 normal 559104 0 0:6:1* 1:6:1 600
    13 0:9:1 FC 15 normal 559104 0 0:6:1 1:6:1* 600
    14 0:9:2 FC 15 normal 559104 0 0:6:1* 1:6:1 600
    15 0:9:3? FC 15 failed 559104 0 —– —– 600

    V400 cli% showpd -state 15
    Id CagePos Type -State- ——————————Detailed_State——————————-
    15 0:9:3? FC failed vacated,missing,invalid_media,multiple_chunklets_media_bad,spinup,servicing
    —————————————————————————————————
    1 total

    V400 cli% showversion
    Release version 3.1.2 (MU1)
    Patches: P04

    Component Name Version
    CLI Server 3.1.2 (P04)
    CLI Client 3.1.2 (P04)
    System Manager 3.1.2 (MU1)
    Kernel 3.1.2 (MU1)
    TPD Kernel Code 3.1.2 (MU1)

    after a few days,The new disk states is new and pid 4,so i issues the command “servicemag resume 0 9” again,but still failed。then the new disk states is failed。

    QZGA_V400 cli% servicemag status -d
    Cage 0, magazine 9:
    A servicemag resume command failed on this magazine.
    The command completed at Tue Oct 10 08:09:22 2017.
    The output of the servicemag resume was:
    servicemag resume 0 9
    … mag 0 9 already onlooped
    … firmware is current on pd WWN [2000B45253996193] Id [12]
    … firmware is current on pd WWN [2000B4525397CB59] Id [13]
    … firmware is current on pd WWN [2000B4525397BB3B] Id [14]
    … firmware is current on pd WWN [5000CCA04090DD04] Id [ 4]
    … firmware is current on pd WWN [2000B4525397B911] Id [15]
    … checking for valid disks…
    … checking for valid disks…
    … disks not normal yet..trying admit/onloop again
    … onlooping mag 0 9
    … checking for valid disks…
    … checking for valid disks…
    … disks not normal yet..trying admit/onloop again
    … onlooping mag 0 9
    … checking for valid disks…
    … checking for valid disks…
    … disks not normal yet..trying admit/onloop again
    … onlooping mag 0 9
    … checking for valid disks…
    Failed —
    disk WWN [5000CCA04090DD04] Id [ 4] is not normal. Please use showpd -s to see details of disk state
    servicemag resume 0 9 — Failed
    QZGA_V400 cli% servicemag status
    Cage 0, magazine 9:
    A servicemag resume command failed on this magazine.
    The command completed at Tue Oct 10 08:09:22 2017.
    servicemag resume 0 9 — Failed. Please run servicemag status -d for more detail
    QZGA_V400 cli% showpd -s 4
    Id CagePos Type -State- ———————————Detailed_State———————————
    4 0:9:3 FC failed vacated,not_available_for_allocations,invalid_media,multiple_chunklets_media_bad
    ——————————————————————————————————–
    1 total
    QZGA_V400 cli% showpd -s
    Id CagePos Type -State- ———————————Detailed_State———————————
    0 0:0:0 FC normal normal
    1 0:0:1 FC normal normal
    2 0:0:2 FC normal normal
    3 0:0:3 FC normal normal
    4 0:9:3 FC failed vacated,not_available_for_allocations,invalid_media,multiple_chunklets_media_bad
    5 0:1:1 FC normal normal
    6 0:1:2 FC normal normal
    7 0:1:3 FC normal normal
    8 0:8:0 SSD normal normal
    9 0:8:1 SSD normal normal
    10 0:8:2 SSD normal normal
    11 0:8:3 SSD normal normal
    12 0:9:0 FC normal servicing
    13 0:9:1 FC normal servicing
    14 0:9:2 FC normal servicing
    15 0:9:3? FC failed vacated,missing,not_available_for_allocations,invalid_media,no_valid_ports
    16 1:0:0 FC normal normal
    17 1:0:1 FC normal normal
    18 1:0:2 FC normal normal

    thank you

  12. hello dears, thanks for your post
    can i replace a failed disk with disk wich get big capacity than the failed and other disk of the magazine

    1. Yes you can use different size disks in a CPG. Personally I wouldn’t recommend it as the smaller disks will get full quicker than the larger ones and you will end up with an uneven load.

  13. Is it possible to low-level reformat a disk with a different block/sector size? I came across an older 3PAR being scrapped that was full of ST3600057FC 600GB 15K FC drives, the same model number as the kind we use in our 10000 series. Anyway, I just tried using one as a replacement for a failed disk… it was recognized properly and updated the disk’s firmware, but showing Invalid Sector Size in management console. Checking the block size using showpd, I see its formatted with 512b and my 10000 is expected 520.

    1. Hi David

      This is not something I have tried. But logically what you suggest make sense, give it a go and let us know how you get on.

  14. Hi Richard, Great guide! I recently replaced a failed drive in my 7200 and now my removed drive shows a ? mark by it. When I do a showpd -space PID it shows the following:

    Id CagePos Type -State– Size Volume Spare Free Unavail Failed
    6 0:6:0? FC degraded 838656 0 0 838656 0 0
    ——————————————————————
    1 total 838656 0 0 838656 0 0

    The new disk shows:

    Id CagePos Type -State- Size Volume Spare Free Unavail Failed
    4 0:6:0 FC normal 1142784 608256 35840 498688 0 0
    ——————————————————————
    1 total 1142784 608256 35840 498688 0 0

    is it safe to dissmiss the old drive ID 6?

    When I run a showpd -c 6 it shows:
    Id CagePos Type State Total OK Fail Free Uninit Unavail Fail OK Fail Free Uninit Fail
    6 0:6:0? FC degraded 819 0 0 225 594 0 0 0 0 0 0 0
    ——————————————————————————————
    1 total 819 0 0 225 594 0 0 0 0 0

    1. Hi yes from these screenshots looks like 4 is the new working disk and disk 6 is the old one that has not cleared its self from the system. If disk 5 shows as 0 space used which it looks like here I would dismiss it like you said

  15. Just want to recheck if I am correct!

    In the description its written 26 with below line!

    In this case disk ID =26, cage position = 2:8:0

    However in the output id is 46.

    1. The disk ID is just an auto generator number, I would focus on the cage position to make sure you are looking at the correct disk

  16. Hello

    I have some problem with hp 3par 7200 with 900GB FC HDD.
    one of the HDDs is fail about 1 month ago , the pdid of my hdd is 0 19 , i replace it withe resvicemag procedure and everything is ok.
    after 1 day my new hdd is normal and the failed disk is gone but next day new disk is fail , i replace the fail disk again and after 1 day everything is ok.
    after 1 month hdd in 0 19 fail again and i replace it but after 2 day the new hdd has fail again.

    cli% showpd

    —-Size(MB)—- —-Ports—-
    Id CagePos Type RPM State Total Free A B Cap(GB)
    0 0:0:0 FC 10 normal 838656 146432 1:0:1* 0:0:1 900
    1 0:1:0 FC 10 normal 838656 143360 1:0:1 0:0:1* 900
    2 0:2:0 FC 10 normal 838656 585728 1:0:1* 0:0:1 900
    3 0:3:0 FC 10 normal 838656 136192 1:0:1 0:0:1* 900
    4 0:4:0 FC 10 normal 838656 147456 1:0:1* 0:0:1 900
    5 0:5:0 FC 10 normal 838656 117760 1:0:1 0:0:1* 900
    6 0:6:0 FC 10 normal 838656 148480 1:0:1* 0:0:1 900
    7 0:7:0 FC 10 normal 838656 129024 1:0:1 0:0:1* 900
    8 0:8:0 FC 10 normal 838656 148480 1:0:1* 0:0:1 900
    9 0:9:0 FC 10 normal 838656 105472 1:0:1 0:0:1* 900
    10 0:10:0 FC 10 normal 838656 0 1:0:1* 0:0:1 900
    11 0:11:0 FC 10 normal 838656 0 1:0:1 0:0:1* 900
    12 0:12:0 FC 10 normal 838656 0 1:0:1* 0:0:1 900
    13 0:13:0 FC 10 normal 838656 0 1:0:1 0:0:1* 900
    14 0:14:0 FC 10 normal 838656 0 1:0:1* 0:0:1 900
    15 0:15:0 FC 10 normal 838656 1024 1:0:1 0:0:1* 900
    16 0:16:0 FC 10 normal 838656 0 1:0:1* 0:0:1 900
    17 0:17:0 FC 10 normal 838656 0 1:0:1 0:0:1* 900
    18 0:18:0 FC 10 normal 838656 0 1:0:1* 0:0:1 900
    19 0:19:0 FC 10 failed 838656 0 1:0:1 0:0:1* 900
    20 0:21:0 FC 10 normal 838656 0 1:0:1 0:0:1* 900
    21 0:22:0 FC 10 normal 838656 5120 1:0:1* 0:0:1 900
    22 0:23:0 FC 10 normal 838656 2048 1:0:1 0:0:1* 900
    23 0:20:0 FC 10 normal 838656 0 1:0:1* 0:0:1 900

    cli% checkhealth

    Checking alert
    Checking cabling
    Checking cage
    Checking dar
    Checking date
    Checking ld
    Checking license
    Checking network
    Checking node
    Checking pd
    Checking port
    Checking rc
    Checking snmp
    Checking task
    Checking vlun
    Checking vv
    Component —————Description————— Qty
    Network Too few working admin network connections 1
    PD PDs that are failed 1

    cli% showcage

    Id Name LoopA Pos.A LoopB Pos.B Drives Temp RevA RevB Model Side
    0 cage0 1:0:1 0 0:0:1 0 24 26-30 320e 320e DCN1 n/a

    cli% showversion

    Release version 3.1.2 (MU2)
    Patches: P10

    Component Name Version
    CLI Server 3.1.2 (MU2)
    CLI Client 3.1.2 (MU2)
    System Manager 3.1.2 (MU2)
    Kernel 3.1.2 (MU2)
    TPD Kernel Code 3.1.2 (MU2)

    cli% servicemag start -pdid 19 -seucceeded

    Expecting integer pdid, got: -succeeded

    SAN.SER cli% servicemag start -pdid 19 -succeeded

    Are you sure you want to run servicemag?
    select q=quit y=yes n=no: y
    servicemag start -pdid 19

    … servicing disks in mag: 0 19

    … normal disks:

    … not normal disks: WWN [XXXXXXXXXXXXXXXX] Id [19] diskpos [0]

    The servicemag start operation will continue in the background.

    cli% showpd -space 19

    —————–(MB)——————
    Id CagePos Type -State- Size Volume Spare Free Unavail Failed
    19 0:19:0 FC failed 838656 0 0 0 0 838656
    —————————————————————
    1 total 838656 0 0 0 0 838656
    SAN.SER cli% servicemag resume 0 19

    Are you sure you want to run servicemag?
    select q=quit y=yes n=no: y

    servicemag status 0 19

    The magazine is being brought online due to a servicemag resume.
    The last status update was at Tue May 1 10:27:04 2018.
    Chunklets relocated: 6 in 4 minutes and 45 seconds
    Chunklets remaining: 2232
    Chunklets marked for moving: 2232
    Estimated time for relocation completion based on 47 seconds per chunklet is: 1 days, 5 hours, 8 minutes and 24 seconds
    servicemag resume 0 19 — is in Progress
    cli% exit

    may the os version is my problem?

    please help me about this problem.

    thank you

  17. Hi Richard. Thanks for your post!
    I had a one fail disk (900GB FC HDD) in hp 3par 10000.Then i reaplced failed disk on new same disk and other disks in that magazine became degraded. Why has so occurred?

    1. This could be due to the increased pressure on the disks during a rebuild led to the failure or that servicemag command was not correctly issued

  18. I want to format drives from a decomm array and use on another array. All volumes and CPG’s removed. Is controlpd format wwn only available on certain Inform versions? I did not see “controlpd format” as a option?

Leave a Reply

Your email address will not be published. Required fields are marked *