Recovering from a failed drive with Apple’s Software RAID

Update: This blog posting was written against 10.8. I’m happy to report you can correctly rebuild RAIDs in Disk Utility on 10.9 and 10.10. Just minus the failed drive in the list, “demote” it, drag the new drive into the list and hit the Rebuild button. Thanks to charlesism for the heads-up.


OK, given the panic-y nature of the topic, perhaps a Q&A format will be best:

My drive failed, but phew I set up RAID Mirroring, so I just need to replace the drive and the Mac will automatically recover?

First off, congratulations on having the forethought to enable RAID Mirroring. You’re mostly in good shape.

You’re forgiven if you thought enabling Disk Utility’s “Automatically rebuild RAID mirror sets” checkbox during initial RAID creation covered this scenario. Unfortunately that’s for when a drive gets knocked offline and is then reconnected. It’s irrelevant in the face of a drive failure, where there’s no data to start with.

Unfortunately OS X’s Software RAID doesn’t offer a Time Machine-like UI, offering to rebuild your RAID for you when it notices a new drive connected and there’s a degraded RAID present.

Man, that would be sweet.

No, you have to do some manual work.

OK, so I need to use Disk Utility to fully recover?

Sorry, while Disk Utility offers some guidance in that degraded RAIDs show up as orange in its UI and missing RAID slices in red, as far as I can tell you can’t actually rebuild your existing RAID using Disk Utility.

Disk Utility is happy to assist you in creating a new RAID, but if you try to do that with an existing slice I can speak from experience it will make good on its threat to delete all existing data before creating recreating the RAID. Which kinda misses the point of rebuilding the RAID from the slice that’s still standing.

No way, Disk Utility will let me create a RAID Mirror, but can’t actually rebuild it?

Way.

Sigh, OK, so what app do I use to rebuild? This “RAID Utility.app” looks promising.

Sorry. RAID Utility.app, available on OS X Server only, is for Apple’s hardware RAID.

As a software RAID pauper, you don’t get an app.

You’re about to tell me I need to drop down to the Unix layer, aren’t you?

Sadly, yes.

But it’s not that bad! It really just boils down to

sudo diskutil appleraid remove $FAILED_SLICE_UUID $RAID_UUID
sudo diskutil appleraid add member $REPLACEMENT_SLICE_DEV $RAID_UUID

But you have to do some legwork first to correctly fill out those variables.

First, you need to partition the replacement drive to match the failed drive. If you don’t remember the exact numbers off hand that’s fine – you can just inspect the still-functioning drive for the numbers. You can use Disk Utility for this.

With the replacement drive properly sliced up like the proverbial Thanksgiving turkey, you need to gather three pieces of information:

  1. The UUID of the RAID to repair. We’ll assign this to $RAID_UUID.

    If you partitioned your system like I previously suggested you’ll have three RAID sets. So you’ll have to repeat these steps three times: one for each RAID.

  2. The UUID of the failed RAID slice. Let’s name this $FAILED_SLICE_UUID.

  3. The device name of the replacement slice. We’ll call this $REPLACEMENT_SLICE_DEV.

You’ll use a one-two punch of diskutil appleraid list and diskutil util to gather these three variables. Here I’ll rebuild the “mini4 Data” RAID set:

$ sudo diskutil appleraid list
AppleRAID sets (3 found)
===============================================================================
Name:                 mini4 Boot
Unique ID:            E73306EA-D188-4F2F-B7E9-05206B92FAC2
Type:                 Mirror
Status:               Online
Size:                 50.0 GB (49999970304 Bytes)
Rebuild:              manual
Device Node:          disk2
-------------------------------------------------------------------------------
#  DevNode   UUID                                  Status     Size
-------------------------------------------------------------------------------
0  disk0s2   53B0A36D-F8A2-4977-9ACC-CBB1BDADD65C  Online     49999970304
1  disk1s2   18189491-D70E-4970-85AD-22AB0AD75617  Online     49999970304
===============================================================================
===============================================================================
Name:                 mini4 Data
Unique ID:            EDFF099E-A897-4D6E-9DE1-544449898CBE
Type:                 Mirror
Status:               Degraded
Size:                 899.6 GB (899592454144 Bytes)
Rebuild:              manual
Device Node:          disk4
-------------------------------------------------------------------------------
#  DevNode   UUID                                  Status     Size
-------------------------------------------------------------------------------
0  disk0s6   2714167E-1A6A-4BAE-B60E-50F333B327EC  Online     899592454144
-  -none-    44632B50-D667-42E7-9837-A1BF53C3C1CC  Missing/Damaged
===============================================================================
===============================================================================
Name:                 mini4 Boot Backup
Unique ID:            4827405E-EEEB-4666-8783-B8CFCE664CCA
Type:                 Mirror
Status:               Online
Size:                 50.0 GB (49999970304 Bytes)
Rebuild:              manual
Device Node:          disk3
-------------------------------------------------------------------------------
#  DevNode   UUID                                  Status     Size
-------------------------------------------------------------------------------
0  disk0s4   1AAB8E00-8206-4985-881B-74346BB1EA62  Online     49999970304
1  disk1s3   63886672-631E-4A91-AF27-22C9EECD45A8  Online     49999970304
===============================================================================
$ sudo diskutil list
/dev/disk0
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *1.0 TB     disk0
   1:                        EFI                         209.7 MB   disk0s1
   2:                 Apple_RAID                         50.0 GB    disk0s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk0s3
   4:                 Apple_RAID mini4_boot_backup       50.0 GB    disk0s4
   5:                 Apple_Boot Boot OS X               134.2 MB   disk0s5
   6:                 Apple_RAID                         899.6 GB   disk0s6
   7:                 Apple_Boot Boot OS X               134.2 MB   disk0s7
/dev/disk1
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *1.0 TB     disk1
   1:                        EFI                         209.7 MB   disk1s1
   2:                 Apple_RAID 2 mini4 boot            50.0 GB    disk1s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk1s7
   4:                 Apple_RAID                         50.0 GB    disk1s3
   5:                 Apple_Boot Boot OS X               134.2 MB   disk1s4
   6:                  Apple_HFS 2 mini4 data            899.6 GB   disk1s5
/dev/disk2
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:                  Apple_HFS mini4_boot             *50.0 GB    disk2
/dev/disk3
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:                  Apple_HFS mini4_boot_backup      *50.0 GB    disk3
/dev/disk4
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:                  Apple_HFS mini4_data             *899.6 GB   disk4
$ FAILED_SLICE_UUID=44632B50-D667-42E7-9837-A1BF53C3C1CC
$ RAID_UUID=EDFF099E-A897-4D6E-9DE1-544449898CBE
$ REPLACEMENT_SLICE_DEV=disk1s5
$ sudo diskutil appleraid remove $FAILED_SLICE_UUID $RAID_UUID
Started RAID operation on disk4 mini4_data
Removing disk from RAID
Finished RAID operation on disk4 mini4_data
$ sudo diskutil appleraid add member $REPLACEMENT_SLICE_DEV $RAID_UUID
Started RAID operation on disk4 mini4_data
Unmounting disk
Adding a booter for the RAID partition disk1s5
Adding disk1s5 to the RAID Set
Finished RAID operation on disk4 mini4_data

OK, that’s a lot of scary Terminal spew, but if you focus on the commands I issued and backtrack them to the listings, it’s not so bad. It’s mostly diskutil trying to pretty-print things for you.

Once you’ve kicked off the rebuild, you can monitor its progress by re-issuing the diskutil appleraid list command:

$ sudo diskutil appleraid list
AppleRAID sets (3 found)
===============================================================================
Name:                 mini4 Boot
Unique ID:            E73306EA-D188-4F2F-B7E9-05206B92FAC2
Type:                 Mirror
Status:               Online
Size:                 50.0 GB (49999970304 Bytes)
Rebuild:              manual
Device Node:          disk2
-------------------------------------------------------------------------------
#  DevNode   UUID                                  Status     Size
-------------------------------------------------------------------------------
0  disk0s2   53B0A36D-F8A2-4977-9ACC-CBB1BDADD65C  Online     49999970304
1  disk1s2   18189491-D70E-4970-85AD-22AB0AD75617  Online     49999970304
===============================================================================
===============================================================================
Name:                 mini4 Data
Unique ID:            EDFF099E-A897-4D6E-9DE1-544449898CBE
Type:                 Mirror
Status:               Degraded
Size:                 899.6 GB (899592454144 Bytes)
Rebuild:              manual
Device Node:          disk4
-------------------------------------------------------------------------------
#  DevNode   UUID                                  Status     Size
-------------------------------------------------------------------------------
0  disk0s6   2714167E-1A6A-4BAE-B60E-50F333B327EC  Online     899592454144
1  disk1s5   1B7B0CC8-D06E-4B54-B50C-B3BB260B71A8  9% (Rebuilding)899592454144
===============================================================================
===============================================================================
Name:                 mini4 Boot Backup
Unique ID:            4827405E-EEEB-4666-8783-B8CFCE664CCA
Type:                 Mirror
Status:               Online
Size:                 50.0 GB (49999970304 Bytes)
Rebuild:              manual
Device Node:          disk3
-------------------------------------------------------------------------------
#  DevNode   UUID                                  Status     Size
-------------------------------------------------------------------------------
0  disk0s4   1AAB8E00-8206-4985-881B-74346BB1EA62  Online     49999970304
1  disk1s3   63886672-631E-4A91-AF27-22C9EECD45A8  Online     49999970304
===============================================================================

Weirdly while Disk Utility can’t kick off a rebuild, it will show the rebuild progress in a nice window with a progress bar, complete with a time estimate. It’s worth a VNC session.

A final note: you can totally rebuild the RAID mirror even of the boot volume while the machine is live, but I found it slightly buggy – I had to do it twice (!). The first once didn’t “take”. Scary, yes. But uptime’s worth it. And uptime is what RAID’s all about.

That and restful nights.

sysadmin raid Oct 4 2014