Ask the Community
Groups
Rebuild Software RAID - Connect IT Community | Kaseya
<main> <article class="userContent"> <h2 data-id="summary"><strong>SUMMARY</strong></h2> <p>Rebuild Software RAID</p> <h2 data-id="issue"><strong>ISSUE</strong></h2> <p></p> <h3 data-id="when-any-of-the-software-raids-enter-a-degraded-state-an-alert-is-shown-in-the-user-interface-as-well-as-emailed-in-the-daily-status-email"><b>When any of the software RAIDs enter a degraded state an alert is shown in the User Interface as well as emailed in the daily status email.</b></h3> <h3 data-id="n-a"> </h3> <h3 data-id="purpose">Purpose</h3> <p>Almost all Unitrends appliances use some form of software RAID. It is possible for the software RAID to get into a degraded state. This article will outline some common causes and methods for rebuilding the RAID volumes.</p> <h3 data-id="applies-to"> <br>Applies To</h3> <p>DPU appliances utilizing software RAID</p> <h2 data-id="resolution"><strong>RESOLUTION</strong></h2> <p></p> <h3 data-id="caution-for-advanced-users-only-requires-command-line-usage"> <b>*** Caution: For advanced users only, requires command-line usage ***</b><br><br> </h3> <p>WARNING!!! Running sfdisk, parted, or mdadm commands can be dangerous. Please use utmost care, please use rebuild_disk instead.</p> <p>Our 2U/3U sized rack mount units use software RAID on the two internal OS drives. The 1U rack mount units and desktop units use software RAID on a larger scale.</p> <p>To find out specific information on the RAID statuses use this command:</p> <p><span style="font-family: courier;">[root@Recovery-713 ~]# <span style="font-size: small;"><b>cat /proc/mdstat</b></span></span></p> <p><span style="font-family: courier;">Personalities : [raid1] [raid0]</span></p> <p><span style="font-family: courier;">md5 : active raid0 md3[0] md4[1]</span></p> <p><span style="font-family: courier;"> 5747030528 blocks super 1.2 64k chunks</span></p> <p><span style="font-family: courier;">md4 : active raid1 sdc2[0] sdd2[1]</span></p> <p><span style="font-family: courier;"> 2877837631 blocks super 1.2 [2/2] [UU]</span></p> <p><span style="font-family: courier;">md3 : active raid1 sda4[0](F) sdb4[1]</span></p> <p><span style="font-family: courier;"> 2869193023 blocks super 1.2 [2/1] [_U]</span></p> <p><span style="font-family: courier;">md1 : active raid1 sdb3[1] sda3[0]</span></p> <p><span style="font-family: courier;"> 12582848 blocks [2/2] [UU]</span></p> <p><span style="font-family: courier;">md2 : active raid1 sdd1[1] sdc1[0]</span></p> <p><span style="font-family: courier;"> 52428672 blocks [2/2] [UU]</span></p> <p><span style="font-family: courier;">md0 : active raid1 sdb2[1] sda2[2](F)</span></p> <p><span style="font-family: courier;"> 48234432 blocks [2/1] [_U]</span></p> <p><span style="font-family: courier;">unused devices: </span></p> <p>You can see from this output where devices have a (F) beside it showing that device is in a failed state. Also the [_U] shows there is a missing device in that RAID volume.</p> <p>You should use the <b>/usr/bp/bin/rebuild_disk</b> script whenever possible. </p> <h3 data-id="options-for-rebuild-disk"> <br><br>Options for rebuild_disk</h3> <ul><li> <b>rebuild_disk --help</b> #show the help message</li> <li> <b>rebuild_disk --speed</b> <br> #set rebuild speed and wait to unset<br> The speed function could be used if the rebuild was started without the services stopped, and now the services are stopped so that the rebuild can take up more resources.</li> <li> <b>rebuild_disk /dev/sdN</b> <br> #rebuild the replaced disk after rebooting<br> Used after powering off the system, removing the old disk, inserting the new disk and booting up again. </li> <li> <b>rebuild_disk --hotswap /dev/sdN</b> <br> #Hotswap warranty replace without rebooting<br> Used in situations where you want to replace under warranty a failed/failing drive without rebooting. Run this after the replacement disk is on-site, but BEFORE swapping the disk. The hotswap function will remove the specified device from all associated RAID sets, then ask the user to remove the drive and replace with the new drive. Afterwards, it will rebuild that drive back into the array.</li> <li> <b>rebuild_disk --readd /dev/sdN </b><br> #Remove/Re-add the disk for rebuild<br> Used in situations where you want to initiate a rebuild on a good RAID array. The scenarios are if there are pending sectors and would like to initiate a rebuild to fix them. In this case, no drives have been dropped from the RAID sets.<br> The readd function now also can rebuild a drive back into an array even if it has been removed from some of the RAID arrays. </li> </ul><ul><li> <b>rebuild_disk --locate /dev/sdN </b><br> #flash the disk LED to locate it<br> Used to physically locate and verify the drive that needs to be replaced. </li></ul><p><br><br>If the existing disk is marked as failed (e.g. sda), and you just need to rebuild it, do this:</p> <p><span style="font-size: small; font-family: courier;"><b>/usr/bp/bin/rebuild_disk --readd /dev/sda</b></span></p> <p>To replace a disk, the recommended method is to shutdown the system, replace the failed disk drive, and then power up the system.</p> <p>Shut down the Unitrends processes so that the rebuild will go faster (optional, but takes 1/3 the time). When the rebuild completes, it will automatically restart the Unitrends processes again. <i>Rebuild time on an R813 SW RAID5 with 3TB disks with services stopped should be about 5 ½ hours instead of 16+ hours. </i><br><span style="font-size: small; font-family: courier;"><b>/etc/init.d/bp_rcscript stop</b></span></p> For replaced disk device (e.g. sda), do this: <p><span style="font-size: small; font-family: courier;"><b>/usr/bp/bin/rebuild_disk /dev/sda</b></span></p> <p>When the new drive has been initialized, do this:</p> <span style="font-family: courier;"><b><span style="font-size: small;">/usr/bp/bin/rebuild_disk</span></b></span> <p>All of the 1U rack mount units are not hot swappable but if the you cannot shut down the system for replacement, use one of these methods:</p> <p>1) Use the rebuild_disk script to hotswap a disk - it will prompt the user when to replace the disk</p> <p><span style="font-size: small; font-family: courier;"><b>/usr/bp/bin/rebuild_disk --hotswap /dev/sda</b></span></p> <p>2) Or, you can run this command to rescan the SCSI bus so that you do not have to reboot after replacing one of the drives.</p> <p>If the 'dpu version' is less than 7.4.0, download the rebuild_disk script. If version 7.4.0 or later, skip this step.<br><br> </p> <blockquote class="blockquote"> <p>wget <a href="/home/leaving?allowTrusted=1&target=ftp%3A%2F%2Fftp.unitrends.com%2Futilities%2Frebuild_disk">ftp://ftp.unitrends.com/utilities/rebuild_disk</a><br>chmod +x rebuild_disk<br>cp rebuild_disk /usr/bp/bin<br> </p> </blockquote> <h2 data-id="cause"><strong>CAUSE</strong></h2> <p></p> <h3 data-id="n-a-1"> </h3> <p>1) One of the disks has failed</p> <p>2) Unclean shutdown caused the RAID members to fail out of sync causing a degraded state.</p> <div> </div> <h2 data-id="notes"><strong>NOTES</strong></h2> <p><strong>Note</strong>: sfdisk does not work on 3T drives so parted or gdisk is used instead.<br> </p> <p><b>*** Caution: For advanced users only. ***<br>*** The rescan may not align the disk order as expected on all systems ***</b><br>for HOST in `ls -l /sys/class/scsi_host/ | grep host |awk '{print $9}'` ; do echo '- - -' > /sys/class/scsi_host/${HOST}/scan ; done</p> <p>Even after doing this you may be asked to reboot.</p> <p>In 1U units there are 4 devices, sda, sdb, sdc, sdd. Once running the above command the replaced device comes back online as /dev/sdf.</p> <p>You can now use the <b>rebuild_disk</b> command to rebuild the disk back into the RAID. </p> <p><span style="font-family: courier;"><b>/usr/bp/bin/rebuild_disk /dev/sdf</b></span><br><br>You can also use sfdisk and mdadm to manually prepare the disk and rebuild the RAIDs but use rebuild_disk if it all possible.</p> <h3 data-id="other-sources"> <br><br>Other Sources</h3> <p><a rel="nofollow" href="/home/leaving?allowTrusted=1&target=https%3A%2F%2Funitrends-support.zendesk.com%2Fhc%2Fen-us%2Farticles%2F360013251218"><span style="color: #0066cc;">Disk monitoring with disk_monitoring_logger.py</span></a></p> <p><a rel="nofollow" href="https://kaseya.vanillacommunities.com/kb/articles/aliases/unitrends/articles/Article/000002809"><span style="color: #0066cc;">How to check a hard drive to see if it needs to be replaced via warranty</span></a></p> <p><a rel="nofollow" href="/home/leaving?allowTrusted=1&target=http%3A%2F%2Flinux.about.com%2Fod%2Fcommands%2Fl%2Fblcmdl8_parted.htm"><span style="color: #0066cc;"><a href="/home/leaving?allowTrusted=1&target=http%3A%2F%2Flinux.about.com%2Fod%2Fcommands%2Fl%2Fblcmdl8_parted.htm">http://linux.about.com/od/commands/l/blcmdl8_parted.htm</a></span></a></p> <p><a rel="nofollow" href="/home/leaving?allowTrusted=1&target=http%3A%2F%2Flinux.about.com%2Fod%2Fcommands%2Fl%2Fblcmdl8_sfdisk.htm"><span style="color: #0066cc;"><a href="/home/leaving?allowTrusted=1&target=http%3A%2F%2Flinux.about.com%2Fod%2Fcommands%2Fl%2Fblcmdl8_sfdisk.htm">http://linux.about.com/od/commands/l/blcmdl8_sfdisk.htm</a></span></a></p> </article> </main>