Settings up NetBSD RAIDframe

this is obsolete doc -- see http://doc.nethence.com/ instead

Introduction

This guide has two parts, the first one describes RAID-1 for the system to boot in it. The second pard descibes RAID-1 and RAID-5 array creations without the ability to boot on it -- just for storage (but still auto-configurable with no flat configuration file). As those are auto-configurable, the configuration files are not put into /etc/raidX.conf but into /var/tmp/raidname.conf to make sure the drives may be switched to another system without loosing the array configurations. The drawback on this is that when a component fails and after a reboot, you won't be able to see what component failed exactly, as the failed drive isn't being auto-configured any more at boot time hence isn't being not identified by RAIDframe system.

Requirements

Make sure you've got RAIDframe enabled in the kernel (by default),

dmesg | grep -i raid

Make sure the disks you want to use for RAID are identical (maybe it works otherwise too assuming you fix the disklabels, but here we go anyway),

dmesg | grep ^wd

dmesg | grep ^sd

Make sure the SMART statuses are alright (also check with the BIOS messages at power on),

atactl wd0 smart status

atactl wd1 smart status

Eventually disable write caching if you don't have an Uninterruptible Power Supply (UPS) (by the way, what happens if there is a kernel panic?),

dkctl wd0 getcache

dkctl wd1 getcache

dkctl wd0 setcache r

dkctl wd1 setcache r

dkctl wd0 setcache r save

dkctl wd1 setcache r save

dkctl wd0 getcache

dkctl wd1 getcache

*** Part A -- Setting up RAID-1 for system boot ***

Note. we will making sure the RAID partition starts at block 63, not 2048 (???) and has the maximum size.

Note. for those booting disks, disklabel may not be called with wd0 and wd1 argument,

disklabel: could not read existing label

you may have to do this instead,

disklabel wd0a

disklabel wd1a

Preparing the disks

Assuming, you are currently running the system on wd0, we are going to setup RAID-1 on an additional (identical) disk, wd1, boot on it and only then integrate wd0 into the array.

Erase the MBR and DOS partition table on wd1,

#dd if=/dev/zero of=/dev/rwd1d bs=8k count=1

dd if=/dev/zero of=/dev/rwd1d bs=1024k count=1

Configure a BSD partition on the whole disk,

fdisk wd0

fdisk -0ua /dev/rwd1d

answer the questions,

Do you want to change our idea of what BIOS thinks? [n]

sysid: [0..255 default: 169]

start: [0..14593cyl default: 63, 0cyl, 0MB]

#or start: [0..24321cyl default: 2048, 0cyl, 1MB] 63

size: [0..14593cyl default: 234441585, 14593cyl, 114473MB]

#or size: [0..24321cyl default: 1985, 0cyl, 1MB] 390721905 (based on what fdisk wd0 showed, equals total - 63, actually)

bootmenu: []

Do you want to change the active partition? [n] y

active partition: [0..4 default: 0]

Are you happy with this choice? [n] y

Update the bootcode from /usr/mdec/mbr? [n] y

Should we write new partition table? [n] y

Check that both disk have identical DOS partition tables,

cd ~/

fdisk wd0 > wd0

fdisk wd1 > wd1

diff -u wd0 wd1

Note. if there is the "PBR is not bootable: All bytes are identical (0x00)" message it will be fixed at installboot time.

Change a partition to say it is raid,

disklabel -r -e -I wd1

like,

disk: sys0

label: SYS0

a: 234441585 63 RAID

 d: 234441648         0     unused      0     0        # (Cyl.      0 - 232580)

or another example,

disk: sys1

label: SYS1

a: 390721905 63 RAID

 d: 390721968         0     unused      0     0        # (Cyl.      0 - 387620)

Prepare the RAID-1 flat configuration file,

vi /var/tmp/raidsys.conf

like,

START array

1 2 0

START disks

absent

/dev/wd1a

START layout

128 1 1 1

START queue

fifo 100

and initialize the raid device,

raidctl -v -C /var/tmp/raidsys.conf raid0

#raidctl -v -I `date +%Y-%m-%d-%s` raid0

raidctl -v -I `date +%Y-%m-%d` raid0

this should be quite fast here, since there is only one disk,

raidctl -v -i raid0

CHECK THE SYSTEM LOGS WHILE DOING THE RAID INITIALIZATION!

Setting up the system on RAID

Edit the disklabel for the newly created raid device (partition "a" needs to be at offset 0 to be seen by the bios and be able to boot)

disklabel -r -e -I raid0

say you want 1024MB of RAM (1024MB x 1024 x 1024 / 512 = 2097152 sectors), 234441472 - 2097152 = 232344320,

 a: 232344320         0     4.2BSD      0     0

 b:   2097152 232344320       swap

 d: 234441472         0     unused      0     0        # (Cyl.      0 - 228946*)

or another example (390721792 - 2097152 = 388624640 sectors),

 a: 388624640         0     4.2BSD 0 0

 b:   2097152 388624640       swap

 d: 390721792         0     unused      0     0        # (Cyl.      0 - 381564*)

Note. The 'c' partition doesn't seem to be mandatory (same in the official -- longer -- RAIDframe guide).

Initialize the filesystem as FFSv2, mount, and entirely copy the currently running system to the raid device,

newfs -O 2 /dev/rraid0a

#fsck -fy /dev/rraid0a

mount /dev/raid0a /mnt/

cd /; pax -v -X -rw -pe . /mnt/

note. The copy takes a while.

edit fstab so you can fix the hard drive path,

cd /mnt/etc/

mv fstab fstab.noraid

sed 's/wd0/raid0/g' fstab.noraid > fstab

ls -l fstab*

fstab should now have,

/dev/raid0a             /       ffs     rw,log           1 1

/dev/raid0b             none    swap    sw,dp            0 0

Make sure swapoff is enabled already on the copied system (it disables swap during the shutdown, to avoid parity errors on the RAID device),

grep swapoff /mnt/etc/defaults/rc.conf  # should be there already!

#grep swapoff /mnt/etc/rc.conf

Install the boot loader onto that raid disk (first 63 sectors have been kept, remember?), so that it is bootable just as if it weren't raid disk (the raid partition on raid0 is starting at 0, remember?),

/usr/sbin/installboot -o timeout=10 -v /dev/rwd1a /usr/mdec/bootxx_ffsv2

mv /boot.cfg /boot.cfg.bkp

mv /mnt/boot.cfg /mnt/boot.cfg.bkp

Note. It is best to use a temporarily specific timeout for each raid disk so you can very simply and quickly identify at boot time on which disk you are booting! Here 20 for the second and only raid disk for now.

Note. Yes FFSv2 as I initialized the filesystem on raid0a as FFSv2 with newfs.

Note. Also remove /boot.cfg otherwise the timeout in there takes precedence.

Enable RAID auto-configuration and reboot,

raidctl -v -A root raid0

tail -2 /var/log/messages

raidctl -s raid0

cd /

sync

shutdown -r now

"The first boot with RAID"

Go into your BIOS or BBS boot menu and precisely choose the second disk (wd1) to boot on.

Ok, make sure the system is actually running raid0/wd1 and not wd0,

mount

swapctl -l

Now copy the MBR and DOS partition table from wd1 to wd0,

dd if=/dev/rwd1d of=/dev/rwd0d bs=8k count=1

and verify that the DOS partitioning layouts are exactly the same on both RAID components,

fdisk /dev/rwd1d > fdisk.wd1

fdisk /dev/rwd0d > fdisk.wd0

diff -bu fdisk.wd1 fdisk.wd0

Do the same for the BSD disk labels and partitions,

disklabel -r wd1a > disklabel.wd1a

disklabel -R -r wd0a disklabel.wd1a

disklabel -r -e -I wd0a

disk: sys0

and check,

disklabel -r wd0a > disklabel.wd0a

diff -bu disklabel.wd1a disklabel.wd0a

Finally add (first, as spare) wd1 to the RAID-1 array,

raidctl -v -a /dev/wd0a raid0

note. The "truncating spare disk" warning in the system logs is fine.

see ? you should now have a spare. check with,

raidctl -s raid0

then initialize the disk to join the array (this takes a while, come back a few days later!),

raidctl -v -F component0 raid0

note. This should say in the system logs that it is initiating a reconstruction on the available spare disk.

you can interrupt the display,

^C

and get back to it to check the reconstruction progress,

raidctl -S raid0

or see how fast the drives are working,

iostat 5

You can (continue to) use the system and available space for storage but the performance won't be optimal until the reconstruction has finished.

A few days later -- Ready to go

Make sure every component is 'optimal' alright,

raidctl -v -s raid0

Note. If wd0a is still referenced as used_spare just reboot soon enought and it will show as major component.

The bootloader on wd0 should be fine with the previous 'dd' but here we go, it's best to differenciate. Install the bootloader on wd0 again with a specific timeout to identify it at boot time,

/usr/sbin/installboot -o timeout=5 -v /dev/rwd0a /usr/mdec/bootxx_ffsv2

ls -l /boot*

note. Also remove /boot.cfg otherwise the timeout in there takes precedence.

and reboot,

cd /

sync

shutdown -r now

BIOS configuration

Tune the boot sequence to make sure the machine is able to boot on both disks, the second and the first one.

*** Part B -- Setting up a RAID-1 or RAID-5 arry for storage ***

Assuming wd2 and wd3 for this array and creating an array referenced "raid1" since we already have an array named "raid0" the the system.

Preparing the disks

Erase the first few sectors of the targetted RAID disks (MBR and partition tables get lost),

#dd if=/dev/zero of=/dev/rwd2d bs=8k count=1

#dd if=/dev/zero of=/dev/rwd3d bs=8k count=1

dd if=/dev/zero of=/dev/rwd2d bs=1024k count=1

dd if=/dev/zero of=/dev/rwd3d bs=1024k count=1

I don't need a DOS partition table (so there is no use of fdisk) because I don't use RAID to boot the system,

fdisk wd2

fdisk wd3

disklabel -r -e -I wd2

disklabel -r -e -I wd3

change the first BSD partition (a)'s fstype '4.2BSD' to 'RAID'. Using the whole size of the disk is okay as I am not booting on it,

disk: data0

label: DATA0

 a: 2930277168         0     RAID

 d: 2930277168         0     unused      0     0        # (Cyl.      0 - 2907020)

disk: data1

label: DATA1

 a: 2930277168         0     RAID

 d: 2930277168         0     unused      0     0        # (Cyl.      0 - 2907020)

Initializing the RAID Device

Create the RAID configuration,

vi /var/tmp/raiddata.conf

For a RAID-1 array e.g. for one row, two columns (two disks) and no spare disk,

START array

1 2 0

START disks

/dev/wd2a

/dev/wd3a

START layout

128 1 1 1

START queue

fifo 100

For a RAID-5 array e.g. for one row, three columns (three disks) and no spare disk,

START array

1 3 0

START disks

/dev/wd2a

/dev/wd3a

/dev/wd4a

START layout

128 1 1 5

START queue

fifo 100

Configure the RAID volume (uppercase -C forces the configuration to take place),

raidctl -v -C /var/tmp/raiddata.conf raid1

Assign a serial number (here UNIX time) to identify the RAID volume,

raidctl -v -I `date +%s` raid1

Initialize the RAID volume (takes a while... several hours for e.g. an 1.5TB RAID-1 array),

time raidctl -v -i raid1

Note. the process is running in background already, you can get back to the prompt if you want,

^C

and eventually see the progress display again by typing,

raidctl -S raid1

or see how fast the drives are working,

iostat 5

CHECK THE SYSTEM LOGS WHILE DOING THE RAID INITIALIZATION!

You can continue while the raid array is being initialized.

Make sure the RAID volume is able to configure itself without the need of the raiddata.conf configuration file,

raidctl -A yes raid1

Note. for a root device ready to boot, we should do '-A root' but this is out of the scope in this part of the guide.

Ready to go

Even though the parity check hasn't finished, you can proceed and use your RAID array already and you can even reboot the system (as long as it finds its configuration in /etc/). The RAID device will just be a a lot more slower during the parity writings; so it's best to let it finish.

In case your RAID volume is larger than 2TB you should get a similar kernel log warning during RAID initialization and at boot time,

WARNING: raid1: total sector size in disklabel (1565586688) != the size of raid (5860553984)

For a <2TB volume just proceed with disklabel and newfs,

dd if=/dev/zero of=/dev/rraid1d bs=1024k count=1

disklabel raid1

newfs -O 2 -b 64k /dev/rraid1a

(or you can proceed with GPT and wedges just like if it were a >2TB disk)

For a >2TB volume proceed with GPT and wedges,

dd if=/dev/zero of=/dev/rraid1d bs=1024k count=1

gpt create raid1

gpt add raid1

gpt show raid1

dkctl raid1 addwedge raid1wedge 34 2930276925 ffs

dkctl raid1 listwedges

and proceed with newfs e.g.,

newfs -O 2 -b 64k /dev/rdk0

*** Part 3 *** Usage and maintainance

Monthly maintainance

Once a month or with monitoring scripts, check,

atactl wd1 smart status

...

dkctl wd1 getcache (should be disabled with no UPS)

...

raidctl -p raid1

raidctl -s raid1

raidctl -S raid1

eventually fix the parity if it's not clean,

raidctl -P raid1

Note. that command is executed at boot every time (/etc/rc.d/raidframeparity).

Eventualy use smartmontools instead of atactl,

echo $PKG_PATH

pkg_add smartmontools

cp /usr/pkg/share/examples/rc.d/smartd /etc/rc.d/

cd /etc/

echo smartd=yes >> rc.conf

rc.d/smartd start

smartctl -l selftest /dev/rwd0d

smartctl -a /dev/rwd1d

smartctl -A /dev/rwd1d

==> lookf for Current_Pending_Sector

Determine the disks identities before changing them:

atactl wd0 identify | grep -i serial

atactl wd1 identify | grep -i serial

atactl wd2 identify | grep -i serial

atactl wd3 identify | grep -i serial

atactl wd4 identify | grep -i serial

For the record, here is a brief summary of the base maintenance commands,

-a /dev/wdXx raidX  add a hot spare disk

-r /dev/wdXx raidX  remove a hot spare disk

-g /dev/wdXx raidX  print the component label

-G raidX      print the current RAID configuration

-f /dev/wdXx raidX  fail the component w/o reconstruction

-F /dev/wdXx raidX  fail the component and initiate a reconstruction on the hot spare if available

-R /dev/wdXx raidX  fail the component and reconstruct on it (after it has been replaced)

-B raidX      copy back the reconstructed data from spare disk to original disk

Replacing a failing disk on a non-booting array

If you need to replace a drive (if raidctl -s reports a failed component), verify the component's identity,

#raidctl -G raid1

#raidctl -g /dev/wd3a raid1

dmesg | grep ^wd3

atactl wd3 identify | grep -i serial

and replace the drive by checking its serial number.

Now that the new drive is in place, prepare it for RAIDframe,

  dd if=/dev/zero of=/dev/rwd3d bs=1024k count=1

disklabel -r -e -I wd3

While the RAID array stays online

Note. the -R method brings that error,

stdout -- raidctl: ioctl (RAIDFRAME_REBUILD_IN_PLACE) failed: Invalid argument

console -- raid1: Device already configured!

==> Proceeding with the hot spare method

Then add it to the raid array as a spare drive,

raidctl -a /dev/wd3a raid1

then reconstruct the disk as part of the array,

raidctl -s raid1

raidctl -F component0 raid1

check with,

raidctl -s raid1

raidctl -S raid1

when it's finished, enable auto-configuration on the drive,

raidctl -A yes /dev/wd3a

and make sure there is no relying /etc/raid*.conf files.

At next reboot the array will show up with both components optimal, no spares. In the meanwhile it's also fine to run as it is.

While the RAID array is offline

Update the RAIDframe array configuration,

raidctl -G raid1

cat /var/tmp/raiddata.conf

START array

1 2 0

START disks

/dev/wd2a

/dev/wd3a

START layout

128 1 1 1

START queue

fifo 100

raidctl -c /var/tmp/raiddata.conf raid1

and reconstruct the disk as part of the array,

raidctl -R /dev/wd3a raid1

check with,

raidctl -s raid1

raidctl -S raid1

when it's finished, enable auto-configuration on the drive,

raidctl -A yes /dev/wd3a

Adding a hot spare to the array

Add the new drive as a hot spare,

raidctl -v -a /dev/wd4a raid1

raidctl -s raid1

fail the component and force the use of the spare disk,

raidctl -F component0 raid1

raidctl -s raid1

watch the reconstruction progress,

raidctl -S raid1

or see how fast the drives are working,

iostat 5

Turn a used spare disk to an array component

Make sure autoconfig is enabled,

raidctl -g /dev/wd2a raid1

raidctl -A yes /dev/wd2a

unconfigure the raidframe device for an instant (it doesn't harm the array),

umount /mount/point/

raidctl -u raid1

reconfigure the array as you wish (the wd2 disk is already reconstructed as it was used as a spare)

#raidctl -G raid1

cat /var/tmp/raiddata.conf

START array

1 2 0

START disks

/dev/wd2a

/dev/wd3a

START layout

128 1 1 1

START queue

fifo 100

raidctl -c /var/tmp/raiddata.conf raid1

shutdown -r now

once rebooted, check that everything is fine (no spare left, just wd2a and wd3a are in optimal state),

raidctl -s raid1

Recover a failing RAID-1 booting array

Recover a damaged single RAID-1 component (move data from wd1 to wd0 when raidctl -F/-R cannot work anymore because of hardware uncorrectable data error). In other words, the array was already non-optimal as it had only one disk, but on top of this, your single drive starts to have serious hardware errors so you can't even reconstruct the shit on a spare drive.

In brief :

- create the raid1 array and partition 'a',

- copy from raid0a to raid1a with pax,

- restart on raid1a,

- erase the raid0 array and change the disk.

Remove the spare disk you need to build another raid array on it,

raidctl -v -r /dev/wd0a raid0

and check,

raidctl -s raid0

Proceed,

dd if=/dev/zero of=/dev/rwd0d bs=8k count=1

fdisk -0ua /dev/rwd0d  # ==> active and only partition, accept to rewrite the mbr if needed in the process

disklabel -r -e -I wd0  # ==> partition a becomes RAID

vi /var/tmp/raid1.conf

like,

START array

1 2 0

START disks

/dev/wd0a

absent

START layout

128 1 1 1

START queue

fifo 100

then,

raidctl -v -C /var/tmp/raid1.conf raid1

raidctl -v -I `date +%Y-%m-%d-%s` raid1

raidctl -v -i raid1

note. the parity initialization (-i) is quite fast here, since there is only one disk.

note. the "Error re-writing parity!" error in the logs is just normal, since we got an absent device for RAID-1.

and check,

raidctl -s raid1

Now,

disklabel -r -e -I raid1

newfs -O 2 /dev/rraid1a

mount /dev/raid1a /mnt/

cd /; pax -v -X -rw -pe . /mnt/

vi /mnt/etc/fstab

:%s/raid0/raid1/g

/usr/sbin/installboot -o timeout=10 -v /dev/rwd0a /usr/mdec/bootxx_ffsv2

mv /boot.cfg /boot.cfg.bkp

raidctl -v -A root raid1

raidctl -v -s raid1

cd /

sync

shutdown -r now

note. Also remove /boot.cfg otherwise the timeout in there takes precedence.

on next boot, disable the former and broken raid device,

raidctl -A no raid0

raidctl -v -u raid0

You can now replace the broken disk and proceed with the "The first boot with RAID" section up there in this guide.

TODO

- what about "dkctl wd1 setcache r" on next reboot, still there?

Troubleshooting

If you ever need to remove only the component label (untested),

dd if=/dev/zero of=/dev/rwdXa skip=16k bs=1k count=1

References

16.2. Setup RAIDframe Support: http://www.netbsd.org/docs/guide/en/chap-rf.html#chap-rf-initsetup

22.2. Deleting the disklabel: https://www.netbsd.org/docs/guide/en/chap-misc.html#chap-misc-delete-disklabel

Appendix B. Installing without sysinst: http://www.nibel.net/nbsdeng/ap-inst.html

Chapter 16. NetBSD RAIDframe: http://www.netbsd.org/docs/guide/en/chap-rf.html

Configuring RAID on NetBSD: http://www.thorburn.se/henrik/netbsd/raidhowto.txt

Hitachi 1TB HDD's, NetBSD 6.0.1 and RAID1 - soft errors and clicking noises!: https://mail-index.netbsd.org/netbsd-users/2013/02/07/msg012431.html

How To Fix / Repair Bad Blocks In Linux: http://linoxide.com/linux-how-to/how-to-fix-repair-bad-blocks-in-linux/

How to make backups using NetBSD's RAIDframe: http://www.schmonz.com/2004/07/23/how-to-make-backups-using-netbsds-raidframe/

NetBSD and RAIDframe: http://www.cs.usask.ca/staff/oster/raid.html

NetBSD and RAIDframe History: http://www.cs.usask.ca/staff/oster/raid_project_history.html

Setting up an 8TB NetBSD file server: http://abs0d.blogspot.fr/2011/08/setting-up-8tb-netbsd-file-server.html

Setting up raidframe(4) on NetBSD: http://wiki.netbsd.org/set-up_raidframe/

The adventure of building a 4TB raid 5 under NetBSD 5.1: http://mail-index.netbsd.org/netbsd-users/2011/09/02/msg008979.html