Nethence Documentation Lab Webmail Your IP BBDock  


Those documents are obsolete, please use the Nethence Documentation instead.

HomeUnixWindowsOracleObsoleteHardwareDIYMechanicsScriptsConfigsPrivate

Settings up NetBSD RAIDframe
 
Introduction
This guide has two parts, the first one describes RAID-1 for the system to boot in it. The second pard descibes RAID-1 and RAID-5 array creations without the ability to boot on it -- just for storage (but still auto-configurable with no flat configuration file). As those are auto-configurable, the configuration files are not put into /etc/raidX.conf but into /var/tmp/raidname.conf to make sure the drives may be switched to another system without loosing the array configurations. The drawback on this is that when a component fails and after a reboot, you won't be able to see what component failed exactly, as the failed drive isn't being auto-configured any more at boot time hence isn't being not identified by RAIDframe system.
 
Requirements
Make sure you've got RAIDframe enabled in the kernel (by default),
dmesg | grep -i raid
 
Make sure the disks you want to use for RAID are identical (maybe it works otherwise too assuming you fix the disklabels, but here we go anyway),
dmesg | grep ^wd
dmesg | grep ^sd
 
Make sure the SMART statuses are alright (also check with the BIOS messages at power on),
atactl wd0 smart status
atactl wd1 smart status
 
Eventually disable write caching if you don't have an Uninterruptible Power Supply (UPS) (by the way, what happens if there is a kernel panic?),
dkctl wd0 getcache
dkctl wd1 getcache
dkctl wd0 setcache r
dkctl wd1 setcache r
dkctl wd0 setcache r save
dkctl wd1 setcache r save
dkctl wd0 getcache
dkctl wd1 getcache
 
*** Part A -- Setting up RAID-1 for system boot ***
 
Note. we will making sure the RAID partition starts at block 63, not 2048 (???) and has the maximum size.
Note. for those booting disks, disklabel may not be called with wd0 and wd1 argument,
disklabel: could not read existing label
you may have to do this instead,
disklabel wd0a
disklabel wd1a
 
Preparing the disks
Assuming, you are currently running the system on wd0, we are going to setup RAID-1 on an additional (identical) disk, wd1, boot on it and only then integrate wd0 into the array.
 
Erase the MBR and DOS partition table on wd1,
#dd if=/dev/zero of=/dev/rwd1d bs=8k count=1
dd if=/dev/zero of=/dev/rwd1d bs=1024k count=1
 
Configure a BSD partition on the whole disk,
fdisk wd0 
fdisk -0ua /dev/rwd1d 
answer the questions,
Do you want to change our idea of what BIOS thinks? [n]
sysid: [0..255 default: 169]
start: [0..14593cyl default: 63, 0cyl, 0MB]
#or start: [0..24321cyl default: 2048, 0cyl, 1MB] 63 
size: [0..14593cyl default: 234441585, 14593cyl, 114473MB]
#or size: [0..24321cyl default: 1985, 0cyl, 1MB] 390721905 (based on what fdisk wd0 showed, equals total - 63, actually)
bootmenu: []
Do you want to change the active partition? [n]
active partition: [0..4 default: 0]
Are you happy with this choice? [n]
Update the bootcode from /usr/mdec/mbr? [n]
Should we write new partition table? [n]
Check that both disk have identical DOS partition tables,
cd ~/
fdisk wd0 > wd0
fdisk wd1 > wd1
diff -u wd0 wd1
Note. if there is the "PBR is not bootable: All bytes are identical (0x00)" message it will be fixed at installboot time.
 
Change a partition to say it is raid,
disklabel -r -e -I wd1
like,
disk: sys0
label: SYS0
a: 234441585 63 RAID 
d: 234441648 0 unused 0 0 # (Cyl. 0 - 232580)
or another example,
disk: sys1
label: SYS1
a: 390721905 63 RAID 
d: 390721968 0 unused 0 0 # (Cyl. 0 - 387620)
 
Prepare the RAID-1 flat configuration file,
vi /var/tmp/raidsys.conf
like,
START array
1 2 0

START disks
absent
/dev/wd1a 

START layout
128 1 1 1

START queue
fifo 100
and initialize the raid device,
raidctl -v -C /var/tmp/raidsys.conf raid0 
#raidctl -v -I `date +%Y-%m-%d-%s` raid0 
raidctl -v -I `date +%Y-%m-%d` raid0 
this should be quite fast here, since there is only one disk,
raidctl -v -i raid0 
 
CHECK THE SYSTEM LOGS WHILE DOING THE RAID INITIALIZATION!
 
Setting up the system on RAID
Edit the disklabel for the newly created raid device (partition "a" needs to be at offset 0 to be seen by the bios and be able to boot)
disklabel -r -e -I raid0
say you want 1024MB of RAM (1024MB x 1024 x 1024 / 512 = 2097152 sectors), 234441472 - 2097152 = 232344320,
a: 232344320 0 4.2BSD 0 0
b: 2097152 232344320 swap
d: 234441472 0 unused 0 0 # (Cyl. 0 - 228946*)
or another example (390721792 - 2097152 = 388624640 sectors),
a: 388624640 0 4.2BSD 0 0
b: 2097152 388624640 swap
d: 390721792 0 unused 0 0 # (Cyl. 0 - 381564*)
Note. The 'c' partition doesn't seem to be mandatory (same in the official -- longer -- RAIDframe guide).
 
Initialize the filesystem as FFSv2, mount, and entirely copy the currently running system to the raid device,
newfs -O 2 /dev/rraid0a
#fsck -fy /dev/rraid0a
mount /dev/raid0a /mnt/
cd /; pax -v -X -rw -pe . /mnt/
note. The copy takes a while.
edit fstab so you can fix the hard drive path,
cd /mnt/etc/
mv fstab fstab.noraid
sed 's/wd0/raid0/g' fstab.noraid > fstab
ls -l fstab*
fstab should now have,
/dev/raid0a / ffs rw,log 1 1
/dev/raid0b none swap sw,dp 0 0
 
Make sure swapoff is enabled already on the copied system (it disables swap during the shutdown, to avoid parity errors on the RAID device),
grep swapoff /mnt/etc/defaults/rc.conf  # should be there already!
#grep swapoff /mnt/etc/rc.conf
 
Install the boot loader onto that raid disk (first 63 sectors have been kept, remember?), so that it is bootable just as if it weren't raid disk (the raid partition on raid0 is starting at 0, remember?),
/usr/sbin/installboot -o timeout=10 -v /dev/rwd1a /usr/mdec/bootxx_ffsv2
mv /boot.cfg /boot.cfg.bkp
mv /mnt/boot.cfg /mnt/boot.cfg.bkp
Note. It is best to use a temporarily specific timeout for each raid disk so you can very simply and quickly identify at boot time on which disk you are booting! Here 20 for the second and only raid disk for now.
Note. Yes FFSv2 as I initialized the filesystem on raid0a as FFSv2 with newfs.
Note. Also remove /boot.cfg otherwise the timeout in there takes precedence.
 
Enable RAID auto-configuration and reboot,
raidctl -v -A root raid0
tail -2 /var/log/messages
raidctl -s raid0
cd /
sync
shutdown -r now
 
"The first boot with RAID"
Go into your BIOS or BBS boot menu and precisely choose the second disk (wd1) to boot on.
 
Ok, make sure the system is actually running raid0/wd1 and not wd0,
mount
swapctl -l
 
Now copy the MBR and DOS partition table from wd1 to wd0,
dd if=/dev/rwd1d of=/dev/rwd0d bs=8k count=1
and verify that the DOS partitioning layouts are exactly the same on both RAID components,
fdisk /dev/rwd1d > fdisk.wd1
fdisk /dev/rwd0d > fdisk.wd0
diff -bu fdisk.wd1 fdisk.wd0
 
Do the same for the BSD disk labels and partitions,
disklabel -r wd1a > disklabel.wd1
disklabel -R -r wd0a disklabel.wd1
disklabel -r -e -I wd0a
disk: sys0
and check,
disklabel -r wd0a > disklabel.wd0a
diff -bu disklabel.wd1a disklabel.wd0a
 
Finally add (first, as spare) wd1 to the RAID-1 array,
raidctl -v -a /dev/wd0a raid0
note. The "truncating spare disk" warning in the system logs is fine.
see ? you should now have a spare. check with,
raidctl -s raid0
then initialize the disk to join the array (this takes a while, come back a few days later!),
raidctl -v -F component0 raid0
note. This should say in the system logs that it is initiating a reconstruction on the available spare disk.
you can interrupt the display,
^C
and get back to it to check the reconstruction progress,
raidctl -S raid0
or see how fast the drives are working,
iostat 5
 
You can (continue to) use the system and available space for storage but the performance won't be optimal until the reconstruction has finished.
 
A few days later -- Ready to go
Make sure every component is 'optimal' alright,
raidctl -v -s raid0
Note. If wd0a is still referenced as used_spare just reboot soon enought and it will show as major component.
 
The bootloader on wd0 should be fine with the previous 'dd' but here we go, it's best to differenciate. Install the bootloader on wd0 again with a specific timeout to identify it at boot time,
/usr/sbin/installboot -o timeout=5 -v /dev/rwd0a /usr/mdec/bootxx_ffsv2
ls -l /boot*
note. Also remove /boot.cfg otherwise the timeout in there takes precedence.
and reboot,
cd /
sync
shutdown -r now
 
BIOS configuration
Tune the boot sequence to make sure the machine is able to boot on both disks, the second and the first one.
 
*** Part B -- Setting up a RAID-1 or RAID-5 arry for storage ***
 
Assuming wd2 and wd3 for this array and creating an array referenced "raid1" since we already have an array named "raid0" the the system.
 
Preparing the disks
Erase the first few sectors of the targetted RAID disks (MBR and partition tables get lost),
#dd if=/dev/zero of=/dev/rwd2d bs=8k count=1
#dd if=/dev/zero of=/dev/rwd3d bs=8k count=1
dd if=/dev/zero of=/dev/rwd2d bs=1024k count=1
dd if=/dev/zero of=/dev/rwd3d bs=1024k count=1
 
I don't need a DOS partition table (so there is no use of fdisk) because I don't use RAID to boot the system,
fdisk wd2
fdisk wd3
disklabel -r -e -I wd2
disklabel -r -e -I wd3
change the first BSD partition (a)'s fstype '4.2BSD' to 'RAID'. Using the whole size of the disk is okay as I am not booting on it,
disk: data0
label: DATA0
a: 2930277168 0 RAID
d: 2930277168 0 unused 0 0 # (Cyl. 0 - 2907020)
disk: data1
label: DATA1
a: 2930277168 0 RAID
d: 2930277168 0 unused 0 0 # (Cyl. 0 - 2907020)
 
Initializing the RAID Device
Create the RAID configuration,
vi /var/tmp/raiddata.conf
 
For a RAID-1 array e.g. for one row, two columns (two disks) and no spare disk,
START array
1 2 0

START disks
/dev/wd2a
/dev/wd3a

START layout
128 1 1

START queue
fifo 100
 
For a RAID-5 array e.g. for one row, three columns (three disks) and no spare disk,
START array
1 3 0

START disks
/dev/wd2a
/dev/wd3a
/dev/wd4a

START layout
128 1 1

START queue
fifo 100
 
Configure the RAID volume (uppercase -C forces the configuration to take place),
raidctl -v -C /var/tmp/raiddata.conf raid1 
 
Assign a serial number (here UNIX time) to identify the RAID volume,
raidctl -v -I `date +%s` raid1 
 
Initialize the RAID volume (takes a while... several hours for e.g. an 1.5TB RAID-1 array),
time raidctl -v -i raid1 
Note. the process is running in background already, you can get back to the prompt if you want,
^C
and eventually see the progress display again by typing,
raidctl -S raid1 
or see how fast the drives are working,
iostat 5
 
CHECK THE SYSTEM LOGS WHILE DOING THE RAID INITIALIZATION!
 
You can continue while the raid array is being initialized.
 
Make sure the RAID volume is able to configure itself without the need of the raiddata.conf configuration file,
raidctl -A yes raid1 
Note. for a root device ready to boot, we should do '-A root' but this is out of the scope in this part of the guide.
 
Ready to go
Even though the parity check hasn't finished, you can proceed and use your RAID array already and you can even reboot the system (as long as it finds its configuration in /etc/). The RAID device will just be a a lot more slower during the parity writings; so it's best to let it finish.
 
In case your RAID volume is larger than 2TB you should get a similar kernel log warning during RAID initialization and at boot time,
WARNING: raid1: total sector size in disklabel (1565586688) != the size of raid (5860553984)
 
For a <2TB volume just proceed with disklabel and newfs,
dd if=/dev/zero of=/dev/rraid1d bs=1024k count=1
disklabel raid1 
newfs -O 2 -b 64k /dev/rraid1a 
(or you can proceed with GPT and wedges just like if it were a >2TB disk)
 
For a >2TB volume proceed with GPT and wedges,
dd if=/dev/zero of=/dev/rraid1d bs=1024k count=1
gpt create raid1
gpt add raid1
gpt show raid1
dkctl raid1 addwedge raid1wedge 34 2930276925 ffs
dkctl raid1 listwedges
and proceed with newfs e.g.,
newfs -O 2 -b 64k /dev/rdk0 
 
*** Part 3 *** Usage and maintainance
Monthly maintainance
Once a month or with monitoring scripts, check,
atactl wd1 smart status
...
dkctl wd1 getcache (should be disabled with no UPS)
...
raidctl -p raid1
raidctl -s raid1
raidctl -S raid1
eventually fix the parity if it's not clean,
raidctl -P raid1 
Note. that command is executed at boot every time (/etc/rc.d/raidframeparity).
 
Eventualy use smartmontools instead of atactl,
echo $PKG_PATH
pkg_add smartmontools
cp /usr/pkg/share/examples/rc.d/smartd /etc/rc.d/
cd /etc/
echo smartd=yes >> rc.conf
rc.d/smartd start
smartctl -l selftest /dev/rwd0d
smartctl -a /dev/rwd1d
smartctl -A /dev/rwd1d
==> lookf for Current_Pending_Sector
 
Determine the disks identities before changing them:
atactl wd0 identify | grep -i serial
atactl wd1 identify | grep -i serial
atactl wd2 identify | grep -i serial
atactl wd3 identify | grep -i serial
atactl wd4 identify | grep -i serial
 
For the record, here is a brief summary of the base maintenance commands,
-a /dev/wdXx raidX  add a hot spare disk
-r /dev/wdXx raidX  remove a hot spare disk
-g /dev/wdXx raidX  print the component label
-G raidX      print the current RAID configuration
-f /dev/wdXx raidX  fail the component w/o reconstruction
-F /dev/wdXx raidX  fail the component and initiate a reconstruction on the hot spare if available
-R /dev/wdXx raidX  fail the component and reconstruct on it (after it has been replaced)
-B raidX      copy back the reconstructed data from spare disk to original disk
 
Replacing a failing disk on a non-booting array
If you need to replace a drive (if raidctl -s reports a failed component), verify the component's identity,
#raidctl -G raid1
#raidctl -g /dev/wd3a raid1
dmesg | grep ^wd3
atactl wd3 identify | grep -i serial
and replace the drive by checking its serial number.
 
Now that the new drive is in place, prepare it for RAIDframe,
  dd if=/dev/zero of=/dev/rwd3d bs=1024k count=1
  disklabel -r -e -I wd3 
 
While the RAID array stays online
Note. the -R method brings that error,
stdout -- raidctl: ioctl (RAIDFRAME_REBUILD_IN_PLACE) failed: Invalid argument
console -- raid1: Device already configured!
==> Proceeding with the hot spare method
 
Then add it to the raid array as a spare drive,
raidctl -a /dev/wd3a raid1
then reconstruct the disk as part of the array,
raidctl -s raid1
raidctl -F component0 raid1
check with,
raidctl -s raid1
raidctl -S raid1
when it's finished, enable auto-configuration on the drive,
raidctl -A yes /dev/wd3a 
and make sure there is no relying /etc/raid*.conf files.
 
At next reboot the array will show up with both components optimal, no spares. In the meanwhile it's also fine to run as it is.
 
While the RAID array is offline
Update the RAIDframe array configuration,
raidctl -G raid1
cat /var/tmp/raiddata.conf
START array
1 2 0

START disks
/dev/wd2a
/dev/wd3a

START layout
128 1 1 1

START queue
fifo 100
raidctl -c /var/tmp/raiddata.conf raid1
and reconstruct the disk as part of the array,
raidctl -R /dev/wd3a raid1
check with,
raidctl -s raid1
raidctl -S raid1
when it's finished, enable auto-configuration on the drive,
raidctl -A yes /dev/wd3a 
 
Adding a hot spare to the array
Add the new drive as a hot spare,
raidctl -v -a /dev/wd4a raid1
raidctl -s raid1
fail the component and force the use of the spare disk,
raidctl -F component0 raid1
raidctl -s raid1
watch the reconstruction progress,
raidctl -S raid1 
or see how fast the drives are working,
iostat 5
 
Turn a used spare disk to an array component
Make sure autoconfig is enabled,
raidctl -g /dev/wd2a raid1 
raidctl -A yes /dev/wd2a 
unconfigure the raidframe device for an instant (it doesn't harm the array),
umount /mount/point/
raidctl -u raid1
reconfigure the array as you wish (the wd2 disk is already reconstructed as it was used as a spare)
#raidctl -G raid1
cat /var/tmp/raiddata.conf
START array
1 2 0

START disks
/dev/wd2a
/dev/wd3a

START layout
128 1 1 1

START queue
fifo 100
raidctl -c /var/tmp/raiddata.conf raid1
shutdown -r now
once rebooted, check that everything is fine (no spare left, just wd2a and wd3a are in optimal state),
raidctl -s raid1
 
Recover a failing RAID-1 booting array
Recover a damaged single RAID-1 component (move data from wd1 to wd0 when raidctl -F/-R cannot work anymore because of hardware uncorrectable data error). In other words, the array was already non-optimal as it had only one disk, but on top of this, your single drive starts to have serious hardware errors so you can't even reconstruct the shit on a spare drive.
 
In brief :
- create the raid1 array and partition 'a',
- copy from raid0a to raid1a with pax,
- restart on raid1a,
- erase the raid0 array and change the disk.
 
Remove the spare disk you need to build another raid array on it,
raidctl -v -r /dev/wd0a raid0 
and check,
raidctl -s raid0 
 
Proceed,
dd if=/dev/zero of=/dev/rwd0d bs=8k count=1
fdisk -0ua /dev/rwd0d  # ==> active and only partition, accept to rewrite the mbr if needed in the process
disklabel -r -e -I wd0  # ==> partition a becomes RAID
vi /var/tmp/raid1.conf
like,
START array
1 2 0

START disks
/dev/wd0a
absent

START layout
128 1 1 1

START queue
fifo 100
then,
raidctl -v -C /var/tmp/raid1.conf raid1 
raidctl -v -I `date +%Y-%m-%d-%s` raid1 
raidctl -v -i raid1 
note. the parity initialization (-i) is quite fast here, since there is only one disk.
note. the "Error re-writing parity!" error in the logs is just normal, since we got an absent device for RAID-1.
and check,
raidctl -s raid1 
 
Now,
disklabel -r -e -I raid1 
newfs -O 2 /dev/rraid1a 
mount /dev/raid1a /mnt/
cd /; pax -v -X -rw -pe . /mnt/
vi /mnt/etc/fstab
:%s/raid0/raid1/g
/usr/sbin/installboot -o timeout=10 -v /dev/rwd0a /usr/mdec/bootxx_ffsv2
mv /boot.cfg /boot.cfg.bkp
raidctl -v -A root raid1
raidctl -v -s raid1
cd /
sync
shutdown -r now
note. Also remove /boot.cfg otherwise the timeout in there takes precedence.
on next boot, disable the former and broken raid device,
raidctl -A no raid0
raidctl -v -u raid0
You can now replace the broken disk and proceed with the "The first boot with RAID" section up there in this guide.
 
TODO
- what about "dkctl wd1 setcache r" on next reboot, still there?
 
Troubleshooting
If you ever need to remove only the component label (untested),
dd if=/dev/zero of=/dev/rwdXa skip=16k bs=1k count=1
 
References
16.2. Setup RAIDframe Support
22.2. Deleting the disklabel
Appendix B. Installing without sysinst
Chapter 16. NetBSD RAIDframe
Configuring RAID on NetBSD
Hitachi 1TB HDD's, NetBSD 6.0.1 and RAID1 - soft errors and clicking noises!
How To Fix / Repair Bad Blocks In Linux
How to make backups using NetBSD's RAIDframe
NetBSD and RAIDframe
NetBSD and RAIDframe History
Setting up an 8TB NetBSD file server
Setting up raidframe(4) on NetBSD
The adventure of building a 4TB raid 5 under NetBSD 5.1
 

Last update: Jan 18, 2016