this is obsolete doc -- see http://doc.nethence.com/ instead
Software RAID with MD devices on Linux
Introduction
We have the main disk on which the system is currently running : sda. And we want to make an RAID1 array while adding the second disk : sdb. The both disks have identical sizes. But the RAID1 device (/dev/mdX) will be a little bit smaller than the originating device partitions (/dev/sdXn), because of RAID internals, even in mirror mode. Therefore we'll have to create a degrated array on the second disk and migrate the data to it. We'll then be able to reconstruct the array, erasing the original disk.
Note. this way, if ever you missed one step or did something wrong, you can always get back to the original system with the installation media typing "linux rescue" at boot prompt, to finish up or fix the RAID configuration,
chroot /mnt/sysimage
#service network start
#service sshd start
mdadm --assemble /dev/md0
mdadm --assemble /dev/md1
mdadm --assemble /dev/md2
swapon /dev/md1
mkdir -p /raid
mount /dev/md2 /raid
mkdir -p /raid/boot
mount /dev/md0 /raid/boot
We're assuming a simple partition layout (no LVM),
sda1 /boot
sda2 swap
sda3 /
Configuration (temporarly degrated mode)
Copy the partition layout on the second disk,
cd ~/
sfdisk -d /dev/sda > sdab.layout
sfdisk /dev/sdb < sdab.layout
#dd if=/dev/sda of=/dev/sdb bs=512 count=1
then change partition types to FD (Linux raid autodetect),
fdisk /dev/sdb
l
t 1 fd
t 2 fd
t 3 fd
w
check,
fdisk -l /dev/sdb
Clean up the partitions just to make sure there's no filesystems from possible previous attempts,
dd if=/dev/zero of=/dev/sdb1 bs=1024K count=100
dd if=/dev/zero of=/dev/sdb2 bs=1024K count=100
dd if=/dev/zero of=/dev/sdb3 bs=1024K count=100
Create the empty arrays on the second disk,
mdadm --create /dev/md0 --level=1 --raid-devices=2 missing /dev/sdb1
mdadm --create /dev/md1 --level=1 --raid-devices=2 missing /dev/sdb2
mdadm --create /dev/md2 --level=1 --raid-devices=2 missing /dev/sdb3
Switch to init 1 to safely copy the data to the RAID devices,
telinit 1
Format the new filesystems and mount them somewhere,
mkfs.ext3 /dev/md0
mkfs.ext3 /dev/md2
mkdir -p /raid
mount /dev/md2 /raid
mkdir -p /raid/boot
mount /dev/md0 /raid/boot
Copy the data into it,
cd /
find . -xdev | cpio -pm /raid
cd /dev
find . | cpio -pm /raid/dev
cd /boot
find . -xdev | cpio -pm /raid/boot
Note. shouldn't we also exclude "lost+found" ?
Note. maybe we could optimize those commands, all at once including /dev
Replace the swap by a new one into the RAID1 device,
swapon -s
swapoff /dev/sda2
mkswap /dev/md1
swapon /dev/md1
Write the RAID configuration into the RAID volume,
cd /raid/etc
cat > mdadm.conf <<EOF9
MAILADDR root
ARRAY /dev/md0 level=raid1 num-devices=2 devices=missing,/dev/sdb1
ARRAY /dev/md1 level=raid1 num-devices=2 devices=missing,/dev/sdb2
ARRAY /dev/md2 level=raid1 num-devices=2 devices=missing,/dev/sdb3
EOF9
Note. the DEVICE line isn't mandatory in that case
Note. carefull this step is absolutely mandatory so the system knows which RAID devices to start during bootup.
Note. otherwise,
#cat > mdadm.conf <<EOF9
#DEVICE partitions
#MAILADDR root
#EOF9
#mdadm -Es | grep md0 >> mdadm.conf
Note. or even,
#echo "DEVICE /dev/sd[ab]1" > mdadm.conf
#mdadm --detail --scan >> mdadm.conf
Fix the partitions pathes in the RAID volume,
cd /raid/etc
vi fstab
change,
/dev/md2 / ext3 defaults 1 1
/dev/md0 /boot ext3 defaults 1 2
/dev/md1 swap swap defaults 0 0
Update the initramfs,
mkinitrd --fstab=/raid/etc/fstab /raid/boot/initraid.img `uname -r`
Update the bootloader configuration on the original filesystem and on RAID volume,
vi /boot/grub/menu.lst
vi /raid/boot/grub/menu.lst
like,
default=0
timeout=5
splashimage=(hd1,0)/grub/splash.xpm.gz
hiddenmenu
title CentOS (2.6.18-128.4.1.el5) RAID
root (hd1,0)
kernel /vmlinuz-2.6.18-128.4.1.el5 ro root=/dev/md2
initrd /initraid.img
Note. we changed title ... RAID
Note. we changed root (hd1,0)
Note. we changed kernel ... root=/dev/md2
Note. eventually define md1,
title CentOS (2.6.18-164.11.1.el5)
root (hd0,0)
kernel /vmlinuz-2.6.18-164.11.1.el5 ro root=/dev/md1 md=1,/dev/sda2,/dev/sdb2
initrd /initrd-2.6.18-164.11.1.el5.img
Make the both disks bootable,
grub --no-floppy
root (hd0,0)
setup (hd0)
root (hd1,0)
setup (hd1)
quit
Note. GRUB is still able to look in the RAID device, donno how.
Rady to go, reboot,
sync
reboot
Note. we're rebooting from single user mode, the "reboot" alias isn't activated but that's what we want since no services are running.
Note. since RHEL5.4 you may configure this too,
vi /etc/sysconfig/raid-check
like,
ENABLED=yes
CHECK=check
CHECK_DEVS="md0 md1"
REPAIR_DEVS="md0 md1"
SKIP_DEVS=""
Adding sda to the array
Change partition types to FD (Linux raid autodetect),
fdisk /dev/sda
l
t 1 fd
t 2 fd
t 3 fd
w
check,
fdisk -l /dev/sda
Add sda to the RAID1 array,
mdadm /dev/md0 -a /dev/sda1
mdadm /dev/md1 -a /dev/sda2
mdadm /dev/md2 -a /dev/sda3
check,
cat /proc/mdstat
Make sure /etc/mdadm.conf is up to date.
Usage
Check the state of the array,
cat /proc/mdstat
#tail /var/log/messages
#cat /sys/block/md2/md/sync_action
mdadm --detail /dev/md0
#mdadm --examine /dev/md0
#mdadm --examine /dev/sda1
#mdadm --examine --brief --scan --config=partitions
#mdadm -Ebsc partitions
#cat /var/run/mdadm/map
Monitor the array,
mdadm --monitor --scan --daemonise > /var/run/mdadm
Mark a faulty device,
mdadm /dev/md0 --fail detached --remove detached
rebuild the array and start what can be started,
mdadm --incremental --rebuild --run --scan
Troubbleshooting
Deployment issues
- Make sure /etc/mdadm.conf exists and is up to date
- Make sure the initrd is up to date and points to the correct root directory and swap
- Make sure the devices files (/dev) exist into the RAID volume
- Make sure the partitions types (sdb1, sdb2, sdb3, ...) are type FD (Linux raid autodetect)
Maintainance issues
Since RHEL5.4, the weekly crontab includes RAID checks (/etc/cron.weekly/99-raid-check) and you may receive some of those warnings by email,
WARNING: mismatch_cnt is not 0 on /dev/md0
WARNING: mismatch_cnt is not 0 on /dev/md1
Repair and then check the arrays (now on a RAID1 MD array, repair is different than just a check),
echo repair > /sys/block/md0/md/sync_action
echo repair > /sys/block/md1/md/sync_action
cat /proc/mdstat
cat /sys/block/md0/md/sync_action
cat /sys/block/md1/md/sync_action
echo check > /sys/block/md1/md/sync_action
echo check > /sys/block/md1/md/sync_action
cat /proc/mdstat
cat /sys/block/md0/md/sync_action
cat /sys/block/md1/md/sync_action
check again,
cat /sys/block/md0/md/mismatch_cnt
cat /sys/block/md1/md/mismatch_cnt
Note. you should now see "0" for both
References
http://wiki.clug.org.za/wiki/RAID-1_in_a_hurry_with_grub_and_mdadm
http://support.uni-klu.ac.at/Raid1Howto
http://www.texsoft.it/index.php?c=hardware&m=hw.storage.grubraid1&l=it
http://wiki.xtronics.com/index.php/Raid