this is obsolete doc -- see http://doc.nethence.com/ instead

Software RAID with MD devices on Linux 

 

 

Introduction 

We have the main disk on which the system is currently running : sda. And we want to make an RAID1 array while adding the second disk : sdb. The both disks have identical sizes. But the RAID1 device (/dev/mdX) will be a little bit smaller than the originating device partitions (/dev/sdXn), because of RAID internals, even in mirror mode. Therefore we'll have to create a degrated array on the second disk and migrate the data to it. We'll then be able to reconstruct the array, erasing the original disk. 

 

Note. this way, if ever you missed one step or did something wrong, you can always get back to the original system with the installation media typing "linux rescue" at boot prompt, to finish up or fix the RAID configuration, 

chroot /mnt/sysimage
#service network start
#service sshd start
mdadm --assemble /dev/md0
mdadm --assemble /dev/md1
mdadm --assemble /dev/md2
swapon /dev/md1
mkdir -p /raid
mount /dev/md2 /raid
mkdir -p /raid/boot
mount /dev/md0 /raid/boot

 

We're assuming a simple partition layout (no LVM), 

sda1  /boot
sda2  swap
sda3  /

 

 

Configuration (temporarly degrated mode) 

Copy the partition layout on the second disk, 

cd ~/
sfdisk -d /dev/sda > sdab.layout
sfdisk /dev/sdb < sdab.layout
#dd if=/dev/sda of=/dev/sdb bs=512 count=1

then change partition types to FD (Linux raid autodetect), 

fdisk /dev/sdb
l
t 1 fd
t 2 fd
t 3 fd
w

check, 

fdisk -l /dev/sdb

 

Clean up the partitions just to make sure there's no filesystems from possible previous attempts, 

dd if=/dev/zero of=/dev/sdb1 bs=1024K count=100
dd if=/dev/zero of=/dev/sdb2 bs=1024K count=100
dd if=/dev/zero of=/dev/sdb3 bs=1024K count=100

 

Create the empty arrays on the second disk, 

mdadm --create /dev/md0 --level=1 --raid-devices=2 missing /dev/sdb1
mdadm --create /dev/md1 --level=1 --raid-devices=2 missing /dev/sdb2
mdadm --create /dev/md2 --level=1 --raid-devices=2 missing /dev/sdb3

 

Switch to init 1 to safely copy the data to the RAID devices, 

telinit 1

 

Format the new filesystems and mount them somewhere, 

mkfs.ext3 /dev/md0
mkfs.ext3 /dev/md2
mkdir -p /raid
mount /dev/md2 /raid
mkdir -p /raid/boot
mount /dev/md0 /raid/boot

 

Copy the data into it, 

cd /
find . -xdev | cpio -pm /raid
cd /dev
find . | cpio -pm /raid/dev
cd /boot
find . -xdev | cpio -pm /raid/boot

Note. shouldn't we also exclude "lost+found" ? 

Note. maybe we could optimize those commands, all at once including /dev 

 

Replace the swap by a new one into the RAID1 device, 

swapon -s
swapoff /dev/sda2
mkswap /dev/md1
swapon /dev/md1

 

Write the RAID configuration into the RAID volume, 

cd /raid/etc
cat > mdadm.conf <<EOF9
MAILADDR root
ARRAY /dev/md0 level=raid1 num-devices=2 devices=missing,/dev/sdb1
ARRAY /dev/md1 level=raid1 num-devices=2 devices=missing,/dev/sdb2
ARRAY /dev/md2 level=raid1 num-devices=2 devices=missing,/dev/sdb3
EOF9

Note. the DEVICE line isn't mandatory in that case 

Note. carefull this step is absolutely mandatory so the system knows which RAID devices to start during bootup. 

Note. otherwise, 

#cat > mdadm.conf <<EOF9
#DEVICE partitions
#MAILADDR root
#EOF9
#mdadm -Es | grep md0 >> mdadm.conf

Note. or even, 

#echo "DEVICE /dev/sd[ab]1" > mdadm.conf
#mdadm --detail --scan >> mdadm.conf

 

Fix the partitions pathes in the RAID volume, 

cd /raid/etc
vi fstab

change, 

/dev/md2                /                       ext3    defaults        1 1
/dev/md0                /boot                   ext3    defaults        1 2
/dev/md1                swap                    swap    defaults        0 0

 

Update the initramfs, 

mkinitrd --fstab=/raid/etc/fstab /raid/boot/initraid.img `uname -r`

 

Update the bootloader configuration on the original filesystem and on RAID volume, 

vi /boot/grub/menu.lst
vi /raid/boot/grub/menu.lst

like, 

default=0
timeout=5
splashimage=(hd1,0)/grub/splash.xpm.gz
hiddenmenu
title CentOS (2.6.18-128.4.1.el5) RAID
        root (hd1,0)
        kernel /vmlinuz-2.6.18-128.4.1.el5 ro root=/dev/md2
        initrd /initraid.img

Note. we changed title ... RAID 

Note. we changed root (hd1,0) 

Note. we changed kernel ... root=/dev/md2 

Note. eventually define md1, 

title CentOS (2.6.18-164.11.1.el5) 

        root (hd0,0)
        kernel /vmlinuz-2.6.18-164.11.1.el5 ro root=/dev/md1 md=1,/dev/sda2,/dev/sdb2
        initrd /initrd-2.6.18-164.11.1.el5.img

 

Make the both disks bootable, 

grub --no-floppy
root (hd0,0)
setup (hd0)
root (hd1,0)
setup (hd1)
quit

Note. GRUB is still able to look in the RAID device, donno how. 

 

Rady to go, reboot, 

sync
reboot

Note. we're rebooting from single user mode, the "reboot" alias isn't activated but that's what we want since no services are running. 

 

Note. since RHEL5.4 you may configure this too, 

vi /etc/sysconfig/raid-check

like, 

ENABLED=yes
CHECK=check
CHECK_DEVS="md0 md1"
REPAIR_DEVS="md0 md1"
SKIP_DEVS=""

 

 

Adding sda to the array 

Change partition types to FD (Linux raid autodetect), 

fdisk /dev/sda
l
t 1 fd
t 2 fd
t 3 fd
w

check, 

fdisk -l /dev/sda

 

Add sda to the RAID1 array, 

mdadm /dev/md0 -a /dev/sda1
mdadm /dev/md1 -a /dev/sda2
mdadm /dev/md2 -a /dev/sda3

check, 

cat /proc/mdstat

 

Make sure /etc/mdadm.conf is up to date. 

 

 

Usage 

Check the state of the array, 

cat /proc/mdstat
#tail /var/log/messages
#cat /sys/block/md2/md/sync_action
mdadm --detail /dev/md0
#mdadm --examine /dev/md0
#mdadm --examine /dev/sda1
#mdadm --examine --brief --scan --config=partitions
#mdadm -Ebsc partitions
#cat /var/run/mdadm/map

 

Monitor the array, 

mdadm --monitor --scan --daemonise > /var/run/mdadm

 

Mark a faulty device, 

mdadm /dev/md0 --fail detached --remove detached

rebuild the array and start what can be started, 

mdadm --incremental --rebuild --run --scan

 

 

Troubbleshooting 

Deployment issues 

- Make sure /etc/mdadm.conf exists and is up to date 

- Make sure the initrd is up to date and points to the correct root directory and swap 

- Make sure the devices files (/dev) exist into the RAID volume 

- Make sure the partitions types (sdb1, sdb2, sdb3, ...) are type FD (Linux raid autodetect) 

 

Maintainance issues 

Since RHEL5.4, the weekly crontab includes RAID checks (/etc/cron.weekly/99-raid-check) and you may receive some of those warnings by email, 

WARNING: mismatch_cnt is not 0 on /dev/md0
WARNING: mismatch_cnt is not 0 on /dev/md1

 

Repair and then check the arrays (now on a RAID1 MD array, repair is different than just a check), 

echo repair > /sys/block/md0/md/sync_action
echo repair > /sys/block/md1/md/sync_action
cat /proc/mdstat
cat /sys/block/md0/md/sync_action
cat /sys/block/md1/md/sync_action
echo check > /sys/block/md1/md/sync_action
echo check > /sys/block/md1/md/sync_action
cat /proc/mdstat
cat /sys/block/md0/md/sync_action
cat /sys/block/md1/md/sync_action

check again, 

cat /sys/block/md0/md/mismatch_cnt
cat /sys/block/md1/md/mismatch_cnt

Note. you should now see "0" for both 

 

 

References 

http://wiki.clug.org.za/wiki/RAID-1_in_a_hurry_with_grub_and_mdadm 

http://support.uni-klu.ac.at/Raid1Howto 

http://www.texsoft.it/index.php?c=hardware&m=hw.storage.grubraid1&l=it 

http://wiki.xtronics.com/index.php/Raid