this is obsolete doc -- see http://doc.nethence.com/ instead

MC/ServiceGuard on RHEL5

http://pbraun.nethence.com/doc/sysutils/mcsg.html

http://pbraun.nethence.com/doc/sysutils/mcsg_package.html

Introduction

We're using "MC/Serviceguard for Linux v11.18" on RHEL5. In a 2-node cluster, a cluster lock is required. And it's strongly recommended for 3 and 4-node clusters anyways. Choose one of the following lock methods :

- Lock LUN

- Quorum server

Note. on HP/UX only there used to be a third alternative : Cluster Lock Disk, which must be part of an LVM volume group.

If you're using an evaluation version of MC/SG, refer to "Read_Me_First.txt" (at CD's root dir) to ask for a eval licence to HP by mail.

Dependencies

Make sure those packages are installed,

rpm -q \

xinetd \

sg3_utils \

net-snmp \

lm_sensors \

tog-pegasus \

| grep ^package

Note. "xinetd" is mandatory.

Note. "net-snmp" if installing cmsnmpd

Note. "tog-pegasus" if installing "sgproviders" for WBEM

As for RHEL5u3 with kernel >= 2.6.18.128, you also need this package,

yum install libnl

Ref. ftp://ftp.hp.com/pub/c-products/servers/ha/linux/SGLX_Certification_Matrix.pdf

Make sure this package is NOT installed,

rpm -e authd

Also make sure the correct time is set,

yum install ntp

ntpdate ntp.obspm.fr

Configure the nodes

Make sure you have a proper network configuration with at least two seprerated subnets for Heartbeats or one seperated subnet with Bonding for Heartbeat. "/etc/hosts" should look like that,

::1             localhost6.localdomain6 localhost6

127.0.0.1       localhost.localdomain   localhost

#10.1.1.10       qs.example.net         qs

10.1.1.11       sg1.example.net        sg1

10.1.2.11       sg1.example.net        sg1

10.1.3.11       sg1.example.net        sg1

10.1.1.12       sg2.example.net        sg2

10.1.2.12       sg2.example.net        sg2

10.1.3.12       sg2.example.net        sg2

Note. here subnet 10.1.1 for network use

Note. here subnet 10.1.2 and 10.1.3 for heartbeat

Note. you may need additionnal subnets, in example for iSCSI.

Note. use the same hostname for all subnets.

Mount the SGLX CD and install the RPMs,

cd x86_x86-64/RedHat5/Serviceguard/IA32

rpm -ivh pidentd-3.0.19-0.i386.rpm

rpm -ivh sgcmom-B.05.00.00-0.rhel5.i386.rpm

rpm -ivh serviceguard-A.11.18.02-0.demo.rhel5.i386.rpm

Note. "cmsnmpd-A.01.00-0.rhel5.i386.rpm" for HP SIM.

Note. "sgproviders-A.02.00.00-0.rhel5.i386.rpm" for Web-Based Enterprise Management

Start "identd" and force at boot,

service identd start

chkconfig identd on

Start "xinetd" and force at boot,

service xinetd restart

chkconfig xinetd on

Add those lines to "/root/.bashrc",

PATH=$PATH:/usr/local/cmcluster/bin

  . /etc/cmcluster.conf

apply,

source .bashrc

add this line to "/etc/man.config",

MANPATH /usr/local/cmcluster/doc/man

facilitate access to SG configurations,

cd ~/

ln -s /usr/local/cmcluster/conf

Create the node list,

cd conf

vi cmclnodelist

like,

sg1 root

sg2 root

If using VMware, install the Serviceguard for Linux Virtual Machine Toolkit (http://h20392.www2.hp.com/portal/swdepot/index.do > High Availability > Serviceguard for Linux Contributed Toolkit Suite).

Configure a lock LUN

Make sure you have access to a SAN disk (http://pbraun.nethence.com/doc/sysutils_linux/iscsi.html),

fdisk -l

#sfdisk -s

Use fdisk to create a 1-cylinder primary partition which is at least 100k large at the start of the disk,

fdisk /dev/sdb

Note. partition type ID "83" as for Linux.

Reread the partition table on the other nodes,

sfdisk -R /dev/sdb

Refs.

Serviceguard Lock LUN within VMware ESX : http://forums11.itrc.hp.com/service/forums/questionanswer.do?admit=109447626+1249221737647+28353475&threadId=1126734

Configure a Quorum server

Mount the SGLX CD and install the RPM,

cd x86_x86-64/RedHat5/Serviceguard/IA32

rpm -ivh qs-A.02.00.04-0.rhel5.i386.rpm

Authorize hosts to connect,

vi /usr/local/qs/conf/qs_authfile

like,

sg1.example.net

sg2.example.net

Add to init,

mkdir -p /var/log/qs

vi /etc/inittab

like,

qs:345:respawn:/usr/local/qs/bin/qs >/var/log/qs/qs 2>/var/log/qs/qs_error

apply,

telinit q

Check it's up,

ps aux | grep qs

note there should be two listening ports,

netstat -an --inet

e.g.,

...

tcp        0      0 0.0.0.0:60277               0.0.0.0:*                   LISTEN

tcp        0      0 0.0.0.0:1238                0.0.0.0:*                   LISTEN

...

Note. 1238 corresponds to the registered "hacl-qs" service.

In case it says "permission denied", just kill the processes,

pkill qsc

they will restart automaticly and reread the "qs_authfile".

Configure the cluster

Create a cluster configuration,

cd conf

cmquerycl -n sg1 -n sg2 -C cluster1.conf.dist >/dev/null

Note. "-L /dev/sdb1" to specify a lock LUN

Note. "-L" may be used either once before all nodes or after every node.

Note. "-q quorum_host_or_ip" to specify a quorum server

Wipe out the comments,

sed -e '

    /^#/d;

    /^$/d;

    /^[[:space:]]*#/d;

    ' cluster1.conf.dist > cluster1.conf

Edit the cluster configuration,

vi cluster1.conf

and define STATIONARY_IP, change MAX_CONFIGURED_PACKAGES and NODE_TIMEOUT.

Example with a lock LUN,

CLUSTER_NAME            cluster1

NODE_NAME               sg1

  NETWORK_INTERFACE     eth0

  STATIONARY_IP         10.1.1.11

  NETWORK_INTERFACE     eth1

  HEARTBEAT_IP          10.1.2.11

  NETWORK_INTERFACE     eth2

  HEARTBEAT_IP          10.1.3.11

  CLUSTER_LOCK_LUN      /dev/sdb1

NODE_NAME               sg2

  NETWORK_INTERFACE     eth0

  STATIONARY_IP         10.1.1.12

  NETWORK_INTERFACE     eth1

  HEARTBEAT_IP          10.1.2.12

  NETWORK_INTERFACE     eth2

  HEARTBEAT_IP          10.1.3.12

  CLUSTER_LOCK_LUN      /dev/sdb1

HEARTBEAT_INTERVAL              1000000

NODE_TIMEOUT                    8000000

AUTO_START_TIMEOUT              600000000

NETWORK_POLLING_INTERVAL        2000000

MAX_CONFIGURED_PACKAGES         5

Example with a Quorum server,

CLUSTER_NAME            cluster1

QS_HOST                 10.1.1.10

QS_POLLING_INTERVAL     300000000

NODE_NAME               sg1

  NETWORK_INTERFACE     eth0

  STATIONARY_IP         10.1.1.11

  NETWORK_INTERFACE     eth1

  HEARTBEAT_IP          10.1.2.11

  NETWORK_INTERFACE     eth2

  HEARTBEAT_IP          10.1.3.11

NODE_NAME               sg2

  NETWORK_INTERFACE     eth0

  STATIONARY_IP         10.1.1.12

  NETWORK_INTERFACE     eth1

  HEARTBEAT_IP          10.1.2.12

  NETWORK_INTERFACE     eth2

  HEARTBEAT_IP          10.1.3.12

HEARTBEAT_INTERVAL              1000000

NODE_TIMEOUT                    8000000

AUTO_START_TIMEOUT              600000000

NETWORK_POLLING_INTERVAL        2000000

MAX_CONFIGURED_PACKAGES         5

Synchronize files among nodes

Make sure those files are synchronized among the nodes,

cat > files.list <<EOF9

/etc/hosts

/etc/resolv.conf

/usr/local/cmcluster/conf/cmclnodelist

/usr/local/cmcluster/conf/cluster1.conf

/usr/local/cmcluster/conf/license.txt

/usr/local/cmcluster/conf/cmcluster.rc

EOF9

Here's a script for that,

scp -p `cat files.list` sg2:/root/conf

Run the cluster

Verify and apply the cluster configuration,

cmcheckconf -C cluster1.conf

cmapplyconf -k -C cluster1.conf

Enable automatic cluster join,

vi /usr/local/cmcluster/conf/cmcluster.rc

like,

AUTOSTART_CMCLD=1

On each node, start the daemon at the same time,

/etc/init.d/cmcluster.init start

#cmrunnode

Note. "/etc/init.d/SGSafetyTimer" is called at boot time.

Check the cluster port is listening,

lsof -i:5302

#netstat -apne --inet | grep 5302

Check for the cluster status,

tail /var/log/messages

cmquerycl (-v)

cmviewcl (-v)

SMH Serviceguard Manager installation

The good old SG Manager GUI used to work for <= SG 11.17. But for SG >=11.18 you need to install the SMH SG Manager on one of the nodes at least.

Make sure you have libXp installed,

rpm -q libXp

Install Java 1.6 (http://java.sun.com/javase/downloads/index.jsp),

chmod +x jdk-6u14-linux-i586-rpm.bin

./jdk-6u14-linux-i586-rpm.bin

Install SMH Serviceguard Manager

mv /etc/redhat-release /etc/redhat-release.dist

echo "Red Hat Enterprise Linux Server release 5.3" > /etc/redhat-release

cd /mnt/cdrom/x86_x86-64/RedHat5/SGManager/IA32

rpm -ivh hpsmh-2.1.8-177.linux.i386.rpm

rpm -ivh hpsmh-tomcat-1.0-11.linux.i386.rpm

/opt/hp/hpsmh/tomcat/bin/tomcat_cfg

provide the real path the java executable (NOT the /usr/bin/java link),

/usr/java/default/bin/java

Note. make sure you provide the real path, not the "/usr/bin/java" link. Otherwise you will get this error from the interface,

The proxy server received an invalid response from an upstream server.

Note. nevermind the chown/chgrp/chmod errors against "tomcat/keystore/"

Install SG Manager,

cd /mnt/cdrom/x86_x86-64/RedHat5/SGManager/IA32

rpm -ivh sgmgrpi-B.01.01.01-1.rhel5.i386.rpm

restart SMH,

/etc/init.d/hpsmhd restart

Get to the node's web interface,

https://10.1.1.11:2381/

root/xxxxxxxx

System update

No service interruption. You can join a cluster as long MCSG revisions match, RHEL versions may be temporarly different. "pidentd" and "deadman" kernel modules need to be recompiled. Ref. "/usr/local/cmcluster/bin/drivers/README"

Stop the node, disable cluster auto start,

cmhaltnode -f

vi /usr/local/cmcluster/conf/cmcluster.rc

change,

~AUTOSTART=0

note. if you're using a lock LUN, make sure it doesn't change after Redhat upgrade (otherwise you'll have to change cluster's config)

and reboot with the Redhat installation CD,

shutdown -r now

Note. if you have a SAN connected, stop at the boot loader prompt and disconnect the fiber channels (at this point HBA modules aren't loaded).

Proceed with the Redhat update and,

Create a new boot loader configuration (GRUB)

Note. if you have a SAN connected, take the chance to enter init 1 at server boot to update the HBA modules.

Start the server and update the deadman and pidentd modules (no MCSG update here, just MCSG's modules update),

rpm -qa | grep authd

rpm -Uvh --force pidentd*.rpm

ll /dev/pidentd

rpm -Uvh --replacefiles --replacepkgs serviceguard*.rpm

ll /dev/deadman

Note. sgcmom has no modules

Note. "authd" may appear after an RHEL update, hence the "--force"

Enable auto start,

vi /usr/local/cmcluster/conf/cmcluster.rc

change,

~AUTOSTART=1

start the node,

cmrunnode

MCSG update

Note. service interruption. MCSG revisions need to be the same to join or form a cluster.

Update MCSG,

rpm -Uvh pidentd*.rpm

/etc/init.d/identd restart

ll /dev/pidentd

rpm -Uvh sgcmon*.rpm

rpm -Uvh serviceguard*.rpm

ll /dev/deadman

Also get the latest patches,

http://itrc.hp.com/ > Patch database > ...

and install them,

tar xvf SGLX*.tar

cd SGLX*/tools

./sgupdate

ll /dev/deadman

cd ../pidentd

rpm -Uvh pidentd*.rpm

/etc/init.d/identd restart

ll /dev/pidentd

Make sure the short hostnames are added for any node IP,

vi /etc/hosts

IPbond0    hostname  shorthost

IPethX    hostname  shorthost

Reapply the cluster configuration,

cmcheckconf -C cluster.conf

cmapplyconf -C cluster.conf

reapply the package(s) configuration(s),

cmcheckconf -P package/package.conf

cmapplyconf -P package/package.conf

Refs.

Eval CDrom's "Read_Me_First.txt"

Release Notes / Setps for Rolling Upgrade

Managing Serviceguard / Procedure - Performing the Rolling Upgrade

Add a node to an existing cluster

- make the same network config with redundant heartbeat interfaces or bonding

- update /etc/hosts on all nodes and quorum if there is one

- edit and copy the cluster ascii configuration file

- edit and copy cmclnodelist

- if quroum, add the new node to quorum's authfile

- if lock lun, make sure it's the same device, otherwise update the cluster conf

- check and apply the the cluster conf

- reconfigure package's failover for the new node (http://pbraun.nethence.com/doc/sysutils/mcsg_package.html)

References

Serviceguard for Linux : http://docs.hp.com/en/ha.html#Serviceguard%20for%20Linux

Software Depot home : http://h20392.www2.hp.com/portal/swdepot/index.do

PDF references

Managing HP Serviceguard for Linux, Eighth Edition : B9903-90060.pdf

Managing HP Serviceguard for Linux, Ninth Edition : B9903-90068.pdf

HP Serviceguard for Linux Version A.11.18 Release Notes : B9903-90071.pdf

HP Serviceguard for Linux Version A.11.19 Release Notes : B9903-90067.pdf

ITRC Forum references

Invalid data for cluster lock LUN configuration : http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1343526&admit=109447626+1243612148626+28353475

SG/LX and iSCSI : http://forums11.itrc.hp.com/service/forums/bizsupport/questionanswer.do?threadId=1061907

why initiator sends LUN reset command to target? : http://www.nabble.com/why-initiator-sends-LUN-reset-command-to-target--td20122708.html

http://thomasvogt.wordpress.com/2008/08/26/mcserviceguard-cluster-installation-on-hp-ux-1131/