this is obsolete doc -- see http://doc.nethence.com/ instead

MC/ServiceGuard on RHEL5 

 

http://pbraun.nethence.com/doc/sysutils/mcsg.html 

http://pbraun.nethence.com/doc/sysutils/mcsg_package.html 

 

 

Introduction 

We're using "MC/Serviceguard for Linux v11.18" on RHEL5. In a 2-node cluster, a cluster lock is required. And it's strongly recommended for 3 and 4-node clusters anyways. Choose one of the following lock methods : 

- Lock LUN 

- Quorum server 

Note. on HP/UX only there used to be a third alternative : Cluster Lock Disk, which must be part of an LVM volume group. 

 

If you're using an evaluation version of MC/SG, refer to "Read_Me_First.txt" (at CD's root dir) to ask for a eval licence to HP by mail. 

 

 

Dependencies 

Make sure those packages are installed, 

rpm -q \
xinetd \
sg3_utils \
net-snmp \
lm_sensors \
tog-pegasus \
| grep ^package

Note. "xinetd" is mandatory. 

Note. "net-snmp" if installing cmsnmpd 

Note. "tog-pegasus" if installing "sgproviders" for WBEM 

 

As for RHEL5u3 with kernel >= 2.6.18.128, you also need this package, 

yum install libnl

Ref. ftp://ftp.hp.com/pub/c-products/servers/ha/linux/SGLX_Certification_Matrix.pdf 

 

Make sure this package is NOT installed, 

rpm -e authd

 

Also make sure the correct time is set, 

yum install ntp
ntpdate ntp.obspm.fr

 

 

Configure the nodes 

Make sure you have a proper network configuration with at least two seprerated subnets for Heartbeats or one seperated subnet with Bonding for Heartbeat. "/etc/hosts" should look like that, 

::1             localhost6.localdomain6 localhost6
127.0.0.1       localhost.localdomain   localhost
#10.1.1.10       qs.example.net         qs
10.1.1.11       sg1.example.net        sg1
10.1.2.11       sg1.example.net        sg1
10.1.3.11       sg1.example.net        sg1
10.1.1.12       sg2.example.net        sg2
10.1.2.12       sg2.example.net        sg2
10.1.3.12       sg2.example.net        sg2

Note. here subnet 10.1.1 for network use 

Note. here subnet 10.1.2 and 10.1.3 for heartbeat 

Note. you may need additionnal subnets, in example for iSCSI. 

Note. use the same hostname for all subnets. 

 

Mount the SGLX CD and install the RPMs, 

cd x86_x86-64/RedHat5/Serviceguard/IA32
rpm -ivh pidentd-3.0.19-0.i386.rpm
rpm -ivh sgcmom-B.05.00.00-0.rhel5.i386.rpm
rpm -ivh serviceguard-A.11.18.02-0.demo.rhel5.i386.rpm

Note. "cmsnmpd-A.01.00-0.rhel5.i386.rpm" for HP SIM. 

Note. "sgproviders-A.02.00.00-0.rhel5.i386.rpm" for Web-Based Enterprise Management 

 

Start "identd" and force at boot, 

service identd start
chkconfig identd on

 

Start "xinetd" and force at boot, 

service xinetd restart
chkconfig xinetd on

 

Add those lines to "/root/.bashrc", 

PATH=$PATH:/usr/local/cmcluster/bin
  . /etc/cmcluster.conf

apply, 

source .bashrc

add this line to "/etc/man.config", 

MANPATH /usr/local/cmcluster/doc/man

facilitate access to SG configurations, 

cd ~/
ln -s /usr/local/cmcluster/conf

 

Create the node list, 

cd conf
vi cmclnodelist

like, 

sg1 root
sg2 root

 

If using VMware, install the Serviceguard for Linux Virtual Machine Toolkit (http://h20392.www2.hp.com/portal/swdepot/index.do > High Availability > Serviceguard for Linux Contributed Toolkit Suite). 

 

 

Configure a lock LUN 

Make sure you have access to a SAN disk (http://pbraun.nethence.com/doc/sysutils_linux/iscsi.html), 

fdisk -l
#sfdisk -s

 

Use fdisk to create a 1-cylinder primary partition which is at least 100k large at the start of the disk, 

fdisk /dev/sdb

Note. partition type ID "83" as for Linux. 

 

Reread the partition table on the other nodes, 

sfdisk -R /dev/sdb

 

Refs. 

Serviceguard Lock LUN within VMware ESX : http://forums11.itrc.hp.com/service/forums/questionanswer.do?admit=109447626+1249221737647+28353475&threadId=1126734 

 

 

Configure a Quorum server 

Mount the SGLX CD and install the RPM, 

cd x86_x86-64/RedHat5/Serviceguard/IA32
rpm -ivh qs-A.02.00.04-0.rhel5.i386.rpm

 

Authorize hosts to connect, 

vi /usr/local/qs/conf/qs_authfile

like, 

sg1.example.net
sg2.example.net

 

Add to init, 

mkdir -p /var/log/qs
vi /etc/inittab

like, 

qs:345:respawn:/usr/local/qs/bin/qs >/var/log/qs/qs 2>/var/log/qs/qs_error

apply, 

telinit q

 

Check it's up, 

ps aux | grep qs

note there should be two listening ports, 

netstat -an --inet

e.g., 

...
tcp        0      0 0.0.0.0:60277               0.0.0.0:*                   LISTEN
tcp        0      0 0.0.0.0:1238                0.0.0.0:*                   LISTEN
...

Note. 1238 corresponds to the registered "hacl-qs" service. 

 

In case it says "permission denied", just kill the processes, 

pkill qsc

they will restart automaticly and reread the "qs_authfile". 

 

 

Configure the cluster 

Create a cluster configuration, 

cd conf
cmquerycl -n sg1 -n sg2 -C cluster1.conf.dist >/dev/null

Note. "-L /dev/sdb1" to specify a lock LUN 

Note. "-L" may be used either once before all nodes or after every node. 

Note. "-q quorum_host_or_ip" to specify a quorum server 

 

Wipe out the comments, 

sed -e '
    /^#/d;
    /^$/d;
    /^[[:space:]]*#/d;
    ' cluster1.conf.dist > cluster1.conf

 

Edit the cluster configuration, 

vi cluster1.conf

and define STATIONARY_IP, change MAX_CONFIGURED_PACKAGES and NODE_TIMEOUT. 

 

Example with a lock LUN, 

CLUSTER_NAME            cluster1

 

NODE_NAME               sg1
  NETWORK_INTERFACE     eth0
  STATIONARY_IP         10.1.1.11
  NETWORK_INTERFACE     eth1
  HEARTBEAT_IP          10.1.2.11
  NETWORK_INTERFACE     eth2
  HEARTBEAT_IP          10.1.3.11
  CLUSTER_LOCK_LUN      /dev/sdb1

 

NODE_NAME               sg2
  NETWORK_INTERFACE     eth0
  STATIONARY_IP         10.1.1.12
  NETWORK_INTERFACE     eth1
  HEARTBEAT_IP          10.1.2.12
  NETWORK_INTERFACE     eth2
  HEARTBEAT_IP          10.1.3.12
  CLUSTER_LOCK_LUN      /dev/sdb1

 

HEARTBEAT_INTERVAL              1000000
NODE_TIMEOUT                    8000000
AUTO_START_TIMEOUT              600000000
NETWORK_POLLING_INTERVAL        2000000
MAX_CONFIGURED_PACKAGES         5

 

Example with a Quorum server, 

CLUSTER_NAME            cluster1
QS_HOST                 10.1.1.10
QS_POLLING_INTERVAL     300000000

 

NODE_NAME               sg1
  NETWORK_INTERFACE     eth0
  STATIONARY_IP         10.1.1.11
  NETWORK_INTERFACE     eth1
  HEARTBEAT_IP          10.1.2.11
  NETWORK_INTERFACE     eth2
  HEARTBEAT_IP          10.1.3.11

 

NODE_NAME               sg2
  NETWORK_INTERFACE     eth0
  STATIONARY_IP         10.1.1.12
  NETWORK_INTERFACE     eth1
  HEARTBEAT_IP          10.1.2.12
  NETWORK_INTERFACE     eth2
  HEARTBEAT_IP          10.1.3.12

 

HEARTBEAT_INTERVAL              1000000
NODE_TIMEOUT                    8000000
AUTO_START_TIMEOUT              600000000
NETWORK_POLLING_INTERVAL        2000000
MAX_CONFIGURED_PACKAGES         5

 

 

Synchronize files among nodes 

Make sure those files are synchronized among the nodes, 

cat > files.list <<EOF9
/etc/hosts
/etc/resolv.conf
/usr/local/cmcluster/conf/cmclnodelist
/usr/local/cmcluster/conf/cluster1.conf
/usr/local/cmcluster/conf/license.txt
/usr/local/cmcluster/conf/cmcluster.rc
EOF9

 

Here's a script for that, 

scp -p `cat files.list` sg2:/root/conf

 

 

Run the cluster 

Verify and apply the cluster configuration, 

cmcheckconf -C cluster1.conf
cmapplyconf -k -C cluster1.conf

 

Enable automatic cluster join, 

vi /usr/local/cmcluster/conf/cmcluster.rc

like, 

AUTOSTART_CMCLD=1

 

On each node, start the daemon at the same time, 

/etc/init.d/cmcluster.init start
#cmrunnode

Note. "/etc/init.d/SGSafetyTimer" is called at boot time. 

 

Check the cluster port is listening, 

lsof -i:5302
#netstat -apne --inet | grep 5302

 

Check for the cluster status, 

tail /var/log/messages
cmquerycl (-v)
cmviewcl (-v)

 

 

SMH Serviceguard Manager installation 

The good old SG Manager GUI used to work for <= SG 11.17. But for SG >=11.18 you need to install the SMH SG Manager on one of the nodes at least. 

 

Make sure you have libXp installed, 

rpm -q libXp

 

Install Java 1.6 (http://java.sun.com/javase/downloads/index.jsp), 

chmod +x jdk-6u14-linux-i586-rpm.bin
./jdk-6u14-linux-i586-rpm.bin

 

Install SMH Serviceguard Manager 

mv /etc/redhat-release /etc/redhat-release.dist
echo "Red Hat Enterprise Linux Server release 5.3" > /etc/redhat-release
cd /mnt/cdrom/x86_x86-64/RedHat5/SGManager/IA32
rpm -ivh hpsmh-2.1.8-177.linux.i386.rpm
rpm -ivh hpsmh-tomcat-1.0-11.linux.i386.rpm
/opt/hp/hpsmh/tomcat/bin/tomcat_cfg

provide the real path the java executable (NOT the /usr/bin/java link), 

/usr/java/default/bin/java

Note. make sure you provide the real path, not the "/usr/bin/java" link. Otherwise you will get this error from the interface, 

The proxy server received an invalid response from an upstream server.

Note. nevermind the chown/chgrp/chmod errors against "tomcat/keystore/" 

 

Install SG Manager, 

cd /mnt/cdrom/x86_x86-64/RedHat5/SGManager/IA32
rpm -ivh sgmgrpi-B.01.01.01-1.rhel5.i386.rpm

restart SMH, 

/etc/init.d/hpsmhd restart

 

Get to the node's web interface, 

https://10.1.1.11:2381/

login, 

root/xxxxxxxx

 

 

System update 

No service interruption. You can join a cluster as long MCSG revisions match, RHEL versions may be temporarly different. "pidentd" and "deadman" kernel modules need to be recompiled. Ref. "/usr/local/cmcluster/bin/drivers/README" 

 

Stop the node, disable cluster auto start, 

cmhaltnode -f
vi /usr/local/cmcluster/conf/cmcluster.rc

change, 

~AUTOSTART=0

note. if you're using a lock LUN, make sure it doesn't change after Redhat upgrade (otherwise you'll have to change cluster's config) 

and reboot with the Redhat installation CD, 

shutdown -r now

Note. if you have a SAN connected, stop at the boot loader prompt and disconnect the fiber channels (at this point HBA modules aren't loaded). 

 

Proceed with the Redhat update and, 

Create a new boot loader configuration (GRUB)

Note. if you have a SAN connected, take the chance to enter init 1 at server boot to update the HBA modules. 

 

Start the server and update the deadman and pidentd modules (no MCSG update here, just MCSG's modules update), 

rpm -qa | grep authd
rpm -Uvh --force pidentd*.rpm
ll /dev/pidentd
rpm -Uvh --replacefiles --replacepkgs serviceguard*.rpm
ll /dev/deadman

Note. sgcmom has no modules 

Note. "authd" may appear after an RHEL update, hence the "--force" 

 

Enable auto start, 

vi /usr/local/cmcluster/conf/cmcluster.rc

change, 

~AUTOSTART=1

start the node, 

cmrunnode

 

 

MCSG update 

Note. service interruption. MCSG revisions need to be the same to join or form a cluster. 

 

Update MCSG, 

rpm -Uvh pidentd*.rpm
/etc/init.d/identd restart
ll /dev/pidentd
rpm -Uvh sgcmon*.rpm
rpm -Uvh serviceguard*.rpm
ll /dev/deadman

 

Also get the latest patches, 

http://itrc.hp.com/ > Patch database > ...

and install them, 

tar xvf SGLX*.tar
cd SGLX*/tools
./sgupdate
ll /dev/deadman
cd ../pidentd
rpm -Uvh pidentd*.rpm
/etc/init.d/identd restart
ll /dev/pidentd

 

Make sure the short hostnames are added for any node IP, 

vi /etc/hosts

like 

IPbond0    hostname  shorthost
IPethX    hostname  shorthost

 

Reapply the cluster configuration, 

cmcheckconf -C cluster.conf
cmapplyconf -C cluster.conf

reapply the package(s) configuration(s), 

cmcheckconf -P package/package.conf
cmapplyconf -P package/package.conf

 

Refs. 

Eval CDrom's "Read_Me_First.txt" 

Release Notes / Setps for Rolling Upgrade 

Managing Serviceguard / Procedure - Performing the Rolling Upgrade 

 

 

Add a node to an existing cluster 

- make the same network config with redundant heartbeat interfaces or bonding 

- update /etc/hosts on all nodes and quorum if there is one 

- edit and copy the cluster ascii configuration file 

- edit and copy cmclnodelist 

- if quroum, add the new node to quorum's authfile 

- if lock lun, make sure it's the same device, otherwise update the cluster conf 

- check and apply the the cluster conf 

- reconfigure package's failover for the new node (http://pbraun.nethence.com/doc/sysutils/mcsg_package.html) 

 

 

References 

Serviceguard for Linux : http://docs.hp.com/en/ha.html#Serviceguard%20for%20Linux 

Software Depot home : http://h20392.www2.hp.com/portal/swdepot/index.do 

 

PDF references 

Managing HP Serviceguard for Linux, Eighth Edition : B9903-90060.pdf 

Managing HP Serviceguard for Linux, Ninth Edition : B9903-90068.pdf 

HP Serviceguard for Linux Version A.11.18 Release Notes : B9903-90071.pdf 

HP Serviceguard for Linux Version A.11.19 Release Notes : B9903-90067.pdf 

 

ITRC Forum references 

Invalid data for cluster lock LUN configuration : http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1343526&admit=109447626+1243612148626+28353475 

SG/LX and iSCSI : http://forums11.itrc.hp.com/service/forums/bizsupport/questionanswer.do?threadId=1061907 

why initiator sends LUN reset command to target? : http://www.nabble.com/why-initiator-sends-LUN-reset-command-to-target--td20122708.html 

http://thomasvogt.wordpress.com/2008/08/26/mcserviceguard-cluster-installation-on-hp-ux-1131/