this is obsolete doc -- see http://doc.nethence.com/ instead
MC/ServiceGuard on RHEL5
http://pbraun.nethence.com/doc/sysutils/mcsg.html
http://pbraun.nethence.com/doc/sysutils/mcsg_package.html
Introduction
We're using "MC/Serviceguard for Linux v11.18" on RHEL5. In a 2-node cluster, a cluster lock is required. And it's strongly recommended for 3 and 4-node clusters anyways. Choose one of the following lock methods :
- Lock LUN
- Quorum server
Note. on HP/UX only there used to be a third alternative : Cluster Lock Disk, which must be part of an LVM volume group.
If you're using an evaluation version of MC/SG, refer to "Read_Me_First.txt" (at CD's root dir) to ask for a eval licence to HP by mail.
Dependencies
Make sure those packages are installed,
rpm -q \
xinetd \
sg3_utils \
net-snmp \
lm_sensors \
tog-pegasus \
| grep ^package
Note. "xinetd" is mandatory.
Note. "net-snmp" if installing cmsnmpd
Note. "tog-pegasus" if installing "sgproviders" for WBEM
As for RHEL5u3 with kernel >= 2.6.18.128, you also need this package,
yum install libnl
Ref. ftp://ftp.hp.com/pub/c-products/servers/ha/linux/SGLX_Certification_Matrix.pdf
Make sure this package is NOT installed,
rpm -e authd
Also make sure the correct time is set,
yum install ntp
ntpdate ntp.obspm.fr
Configure the nodes
Make sure you have a proper network configuration with at least two seprerated subnets for Heartbeats or one seperated subnet with Bonding for Heartbeat. "/etc/hosts" should look like that,
::1 localhost6.localdomain6 localhost6
127.0.0.1 localhost.localdomain localhost
#10.1.1.10 qs.example.net qs
10.1.1.11 sg1.example.net sg1
10.1.2.11 sg1.example.net sg1
10.1.3.11 sg1.example.net sg1
10.1.1.12 sg2.example.net sg2
10.1.2.12 sg2.example.net sg2
10.1.3.12 sg2.example.net sg2
Note. here subnet 10.1.1 for network use
Note. here subnet 10.1.2 and 10.1.3 for heartbeat
Note. you may need additionnal subnets, in example for iSCSI.
Note. use the same hostname for all subnets.
Mount the SGLX CD and install the RPMs,
cd x86_x86-64/RedHat5/Serviceguard/IA32
rpm -ivh pidentd-3.0.19-0.i386.rpm
rpm -ivh sgcmom-B.05.00.00-0.rhel5.i386.rpm
rpm -ivh serviceguard-A.11.18.02-0.demo.rhel5.i386.rpm
Note. "cmsnmpd-A.01.00-0.rhel5.i386.rpm" for HP SIM.
Note. "sgproviders-A.02.00.00-0.rhel5.i386.rpm" for Web-Based Enterprise Management
Start "identd" and force at boot,
service identd start
chkconfig identd on
Start "xinetd" and force at boot,
service xinetd restart
chkconfig xinetd on
Add those lines to "/root/.bashrc",
PATH=$PATH:/usr/local/cmcluster/bin
. /etc/cmcluster.conf
apply,
source .bashrc
add this line to "/etc/man.config",
MANPATH /usr/local/cmcluster/doc/man
facilitate access to SG configurations,
cd ~/
ln -s /usr/local/cmcluster/conf
Create the node list,
cd conf
vi cmclnodelist
like,
sg1 root
sg2 root
If using VMware, install the Serviceguard for Linux Virtual Machine Toolkit (http://h20392.www2.hp.com/portal/swdepot/index.do > High Availability > Serviceguard for Linux Contributed Toolkit Suite).
Configure a lock LUN
Make sure you have access to a SAN disk (http://pbraun.nethence.com/doc/sysutils_linux/iscsi.html),
fdisk -l
#sfdisk -s
Use fdisk to create a 1-cylinder primary partition which is at least 100k large at the start of the disk,
fdisk /dev/sdb
Note. partition type ID "83" as for Linux.
Reread the partition table on the other nodes,
sfdisk -R /dev/sdb
Refs.
Serviceguard Lock LUN within VMware ESX : http://forums11.itrc.hp.com/service/forums/questionanswer.do?admit=109447626+1249221737647+28353475&threadId=1126734
Configure a Quorum server
Mount the SGLX CD and install the RPM,
cd x86_x86-64/RedHat5/Serviceguard/IA32
rpm -ivh qs-A.02.00.04-0.rhel5.i386.rpm
Authorize hosts to connect,
vi /usr/local/qs/conf/qs_authfile
like,
sg1.example.net
sg2.example.net
Add to init,
mkdir -p /var/log/qs
vi /etc/inittab
like,
qs:345:respawn:/usr/local/qs/bin/qs >/var/log/qs/qs 2>/var/log/qs/qs_error
apply,
telinit q
Check it's up,
ps aux | grep qs
note there should be two listening ports,
netstat -an --inet
e.g.,
...
tcp 0 0 0.0.0.0:60277 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:1238 0.0.0.0:* LISTEN
...
Note. 1238 corresponds to the registered "hacl-qs" service.
In case it says "permission denied", just kill the processes,
pkill qsc
they will restart automaticly and reread the "qs_authfile".
Configure the cluster
Create a cluster configuration,
cd conf
cmquerycl -n sg1 -n sg2 -C cluster1.conf.dist >/dev/null
Note. "-L /dev/sdb1" to specify a lock LUN
Note. "-L" may be used either once before all nodes or after every node.
Note. "-q quorum_host_or_ip" to specify a quorum server
Wipe out the comments,
sed -e '
/^#/d;
/^$/d;
/^[[:space:]]*#/d;
' cluster1.conf.dist > cluster1.conf
Edit the cluster configuration,
vi cluster1.conf
and define STATIONARY_IP, change MAX_CONFIGURED_PACKAGES and NODE_TIMEOUT.
Example with a lock LUN,
CLUSTER_NAME cluster1
NODE_NAME sg1
NETWORK_INTERFACE eth0
STATIONARY_IP 10.1.1.11
NETWORK_INTERFACE eth1
HEARTBEAT_IP 10.1.2.11
NETWORK_INTERFACE eth2
HEARTBEAT_IP 10.1.3.11
CLUSTER_LOCK_LUN /dev/sdb1
NODE_NAME sg2
NETWORK_INTERFACE eth0
STATIONARY_IP 10.1.1.12
NETWORK_INTERFACE eth1
HEARTBEAT_IP 10.1.2.12
NETWORK_INTERFACE eth2
HEARTBEAT_IP 10.1.3.12
CLUSTER_LOCK_LUN /dev/sdb1
HEARTBEAT_INTERVAL 1000000
NODE_TIMEOUT 8000000
AUTO_START_TIMEOUT 600000000
NETWORK_POLLING_INTERVAL 2000000
MAX_CONFIGURED_PACKAGES 5
Example with a Quorum server,
CLUSTER_NAME cluster1
QS_HOST 10.1.1.10
QS_POLLING_INTERVAL 300000000
NODE_NAME sg1
NETWORK_INTERFACE eth0
STATIONARY_IP 10.1.1.11
NETWORK_INTERFACE eth1
HEARTBEAT_IP 10.1.2.11
NETWORK_INTERFACE eth2
HEARTBEAT_IP 10.1.3.11
NODE_NAME sg2
NETWORK_INTERFACE eth0
STATIONARY_IP 10.1.1.12
NETWORK_INTERFACE eth1
HEARTBEAT_IP 10.1.2.12
NETWORK_INTERFACE eth2
HEARTBEAT_IP 10.1.3.12
HEARTBEAT_INTERVAL 1000000
NODE_TIMEOUT 8000000
AUTO_START_TIMEOUT 600000000
NETWORK_POLLING_INTERVAL 2000000
MAX_CONFIGURED_PACKAGES 5
Synchronize files among nodes
Make sure those files are synchronized among the nodes,
cat > files.list <<EOF9
/etc/hosts
/etc/resolv.conf
/usr/local/cmcluster/conf/cmclnodelist
/usr/local/cmcluster/conf/cluster1.conf
/usr/local/cmcluster/conf/license.txt
/usr/local/cmcluster/conf/cmcluster.rc
EOF9
Here's a script for that,
scp -p `cat files.list` sg2:/root/conf
Run the cluster
Verify and apply the cluster configuration,
cmcheckconf -C cluster1.conf
cmapplyconf -k -C cluster1.conf
Enable automatic cluster join,
vi /usr/local/cmcluster/conf/cmcluster.rc
like,
AUTOSTART_CMCLD=1
On each node, start the daemon at the same time,
/etc/init.d/cmcluster.init start
#cmrunnode
Note. "/etc/init.d/SGSafetyTimer" is called at boot time.
Check the cluster port is listening,
lsof -i:5302
#netstat -apne --inet | grep 5302
Check for the cluster status,
tail /var/log/messages
cmquerycl (-v)
cmviewcl (-v)
SMH Serviceguard Manager installation
The good old SG Manager GUI used to work for <= SG 11.17. But for SG >=11.18 you need to install the SMH SG Manager on one of the nodes at least.
Make sure you have libXp installed,
rpm -q libXp
Install Java 1.6 (http://java.sun.com/javase/downloads/index.jsp),
chmod +x jdk-6u14-linux-i586-rpm.bin
./jdk-6u14-linux-i586-rpm.bin
Install SMH Serviceguard Manager
mv /etc/redhat-release /etc/redhat-release.dist
echo "Red Hat Enterprise Linux Server release 5.3" > /etc/redhat-release
cd /mnt/cdrom/x86_x86-64/RedHat5/SGManager/IA32
rpm -ivh hpsmh-2.1.8-177.linux.i386.rpm
rpm -ivh hpsmh-tomcat-1.0-11.linux.i386.rpm
/opt/hp/hpsmh/tomcat/bin/tomcat_cfg
provide the real path the java executable (NOT the /usr/bin/java link),
/usr/java/default/bin/java
Note. make sure you provide the real path, not the "/usr/bin/java" link. Otherwise you will get this error from the interface,
The proxy server received an invalid response from an upstream server.
Note. nevermind the chown/chgrp/chmod errors against "tomcat/keystore/"
Install SG Manager,
cd /mnt/cdrom/x86_x86-64/RedHat5/SGManager/IA32
rpm -ivh sgmgrpi-B.01.01.01-1.rhel5.i386.rpm
restart SMH,
/etc/init.d/hpsmhd restart
Get to the node's web interface,
https://10.1.1.11:2381/
login,
root/xxxxxxxx
System update
No service interruption. You can join a cluster as long MCSG revisions match, RHEL versions may be temporarly different. "pidentd" and "deadman" kernel modules need to be recompiled. Ref. "/usr/local/cmcluster/bin/drivers/README"
Stop the node, disable cluster auto start,
cmhaltnode -f
vi /usr/local/cmcluster/conf/cmcluster.rc
change,
~AUTOSTART=0
note. if you're using a lock LUN, make sure it doesn't change after Redhat upgrade (otherwise you'll have to change cluster's config)
and reboot with the Redhat installation CD,
shutdown -r now
Note. if you have a SAN connected, stop at the boot loader prompt and disconnect the fiber channels (at this point HBA modules aren't loaded).
Proceed with the Redhat update and,
Create a new boot loader configuration (GRUB)
Note. if you have a SAN connected, take the chance to enter init 1 at server boot to update the HBA modules.
Start the server and update the deadman and pidentd modules (no MCSG update here, just MCSG's modules update),
rpm -qa | grep authd
rpm -Uvh --force pidentd*.rpm
ll /dev/pidentd
rpm -Uvh --replacefiles --replacepkgs serviceguard*.rpm
ll /dev/deadman
Note. sgcmom has no modules
Note. "authd" may appear after an RHEL update, hence the "--force"
Enable auto start,
vi /usr/local/cmcluster/conf/cmcluster.rc
change,
~AUTOSTART=1
start the node,
cmrunnode
MCSG update
Note. service interruption. MCSG revisions need to be the same to join or form a cluster.
Update MCSG,
rpm -Uvh pidentd*.rpm
/etc/init.d/identd restart
ll /dev/pidentd
rpm -Uvh sgcmon*.rpm
rpm -Uvh serviceguard*.rpm
ll /dev/deadman
Also get the latest patches,
http://itrc.hp.com/ > Patch database > ...
and install them,
tar xvf SGLX*.tar
cd SGLX*/tools
./sgupdate
ll /dev/deadman
cd ../pidentd
rpm -Uvh pidentd*.rpm
/etc/init.d/identd restart
ll /dev/pidentd
Make sure the short hostnames are added for any node IP,
vi /etc/hosts
like
IPbond0 hostname shorthost
IPethX hostname shorthost
Reapply the cluster configuration,
cmcheckconf -C cluster.conf
cmapplyconf -C cluster.conf
reapply the package(s) configuration(s),
cmcheckconf -P package/package.conf
cmapplyconf -P package/package.conf
Refs.
Eval CDrom's "Read_Me_First.txt"
Release Notes / Setps for Rolling Upgrade
Managing Serviceguard / Procedure - Performing the Rolling Upgrade
Add a node to an existing cluster
- make the same network config with redundant heartbeat interfaces or bonding
- update /etc/hosts on all nodes and quorum if there is one
- edit and copy the cluster ascii configuration file
- edit and copy cmclnodelist
- if quroum, add the new node to quorum's authfile
- if lock lun, make sure it's the same device, otherwise update the cluster conf
- check and apply the the cluster conf
- reconfigure package's failover for the new node (http://pbraun.nethence.com/doc/sysutils/mcsg_package.html)
References
Serviceguard for Linux : http://docs.hp.com/en/ha.html#Serviceguard%20for%20Linux
Software Depot home : http://h20392.www2.hp.com/portal/swdepot/index.do
PDF references
Managing HP Serviceguard for Linux, Eighth Edition : B9903-90060.pdf
Managing HP Serviceguard for Linux, Ninth Edition : B9903-90068.pdf
HP Serviceguard for Linux Version A.11.18 Release Notes : B9903-90071.pdf
HP Serviceguard for Linux Version A.11.19 Release Notes : B9903-90067.pdf
ITRC Forum references
Invalid data for cluster lock LUN configuration : http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1343526&admit=109447626+1243612148626+28353475
SG/LX and iSCSI : http://forums11.itrc.hp.com/service/forums/bizsupport/questionanswer.do?threadId=1061907
why initiator sends LUN reset command to target? : http://www.nabble.com/why-initiator-sends-LUN-reset-command-to-target--td20122708.html
http://thomasvogt.wordpress.com/2008/08/26/mcserviceguard-cluster-installation-on-hp-ux-1131/