this is obsolete doc -- see http://doc.nethence.com/ instead

Setting up a cluster base with redhat/centos Cluster suite (RHCS) 

on Centos 6.7 

 

http://pbraun.nethence.com/unix/sysutils_linux/redhat-rhcs.html 

http://pbraun.nethence.com/unix/sysutils_linux/redhat-rhcs-services.html 

 

System Preparation 

(Optional) Proceed as usual with system post-installation and network configuration, but with one additional interface the the heartbeat, 

mv /etc/sysconfig/network-scripts/ifcfg-eth1 /etc/sysconfig/network-scripts/DIST.ifcfg-eth1
vi /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
BOOTPROTO=none
ONBOOT=yes
IPADDR=10.9.9.1
NETMASK=255.255.255.0
#VLAN=yes

apply, 

service network restart

 

Then add the nodes and the manager to hosts (heartbeat seperated network is only optional), 

vi /etc/hosts
10.0.0.X       cnode1 rhcs-cnode1 rhcs-cnode1.example.local
10.0.0.X       cnode2 rhcs-cnode2 rhcs-cnode2.example.local
10.0.0.X       cnode3 rhcs-cnode3 rhcs-cnode3.example.local
#10.0.0.X       manage rhcs-manage rhcs-manage.example.local
10.9.9.1       hbnode1
10.9.9.2       hbnode2
10.9.9.3       hbnode3

 

RHCS installation 

On the cluster nodes, 

yum -y groupinstall "High Availability"
#yum -y install ricci
#yum -y install ccs
chkconfig iptables off
chkconfig ip6tables off
service iptables stop
service ip6tables stop
passwd ricci

 

RHCS Configuration 

On one node, configure the cluster, either from the command line, 

ls -l /etc/cluster/cluster.conf # should not exist
ccs -h localhost --createcluster centosha
cat /etc/cluster/cluster.conf
ccs -h localhost --addnode hbnode1
ccs -h localhost --addnode hbnode2
ccs -h localhost --addnode hbnode3
cat /etc/cluster/cluster.conf

copy/paste the configuration on all the nodes, 

ls -l /etc/cluster/cluster.conf # should not exist
cat > /etc/cluster/cluster.conf <<EOF9
<?xml version="1.0"?>
<cluster config_version="4" name="centosha">
        <fence_daemon/>
        <clusternodes>
                <clusternode name="hbnode1" nodeid="1"/>
                <clusternode name="hbnode2" nodeid="2"/>
                <clusternode name="hbnode3" nodeid="3"/>
        </clusternodes>
        <cman/>
        <fencedevices/>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
</cluster>
EOF9
chmod 640 /etc/cluster/cluster.conf 

 

Ready to go 

Start the services on all the nodes, 

chkconfig ricci on
chkconfig cman on
chkconfig rgmanager on
chkconfig modclusterd on
service ricci start
service cman start
service rgmanager start
service modclusterd start

 

Creating fencing device (fence_soap) 

See available fencing device possibles, 

ccs -h localhost --lsfenceopts

more details on france_vmware_soap, 

ccs -h localhost --lsfenceopts fence_vmware_soap

 

On a cluster node, try to fence a node manually, before you configure fencing, 

fence_vmware_soap -a vcenter_address -l ADDOMAIN\\short -p 'AD_PASSWORD' -o list -z | grep -i rhcs
fence_vmware_soap -a vcenter_address -l shortlogin@addomain.tld -p 'AD_PASSWORD' -o status -n VM-rhcs-cnode2 -z
fence_vmware_soap -a vcenter_address -l shortlogin@addomain.tld -p 'AD_PASSWORD' -o off -n VM-rhcs-cnode2 -z
fence_vmware_soap -a vcenter_address -l shortlogin@addomain.tld -p 'AD_PASSWORD' -o on -n VM-rhcs-cnode2 -z
(and rejoin the cluster)

Note. both ADDOMAIN\\ or @addomain.tld works 

 

If it works for you, you are ready to configure the fencing device, 

ccs -h localhost --addfencedev fence_soap agent=fence_vmware_soap ipaddr="vcenter_address" login="short@addomain.tld" passwd="AD_PASSWORD"

or manually (increment the config version), 

vi /etc/cluster/cluster.conf
<cluster config_version="5" name="centosha">
        <fencedevices>
                <fencedevice agent="fence_vmware_soap" ipaddr="vcenter_address" login="short@addomain.tld" name="fence_soap" passwd="AD_PASSWORD" ssl="1"/>
        </fencedevices>

check and populate the changes, 

ccs -h localhost --lsfencedev
cat /etc/cluster/cluster.conf
ccs_config_validate
cman_tool version -r

 

Setting up fencing for every node 

Now that a fence device agent is configured (fence_soap), enable fence methods for the nodes. 

 

Get the name and UUID of the cluster nodes from the vcenter, 

fence_vmware_soap -a vcenter_address -l ADDOMAIN\\short -p 'AD_PASSWORD' -o list -z | grep -i rhcs

 

Then on e.g. cnode1, 

grep name= /etc/cluster/cluster.conf | grep fence
ccs -h localhost --addmethod soap cnode1
ccs -h localhost --addmethod soap cnode2
ccs -h localhost --addmethod soap cnode3

link the methods to the device agent, 

ccs -h localhost --addfenceinst fence_soap cnode1 soap port="VM_NAME" uuid="UUID"
ccs -h localhost --addfenceinst fence_soap cnode2 soap port="VM_NAME" uuid="UUID"
ccs -h localhost --addfenceinst fence_soap cnode3 soap port="VM_NAME" uuid="UUID"

check and populate the changes, 

ccs_config_validate
cman_tool version -r

 

Now check that fencing status responds on every node, 

fence_check

 

Troubleshooting 

Enable debugging, 

ccs -h localhost --setlogging debug=on

 

If you get this error trying to add fence_vmware_soap using css, 

"Validation Failure, unable to modify configuration file"

then proceed manually as show above, step by step. 

Ref. (unrelated) Bug 725722 - cluster.rng from ccs needs to match cluster.rng from cman: https://bugzilla.redhat.com/show_bug.cgi?id=725722 

 

Troubleshooting -- Start from scratch 

If you ever need to start from scratch, on every cluster node, 

chkconfig ricci off
chkconfig cman off
chkconfig rgmanager off
chkconfig modclusterd off
rm -f /etc/cluster/cluster.conf
reboot

 

Optional -- RHCS web interface 

On the manager, 

yum -y groupinstall "High Availability Management"
yum -y install ricci
service ip6tables stop
service luci start
service ricci start

you can now access the management node web interface, 

https://manage:8084/

 

References 

RedHat Cluster Suite And Conga - Linux Clustering: https://www.howtoforge.com/redhat-cluster-suite-and-conga-linux-clustering 

Red Hat Enterprise Linux 6 Cluster Administration: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/Cluster_Administration/ 

 

More refs general 

http://www.golinuxhub.com/2014/02/configure-red-hat-cluster-using-vmware.html 

https://wiki.deimos.fr/Installation_et_Configuration_de_Red_Hat_Cluster_Suite#Luci_2 

http://bigthinkingapplied.com/creating-a-ha-cluster-with-red-hat-cluster-suite-part-2/ 

http://schlutech.com/2011/07/demystifying-high-availability-linux-clustering-technologies/ 

https://alteeve.ca/w/AN!Cluster_Tutorial_2#Node_Host_Names 

 

References about Fencing 

How to configure fence_vmware_soap using the Red Hat Enterprise Linux 6 tool ccs: https://access.redhat.com/solutions/454303 

Appendix A. Fence Device Parameters: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/ap-fence-device-param-CA.html 

 

4.26. VMWare over SOAP API: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Fence_Configuration_Guide/s1-software-fence-vmware-soap-CA.html 

How to test fence devices and fencing configuration in a RHEL 5, 6, or 7 High Availability cluster?: https://access.redhat.com/solutions/18803 

fence_vmware_soap agent fails with error 'Unable to connect/login to fencing device': https://access.redhat.com/solutions/1327053 

9.4. Updating a Configuration: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/s1-admin-updating-config-CA.html 

https://communities.vmware.com/thread/391841?start=0&tstart=0 

https://www.centos.org/forums/viewtopic.php?f=47&t=52403 

http://www.linuxtopia.org/online_books/rhel6/rhel_6_cluster_admin/rhel_6_cluster_s1-config-fence-devices-conga-CA.html