philtortoise System/DBA+: Test HA 2: Veritas cluster/Oracle setup

Following our test iSCSI setup without hardware, here is an example of typical VCS/Oracle fail-over setup. This is on RHEL5

setup heartbeat network

You need 2 more virtual network cards on node1 and node2, preferably on separate logical networks:
If needed: re-run the vmware config (/usr/bin/vmware-config.pl) to create 2 local 'host-only' subnets 192.168.130.0 and 192.168.131.0 because I suspect LLT may not work on the same bridged network
Then add 2 more network cards in each VM

Assign the addresses (system-config-network). In our example we will use:

node1:
192.168.130.160/24 (eth1)
192.168.131.160/24 (eth2)

node2:
192.168.130.161/24 (eth1)
192.168.131.161/24 (eth2)

run system-config-network and setup accordingly, then '/etc/init.d/network restart'

from node1 perform basic tests:
ping 192.168.130.161

ping 192.168.131.161

VCS prerequisites

note this is for VCS5.1 on RHEL5.5 check the install manual

# yum install compat-libgcc compat-libstdc++ glibc-2.5 libgcc glibc libgcc libstdc++ java-1.4.2

append to /etc/hosts, for easier admin, on each node:
192.168.0.201 node1
192.168.0.202 node2

ssh keys:

ssh-keygen -t dsa (on each node)

node1# scp /root/.ssh/id_dsa.pub node2:/root/.ssh/authorized_keys2

 node2# scp /root/.ssh/id_dsa.pub node1:/root/.ssh/authorized_keys2

Verify you can connect without password from node1 to node2, and the other way around

node1# ssh node2

Update .bash_profile

PATH=/opt/VRTS/bin:$PATH; export PATH

MANPATH=/usr/share/man:/opt/VRTS/man; export MANPATH

kernel panic
sysctl -w kernel.panic=10

precheck (from the VCS cd, or tar.gz extracted):

./installvcs -precheck node1 node2

VCS install

 ./installvcs

choices:

I : install

1) Veritas Cluster Server (VCS)

3) Install all Veritas Cluster Server rpms - 322 MB required

Enter the 64 bit RHEL5 system names separated by spaces: [q,?] node1 node2

(enter license key or 60days without)

Would you like to configure VCS on node1 node2 [y,n,q] (n) y

Enter the unique cluster name: [q,?] vmclu160
Enter a unique Cluster ID number between 0-65535: [b,q,?] (0) 160

Enter the NIC for the first private heartbeat link on node1: [b,q,?] eth1
eth1 has an IP address configured on it. It could be a public NIC on node1.
Are you sure you want to use eth1 for the first private heartbeat link?
[y,n,q,b,?] (n) y
Is eth1 a bonded NIC? [y,n,q] (n)
Would you like to configure a second private heartbeat link? [y,n,q,b,?] (y)
Enter the NIC for the second private heartbeat link on node1: [b,q,?] eth2
eth2 has an IP address configured on it. It could be a public NIC on node1.
Are you sure you want to use eth2 for the second private heartbeat link?
[y,n,q,b,?] (n) y
Is eth2 a bonded NIC? [y,n,q] (n)
Would you like to configure a third private heartbeat link? [y,n,q,b,?] (n) n

Do you want to configure an additional low priority heartbeat link?
[y,n,q,b,?] (n) y
Enter the NIC for the low priority heartbeat link on node1: [b,q,?] (eth0)
Is eth0 a bonded NIC? [y,n,q] (n)
Are you using the same NICs for private heartbeat links on all systems?
[y,n,q,b,?] (y) y

Cluster information verification:
        Cluster Name:      vmclu160
        Cluster ID Number: 160
        Private Heartbeat NICs for node1:
                link1=eth1
                link2=eth2
        Low Priority Heartbeat NIC for node1: link-lowpri=eth0
        Private Heartbeat NICs for node2:
                link1=eth1
                link2=eth2
        Low Priority Heartbeat NIC for node2: link-lowpri=eth0
Is this information correct? [y,n,q,b,?] (y) y

Virtual IP can be specified in RemoteGroup resource, and can be used to
connect to the cluster using Java GUI
The following data is required to configure the Virtual IP of the Cluster:
        A public NIC used by each system in the cluster
        A Virtual IP address and netmask
Do you want to configure the Virtual IP? [y,n,q,?] (n) y
Active NIC devices discovered on node1: eth0 eth1 eth2
Enter the NIC for Virtual IP of the Cluster to use on node1: [b,q,?] (eth0)
Is eth0 to be the public NIC used by all systems? [y,n,q,b,?] (y)
Enter the Virtual IP address for the Cluster: [b,q,?] 192.168.0.203
Enter the NetMask for IP 192.168.0.203: [b,q,?] (255.255.255.0)
Would you like to configure VCS to use Symantec Security Services? [y,n,q] (n)

Do you want to set the username and/or password for the Admin user
(default username = 'admin', password='password')? [y,n,q] (n) y
Enter the user name: [b,q,?] (admin)
Enter the password:
Enter again:
Do you want to add another user to the cluster? [y,n,q] (n) n

For this test setup, answer n to SMTP and Global cluster and let it restart

node1# hastatus -sum

 -- SYSTEM STATE

 -- System               State                Frozen

 A  node1                RUNNING              0

 A  node2                RUNNING              0

 -- GROUP STATE

 -- Group           System               Probed     AutoDisabled    State        

 B  ClusterService  node1                Y          N               ONLINE       

 B  ClusterService  node2                Y          N               OFFLINE

Oracle binary install

Reference: Database Installation Guide for Linux

On each node:

# yum install gcc elfutils-libelf-devel glibc-devel libaio-devel libstdc++-devel unixODBC unixODBC-devel gcc-c++

# groupadd dba

# groupadd oinstall

# useradd -m oracle -g oinstall -G dba,asmdba

# passwd oracle

# cat >> /etc/security/limits.conf

 oracle          hard    nofiles         65536

 oracle          soft    nofiles         65536

For RHEL5.5:

 # cat >>  /etc/sysctl.conf

 kernel.sem = 250        32000   100      128

 fs.file-max = 6815744

 net.ipv4.ip_local_port_range = 9000    65500

 net.core.rmem_default = 262144

 net.core.rmem_max = 4194304

 net.core.wmem_default = 262144

 net.core.wmem_max = 4194304

 fs.aio-max-nr = 1048576

# /sbin/sysctl -p

 # mkdir /opt/oracle

 # chown oracle.oinstall /opt/oracle/

As ORACLE user:
On this example setup, Oracle binaries are installed on both nodes, in /opt/oracle
extract the Oracle distrib 11gr2, ensure you have an X connection and run:
$ ./runInstaller

install database software only
single instance database installation
enterprise
(select options -> only partitioning)
oracle base=/opt/oracle/app/oracle
sw loc=/opt/oracle/app/oracle/product/11.2.0/dbhome_1
leave other defaults

$ cat >> ~/.bash_profile

 export ORACLE_HOME=/opt/oracle/app/oracle/product/11.2.0/dbhome_1

 export PATH=$ORACLE_HOME/bin:$PATH

Oracle instance install

mount the shared disk on /database (as root)

# mkdir /database

# chown oracle.dba /database (do these on both nodes)

# mount /dev/sdd1 /database/ (do this only on node1, to create the instance)

create the instance with dbca (as oracle)

$ dbca

create a test database, especially set:

Use common location for all database files: /database

$ export ORACLE_SID=DTST (and add this to oracle .bash_profile on both nodes)

$ sqlplus "/ as sysdba"

 SQL> select * from dual;

 SQL> shutdown immediate

copy spfile to the other node (as oracle):

$ scp /opt/oracle/app/oracle/product/11.2.0/dbhome_1/dbs/spfileDTST.ora node2:/opt/oracle/app/oracle/product/11.2.0/dbhome_1/dbs

copy also the directory structure created for audit logs:

$ scp -r /opt/oracle/app/oracle/admin/DTST node2:/opt/oracle/app/oracle/admin/DTST

it seems the /opt/oracle/app/oracle/diag/rdbms/dtst strcuture for traces etc.. is created automatically)

set $ORACLE_HOME/network/admin/listener.ora with the virtual IP we will use, here : 192.168.0.204

LISTENER =

  (DESCRIPTION_LIST =

    (DESCRIPTION =

      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.0.204)(PORT = 1521))

)

)

 ADR_BASE_LISTENER = /opt/oracle/app/oracle

 SID_LIST_LISTENER =

  (SID_LIST =

    (SID_DESC =

      (GLOBAL_DBNAME = DTST)

      (ORACLE_HOME =/opt/oracle/app/oracle/product/11.2.0/dbhome_1)

      (SID_NAME = DTST)

)

)

and tnsnames.ora:

 DTST =

  (DESCRIPTION =

    (ADDRESS_LIST =

      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.0.204)(PORT = 1521))

)

    (CONNECT_DATA =

      (SERVICE_NAME = DTST)

)

)

copy both files to the other node (as oracle):

$ scp /opt/oracle/app/oracle/product/11.2.0/dbhome_1/network/admin/*.ora node2:/opt/oracle/app/oracle/product/11.2.0/dbhome_1/network/admin

VCS service group config for Oracle

umount shared disk

# umount /database

update /etc/VRTSvcs/conf/config/main.cf and add the following service group:

 group OraGroup (

        SystemList = { node1 = 0, node2 = 1 }

        AutoStartList = { node1, node2 }

)

        DiskReservation DR_ora (

                Disks @node1 = { "/dev/sdd" }

                Disks @node2 = { "/dev/sdd" }

                FailFast = 1

)

        Mount Mount_oraprod_dfiles (

                MountPoint = "/database"

                BlockDevice = "/dev/sdd1"

                FSType = ext3

                FsckOpt = "-n"

)

        IP IP_oraprod (

                Device = eth0

                Address = "192.168.0.204"

                NetMask = "255.255.250.0"

)

        NIC NIC_oraprod (

                Device = eth0

)

        Netlsnr LSNR_oraprod_lsnr (

                Owner = oracle

                Home = "/opt/oracle/app/oracle/product/11.2.0/dbhome_1"

                TnsAdmin = "/opt/oracle/app/oracle/product/11.2.0/dbhome_1/network/admin"

                Listener = LISTENER

)

        Oracle ORA_oraprod (

                Sid =DTST

                Owner = oracle

                Home = "/opt/oracle/app/oracle/product/11.2.0/dbhome_1"

                Pfile = "/opt/oracle/app/oracle/product/11.2.0/dbhome_1/dbs/initDTST.ora"

                StartUpOpt = STARTUP

)

        IP_oraprod requires NIC_oraprod

        LSNR_oraprod_lsnr requires IP_oraprod

        LSNR_oraprod_lsnr requires ORA_oraprod

        Mount_oraprod_dfiles requires DR_ora

        ORA_oraprod requires Mount_oraprod_dfiles

note: this is a simple setup: no Veritas Volume,no DetailMonitoring of the DB (ie hangs = no failure detection)

check:

# hacf -verify /etc/VRTSvcs/conf/config/

stop/start the cluster to re-read the main.cf (in prod we could use hacf -cftocmd .../config/ and run main.cmd etc.)

# hastop -all

# hastart

# hastart (on node2)

verify log while starting:

# tail -f /var/VRTSvcs/log/engine_A.log

# hastatus -sum

Failover quick test

Simulate a problem on oracle

$ su - oracle

$ sqlplus "/ as sysdba"

 SQL> shutdown immediate

and verify if fail-over works.

connect for real to the database on node2 and check it is OPEN

SQL> select * from dual;
D
-
X

SQL> select STATUS from V$instance;
STATUS
------------
OPEN

Of course we have also to test with an external client, using the VIP

Conclusion:

We have a working test cluster without special hardware, using the iSCSI target from the previous post.
At this point we can backup the 2 Virtual machines (rac1, rac2), as well as the file used for the iSCSI disk. And experiment at will...

philtortoise System/DBA+

Saturday, October 27, 2012

Test HA 2: Veritas cluster/Oracle setup