Saturday, October 27, 2012

Test HA 2: Veritas cluster/Oracle setup



Following our test iSCSI setup without hardware, here is an example of typical VCS/Oracle fail-over setup. This is on RHEL5

setup heartbeat network

You need 2 more virtual network cards on node1 and node2, preferably on separate logical networks:
If needed: re-run the vmware config (/usr/bin/vmware-config.pl) to create 2 local 'host-only' subnets 192.168.130.0 and 192.168.131.0 because I suspect LLT may not work on the same bridged network
Then add 2 more network cards in each VM

Assign the addresses (system-config-network). In our example we will use:

node1:
 192.168.130.160/24  (eth1)
 192.168.131.160/24  (eth2)

node2:
 192.168.130.161/24  (eth1)
 192.168.131.161/24  (eth2)

run system-config-network and setup accordingly, then '/etc/init.d/network restart'

from node1 perform basic tests:
 ping 192.168.130.161

 ping 192.168.131.161

VCS prerequisites


note this is for VCS5.1 on RHEL5.5 check the install manual

# yum install compat-libgcc compat-libstdc++ glibc-2.5 libgcc glibc libgcc libstdc++ java-1.4.2

append to /etc/hosts, for easier admin, on each node:
 192.168.0.201 node1
 192.168.0.202 node2


ssh keys:


 ssh-keygen -t dsa  (on each node)

 node1# scp /root/.ssh/id_dsa.pub node2:/root/.ssh/authorized_keys2
 node2# scp /root/.ssh/id_dsa.pub node1:/root/.ssh/authorized_keys2

Verify you can connect without password from node1 to node2, and the other way around
node1# ssh node2

Update .bash_profile

PATH=/opt/VRTS/bin:$PATH; export PATH
MANPATH=/usr/share/man:/opt/VRTS/man; export MANPATH

kernel panic
 sysctl -w kernel.panic=10

precheck (from the VCS cd, or tar.gz extracted):

 ./installvcs -precheck node1 node2


VCS install

 ./installvcs

choices:

 I : install

 1)  Veritas Cluster Server (VCS)

 3)  Install all Veritas Cluster Server rpms - 322 MB required

 Enter the 64 bit RHEL5 system names separated by spaces: [q,?] node1 node2

(enter license key or 60days without)

 Would you like to configure VCS on node1 node2 [y,n,q] (n) y

 Enter the unique cluster name: [q,?] vmclu160
 Enter a unique Cluster ID number between 0-65535: [b,q,?] (0) 160

 Enter the NIC for the first private heartbeat link on node1: [b,q,?] eth1
 eth1 has an IP address configured on it. It could be a public NIC on node1.
 Are you sure you want to use eth1 for the first private heartbeat link?
 [y,n,q,b,?] (n) y
 Is eth1 a bonded NIC? [y,n,q] (n)
 Would you like to configure a second private heartbeat link? [y,n,q,b,?] (y)
 Enter the NIC for the second private heartbeat link on node1: [b,q,?] eth2
 eth2 has an IP address configured on it. It could be a public NIC on node1.
 Are you sure you want to use eth2 for the second private heartbeat link?
 [y,n,q,b,?] (n) y
 Is eth2 a bonded NIC? [y,n,q] (n)
 Would you like to configure a third private heartbeat link? [y,n,q,b,?] (n) n

 Do you want to configure an additional low priority heartbeat link?
 [y,n,q,b,?] (n) y
 Enter the NIC for the low priority heartbeat link on node1: [b,q,?] (eth0)
 Is eth0 a bonded NIC? [y,n,q] (n)
 Are you using the same NICs for private heartbeat links on all systems?
 [y,n,q,b,?] (y) y


 Cluster information verification:
        Cluster Name:      vmclu160
        Cluster ID Number: 160
        Private Heartbeat NICs for node1:
                link1=eth1
                link2=eth2
        Low Priority Heartbeat NIC for node1: link-lowpri=eth0
        Private Heartbeat NICs for node2:
                link1=eth1
                link2=eth2
        Low Priority Heartbeat NIC for node2: link-lowpri=eth0
 Is this information correct? [y,n,q,b,?] (y) y


 Virtual IP can be specified in RemoteGroup resource, and can be used to
 connect to the cluster using Java GUI
 The following data is required to configure the Virtual IP of the Cluster:
        A public NIC used by each system in the cluster
        A Virtual IP address and netmask
 Do you want to configure the Virtual IP? [y,n,q,?] (n) y
 Active NIC devices discovered on node1: eth0 eth1 eth2
 Enter the NIC for Virtual IP of the Cluster to use on node1: [b,q,?] (eth0)
 Is eth0 to be the public NIC used by all systems? [y,n,q,b,?] (y)
 Enter the Virtual IP address for the Cluster: [b,q,?] 192.168.0.203
 Enter the NetMask for IP 192.168.0.203: [b,q,?] (255.255.255.0)
 Would you like to configure VCS to use Symantec Security Services? [y,n,q] (n)

 Do you want to set the username and/or password for the Admin user
 (default username = 'admin', password='password')? [y,n,q] (n) y
 Enter the user name: [b,q,?] (admin)
 Enter the password:
 Enter again:
 Do you want to add another user to the cluster? [y,n,q] (n) n

For this test setup, answer n to SMTP and Global cluster and let it restart


node1# hastatus -sum
 -- SYSTEM STATE
 -- System               State                Frozen
 A  node1                RUNNING              0
 A  node2                RUNNING              0
 -- GROUP STATE
 -- Group           System               Probed     AutoDisabled    State       
 B  ClusterService  node1                Y          N               ONLINE      
 B  ClusterService  node2                Y          N               OFFLINE


Oracle binary install


Reference: Database Installation Guide for Linux

On each node:

# yum install gcc elfutils-libelf-devel glibc-devel libaio-devel libstdc++-devel unixODBC unixODBC-devel gcc-c++

# groupadd dba
# groupadd oinstall
# useradd -m oracle -g oinstall -G dba,asmdba
# passwd oracle

 # cat >>  /etc/security/limits.conf
 oracle          hard    nofiles         65536
 oracle          soft    nofiles         65536

For RHEL5.5:
 # cat >>  /etc/sysctl.conf
 kernel.sem = 250        32000   100      128
 fs.file-max = 6815744
 net.ipv4.ip_local_port_range = 9000    65500
 net.core.rmem_default = 262144
 net.core.rmem_max = 4194304
 net.core.wmem_default = 262144
 net.core.wmem_max = 4194304
 fs.aio-max-nr = 1048576

 # /sbin/sysctl -p

 # mkdir /opt/oracle
 # chown oracle.oinstall /opt/oracle/

As ORACLE user:
On this example setup, Oracle binaries are installed on both nodes, in /opt/oracle
extract the Oracle distrib 11gr2, ensure you have an X connection and run:
 $ ./runInstaller

install database software only
single instance database installation
enterprise
(select options -> only partitioning)
oracle base=/opt/oracle/app/oracle
sw loc=/opt/oracle/app/oracle/product/11.2.0/dbhome_1
leave other defaults


 $ cat >> ~/.bash_profile
 export ORACLE_HOME=/opt/oracle/app/oracle/product/11.2.0/dbhome_1
 export PATH=$ORACLE_HOME/bin:$PATH

Oracle instance install


mount the shared disk on /database (as root)
# mkdir /database
# chown oracle.dba /database      (do these on both nodes)

# mount /dev/sdd1 /database/      (do this only on node1, to create the instance)


create the instance with dbca (as oracle)

$ dbca

create a test database, especially set:

 Use common location for all database files: /database

 $ export ORACLE_SID=DTST    (and add this to oracle .bash_profile on both nodes)


 $ sqlplus "/ as sysdba"
 SQL> select * from dual;
 SQL> shutdown immediate

copy spfile to the other node (as oracle):
$ scp /opt/oracle/app/oracle/product/11.2.0/dbhome_1/dbs/spfileDTST.ora node2:/opt/oracle/app/oracle/product/11.2.0/dbhome_1/dbs

copy also the directory structure created for audit logs:
$ scp -r /opt/oracle/app/oracle/admin/DTST node2:/opt/oracle/app/oracle/admin/DTST
it seems the /opt/oracle/app/oracle/diag/rdbms/dtst strcuture for traces etc.. is created automatically)


set $ORACLE_HOME/network/admin/listener.ora with the virtual IP we will use, here : 192.168.0.204

 LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.0.204)(PORT = 1521))
    )
  )
 ADR_BASE_LISTENER = /opt/oracle/app/oracle
 SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = DTST)
      (ORACLE_HOME =/opt/oracle/app/oracle/product/11.2.0/dbhome_1)
      (SID_NAME = DTST)
    )
  )

and tnsnames.ora:

 DTST =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.0.204)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = DTST)
    )
  )

copy both files to the other node (as oracle):
$ scp /opt/oracle/app/oracle/product/11.2.0/dbhome_1/network/admin/*.ora node2:/opt/oracle/app/oracle/product/11.2.0/dbhome_1/network/admin

VCS service group config for Oracle

umount shared disk

# umount /database


update /etc/VRTSvcs/conf/config/main.cf and add the following service group:


 group OraGroup (
        SystemList = { node1 = 0, node2 = 1 }
        AutoStartList = { node1, node2 }
        )
        DiskReservation DR_ora (
                Disks @node1 = { "/dev/sdd" }
                Disks @node2 = { "/dev/sdd" }
                FailFast = 1
                )
        Mount Mount_oraprod_dfiles (
                MountPoint = "/database"
                BlockDevice = "/dev/sdd1"
                FSType = ext3
                FsckOpt = "-n"
                )
        IP IP_oraprod (
                Device = eth0
                Address = "192.168.0.204"
                NetMask = "255.255.250.0"
                )
        NIC NIC_oraprod (
                Device = eth0
                )
        Netlsnr LSNR_oraprod_lsnr (
                Owner = oracle
                Home = "/opt/oracle/app/oracle/product/11.2.0/dbhome_1"
                TnsAdmin = "/opt/oracle/app/oracle/product/11.2.0/dbhome_1/network/admin"
                Listener = LISTENER
                )
        Oracle ORA_oraprod (
                Sid =DTST
                Owner = oracle
                Home = "/opt/oracle/app/oracle/product/11.2.0/dbhome_1"
                Pfile = "/opt/oracle/app/oracle/product/11.2.0/dbhome_1/dbs/initDTST.ora"
                StartUpOpt = STARTUP
                )
        IP_oraprod requires NIC_oraprod
        LSNR_oraprod_lsnr requires IP_oraprod
        LSNR_oraprod_lsnr requires ORA_oraprod
        Mount_oraprod_dfiles requires DR_ora
        ORA_oraprod requires Mount_oraprod_dfiles


note: this is a simple setup: no Veritas Volume,no  DetailMonitoring of the DB (ie hangs = no failure detection)

check:
# hacf -verify /etc/VRTSvcs/conf/config/

stop/start the cluster to re-read the main.cf (in prod we could use hacf -cftocmd .../config/ and run main.cmd etc.)

# hastop -all
# hastart

# hastart         (on node2)

verify log while starting:

# tail -f /var/VRTSvcs/log/engine_A.log

# hastatus -sum

Failover quick test


Simulate a problem on oracle

$ su - oracle
$ sqlplus "/ as sysdba"
 SQL> shutdown immediate

and verify if fail-over works.

connect for real to the database on node2 and check it is OPEN

 SQL> select * from dual;
 D
 -
 X

 SQL> select STATUS from V$instance;
 STATUS
 ------------
 OPEN

Of course we have also to test with an external client, using the VIP


Conclusion: 

We have a working test cluster without special hardware, using the iSCSI target from the previous post.
At this point we can backup the 2 Virtual machines (rac1, rac2), as well as the file used for the iSCSI disk. And experiment at will...

No comments:

Post a Comment