CRS is Not Starting has a disk HB, but no network HB, DHB has rcfg
Oracle CRS is not starting
"has a disk HB, but no network HB, DHB has rcfg..." in ocssd log
Node2 is Terminated and cluster services not Up
automatically. so when we trying to Start cluster services in Node2 found the
Below error in ocssd.log file
2018-04-01 00:03:27.519: [
CSSD][1025612096]clssnmLocalJoinEvent: takeover aborted due to cluster
member node found on disk
2018-04-01 00:03:28.017: [
CSSD][1020881216]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2018-04-01 00:03:28.050: [
CSSD][1016133952]clssnmvDHBValidateNCopy: node 1, act-racnode01, has a disk HB, but no network HB,
DHB has rcfg 299789247, wrtcnt,
353554863, LATS 492944, lastSeqNo 353554860, uniqueness 1520021542, timestamp
1522521207/2570305164
If you look CRS
check you'll see following:
[root@<node2> <node2>]# /u01/app/11.2.0/grid/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
[root@<node2> <node2>]# /u01/app/11.2.0/grid/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
So we came to know this is Due to Interconnectivity Failure
between Node2 and Node1
You can check
Ping and SSH between nodes via interconnect interface.
If they are not working then there is problem in network connection between cluster nodes. Fix the problem and CRS will start correctly.
But if Ping and SSH did work between nodes via interconnect interface and still ocssd log did complain about interconnect HeartBeat (no network HB) then interconnect interface is jammed. You can try to restart it to get it fixed (NOTE! It is usually the working node interconnect interface that is needed to restart (like error message is saying in ocssd.log (it is complaining node1)). For example if node2 CRS is not starting then restart node1 interconnect interface ) :
If they are not working then there is problem in network connection between cluster nodes. Fix the problem and CRS will start correctly.
But if Ping and SSH did work between nodes via interconnect interface and still ocssd log did complain about interconnect HeartBeat (no network HB) then interconnect interface is jammed. You can try to restart it to get it fixed (NOTE! It is usually the working node interconnect interface that is needed to restart (like error message is saying in ocssd.log (it is complaining node1)). For example if node2 CRS is not starting then restart node1 interconnect interface ) :
In My Case the ,
ssh and Ping with Private IP Is Working Fine. So we came to know the
Interconnect is Jammed and we went with Restart of Private interconnect.
We need to perform the below commands on Node
which is running Successfully . In My case it is Noe1. Node 1 is running with
out issues . So I Restarted eth1 network in Node1.
[root@<node1> <node1>]# ifdown eth1
[root@<node1> <node1>]# ifup eth1
And check that eth1 is looking ok:
[root@<node1> <node1>]# ifconfig
[root@<node1> <node1>]# ifdown eth1
[root@<node1> <node1>]# ifup eth1
And check that eth1 is looking ok:
[root@<node1> <node1>]# ifconfig
After interface
restart that node2 clusterware is starting again:

Thank you, Great job. It was helpful for me.
ReplyDeleteSur PCY ça marche pas :p
Delete