Oracle RAC Internals

Oracle Clusterware is the fundation of Oracle Real Application Cluster (Oracle RAC). Cluster Management Software that runs on top of Operating system maintains cluster coherence between the various nodes in the cluster and manages components, such as shard disk, node membership, instance membership etc. In erlier version os RAC, Oracle only supported nodes with direct access to shared storeage and ASM instance be up and running all the time. In Oracle Clusterware 12c release 2 (12.2), all clusters are configured as Oracle Flex Clusters, meaning that a cluster is configured with one or more Hub Nodes, These hub nodes can have large number of sub nodes. Oracle Flex ASM is requirement for Oracle Flex Cluster.

In the new architecture, Hub nodes are tightly connected. These are the only nodes required direct access to shared storage. These are more like older RAC clusters where DB and ASM instances spread across the nodes. Oracle Flex Clusters may operate with one or many Hub Nodes, but other nodes are optional and can only exist as members of a cluster that includes at least one Hub Node. Oracle Clusterware has two components that is used as the main pillars of the cluster: The voting files, which record node membership information, and the Oracle Cluster Registry (OCR), which records cluster configuration information. Voting files and OCRs must reside on shared storage available to all cluster member nodes.

Oracle Clusterware consists of two separate technology stacks: an upper technology stack anchored by the Cluster Ready Services (CRS) daemon (CRSD) and a lower technology stack anchored by the Oracle High Availability Services daemon (OHASD).

12.2 onwards oracle  introduces a new agent concept which makes the Oracle Clusterware. These agents are multi-threaded daemons which implement entry points for multiple resource types and which spawn new processes for different users. Main agents are oraagent, orarootagent and cssdagent/cssdmonitor, there can be an application agent and a script agent. The two main agents are the oraagent and the orarootagent. Both ohasd and crsd employ one oraagent and one orarootagent each

The Cluster Ready Services Technology Stack

Cluster Ready Services (CRS) The CRSD manages cluster resources based on the configuration information that is stored in OCR for each resource csrd.bin  
Cluster Synchronization Services (CSS) Manages the cluster configuration by controlling which nodes are members of the cluster and by notifying members when a node joins or leaves the cluster. ocssd.bin, cssdmonitor, cssdagent
Oracle ASM Disk management for Oracle Clusterware and Oracle Database ASM (ora.sm)
Cluster Time Synchronization Service (CTSS) Time management in a cluster for Oracle Clusterware. This was new introduction from 11.2 onwards. The CTSS synchronizes the time on all of the nodes in a cluster to match the time setting on the CTSS master node. When Oracle Clusterware is installed, the Cluster Time Synchronization Service (CTSS) is installed as part of the software package. During installation, the Cluster Verification Utility (CVU) determines if the network time protocol (NTP) is in use on any nodes in the cluster. If Oracle Clusterware finds that NTP is running or that NTP has been configured, then NTP is not affected by the CTSS installation. Instead, CTSS starts in observer mode octssd.bin
Event Management (EVM)   evmd.bin, evmlogger.bin
Grid Naming Service (GNS) Oracle 11.2 onwards, DHCP is supported for both private interconnect and for almost all virtual ip addresses on the public netwotk. For clients outside the cluster to find the virtual hosts in the cluster, Oracle provide a Grid Naming Service (GNS).   This works with any higher-level DNS to provide resolvable names to external clients. gipcd.bin
Oracle Agent (oraagent) Performs start/stop/check/clean actions for ora.asm, ora.eons, ora.LISTENER.lsnr, SCAN listeners, ora.ons. Performs start/stop/check/clean actions for service, database and diskgroup resources Receives CRS state change events and dequeues RLB events and enqueues HA events for OCI and ODP.NET clients oraagent.bin
Oracle Notification Service (ONS)   ons
Oracle Root Agent(orarootagent) Performs start/stop/check/clean actions for GNS, VIP, SCAN VIP and network resources Orarootagent.bin
The Oracle High Availability Services Technology Stack
Oracle High Availability Services OHASD is part of the lower stack anchored by the Oracle High Availability Services daemon (ohasd). The OHASD is the daemon which starts every other daemon that is part of the Oracle Clusterware stack on a node. The entry point for OHASD is /etc/inittab, which executes the /etc/init.d/ohasd and /etc/init.d/init.ohasd control scripts. The /etc/init.d/ohasd script is a RC script including the start and the stop actions. The /etc/init.d/init.ohasd script is the OHASD framework control script which will spawn the Grid_home/bin/ohasd.bin executable ohasd.bin
appagent    
Cluster Logger Service (ologgerd)   ologgerd.bin
Grid Interprocess Communication (GIPC) The CSS layer is using Grid IPC. GIPC will support the use of multiple NICs for a single communications link, e.g. CSS/NM internode communications. gipcd.bin
Grid Plug and Play (GPNPD) GPnPD  was introduced starting from 12.2 onwards. The GPnPD provides access to the GPnP profile, and coordinates updates to the profile among the nodes of the cluster to ensure that all of the nodes have the most recent profile. GPnpP daemon is spawned by OHASD oraagent, and it must be running in order for the CRS stack to start. detects running gpnpd, connects back to oraagent opens wallet/profile opens local/remote endpoints advertises remote endpoint with mdnsd starts OCR availability check discovers remote gpnpds equalizes profile starts to service client gpnpd.bin
Multicast Domain Name Service (mDNS)   mdnsd.bin
Oracle Agent (oraagent) Performs start/stop/check/clean actions for ora.asm, ora.evmd, ora.gipcd, ora.gpnpd, ora.mdnsd oraagent.bin
Oracle Root Agent (orarootagent) Performs start/stop/check/clean actions for ora.crsd, ora.ctssd, ora.diskmon, ora.drivers.acfs Orarootagent.bin
scriptagent    
System Monitor Service (osysmond) Monitors and gathers system metrics periodically Runs as real time process Runs validation rules against the system metrics Marks color coded alerts based on thresholds Sends the data to the master Logger daemon Logs data to local disk in case of failure to send osysmond.bin

Oracle Cluster Registry (OCR)

As of 11.2, OCR can also be stored in ASM. The ASM partnership and status table (PST) is replicated on multiple disks and is extended to store OCR. Consequently, OCR can tolerate the loss of the same number of disks as are in the underlying disk group, and be can relocated / rebalanced in response to disk failures. 

In order to store an OCR on a disk group, the disk group has a ‘special’ file type called ‘ocr’. The default configuration location is /etc/oracle/ocr.loc

# cat /etc/oracle/ocr.loc 
ocrconfig_loc=+DATA 
local_only=FALSE 

From a user and maintenance perspective, the rest remains the same. The OCR can only be configured in ASM when the cluster completely migrated to 11.2 (crsctl query crs activeversion >= 11.2.0.1.0). Oracle still support mixed configurations, so we could have OCR’s stored in ASM and another stored on a supported NAS device, as Oracle support up to 5 OCR locations in 11.2.0.1. Oracle do not support raw or block devices for neither OCR nor voting files anymore.

The OCR diskgroup is auto mounted by the ASM instance during startup. The CRSD and ASM dependency is maintained by OHASD.

Oracle Local Registry (OLR)

The OLR, similar in structure as the OCR, is a node-local repository, and is managed by OHASD. The configuration data in OLR pertains to the local node only, and is not shared among other nodes.

The configuration is stored in ‘/etc/oracle/olr.loc’ (on Linux) or equivalent on other OS. The default location after installing Oracle Clusterware is:

RAC: Grid_home/cdata/<hostname.olr> 
Oracle Restart: Grid_home/cdata/localhost/hostname. 

The information stored in the OLR is needed by OHASD to start or join a cluster; this includes data about GPnP wallets, clusterware configuration and version information.

OLR keys have the same properties as OCR keys and the same tools are used to either check or dump them. 

To see the OLR location, run the command:

# ocrcheck -local –config 
Oracle Local Registry configuration is : 
Device/File Name : Grid_home/cdata/node1.olr

Heartbeat

The CSS uses two main heartbeat mechanisms for cluster membership, the network heartbeat (NHB) and the disk heartbeat (DHB). The NHB is used for the detection of loss of cluster connectivity.  DHB is mainly used for network split brain resolution.

Network Heartbeat (NHB)

The NHB is sent over the private network interface that was configured as private interconnect during Clusterware installation. CSS sends a NHB every second from one node to all the other nodes in a cluster and receive every second a NHB from the remote nodes.  The NHB is also sent to the cssdmonitor and the cssdagent. The NHB contains time stamp information from the local node, and is used by the remote node to figure out when the NHB was sent.

Disk Heartbeat (DHB)

DHB which is required for split brain resolution. It contains a timestamp of the local time in UNIX epoch seconds, as well as a millisecond timer. The DHB beat is a mechanisms to make a decision about whether a node is still alive. When the DHB beat is missing for too long, the node is assumed to be dead.

–to be continued

Leave a reply