RAC Hang Manager – A New Feature

Oracle Hang Manager is to reliably detect, if hang resolution is enabled, resolve hangs in a timely manner. Over various releases, Hang Manager has been enhanced along with the wait event infrastructure on which it relies. However, it is only in 11.2.0.2 that Hang Manager actually resolves any hangs by terminating sessions and/or processes. This is the default operation in 11.2.0.2.

Hang Manager will not terminate an instance unless the resolution scope, which is controlled by the initialization parameter _HANG_RESOLUTION_SCOPE, is set to INSTANCE. By default, this parameter is set to PROCESS.

Hang Manager automatically detects and resolves session hangs clusterwide in RAC. However HM does not detect hangs that are caused by user application enqueues (TM, TX, UL).

How Hang Manager Works

HM runs within the DIA0 process on each instance. Currently, HM operates under 5 phases: DETECT, HA (global Hang Analyze), ANALYZE, VERIFY and VICTIM (victim selection and hang resolution) plus the HAONLY phase that runs periodically but is not directly related to HM. All DIA0 processes run each phase at the same time. The master DIA0 process, which resides on the lowest numbered instance in the cluster, drives the phase transitions.

[TABLE=6]

Hang Manager Trace Files

Hang Manager create number of trace files in the trace directory.

  • DIA0 trace files – This will be in the format of _dia0_.trc. This can be located at bdump location and the amount of information generated to this file depends on whether tracing enabled or note. If tracing is not enabled, some information like ignored hangs, self-resolved hangs and ORA-32701 incident information are output to this race file.
  • Verified Hang DIA0 trace files – _dia0__.trc – This can be located at bdump location and the trace file has a preamble that briefly describes how to read the information in the file. Next, the hang and chain information of one or more verified hangs. It will also include a dump of all of the chains that were active when the hang was first detected during the HA phase. This trace file will not exist if the hang is resolved as soon as it is verified.
  • ORA-32701 incident trace file – _dia0__i.trc – This is the trace file that is created when an ORA-32701 incident occurs. This incident is triggered when a hang will be resolved. This file has more details about the incident.

X$ view related to Hang Manager

There are two X$ tables (X$KJZNHANGS and X$KJZNHANGSES). If the DIA0 process restarts but the instance survives, all of the hang information in these tables is kept. This information can be used for historical purposes.