Table of Contents
Problem Statement
Application team informed us about the regular EOD jobs not progressing as expected.
Diagnosis
Application team was executing EOD operations. Basis their observation, the job progress had stagnated. We were roped in to troubleshoot this issue and figure out what engendered this issue.
When we initially looked at database, EOD sessions were waiting for “library cache pin”.
Below excerpts from session waits describe the same.
SID EVENT P1 P1RAW STATE SQL_ID BLOCKING_SESSION ------ ------------------------- ------------------------ ---------------- -------------------- ------------- ---------------- 3847 Disk file operations I/O 8 0000000000000008 WAITED SHORT TIME c6quqbusxxxxx ------------------------- 1 5311 db file sequential read 2373 0000000000000945 WAITING 5nv8stdvxxxxx 5958 3451 0000000000000D7B WAITING 1dckrmkgxxxxx 8812 1757 00000000000006DD WAITED SHORT TIME a4a4r7cpxxxxx 2660 3771 0000000000000EBB WAITING 5m3t39ppxxxxx 6761 3215 0000000000000C8F WAITED SHORT TIME 2t5nfwhqxxxxx ------------------------- 5 987 library cache pin 8070450556365212160 700000059D808A00 WAITING 2by0bj2fxxxxx 3245 12682136575630453440 B0000005CF712AC0 WAITING g0bggfqrxxxxx 4270 12682136575630453440 B0000005CF712AC0 WAITING 85gnympxxxxxx 7238 ------------------------- 3
As evident from above output, we could see that blocking session was “7238”
Application team confirmed that it was their queueing query and can be killed.
Sample Query
SQL_ID SQL_TEXT PLAN_HASH_VALUE EXECUTIONS AVG_TIME_SEC AVG_LIO AVG_ROW ------------- ---------------------------------------- --------------- ---------- ------------ ---------- ---------- 85gnympxxxxxx Begin :fResult := XXXXXX.XXXXXXXXXXXXXXX 0 1 23205.3102 727922023 2 XX.XXXXXXXXXXXXXXXXXX( 1234,1,'q_xxxxxxx xxxxx'); End;
Post killing the session, the expected relief did not come our way and we were baffled to see another HOLDER. This seemed to be a vicious cycle of killing such a session and getting trapped with another HOLDER session.
Later, Application team confirmed that it was an application job which shall keep respawning upon killing.
At this stage we were not aware as to why this query was blocking and holding EOD process over “library cache pin”.
Witnessing a hang like situation, We decided to take HANGANALZE dump to diagnose it further.
oradebug setmypid
oradebug unlimit
oradebug hanganalyze 4
Hang Analysis in /oradumps/dbname/diag/diag/rdbms/dbname/INSTNAME/trace/INSTNAME_ora_29001.trc
Below Chains most likely to have caused the hang:
[a] Chain 1 Signature: 'wait for unread message on broadcast channel'<='library cache pin' Chain 1 Signature Hash: 0xcd5dcbea [b] Chain 2 Signature: 'control file sequential read' Chain 2 Signature Hash: 0x258411b1 [c] Chain 3 Signature: Chain 3 Signature Hash: 0x673a0128
We further looked upon more details over Chain 1.
------------------------------------------------------------------------------- Chain 1: ------------------------------------------------------------------------------- Oracle session identified by: { instance: 1 (dbname.instname) os id: 23105 process id: 264, oracle@hostname session id: 5188 session serial #: 46497 } is waiting for 'library cache pin' with wait info: { p1: 'handle address'=0xb0000005cf712ac0 p2: 'pin address'=0x40000004541b9180 p3: '100*mode+namespace'=0x46eb00020003 time in wait: 10 min 40 sec timeout after: 4 min 19 sec wait id: 21 blocking: 0 sessions current sql: BEGIN dbms_lock.sleep(60); END; short stack: ksedsts()+528<-ksdxcb()+960<-sspuser()+216<-__sighndlr()+12<-call_user_handler()+1028<-sigacthandler()+224 <-_portfs()+8<-port_get()+28<-sskgpwwait()+1076<-skgpwwait()+180<-ksliwat()+1984<-kslwaitctx()+148<-kjusuc()+9160 <-ksipgetctxia()+9396<-ksixpgetia()+180<-kqlmPin()+2996<-kglpnal()+5016<-kglpin()+1272<-kqlvld()+1308<-kglgob()+2812 <-kgiinb()+1172<-pfri7_inst_body_common()+248<-pfri3_inst_body()+28<-pfrrun()+2520<-plsql_run()+560<-peicnt()+256 <-kkxexe()+744<-opiexe()+22648<-kpoal8()+2384<-opiodr()+1100<-ttcpip() wait history: * time between current wait and wait #1: 0.000110 sec 1. event: 'library cache revalidation' time waited: 1.000049 sec wait id: 20 p1: ' '=0x0 p2: ' '=0x0 p3: ' '=0x0 * time between wait #1 and #2: 0.000056 sec 2. event: 'library cache pin' time waited: 15 min 0 sec wait id: 19 p1: 'handle address'=0xb0000005cf712ac0 p2: 'pin address'=0x30000004541b9300 p3: '100*mode+namespace'=0x46eb00020002 * time between wait #2 and #3: 0.000062 sec 3. event: 'library cache revalidation' time waited: 1.000032 sec wait id: 18 p1: ' '=0x0 p2: ' '=0x0 p3: ' '=0x0 } and is blocked by => Oracle session identified by: { instance: 1 (dbname.instname) os id: 21431 process id: 1478, oracle@hostname (J007) session id: 7238 session serial #: 40343 } which is waiting for 'wait for unread message on broadcast channel' with wait info: { p1: 'channel context'=0xb12fb4d08 p2: 'channel handle'=0xb335bbce0 time in wait: 0.484074 sec heur. time in wait: 6.701969 sec timeout after: 9.515926 sec wait id: 5792240 blocking: 1 session current sql: Begin :fResult := XXXXXX.XXX_XXXXXXXXXXXXX.XXXXXXXXXXXXXXXXXX( 1234,1,'q_xxxxxxxxxxxx'); End; short stack: ksedsts()+528<-ksdxcb()+960<-sspuser()+216<-__sighndlr()+12<-call_user_handler()+1028<-sigacthandler()+224 <-_portfs()+8<-port_get()+28<-sskgpwwait()+1076<-skgpwwait()+180<-ksliwat()+1984<-kslwaitctx()+148<-ksrbwait()+464 <-kwqidlisten()+416<-kwqilintl()+340<-kwqilisten()+388<-spefcmpa()+768<-spefmccallstd()+180<-peftrusted()+128<-psdexsp()+244 <-rpiswu2()+688<-kxe_push_env_internal_pp_()+300<-kkx_push_env_for_ICD_for_new_session()+64<-psdextp()+316 <-pefccal()+476<-pefcal()+180<-pevm_FCAL()+144<-pfrinstr_FCAL()+156<- wait history: * time between current wait and wait #1: 0.000032 sec 1. event: 'wait for unread message on broadcast channel' time waited: 0.000078 sec wait id: 5792239 p1: 'channel context'=0xb12fb4d08 p2: 'channel handle'=0xb335bbce0 * time between wait #1 and #2: 0.000075 sec 2. event: 'wait for unread message on broadcast channel' time waited: 0.117271 sec wait id: 5792238 p1: 'channel context'=0xb12fb4d08 p2: 'channel handle'=0xb335bbce0 * time between wait #2 and #3: 0.000026 sec 3. event: 'wait for unread message on broadcast channel' time waited: 0.000755 sec wait id: 5792237 p1: 'channel context'=0xb12fb4d08 p2: 'channel handle'=0xb335bbce0 } Chain 1 Signature: 'wait for unread message on broadcast channel'<='library cache pin' Chain 1 Signature Hash: 0xcd5dcbea
We could see some interesting facts as highlighted above.
Waiting session(5188) was executing “BEGIN dbms_lock.sleep(60); END;” waiting for “library cache pin” and “library cache revalidation”
Blocking session(7238) was application job which is waiting for “wait for unread message on broadcast channel”
This information gave us an inkling of what might be happening in the background.
We looked at some other facts by executing few queries to understand this locking pattern.
Sessions waiting for a Library Cache Pin
select sid Waiter,
substr(rawtohex(p1),1,30) Handle,
substr(rawtohex(p2),1,30) Pin_addr
from v$session_wait
where wait_time=0
and event like 'library cache pin%';
WAITER HANDLE PIN_ADDR
---------- ------------------------------ -----------------------
4270 CA0D4516253A391F2E2329 CA031F3B2C03412F14161D
5188 CA0D4516253A391F2E2329 CA053E1157044713236435
8270 CA0D4516253A391F2E2329 CA0E54333B085D273A4561
Objects involved to Library Cache Pin we are waiting for
select to_char(SESSION_ID,'999') sid ,
substr(LOCK_TYPE,1,30) Type,
substr(lock_id1,1,23) Object_Name,
substr(mode_held,1,4) HELD, substr(mode_requested,1,4) REQ,
lock_id2 Lock_addr
from dba_lock_internal
where mode_requested<>'None'
and mode_requested<>mode_held
and session_id in ( select sid
from v$session_wait
where wait_time=0
and event like 'library cache pin%');
SID TYPE OBJECT_NAME HELD REQ LOCK_ADDR
---- ------------------------------ ------------------- ---- ---- -----------------
4270 Body Definition Pin SYS.DBMS_LOCK None Shar B0000005CF712AC0
5188 Body Definition Pin SYS.DBMS_LOCK None Shar B0000005CF712AC0
8270 Body Definition Pin SYS.DBMS_LOCK None Excl B0000005CF712AC0
Who are the holders? select sid Holder ,KGLPNUSE Sessaddr , KGLPNMOD Held, KGLPNREQ Req from x$kglpn , v$session where KGLPNHDL in (select p1raw from v$session_wait where wait_time=0 and event like 'library cache pin%') and KGLPNMOD <> 0 and v$session.saddr=x$kglpn.kglpnuse; HOLDER SESSADDR HELD REQ ---------- ---------------- ---------- ---------- 2166 0000000B42286EE8 2 0 3028 0000000B623B7CB0 2 0 3837 0000000B724AC408 2 0 4594 0000000B125E74C8 2 0 5298 0000000B126D30A8 2 0 7238 0000000B62942F60 2 0 Holder Wait event select sid,substr(event,1,30),wait_time from v$session_wait where sid in (select sid from x$kglpn , v$session where KGLPNHDL in (select p1raw from v$session_wait where wait_time=0 and event like 'library cache pin%') and KGLPNMOD <> 0 and v$session.saddr=x$kglpn.kglpnuse );
We could again see some similar facts being highlighted above.
So, what led to this????
All waiting sessions were executing “SYS.DBMS_LOCK” where TYPE was “Body Definition Pin”
We co-related the analysis from the HANGANALYZE and the output of the above queries. Putting this two together, we were sure that our previous inkling was correct.
Something seemed to have gone wrong with the package “DBMS_LOCK” definition which can be inferred by looking at the wait events “library cache revalidation” and TYPE “Body Definition Pin”
We verified the object status from DBA_OBJECTS
select owner, object_name, object_type, status , last_ddl_time from dba_objects where object_name='DBMS_LOCK'; OWNER OBJECT_NAME OBJECT_TYPE STATUS LAST_DDL_TIME --------------- --------------- ----------------------- ------- -------------------- SYS DBMS_LOCK PACKAGE VALID 13-jul-2022 18:18:14 SYS DBMS_LOCK PACKAGE BODY INVALID 22-jun-2021 01:28:07 PUBLIC DBMS_LOCK SYNONYM VALID 28-sep-2018 06:58:49
As highlighted, object was compiled on same evening leaving “DBMS_LOCK” package body INVALID.
This check assured us that since package body was INVALID, all the sessions were waiting on “library cache pin”.
Solution Implemented
We tried compiling the package manually.
alter package SYS.DBMS_LOCK compile body;
Our marathon did not end here. Yet another dilemma was waiting for us!
Even our session from which we initiated the COMPILE was waiting for “library cache lock” event.
We kept our command running and identified blocking sessions.
select sid,substr(event,1,30),wait_time from v$session_wait where sid in (select sid from x$kglpn , v$session where KGLPNHDL in (select p1raw from v$session_wait where wait_time=0 and event like 'library cache pin%') and KGLPNMOD <> 0 and v$session.saddr=x$kglpn.kglpnuse ); HOLDER SESION HELD REQ ---------- ---------------- ---------- ---------- 2166 0000000B42286EE8 2 0 3028 0000000B623B7CB0 2 0 3837 0000000B724AC408 2 0 4594 0000000B125E74C8 2 0 5298 0000000B126D30A8 2 0 7238 0000000B62942F60 2 0
It was a similar job holding our operation as they were calling DBMS_LOCK package inside their package body. Application team insisted us to restart the database but we preferred to perform killing rather than entire DB restart.
We killed all blocking sessions (6 holders) and finally our COMPILE command completed successfully.
alter session set nls_date_format='dd-mon-yyyy hh24:mi:ss' select owner, object_name, object_type, status , last_ddl_time from dba_objects where object_name='DBMS_LOCK'; OWNER OBJECT_NAME OBJECT_TYPE STATUS LAST_DDL_TIME --------------- ----------------------------------- ----------------------- ------- -------------------- SYS DBMS_LOCK PACKAGE VALID 13-jul-2022 18:18:14 SYS DBMS_LOCK PACKAGE BODY VALID 13-jul-2022 23:05:38 PUBLIC DBMS_LOCK SYNONYM VALID 28-sep-2018 06:58:49
Application team initiated EOD again and it progressed as expected and came to a successful completion.
Well!!! Well!!! Well!!!
Wait ???????
It’s not the end yet!!!
What exactly triggered all this?
We started investigating why DBMS_LOCK package body was invalidated.
We looked at database alert log for similar time frame of LAST_DDL_TIME. Below messages were seen.
2022-07-13T18:20:55.173026+05:30 Errors in file /oradumps/dbname/diag/diag/rdbms/dbname/INSTNAME/trace/INSTNAME_j071_3462.trc: ORA-00604: error occurred at recursive SQL level 2 ORA-04061: existing state of has been invalidated ORA-04061: existing state of package body "SYS.DBMS_LOCK" has been invalidated ORA-04065: not executed, altered or dropped package body "SYS.DBMS_LOCK" ORA-06508: PL/SQL: could not find program unit being called: "SYS.DBMS_LOCK" ORA-06512: at "SYS.DBMS_PRVTAQIP", line 3025 ORA-06512: at line 1 ORA-04061: existing state of has been invalidated ORA-04061: existing state of package body "SYS.DBMS_LOCK" has been invalidated ORA-04065: not executed, altered or dropped package body "SYS.DBMS_LOCK" ORA-06508: PL/SQL: could not find program unit being called: "SYS.DBMS_LOCK" ORA-06512: at "SYS.DBMS_PRVTAQIP", line 3025 ORA-06512: at line 1
We looked at all objects compiled during that day.
select owner, object_name, object_type, status , last_ddl_time from dba_objects where trunc(last_ddl_time)=trunc(sysdate) order by last_ddl_time; OWNER OBJECT_NAME OBJECT_TYPE STATUS LAST_DDL_TIME -------------------- ----------------------------------- ----------------------- ------- -------------------- SYS DBA_REGISTRY_SQLPATCH VIEW VALID 13-jul-2022 18:16:52 SYS DBMS_SQLPATCH PACKAGE VALID 13-jul-2022 18:16:52 SYS DBMS_QOPATCH PACKAGE VALID 13-jul-2022 18:16:52 SYS REGISTRY$HISTORY TABLE VALID 13-jul-2022 18:16:52 SYS DBMS_REGISTRY PACKAGE VALID 13-jul-2022 18:16:52 . . . . SYS DBMS_RCVMAN PACKAGE VALID 13-jul-2022 18:18:11 SYS DBMS_SERVER_ALERT PACKAGE VALID 13-jul-2022 18:18:11 SYS DBMS_LOCK PACKAGE VALID 13-jul-2022 18:18:14
Why “DBA_REGISTRY_SQLPATCH” also got compiled?
Taking a cue from the above VIEW NAME, our line of investigation led us to know that “datapatch -verbose” had been executed on this setup which got failed because of similar locking issue and left the DBMS_LOCK package body uncompiled and in INVALID state.
Superb Analysis Dinesh.
If possible can you share the query having output in Diagnosis section.
Hi Dipak,
Many of the essential queries are already included in the content.
The following query can be utilized to monitor session wait events.
col sid format 99999
col event format a30
col state format a25
col p1 format 99999999999999
set pages 1500
set lines 150
break on event
compute count of event on event
select sid,event,p1,p1raw, state, sql_id, final_blocking_session blocking_session
from v$session
where wait_class<>‘Idle’
order by event;
Please inform me if you require a particular content SQL.
Thank you Dinesh. Your troubleshooting skills are very good. Continue to share the learnings in community.
Thanks for sharing!!!
Finding root cause of DBMS_LOCK package getting invalid was superb.
Nicely explained. Thanks for sharing
Is it in cdb environment?
No. It’s NON-CDB.
Well and detailed explanations on analysis done … Thanks for sharing ..!!
Thanks for sharing
As always, very nicely explained with relevant details
Excellent Article! Thanks for sharing.