Table of Contents
Problem statement
Database is recently upgraded to 12.1.0.2 version.
After few days, database started terminating with ORA-04031 error consistently.
Excerpts from database alert log
ORA-04031: unable to allocate 384 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","KGLHD") ORA-04031: unable to allocate 768 bytes of shared memory ("shared pool","unknown object","KKSSP^1960","kglss") ORA-04031: unable to allocate 352 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","krsdicle") ORA-04031: unable to allocate 384 bytes of shared memory ("shared pool","select o.owner#,o.name,o.nam...","sga heap(1,0)","KGLHD") ORA-04031: unable to allocate 1824 bytes of shared memory ("shared pool","update user$ set exptime=DEC...","sga heap(1,0)","KQR X SO") ORA-04031: unable to allocate 2504 bytes of shared memory ("shared pool","WITH MONITOR_DATA AS (SELECT...","SQLA^2cef633c","kafco : qkacol")
Diagnosis
To begin with, we first observed very high shared pool utilization.
Initially shared pool size has been increased from 1GB to 4GB.
After couple of days, again database crashed with similar errors. This time shared pool size has been increased to 7GB. Despite above change done, database crashed.
Further shared pool size has been increased to 9GB.
Increasing shared pool size multiple times did not help, and hence we started investigating AWR reports.
Upon investigation, AWR reports shows heavy Hard parses per second.
Hard Parse Count Per Sec
Load Profile Per Second Per Transaction ------------------- --------------- ---------------- DB Time(s): 4.3 0.7 DB CPU(s): 4.6 0.7 User calls: 636.1 96.8 Parses (SQL): 1,377.7 209.7 Hard parses (SQL): 906.1 137.9 Logons: 28.7 4.4 Executes (SQL): 1,503.7 228.9
select BEGIN_TIME,METRIC_NAME,AVERAGE from dba_hist_sysmetric_summary where METRIC_NAME='Hard Parse Count Per Sec' and trunc(BEGIN_TIME) >=trunc(sysdate-60) and AVERAGE > 600 order by 1; BEGIN_TIME METRIC_NAME AVERAGE -------------------- ----------------------------------- ----------- 26-MAR-2018 11:29:50 Hard Parse Count Per Sec 627.073261 26-MAR-2018 12:29:51 Hard Parse Count Per Sec 601.087453 28-MAR-2018 11:29:52 Hard Parse Count Per Sec 620.686635 02-APR-2018 09:29:54 Hard Parse Count Per Sec 622.436011 02-APR-2018 10:29:54 Hard Parse Count Per Sec 861.500578 02-APR-2018 11:29:53 Hard Parse Count Per Sec 905.789608 02-APR-2018 12:29:54 Hard Parse Count Per Sec 835.835878 02-APR-2018 13:29:54 Hard Parse Count Per Sec 646.823878 02-APR-2018 14:29:54 Hard Parse Count Per Sec 654.385633 02-APR-2018 15:29:53 Hard Parse Count Per Sec 645.756677 03-APR-2018 10:30:14 Hard Parse Count Per Sec 697.683740 03-APR-2018 11:30:15 Hard Parse Count Per Sec 726.630095 03-APR-2018 12:29:14 Hard Parse Count Per Sec 640.659254 04-APR-2018 11:29:15 Hard Parse Count Per Sec 635.936752
Top application queries were identified leading to heavy hard parsing
Sample Queries performing Hard parsing
col sample_sql_Text for a40 with d1 as ( select /*+ MATERIALIZE */ force_matching_signature fms, count(distinct(sql_id)) count, sum(executions) executions from v$sqlarea where force_matching_signature > 0 group by force_matching_signature having count(distinct(sql_id)) >= 5 ) select (Select sql_fulltext from v$sqlarea where force_matching_signature=d1.fms and rownum=1) sample_sql_Text, d1.* from d1 order by count desc; SAMPLE_SQL_TEXT FMS COUNT EXECUTIONS ---------------------------------------- -------------------- -------------------- -------------------- select count(*) as rowcount from TAB 595475347031080823 4937 28552 where b_filename_type='DP2' and run_da te >=trunc(sysdate-3) and b_filename lik e trim('% XXXXXXXXXXXXXXXXXXXXX %') update TABNAME set T4S=229483 where 7308982053163128587 2811 2811 DOC_TYPE='DP3' and STMTDT='31-Mar-20 18' update TABNAME set T3S=154728 where 110354220139214304 2798 2798 DOC_TYPE='DP3' and STMTDT='31-Mar-20 18'
Recommendation was provided to make use of bind variables and Application team agreed to perform this change for such queries.
But distinct queries count was more than 1500 and practically it was impossible to convert all by using BIND variables within Application code.
We eventually decided to set cursor_sharing=FORCE to convert all application queries to make use of system generated BIND variables.
After this change system was running fine for almost 10 days as compared with DB crash for every 2-3 days consecutively.
We assumed issue got resolved but again database crashed after 10 days.
Now hard parse count per second was also within acceptable range.
BEGIN_TIME METRIC_NAME AVERAGE -------------------- ----------------------------------- ----------- 15-apr-2018 07:30:19 Hard Parse Count Per Sec 5.45156795 15-apr-2018 08:30:20 Hard Parse Count Per Sec 8.56563527 15-apr-2018 09:30:20 Hard Parse Count Per Sec 11.0442935 15-apr-2018 10:30:20 Hard Parse Count Per Sec 12.9795929 15-apr-2018 11:29:20 Hard Parse Count Per Sec 14.4070235 15-apr-2018 12:29:20 Hard Parse Count Per Sec 10.8581472 15-apr-2018 13:30:20 Hard Parse Count Per Sec 9.08589363 15-apr-2018 14:30:19 Hard Parse Count Per Sec 8.00785081 15-apr-2018 15:30:19 Hard Parse Count Per Sec 7.63119506 15-apr-2018 16:29:20 Hard Parse Count Per Sec 8.80645671 15-apr-2018 17:29:20 Hard Parse Count Per Sec 8.35716632
We have started analyzing ORA-04031 trace files to verify shared pool usage statistics.
Following details have been identified.
Shared Pool Usage statistics from trace file
TOP 20 MAXIMUM MEMORY -- USES ACROSS SGA HEAP 1 - 4 ------------------------------------------------- "krsdicle " 12 GB 35% "free memory " 10 GB 31% "SQLA " 6629 MB 19% "KGLH0 " 2658 MB 8% "KGLHD " 473 MB 1% "PX msg pool " 269 MB 1% "dbktb: trace buffer " 184 MB 1% "KGLDA " 141 MB 0% "KGLS " 111 MB 0% Memory Utilization of Subpool 1 (SGA HEAP) Allocation Name Size Max Size Chunks ------------------------------------------------------------------------ "free memory " 559510696 1418773464 511 "db_block_hash_buckets " 4194304 4194304 128 "parameter value memory " 37376 37376 2317 "db_files " 808144 808144 1002 "krsdicle " 6296844896 629684489 619189684 "State object subpools " 4608 4608 23 "ksdhng: blkrs cache " 18176 18176 1
Out of all shared pool components, there was one which had utilized entire shared pool memory.
“krsdicle” 12 GB 35%
Another important observation was “alter system” command consistently being observed in database alert log.
Excerpts from Alert log
ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=MEMORY SID='*'; Wed Apr 25 12:34:22 2018 ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=MEMORY SID='*'; Wed Apr 25 12:35:22 2018 ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=MEMORY SID='*'; Wed Apr 25 12:36:22 2018 ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=MEMORY SID='*';
In an ideal scenario, database must not run such command continuously.
This made us to feel some abnormal behaviour at DB end.
So, what led to this ????
The ‘KRSDICLE’ memory chunk is Data Guard specific and related to redo transport.
This issue was investigated in Bug 23722678 : ORA-04031 ON DATAGUARD ENVIRONMENT WITH HIGH ALLOCATION IN ‘KRSDICLE’ , closed as not a bug.
The cause for this behavior is that log_archive_dest_n are configured incorrectly.
In this particular case, log_archive_dest_2 on the primary database had the same service defined as log_archive_dest_2 from the standby instance.
This is a configuration issue.
Error from Standby DB alert log
ORA-16040: Destination archive log file is locked. Wed Apr 25 14:37:39 2018 TT00: Archive log rejected (T-1.S-56423) at host 'SERVICE_DR'
Solution Implemented
On standby database, log_archive_dest_2 was deferred which points to same service used by production database.
ALTER SYSTEM SET log_archive_dest_state_2=’DEFER’;
Reference Document
ORA-04031 On Dataguard Environment With High Allocation In ‘KRSDICLE’ (Doc ID 2233468.1)
Good information. Thanks for detailed info..
Thanks. The information was very effective and helped to resolve the issue.
Very Informative…Thanks for Sharing !!!