Table of Contents
Problem Statement
The Application team reported performance slowness and observed intermittent transaction declines. They confirmed that the slowness was due to a database performance issue.
Diagnosis
We began our investigation by validating database performance.
We noted a significant increase in DBTIME due to extended wait times associated with the “Disk file operations I/O” event.
To better understand this, we extracted historical data for comparison between good and bad period.
Our analysis revealed a sharp rise in both the “Total Time Waited” and the “Average Wait Time.”
Please see the statistics below for the comparison of these good and bad time periods.
Good Time
Total Average
Start Time End Time Time Waited Wait Time
Weekday (hh:mi Day) (hh:mi Day) Event Name Total Waits (milli secs) (milli secs)
--------- ------------- ------------- -------------------------- ------------ ---------------- ------------
THURSDAY 13:30 14:00 Disk file operations I/O 4,908,129 177,207 .04
THURSDAY 14:00 14:30 Disk file operations I/O 3,965,712 143,839 .04
THURSDAY 14:30 15:00 Disk file operations I/O 4,400,710 160,951 .04
THURSDAY 15:00 15:30 Disk file operations I/O 3,998,433 153,973 .04
THURSDAY 15:30 16:00 Disk file operations I/O 4,261,380 170,380 .04
THURSDAY 16:00 16:30 Disk file operations I/O 4,119,251 164,071 .04
Bad Time
Total Average
Start Time End Time Time Waited Wait Time
Weekday (hh:mi Day) (hh:mi Day) Event Name Total Waits (milli secs) (milli secs)
--------- ------------- ------------- -------------------------- ------------ ---------------- ------------
SUNDAY 13:30 14:00 Disk file operations I/O 45,015,244 128,003,736 2.84
SUNDAY 14:00 14:30 Disk file operations I/O 23,649,575 2,005,336,249 84.79
SUNDAY 14:30 15:00 Disk file operations I/O 20,624,498 2,117,508,428 102.67
SUNDAY 15:00 15:30 Disk file operations I/O 22,168,095 1,890,198,052 85.27
SUNDAY 15:30 16:00 Disk file operations I/O 37,948,014 739,353,601 19.48
SUNDAY 16:00 16:30 Disk file operations I/O 33,978,499 1,066,412,249 31.38
The following event details were extracted from AWR reports at the time of the issue.
To identify the FileOperation/filetype causing this wait, we analyzed ASH reports to determine the P1, P2, and P3 values.
Below data was extracted from the ASH report.
All “Disk file operations I/O” waits were on P1 (FileOperation=8) and P3 (filetype=8).
A relevant DOC-ID from MOS indicated the FileOperation for miscellaneous io.
1 file creation 2 file open 3 file resize 4 file deletion 5 file close 6 wait for all aio requests to finish 7 write verification 8 wait for miscellaneous io (ftp, block dump, passwd file) 9 read from snapshot files
The challenge was to pinpoint the exact filetype responsible for the heavy wait time.
To address this, we decided to take a HANGANALYZE trace using the following command.
connect / as sysdba oradebug setmypid oradebug unlimit oradebug hanganalyze 3 --Wait 30 seconds to give time to identify process state changes. oradebug hanganalyze 3 exit
Below was the top chain from HANGANALYZE trace:
------------------------------------------------------------------------------- Chain 1: ------------------------------------------------------------------------------- Oracle session identified by: { instance: 1 (DEMO.DEMO1) os id: 62063978 process id: 3360, oracle@HOSTNAME session id: 6 session serial #: 34839 module name: 0 (MODULE@HOSTNAME (TNS V1-V3)) } is waiting for 'Disk file operations I/O' with wait info: { p1: 'FileOperation'=0x8 p2: 'fileno'=0x0 p3: 'filetype'=0x8 time in wait: 0.172936 sec heur. time in wait: 0.496215 sec timeout after: never wait id: 5058 blocking: 0 sessions current sql_id: 3977853392 current sql: DEMOSQL short stack: ksedsts<-ksdxfstk<-ksdxcb<-sspuser<-__sighandler()<-send<-__syslog_r<-syslog<-IPRA.$szaud <-szaud_mkrec<-IPRA.$audios<-IPRA.$audins<-IPRA.$audfro<-audsel<-IPRA.$auddft_internal<-auddft <-audStatement<-opiexe<-opiefn0<-opiefn<-opiodr<-ttcpip<-opitsk<-opiino<-opiodr<-opidrv<-sou2o <-opimai_real<-ssthrdmain<-main<-__start wait history: * time between current wait and wait #1: 0.000026 sec 1. event: 'Disk file operations I/O' time waited: 0.085858 sec wait id: 5057 p1: 'FileOperation'=0x8 p2: 'fileno'=0x0 p3: 'filetype'=0x8 * time between wait #1 and #2: 0.000021 sec 2. event: 'Disk file operations I/O' time waited: 0.237012 sec wait id: 5056 p1: 'FileOperation'=0x8 p2: 'fileno'=0x0 p3: 'filetype'=0x8 * time between wait #2 and #3: 0.000009 sec 3. event: 'Disk file operations I/O' time waited: 0.000353 sec wait id: 5055 p1: 'FileOperation'=0x8 p2: 'fileno'=0x0 p3: 'filetype'=0x8 }
While reviewing the short stack, we realized these were audit files being recorded in the OS syslog file.
<-__syslog_r<-syslog<-IPRA.$szaud<-szaud_mkrec<-IPRA.$audios<-IPRA.$audins<-IPRA.$audfro<-audsel
We promptly reviewed the audit parameter configuration within the database.
NAME_COL_PLUS_SHOW_PARAM VALUE_COL_PLUS_SHOW_PARAM -------------------------- -------------------------- audit_file_dest /DBHOME/oracle/app/orabase audit_sys_operations TRUE audit_syslog_level LOCAL0.INFO audit_trail OS
The database was set up to log audit files to the operating system messages file using SIEM auditing.
We then proceeded to validate the /var/adm/messages file.
ls -lrt /var/adm/messages -rw-r--r-- 1 root system 42217221888 09:11 messages
The messages file had grown to over 40GB. We promptly requested the sysadmin team to rotate this file.
After this change, although there was a noticeable reduction in the overall number of active sessions, over 100 sessions were still intermittently experiencing waits for the “Disk file operations I/O” event.
Now…… What next????
We aimed to comprehend the types of data being audited within the database.
To achieve this, we examined several database views and reviewed the content of the /var/adm/messages file.
/var/adm/messages HOSTNAME local0:info Oracle Audit[43191340]: LENGTH: "310" SESSIONID:[9] "918096402" ENTRYID:[2] "62" STATEMENT:[3] "395" USERID:[6] "DEMOUSER" USERHOST:[18] "APPHOST1" ACTION:[1] "3" RETURNCODE:[1] "0" OBJ$CREATOR:[6] "APPUSER" OBJ$NAME:[23] "DEMO_TABLE_NAME" OS$USERID:[6] "demo_os" DBID:[10] "1234567890" PRIV$USED:[3] "351" CURRENT_USER:[6] "DEMOUSER" HOSTNAME local0:info Oracle Audit[2362298]: LENGTH: "313" SESSIONID:[9] "918090457" ENTRYID:[4] "6466" STATEMENT:[5] "26402" USERID:[6] "DEMOUSER" USERHOST:[17] "APPHOST2" ACTION:[1] "3" RETURNCODE:[1] "0" OBJ$CREATOR:[6] "APPUSER" OBJ$NAME:[23] "DEMO_TABLE_NAME" OS$USERID:[6] "demo_os" DBID:[10] "1234567890" PRIV$USED:[3] "351" CURRENT_USER:[6] "DEMOUSER"
A few critical application tables were continuously logging audit data to the /var/adm/messages file.
To determine the type of auditing enabled, we examined the object-level audit data from the database.
select count(*) from DBA_AUDIT_OBJECT; COUNT(*) ---------- 0 select count(*) from DBA_OBJ_AUDIT_OPTS; COUNT(*) ---------- 0
The data indicates that no audit operations were enabled for any of the critical objects observed in the /var/adm/messages file.
This discovery has left us perplexed, trying to understand the source and mechanism behind the audit data being logged into the OS files.
We decided to approach this with reverse engineering, and the results we obtained were truly intriguing.
We once more reviewed the audit data being logged into the /var/adm/messages.
/var/adm/messages HOSTNAME local0:info Oracle Audit[43191340]: LENGTH: "310" SESSIONID:[9] "918096402" ENTRYID:[2] "62" STATEMENT:[3] "395" USERID:[6] "DEMOUSER" USERHOST:[18] "APPHOST1" ACTION:[1] "3" RETURNCODE:[1] "0" OBJ$CREATOR:[6] "APPUSER" OBJ$NAME:[23] "DEMO_TABLE_NAME" OS$USERID:[6] "demo_os" DBID:[10] "1234567890" PRIV$USED:[3] "351" CURRENT_USER:[6] "DEMOUSER"
The column ‘PRIV$USED’ indicates the specific privilege for which this audit record was logged.
In our case, the value ‘351’ corresponds to a particular privilege.
We identified the meaning of this privilege number by referencing relevant database dictionary views.
select * From system_privilege_map where PRIVILEGE like '%351%' order by 1; PRIVILEGE NAME PROPERTY ---------- ---------------------------------------- ---------- -351 EXEMPT REDACTION POLICY 0
So, what led to this????
The results above indicate that an AUDIT record has been logged due to the exemption granted to a redaction policy that is currently enabled.
We examined the redaction policy defined using the command below.
col OBJECT_OWNER for a15 col OBJECT_NAME for a30 col POLICY_NAME for a30 col EXPRESSION for a10 col POLICY_DESCRIPTION for a10 select * From REDACTION_POLICIES; OBJECT_OWNER OBJECT_NAME POLICY_NAME EXPRESSION ENA POLICY_DES --------------- ------------------------------ ------------------------------ ---------- --- ---------- DEMOUSER DEMO_TABLE_NAME DEMO_TABLE_FULL_COLUMN 1=1 YES
Database Redaction has been activated for the table APPUSER.DEMO_TABLE_NAME for which audit records being recorded in the messages file.
Additionally, the following data clearly indicates that the system privilege ‘EXEMPT REDACTION POLICY’ has been granted to a critical application user to access redacted data.
select * From DBA_SYS_PRIVS where PRIVILEGE like '%EXEMPT%'; GRANTEE PRIVILEGE ADM COM INH ----------- ------------------------- --- --- --- DEMOUSER EXEMPT REDACTION POLICY NO NO NO
Furthermore, we have verified another dictionary view (DBA_PRIV_AUDIT_OPTS) to ascertain whether this ‘EXEMPT REDACTION POLICY’ privilege is being audited.
This exemption was explicitly under audit.
Solution Implemented
To address the issue immediately, we opted to temporarily disable auditing for this policy:
NOAUDIT EXEMPT REDACTION POLICY;
For a sustainable solution, we must ensure that SYSLOG is directed to remote servers via SIEM to minimize local storage usage. If local write is necessary, implement log rotation to manage SYSLOG log sizes effectively.
Moreover, it’s crucial to minimize or control the SELECTION of REDACTED columns to maintain the integrity of data redaction and reduce unnecessary audit log entries.
Furthermore all issues have been successfully resolved, and the application is currently running flawlessly!!!!
If it come via ora_secure_config …will this help?
It may need specific scenario analysis.
Nice documentation.
Very nicely explained. Yes, the issue was quite mysterious.
Thank you Dinesh for sharing such great information.
Thanks for sharing