Disk file operations I/O

Problem Statement

The Application team reported performance slowness and observed intermittent transaction declines. They confirmed that the slowness was due to a database performance issue.

Diagnosis

We began our investigation by validating database performance.

We noted a significant increase in DBTIME due to extended wait times associated with the “Disk file operations I/O” event.
To better understand this, we extracted historical data for comparison between good and bad period.
Our analysis revealed a sharp rise in both the “Total Time Waited” and the “Average Wait Time.”

Please see the statistics below for the comparison of these good and bad time periods.


 Good Time
                                                                                          Total      Average
           Start Time    End Time                                                   Time Waited    Wait Time
 Weekday   (hh:mi Day)   (hh:mi Day)   Event Name                  Total Waits     (milli secs) (milli secs)
 --------- ------------- ------------- -------------------------- ------------ ---------------- ------------
 THURSDAY  13:30 	 14:00         Disk file operations I/O      4,908,129          177,207          .04
 THURSDAY  14:00 	 14:30         Disk file operations I/O      3,965,712          143,839          .04
 THURSDAY  14:30 	 15:00         Disk file operations I/O      4,400,710          160,951          .04 
 THURSDAY  15:00 	 15:30         Disk file operations I/O      3,998,433          153,973          .04
 THURSDAY  15:30 	 16:00         Disk file operations I/O      4,261,380          170,380          .04
 THURSDAY  16:00 	 16:30         Disk file operations I/O      4,119,251          164,071          .04 


 Bad Time
                                                                                          Total      Average
           Start Time    End Time                                                   Time Waited    Wait Time
 Weekday   (hh:mi Day)   (hh:mi Day)   Event Name                  Total Waits     (milli secs) (milli secs)
 --------- ------------- ------------- -------------------------- ------------ ---------------- ------------
 SUNDAY    13:30         14:00         Disk file operations I/O     45,015,244      128,003,736         2.84
 SUNDAY    14:00         14:30         Disk file operations I/O     23,649,575    2,005,336,249        84.79
 SUNDAY    14:30         15:00         Disk file operations I/O     20,624,498    2,117,508,428       102.67 
 SUNDAY    15:00         15:30         Disk file operations I/O     22,168,095    1,890,198,052        85.27
 SUNDAY    15:30         16:00         Disk file operations I/O     37,948,014      739,353,601        19.48
 SUNDAY    16:00         16:30         Disk file operations I/O     33,978,499    1,066,412,249        31.38 

The following event details were extracted from AWR reports at the time of the issue.

To identify the FileOperation/filetype causing this wait, we analyzed ASH reports to determine the P1, P2, and P3 values.
Below data was extracted from the ASH report.

All “Disk file operations I/O” waits were on P1 (FileOperation=8) and P3 (filetype=8).
A relevant DOC-ID from MOS indicated the FileOperation for miscellaneous io.

 
 1  file creation
 2  file open
 3  file resize
 4  file deletion
 5  file close
 6  wait for all aio requests to finish
 7  write verification
 8  wait for miscellaneous io (ftp, block dump, passwd file) 
 9  read from snapshot files 

The challenge was to pinpoint the exact filetype responsible for the heavy wait time.
To address this, we decided to take a HANGANALYZE trace using the following command.

 
 connect / as sysdba
 oradebug setmypid
 oradebug unlimit
 oradebug hanganalyze 3
 --Wait 30 seconds to give time to identify process state changes. 
 oradebug hanganalyze 3
 exit 

Below was the top chain from HANGANALYZE trace:

 
 -------------------------------------------------------------------------------
 Chain 1:
 -------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (DEMO.DEMO1)
                   os id: 62063978
              process id: 3360, oracle@HOSTNAME
              session id: 6
        session serial #: 34839
             module name: 0 (MODULE@HOSTNAME (TNS V1-V3))
    }
    is waiting for 'Disk file operations I/O' with wait info:
    {
                      p1: 'FileOperation'=0x8
                      p2: 'fileno'=0x0
                      p3: 'filetype'=0x8
            time in wait: 0.172936 sec
      heur. time in wait: 0.496215 sec
           timeout after: never
                 wait id: 5058
                blocking: 0 sessions
          current sql_id: 3977853392
             current sql: DEMOSQL
             short stack: ksedsts<-ksdxfstk<-ksdxcb<-sspuser<-__sighandler()<-send<-__syslog_r<-syslog<-IPRA.$szaud
			  <-szaud_mkrec<-IPRA.$audios<-IPRA.$audins<-IPRA.$audfro<-audsel<-IPRA.$auddft_internal<-auddft 
			  <-audStatement<-opiexe<-opiefn0<-opiefn<-opiodr<-ttcpip<-opitsk<-opiino<-opiodr<-opidrv<-sou2o
			  <-opimai_real<-ssthrdmain<-main<-__start
            wait history:
              * time between current wait and wait #1: 0.000026 sec
              1.       event: 'Disk file operations I/O'
                 time waited: 0.085858 sec
                     wait id: 5057             p1: 'FileOperation'=0x8
                                               p2: 'fileno'=0x0
                                               p3: 'filetype'=0x8
              * time between wait #1 and #2: 0.000021 sec
              2.       event: 'Disk file operations I/O'
                 time waited: 0.237012 sec
                     wait id: 5056             p1: 'FileOperation'=0x8
                                               p2: 'fileno'=0x0
                                               p3: 'filetype'=0x8
              * time between wait #2 and #3: 0.000009 sec
              3.       event: 'Disk file operations I/O'
                 time waited: 0.000353 sec
                     wait id: 5055             p1: 'FileOperation'=0x8
                                               p2: 'fileno'=0x0
                                               p3: 'filetype'=0x8
    } 

While reviewing the short stack, we realized these were audit files being recorded in the OS syslog file.

<-__syslog_r<-syslog<-IPRA.$szaud<-szaud_mkrec<-IPRA.$audios<-IPRA.$audins<-IPRA.$audfro<-audsel

We promptly reviewed the audit parameter configuration within the database.

 
 NAME_COL_PLUS_SHOW_PARAM   VALUE_COL_PLUS_SHOW_PARAM
 -------------------------- --------------------------  
 audit_file_dest            /DBHOME/oracle/app/orabase
 audit_sys_operations       TRUE
 audit_syslog_level         LOCAL0.INFO
 audit_trail                OS 

The database was set up to log audit files to the operating system messages file using SIEM auditing.
We then proceeded to validate the /var/adm/messages file.

 
 ls -lrt /var/adm/messages
 -rw-r--r--    1 root     system   42217221888 09:11 messages 

The messages file had grown to over 40GB. We promptly requested the sysadmin team to rotate this file.

After this change, although there was a noticeable reduction in the overall number of active sessions, over 100 sessions were still intermittently experiencing waits for the “Disk file operations I/O” event.

Now…… What next????

We aimed to comprehend the types of data being audited within the database.
To achieve this, we examined several database views and reviewed the content of the /var/adm/messages file.

 
 /var/adm/messages
 HOSTNAME local0:info Oracle Audit[43191340]: 
	LENGTH: "310" 
	SESSIONID:[9] "918096402" 
	ENTRYID:[2] "62" 
	STATEMENT:[3] "395" 
	USERID:[6] "DEMOUSER" 
	USERHOST:[18] "APPHOST1" 
	ACTION:[1] "3" 
	RETURNCODE:[1] "0" 
	OBJ$CREATOR:[6] "APPUSER" 
	OBJ$NAME:[23] "DEMO_TABLE_NAME" 
	OS$USERID:[6] "demo_os" 
	DBID:[10] "1234567890" 
	PRIV$USED:[3] "351" 
	CURRENT_USER:[6] "DEMOUSER"

 HOSTNAME local0:info Oracle Audit[2362298]: 
	LENGTH: "313" 
	SESSIONID:[9] "918090457" 
	ENTRYID:[4] "6466" 
	STATEMENT:[5] "26402" 
	USERID:[6] "DEMOUSER" 
	USERHOST:[17] "APPHOST2" 
	ACTION:[1] "3" 
	RETURNCODE:[1] "0" 
	OBJ$CREATOR:[6] "APPUSER" 
	OBJ$NAME:[23] "DEMO_TABLE_NAME" 
	OS$USERID:[6] "demo_os" 
	DBID:[10] "1234567890" 
	PRIV$USED:[3] "351" 
	CURRENT_USER:[6] "DEMOUSER" 

A few critical application tables were continuously logging audit data to the /var/adm/messages file.
To determine the type of auditing enabled, we examined the object-level audit data from the database.

 
 select count(*) 
 from DBA_AUDIT_OBJECT;

   COUNT(*)
 ----------
          0

 select count(*) 
 from DBA_OBJ_AUDIT_OPTS; 

   COUNT(*)
 ----------
          0

The data indicates that no audit operations were enabled for any of the critical objects observed in the /var/adm/messages file.
This discovery has left us perplexed, trying to understand the source and mechanism behind the audit data being logged into the OS files.
We decided to approach this with reverse engineering, and the results we obtained were truly intriguing.

We once more reviewed the audit data being logged into the /var/adm/messages.

 
 /var/adm/messages
 HOSTNAME local0:info Oracle Audit[43191340]: 
	LENGTH: "310" 
	SESSIONID:[9] "918096402" 
	ENTRYID:[2] "62" 
	STATEMENT:[3] "395" 
	USERID:[6] "DEMOUSER" 
	USERHOST:[18] "APPHOST1" 
	ACTION:[1] "3" 
	RETURNCODE:[1] "0" 
	OBJ$CREATOR:[6] "APPUSER" 
	OBJ$NAME:[23] "DEMO_TABLE_NAME" 
	OS$USERID:[6] "demo_os" 
	DBID:[10] "1234567890" 
	PRIV$USED:[3] "351" 
	CURRENT_USER:[6] "DEMOUSER"

The column ‘PRIV$USED’ indicates the specific privilege for which this audit record was logged.
In our case, the value ‘351’ corresponds to a particular privilege.
We identified the meaning of this privilege number by referencing relevant database dictionary views.

 
 select * 
 From system_privilege_map 
 where PRIVILEGE like '%351%'
 order by 1;

  PRIVILEGE NAME                                       PROPERTY
 ---------- ---------------------------------------- ---------- 
       -351 EXEMPT REDACTION POLICY                           0

So, what led to this????

The results above indicate that an AUDIT record has been logged due to the exemption granted to a redaction policy that is currently enabled.
We examined the redaction policy defined using the command below.

 
 col OBJECT_OWNER for a15
 col OBJECT_NAME for a30
 col POLICY_NAME for a30
 col EXPRESSION for a10
 col POLICY_DESCRIPTION for a10

 select * 
 From REDACTION_POLICIES;

 OBJECT_OWNER    OBJECT_NAME                    POLICY_NAME                    EXPRESSION ENA POLICY_DES
 --------------- ------------------------------ ------------------------------ ---------- --- ---------- 
 DEMOUSER        DEMO_TABLE_NAME                DEMO_TABLE_FULL_COLUMN         1=1        YES

Database Redaction has been activated for the table APPUSER.DEMO_TABLE_NAME for which audit records being recorded in the messages file.
Additionally, the following data clearly indicates that the system privilege ‘EXEMPT REDACTION POLICY’ has been granted to a critical application user to access redacted data.

 
 select * 
 From DBA_SYS_PRIVS 
 where PRIVILEGE like '%EXEMPT%';

 GRANTEE     PRIVILEGE                 ADM COM INH
 ----------- ------------------------- --- --- --- 
 DEMOUSER    EXEMPT REDACTION POLICY   NO  NO  NO

Furthermore, we have verified another dictionary view (DBA_PRIV_AUDIT_OPTS) to ascertain whether this ‘EXEMPT REDACTION POLICY’ privilege is being audited.
This exemption was explicitly under audit.

Solution Implemented

To address the issue immediately, we opted to temporarily disable auditing for this policy:

NOAUDIT EXEMPT REDACTION POLICY;

For a sustainable solution, we must ensure that SYSLOG is directed to remote servers via SIEM to minimize local storage usage. If local write is necessary, implement log rotation to manage SYSLOG log sizes effectively.
Moreover, it’s crucial to minimize or control the SELECTION of REDACTED columns to maintain the integrity of data redaction and reduce unnecessary audit log entries.

Furthermore all issues have been successfully resolved, and the application is currently running flawlessly!!!!

alliswell

5 thoughts on “Disk file operations I/O”

  1. Very nicely explained. Yes, the issue was quite mysterious.
    Thank you Dinesh for sharing such great information.

Leave a Reply

Your email address will not be published. Required fields are marked *