Table of Contents
Problem Statement
The Core Banking Database has undergone a 19c upgrade. Following the upgrade, the application has experienced severe intermittent timeout issues, primarily attributed to elevated CPU utilization. This contrasts with the previous version 11.2.0.4, where such high CPU utilization was never observed. If this problem was not resolved within a couple of days, they would have needed to rollback to 11g version.
Diagnosis
We initiated our investigation by first validating the operating system statistics to ensure if high CPU utilization is responsible for the application timeouts.
As database instance was running on an AIX server, we retrieved NMON data and created graphs using NMONVisualizer.
The graph above clearly illustrates prominent CPU spikes that could potentially be causing application timeouts.
To validate this point further, we generated AWR compare report between 11g(good time) and 19c(Bad time).
Upon examining AWR compare reports, a notable increase was detected in the time model statistic “connection management call elapsed time”. This particular statistic suggests a potential surge in the frequency of connections and disconnections within the application sessions. When there is a substantial influx of connections, the creation of new connections might experience prolonged wait times. Such delays can be costly and may lead to performance issues specifically related to larger wait time in connection management.
Similar findings were seen from “Top ADDM Findings by Average Active Sessions” as well where “Session Connect and Disconnect” spending 23% of database time.
To assess the elevated connection rate, we gathered both current and historical data pertaining to the metric “Logons Per Sec”. The findings from this analysis turned out to be rather puzzling.
The rate of “logons per second” was significantly higher in the 11g version as well. However, in the 19c version, the database exhibited a notable increase in the time spent on connection management, a scenario not observed in the 11g version. This delay in the allocation of processes led to elevated CPU utilization, consequently leading to application timeouts.
So, what led to this????
The time span between the 11g and 19c versions exceeded 10 years, during which Oracle implemented significant alterations in both functionality and security of login module.
- Addressing security concerns involves focusing on algorithm enhancements and strengthening protocols to mitigate the risk of potential attacks.
- Comply with established norms like FIPS, which may necessitate the adoption of protocols such as TLS 1.2 or the utilization of larger cryptographic key sizes.
- The fulfilment of these requirements often relies heavily on the use of cryptographic libraries. While this heightened security can be advantageous, it may also incur a performance trade-off.
- Entropy in conjunction with cryptography may pose challenges for systems experiencing frequent connection activities, including login and logout processes.
The indicated modifications become apparent when the application’s connection management initiates multiple log-offs and log-ons. No such behavior was observed in cases where connection management is more efficient.
Now…… What next????
Considering all the factors mentioned above, we began exploring various options to promptly address these issues.
- Increase the CPU count: This machine was already a powerhouse, and obtaining extra CPUs was not an option.
- Enhance the management of connections at the application layer: While the application vendor acknowledges the importance of this improvement and has prioritised it, the implementation is slated for the next version, a process that may extend over several months.
- Implementing connection pooling: can greatly enhance the situation.
We made the decision to set up and configure Oracle’s innovative feature, “Database Resident Connection Pooling”.
Solution Implemented
Database Resident Connection Pool (DRCP) provides a connection pool in the database server for typical Web application usage scenarios where the application acquires a database connection, works on it for a relatively short duration, and then releases it. DRCP pools “dedicated” servers. A pooled server is the equivalent of a server foreground process and a database session combined.
DRCP increases Database server scalability and resolves the resource wastage issue that is associated with concurrent dynamic process creation.
The following steps have been undertaken to set up DRCP.
Configure POOL BEGIN dbms_connection_pool.Configure_pool( pool_name => 'SYS_DEFAULT_CONNECTION_POOL', minsize => 2000, maxsize => 2500, incrsize => 50, session_cached_cursors => 800, inactivity_timeout => 300, max_think_time => 3600, max_use_session => 500000, max_lifetime_session => 86400); END; / exec dbms_connection_pool.alter_param(pool_name => 'SYS_DEFAULT_CONNECTION_POOL',param_name => 'NUM_CBROK',param_value => '40');
Verify POOL select * from dba_cpool_info;
Start POOL exec dbms_connection_pool.START_POOL; select * from dba_cpool_info;
Validate POOL Processes eg: L001 is worker process name select count(*), inst_id from gv$process where pname like 'L%' group by inst_id;
Monitor POOL select count(*), inst_id, module, server from gv$session where server='POOLED' group by inst_id , module ,server col service for a15 col machine for a25 col CM for a15 select USERNAME, SERVICE, MACHINE, CONNECTION_MODE, CONNECTION_STATUS, ACTIVE_TIME, IDLE_TIME, WAIT_TIME, THINK_TIME, NUMGETS, NUMHITS from gv$cpool_conn_info; select * from gv$cpool_stats; select * from gv$cpool_cc_stats;
How to STOP POOL exec dbms_connection_pool.STOP_POOL;
Reference MOS document:
Primary Note: Overview of Database Resident Connection Pooling (DRCP) (Doc ID 1501987.1)
After successfully launching DRCP, you should furnish the application team with the following example TNS entry. The application team can then adjust their connection properties using the newly provided TNSENTRY.
service_name= (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = hostname)(PORT = 1900)) ) (CONNECT_DATA = (SERVER = POOLED) (SERVICE_NAME = DBNAME) ) )
Ultimate result
The application team modified connection properties. DRCP has been used for the modules creating maximum concurrent dedicated connection.
The results were captivating.
CPU usage never exceeded 20%
The wait time for connection management has been significantly minimized:
All issues have been successfully resolved, and the application is currently running flawlessly!!!!
Thanks for sharing.
This one gives a clear and crisp understanding of the issue at hand and how the workaround ,as in DRCP, helped overcome it.
Cheers!