latch free waits on “FOB s.o list latch”

Problem Statement

Database had been migrated to new setup. Post migration, as business load increased “latch free” events started surfacing inside database. Channel timeouts seen at Application layer. As an immediate resolution, the database was moved back to the existing setup. This gave necessary and expected relief with Application timeouts found normalising. We begins investigating RCA for this issue.

Diagnosis

New hardware has been setup for existing application. Database was in SYNC using ADG. On D-Day database has been migrated to new setup by performing ADG switchover at midnight. Database ran well for 3-4 hours.

Suddenly Application team observed channel timeouts and quickly decided to rollback. Database was moved to existing setup and the expected relief came their way. We commenced exploring database AWR/ASH reports to address this HISTORICAL issue.


After switchover to new setup, as day progressed and application load hitting the database we saw a huge spike of “latch free” wait event from 7:40AM. This was not the case earlier. We tried to figure out why in a trice all of these events started cropping up.

Taking a look at “Transactions per second”, we noted a gradual increase as the day progressed till 07:40 AM. At around 1500 TPS, “latch free” events started snowballing.
We extracted AWR report from 7:30 AM to 8:00 AM.

Looking at above data, “latch free” events were contributing 82% of DBTIME.

It was paramount to identify the kind of latch associated with latch free waits while tuning a database for latch waits. In Oracle 10g or later, finding latches which are causing waits is easy because most wait events have been introduced with specific latch waits (e.g., latch: shared pool). However, some latch waits are still rolled up in the old latch free wait event.

In such cases, using “latch free” wait event “P2=Latch number”, we can identify specific latch using below command.

 
 SELECT * FROM v$latchname
 WHERE latch# = 40;

     LATCH# NAME
 ---------- -------------------  
         40 FOB s.o list latch

Similarly we also looked at “Latch Statistics” from AWR report to understand type of latch.

In both the cases, it was “FOB s.o list latch” where “Misses and Sleeps” were exponentially high.

What is “FOB s.o list latch”?

Based on document “How To Handle FOB s.o list latch Contention (Doc ID 2491068.1)”, “FOB s.o list latch” contention is observed if a system has many processes and many DB files and lots of processes are accessing the files concurrently.

“FOB(FileOpenBlock)” is FOB state object type and normally the allocations are done from shared pool. Pool size is based on the number of processes and database files configured for the database.

Current configured values from AWR:

processes=40000
db_files=5000

So, what led to this????

To understand FOB memory allocation, we validated “SGA breakdown difference” from AWR report.

As per above data, name “FileOpenBlock” from shared pool had grown from 3094MB to 5683MB. For every single growth operation, “FOB s.o list latch” is allocated. Frequent GROWTH could result into high waits on this latch.

To avoid excessive dynamic growth, number of FOB entries must be preallocated during instance start time itself. This can be controlled using parameter “_ksfd_fob_pct”. The default value of _ksfd_fob_pct would be 10% if NUMA enabled and 25% for non-NUMA.

But the question in our inquisitive mind was why such issues were never seen in existing production setup. In a quest of the same, we extracted AWR compare report and jumped to “SGA breakdown difference” section again.

What we observed here was fascinating !!!!!

 

As per above data, We could see “FileOpenBlock” was part of “numa pool” and preallocated size was 11130MB without any growth in existing setup. In migrated setup, FOB was part of “shared pool” where dynamic GROWTH has been seen from 3094MB to 5683MB.

Taking a cue from the fact that memory was being allocated under “numa pool”, we checked the value for “_enable_NUMA_support” parameter in both the setups. In migrated setup “_enable_NUMA_support” was set to “FALSE” and value of “_ksfd_fob_pct” was ZERO. When compared with existing setup, “_enable_NUMA_support” was TRUE and “_ksfd_fob_pct” was 40.

Solution Implemented

Both the parameters were set similar to production

“_enable_NUMA_support”=TRUE
“_ksfd_fob_pct”=40

7 thoughts on “latch free waits on “FOB s.o list latch””

  1. Thanks, This is very informative and how one can deep dive the issue with proper channel path . Between can u let us know what is the numa setting at OS level when numa is disabled at db during issue window ??

  2. Very good method to describe the issue & solution. Thanks to help us to gain real world issue knowledge.

Leave a Reply

Your email address will not be published. Required fields are marked *