SCAN Registration : Database services not registered in SCAN Listeners of Node1

Problem Statement

In Real application Cluster setup, SCAN LISTENER running on 1st node could only register local instances and not the remote instances, however the scan listener running on Node 2 could register instances from both nodes, as expected.

Diagnosis

One fine day, we observed load balancing issues inside the database which compelled us to check if there were connections from application which hit a specific node. Post ruling out any node bound connections, we surmised some hitch at the LISTENER level.
Local listener registration list seemed fine, however, SCAN LISTENER registration list was a source of amazement.
For Scan listeners running on Node1, only instances from Node1 were listed in the registration list, as against on Node2 where all the instances (from both nodes) were visible.

Node1

 lsnrctl stat LISTENER_SCAN1

 LSNRCTL for IBM/AIX RISC System/6000: Version 19.0.0.0.0 - Production on 15-NOV-2021 20:36:34

 Copyright (c) 1991, 2021, Oracle.  All rights reserved.

 Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
 STATUS of the LISTENER
 ------------------------
 Alias                     LISTENER_SCAN1
 Version                   TNSLSNR for IBM/AIX RISC System/6000: Version 19.0.0.0.0 - Production
 Start Date                15-NOV-2021 20:29:58
 Uptime                    0 days 0 hr. 6 min. 35 sec
 Trace Level               off
 Security                  ON: Local OS Authentication
 SNMP                      OFF
 Listener Parameter File   /GRIDHOME/oracle/app/product/grid/19.3.0/network/admin/listener.ora
 Listener Log File         /GRIDHOME/oracle/orabase/diag/tnslsnr/Node1/listener_scan1/alert/log.xml 
 Listening Endpoints Summary...
   (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
   (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.xx.xx.xx)(PORT=1900)))
 Services Summary...
 Service "-MGMTDBXDB" has 1 instance(s).
   Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
 Service "8939c3a49bbd01c2e0530af11a196e44" has 1 instance(s).
   Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
 Service "DEMODB" has 1 instance(s).
   Instance "DEMODB1", status READY, has 40 handler(s) for this service...
 Service "_mgmtdb" has 1 instance(s).
   Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
 Service "b58859d4571d06a0e0530a323030d700" has 1 instance(s).
   Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
 Service "gimr_dscrep_10" has 1 instance(s).
   Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
 The command completed successfully

Node2

 lsnrctl stat LISTENER_SCAN2

 LSNRCTL for IBM/AIX RISC System/6000: Version 19.0.0.0.0 - Production on 15-NOV-2021 20:37:16

 Copyright (c) 1991, 2021, Oracle.  All rights reserved.

 Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN2)))
 STATUS of the LISTENER
 ------------------------
 Alias                     LISTENER_SCAN2
 Version                   TNSLSNR for IBM/AIX RISC System/6000: Version 19.0.0.0.0 - Production
 Start Date                15-NOV-2021 20:36:16
 Uptime                    0 days 0 hr. 1 min. 0 sec
 Trace Level               off
 Security                  ON: Local OS Authentication
 SNMP                      OFF
 Listener Parameter File   /GRIDHOME/oracle/app/product/grid/19.3.0/network/admin/listener.ora
 Listener Log File         /GRIDHOME/oracle/orabase/diag/tnslsnr/Node2/listener_scan2/alert/log.xml 
 Listening Endpoints Summary...
   (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN2)))
   (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.xx.xx.xx)(PORT=1900)))
 Services Summary...
 Service "-MGMTDBXDB" has 1 instance(s).
   Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
 Service "8939c3a49bbd01c2e0530af11a196e44" has 1 instance(s).
   Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
 Service "DEMODB" has 2 instance(s).
   Instance "DEMODB1", status READY, has 40 handler(s) for this service...
   Instance "DEMODB2", status READY, has 40 handler(s) for this service...
 Service "_mgmtdb" has 1 instance(s).
   Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
 Service "b58859d4571d06a0e0530a323030d700" has 1 instance(s).
   Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
 Service "gimr_dscrep_10" has 1 instance(s).
   Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
 The command completed successfully

Preliminary exercises of restarting and relocating the SCAN listeners did not help.
Even Restarting CRS along with a Node reboot didn’t come to our rescue.

To circumvent this, we relocated all scan listeners to Node2 to avoid load balancing issues within the database.

To further resolve the issue, we configured separate TNS entry using all SCAN listeners and set remote listener parameter referring below support document.
SCAN Registration Issue: Services Fail to Register to SCAN Listeners (Doc ID 1448717.1)

 
 REMOTE_LISTENERS_SCAN=
     (DESCRIPTION =
       (ADDRESS = (PROTOCOL = TCP)(HOST = 10.xx.xx.x1)(PORT = 1900)) 
       (ADDRESS = (PROTOCOL = TCP)(HOST = 10.xx.xx.x2)(PORT = 1900))
       (ADDRESS = (PROTOCOL = TCP)(HOST = 10.xx.xx.x3)(PORT = 1900))
     )

 alter system set REMOTE_LISTENER='REMOTE_LISTENERS_SCAN' scope=both sid='*'; 
 alter system register;

Issue still continues to haunt us.

Now…… What next????

In this setup, logging has been disabled proactively for all the listeners including SCAN.
Without Listener logging , it was onerous to understand the nature of errors and therefore isolate the problem.

As per regulatory compliance DR drill has been planned for this database.
This gave us privilege to enable listener logging at problematic cluster setup.

Enabling Scan Listener logging too didn’t help as there were no anomalies recorded.

Upon further studying and probing the issue , we came across the following document to perform additional tracing of the listener registration.
How to Trace Dynamic Registration from PMON or LREG (12c) (Doc ID 787055.1)

What is LREG????

In 12c and higher, there is a new background process called ora_lreg_DEMODB1
In previous releases, PMON always handled instance registration logic. In 12c, LREG (Listener REGistration) now takes over with the instance registration logic.

LREG registers instance information with the listener.
It is a critical background process for each database instance (if it is killed, Oracle goes down).
It does everything PMON used to do and report service_update, service_register, service_died in the listener.log.

How to Enable LREG trace
alter system set events = ‘immediate trace name LREG_STATE level 3’;

How to Disable?
alter system set events = ‘immediate trace name LREG_STATE off’;

We have seen some interesting errors in LREG trace file for SCAN listener running on Node1.

 
  Remote listeners:
  0 - (DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=))(ADDRESS=(PROTOCOL=tcp)(HOST=10.xx.xx.x1)(PORT=1900))) pdb 0 dci 0 flg 0x22 
       firewall=0, publish_service_acl=0
       state=4, err=3
       nse[0]=12535, nse[1]=12579, nte[0]=0, nte[1]=0, nte[2]=0
       ncre=0
       endp=(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.xx.xx.x1)(PORT=1900)))
         flg=0x80000000 nse=0 pri=0x122358628
  1 - (DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=))(ADDRESS=(PROTOCOL=tcp)(HOST=10.xx.xx.x2)(PORT=1900))) pdb 0 dci 0 flg 0x22
       firewall=0, publish_service_acl=0
       state=1, err=0
       nse[0]=0, nse[1]=0, nte[0]=0, nte[1]=0, nte[2]=0
       ncre=0
       endp=(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.xx.xx.x2)(PORT=1900)))
         flg=0x80000000 nse=0 pri=0x122358078
  2 - (DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=))(ADDRESS=(PROTOCOL=tcp)(HOST=10.xx.xx.x3)(PORT=1900))) pdb 0 dci 0 flg 0x22
       firewall=0, publish_service_acl=0
       state=1, err=0
       nse[0]=0, nse[1]=0, nte[0]=0, nte[1]=0, nte[2]=0
       ncre=0
       endp=(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.xx.xx.x3)(PORT=1900)))
         flg=0x80000000 nse=0 pri=0x122357b20

Error Details
 
 oerr tns 12535
 12535, 00000, "TNS:operation timed out"
 // *Cause: The requested operation could not be completed within the time out
 // period.
 // *Action: Look at the documentation on the secondary errors for possible
 // remedy. See SQLNET.LOG to find secondary error if not provided explicitly.
 // Turn on tracing to gather more information.

 oerr tns 12579
 12579, 00000, "TNS:transport connection failed"
 // *Cause: An error was encountered while trying to create transport connection. 
 // *Action: Likely reasons include host may be down, outbound transport
 //  connection may be blocked by network or any other transport connectivity
 //  issue.
 // If error persists, contact Oracle Customer Support.

So, what led to this????

The errors led us to yet another document.
ORA-12514 & Services Fail to Register to SCAN Listeners After HAIP Offline (Doc ID 2500067.1)

Oracle 12C introduces a new parameter: REMOTE_REGISTRATION_ADDRESS_<listener_name> which is configured internally in the SCAN listener to restrict registrations to the private network.
By default, it uses HAIP as remote registration address.

On problematic Node1, it was using HAIP address where as on Node2 it was private IP

Node1
 
 lsnrctl show remote_registration_address listener_scan1

 LSNRCTL for IBM/AIX RISC System/6000: Version 19.0.0.0.0 - Production on 15-NOV-2021 20:22:45

 Copyright (c) 1991, 2021, Oracle.  All rights reserved.

 Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
 LISTENER parameter "remote_registration_address" set to (DESCRIPTION=(ASYNC_TIMER=yes)(EXPIRE_TIME=1)(TRANSPORT_CONNECT_TIMEOUT=15)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=169.254.46.147)(PORT=49925)))) 
 The command completed successfully

Node2
 
 lsnrctl show remote_registration_address listener_scan2

 LSNRCTL for IBM/AIX RISC System/6000: Version 19.0.0.0.0 - Production on 15-NOV-2021 20:23:55

 Copyright (c) 1991, 2021, Oracle.  All rights reserved.

 Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN2)))
 LISTENER parameter "remote_registration_address" set to (DESCRIPTION=(ASYNC_TIMER=yes)(EXPIRE_TIME=1)(TRANSPORT_CONNECT_TIMEOUT=15)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.30)(PORT=52350)))) 
 The command completed successfully

 lsnrctl show remote_registration_address listener_scan3

 LSNRCTL for IBM/AIX RISC System/6000: Version 19.0.0.0.0 - Production on 15-NOV-2021 20:24:03

 Copyright (c) 1991, 2021, Oracle.  All rights reserved.

 Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN3)))
 LISTENER parameter "remote_registration_address" set to (DESCRIPTION=(ASYNC_TIMER=yes)(EXPIRE_TIME=1)(TRANSPORT_CONNECT_TIMEOUT=15)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.30)(PORT=52384))))
 The command completed successfully

When we look at HAIP status, it was in OFFLINE state on Node2 (Good Node but actual evil)

Node1
 
 crsctl status res -t -init

 -------------------------------------------------------------------------------- 
 Name           Target  State        Server                   State details
 --------------------------------------------------------------------------------
 Cluster Resources
 --------------------------------------------------------------------------------
 ora.cluster_interconnect.haip
       1        ONLINE  ONLINE       Node1                    STABLE
 --------------------------------------------------------------------------------

Node2
 
 crsctl status res -t -init

 -------------------------------------------------------------------------------- 
 Name           Target  State        Server                   State details
 --------------------------------------------------------------------------------
 Cluster Resources
 --------------------------------------------------------------------------------
 ora.cluster_interconnect.haip
       1        ONLINE  OFFLINE                               STABLE
 --------------------------------------------------------------------------------

Solution Implemented

Since the HAIP was not functioning on Node2, instance registration request from Node2 could not reach Node1. This is why Node2 instances were not listed in registration list on Node1.
Starting the HAIP on Node2 resolved the issue.
As a long-term and permanent resolution, we also disabled REMOTE_REGISTRATION_ADDRESS in listener.ora file for all SCAN listeners.

 
 REMOTE_REGISTRATION_ADDRESS_LISTENER_SCAN1=OFF 
 REMOTE_REGISTRATION_ADDRESS_LISTENER_SCAN2=OFF
 REMOTE_REGISTRATION_ADDRESS_LISTENER_SCAN3=OFF

SCAN listeners were restarted and services from all instances were visible in all SCAN listeners.

2 thoughts on “SCAN Registration : Database services not registered in SCAN Listeners of Node1”

Leave a Reply

Your email address will not be published. Required fields are marked *