Let me describe how I discovered that the optimizer_adaptive_features may have a huge negative impact (specially on RAC databases).
I upgraded a 11gR2 database to 12.1.0.2 and then I converted this database to a PDB. To do so I launched “@$ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql” (See MOS Doc ID 1564657.1 for more details).
This sql script took about 75 minutes to complete on my 2 nodes RAC database. This is quite long and then I decided to try to reduce this duration.
To diagnose: I dropped the PDB, plugged it back, re-launched noncdb_to_pdb.sql and enabled a 10046 trace on the session.
The top SQL (sort by elapsed time) for this session is the following:
SQL ID: frjd8zfy2jfdq Plan Hash: 510421217
SELECT executions, end_of_fetch_count, elapsed_time/px_servers
elapsed_time, cpu_time/px_servers cpu_time,
buffer_gets/executions buffer_gets
FROM
(SELECT sum(executions) as executions, sum(case
when px_servers_executions > 0 then
px_servers_executions else executions end)
as px_servers, sum(end_of_fetch_count) as end_of_fetch_count,
sum(elapsed_time) as elapsed_time,
sum(cpu_time) as cpu_time, sum(buffer_gets) as
buffer_gets FROM gv$sql
WHERE executions > 0 AND sql_id = :1
AND parsing_schema_name = :2)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 70502 4.02 4.25 0 0 0 0
Execute 70502 264.90 759.33 0 0 0 0
Fetch 70502 215.77 1848.46 0 0 0 70502
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 211506 484.70 2612.05 0 0 0 70502
Misses in library cache during parse: 17
Misses in library cache during execute: 17
Optimizer mode: CHOOSE
Parsing user id: SYS (recursive depth: 2)
Number of plan statistics captured: 70502
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
1 1 1 VIEW (cr=0 pr=0 pw=0 time=76 us)
1 1 1 SORT AGGREGATE (cr=0 pr=0 pw=0 time=76 us)
0 0 1 PX COORDINATOR (cr=0 pr=0 pw=0 time=76 us)
0 0 0 PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0 time=0 us)
0 0 0 VIEW GV$SQL (cr=0 pr=0 pw=0 time=0 us)
0 0 0 FIXED TABLE FIXED INDEX X$KGLCURSOR_CHILD (ind:2) (cr=0 pr=0 pw=0 time=0 us)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
PX Deq: Join ACK 281741 0.03 368.85
PX Deq: reap credit 1649736 0.00 25.22
IPC send completion sync 141000 0.02 135.07
PX Deq: Parse Reply 141004 0.04 170.64
PX Deq: Execute Reply 141004 0.08 1366.42
reliable message 70502 0.00 125.51
PX Deq: Signal ACK EXT 140996 0.03 13.99
PX Deq: Slave Session Stats 140996 0.02 25.54
enq: PS - contention 70605 0.11 145.80
KJC: Wait for msg sends to complete 2584 0.00 0.04
latch free 16 0.00 0.00
PX qref latch 14 0.00 0.00
latch: shared pool 4 0.00 0.00
latch: active service list 1 0.00 0.00
row cache lock 127 0.00 0.08
library cache lock 36 0.00 0.04
library cache pin 36 0.00 0.04
gc cr grant 2-way 1 0.00 0.00
db file sequential read 1 0.00 0.00
oracle thread bootstrap 1 0.03 0.03
As you can see this SQL elapsed time is about 2600 seconds in total (for about 70 000 executions), most of the wait time comes from “PX %” events and this SQL queries the GV$SQL view.
But wait:
Why the hell this SQL (which looks like an SQL that collects metrics for a particular sql_id) is somehow part of the session that launched the noncdb_to_pdb.sql script? Does it make sense?
I really don’t think so, this sql (sql_id “frjd8zfy2jfdq”) should have been triggered by something else (parsing or whatever) than noncdb_to_pdb.sql.
Let’s prove it with a simple test case (I am alone on the database):
alter system flush shared_pool;
select count(*) from v$sql where sql_id='frjd8zfy2jfdq';
connect / as sysdba
begin
execute immediate 'select object_name from dba_objects';
end;
/
select count(*) from v$sql where sql_id='frjd8zfy2jfdq';
With the following result:
SQL> @test_case
System altered.
COUNT(*)
----------
0
Connected.
PL/SQL procedure successfully completed.
COUNT(*)
----------
2
Did you see that the sql_id “frjd8zfy2jfdq” has been produced by this simple test case? (If you trace the session you would see this query into the trace file).
I reproduced this behavior on:
- 12.1.0.2 database with CDB/PDB.
- 12.1.0.2 database (Non CDB).
and was not able to reproduce it on a 11gR2 database.
So, it looks like that this SQL is introduced by a 12.1.0.2 (12cR1) new feature. As it is somehow linked to “SQL metrics collection”, I tried to disable some 12cR1 features (linked to this area) until I found the “culprit”.
It did not produce any change until I set the optimizer_adaptive_features parameter to false (true is the default).
Here is the result:
SQL> !cat test_case.sql
alter system flush shared_pool;
select count(*) from v$sql where sql_id='frjd8zfy2jfdq';
connect / as sysdba
alter session set optimizer_adaptive_features=false;
begin
execute immediate 'select object_name from dba_objects';
end;
/
select count(*) from v$sql where sql_id='frjd8zfy2jfdq';
SQL> @test_case
System altered.
COUNT(*)
----------
0
Connected.
Session altered.
PL/SQL procedure successfully completed.
COUNT(*)
----------
0
BINGO!!! The sql_id “frjd8zfy2jfdq” has not been executed!
Well, now what is the impact on noncdb_to_pdb.sql on my 2 nodes RAC database?
I edited the script and added:
alter session set optimizer_adaptive_features=false;
just above this line:
exec dbms_pdb.noncdb_to_pdb(1);
Guess what?
- noncdb_to_pdb.sql took about 20 minutes to execute (compare to about 75 minutes without setting optimizer_adaptive_features to false).
- The sql_id “frjd8zfy2jfdq” is not part of the trace file anymore.
Remarks:
- The impact of this SQL is much more “visible” with a RAC database, as it queries a GLOBAL V$ view (GV$SQL) and then needs parallel query to run (If more than one instance is up). With only one instance up, the 10046 trace file produced:
SQL ID: frjd8zfy2jfdq Plan Hash: 510421217
SELECT executions, end_of_fetch_count, elapsed_time/px_servers
elapsed_time, cpu_time/px_servers cpu_time,
buffer_gets/executions buffer_gets
FROM
(SELECT sum(executions) as executions, sum(case
when px_servers_executions > 0 then
px_servers_executions else executions end)
as px_servers, sum(end_of_fetch_count) as end_of_fetch_count,
sum(elapsed_time) as elapsed_time,
sum(cpu_time) as cpu_time, sum(buffer_gets) as
buffer_gets FROM gv$sql
WHERE executions > 0 AND sql_id = :1
AND parsing_schema_name = :2)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 69204 2.58 2.78 0 0 0 0
Execute 69204 23.93 25.45 0 0 0 0
Fetch 69204 2.68 2.64 0 0 0 69204
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 207612 29.20 30.88 0 0 0 69204
Misses in library cache during parse: 18
Misses in library cache during execute: 18
Optimizer mode: CHOOSE
Parsing user id: SYS (recursive depth: 2)
Number of plan statistics captured: 62
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
1 1 1 VIEW (cr=0 pr=0 pw=0 time=120 us)
1 1 1 SORT AGGREGATE (cr=0 pr=0 pw=0 time=113 us)
0 0 0 PX COORDINATOR (cr=0 pr=0 pw=0 time=100 us)
0 0 0 PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0 time=50 us)
0 0 0 VIEW GV$SQL (cr=0 pr=0 pw=0 time=43 us)
0 0 0 FIXED TABLE FIXED INDEX X$KGLCURSOR_CHILD (ind:2) (cr=0 pr=0 pw=0 time=39 us)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
row cache lock 1 0.00 0.00
As you can see: elapsed time of about 30 seconds (for about 70 000 executions) and no “PX%” wait events.
- I checked which parameters (hidden or not) changed when setting optimizer_adaptive_features to false. Then I tried one by one those parameters with my test case and discovered that setting “_optimizer_dsdir_usage_control” to 0 is enough to get rid of the sql_id “frjd8zfy2jfdq”.
- During the run of noncdb_to_pdb.sql, this is the large number of executions (about 70 000) of the sql_id “frjd8zfy2jfdq” that produced this long “overall” duration and highlighted the fact that this query has to be somehow removed.
- You could meet this SQL on 12cR1 databases (CDB/PDB or non CDB).
Conclusion:
- The optimizer_adaptive_features (set to true) may produce a huge negative impact on the performance (specially with RAC database). I provided an example of such an impact when running the noncdb_to_pdb.sql script.
- Should you meet the sql_id “frjd8zfy2jfdq” in your database and would like to get rid of it (because you observe a negative impact): Then simply set optimizer_adaptive_features to false (or “_optimizer_dsdir_usage_control” to 0) at the session or system level.