In-Memory Instance distribution with RAC databases: I want at least one instance that contains all the data

As you may know, with a RAC database, by default all objects populated into
In-memory will be distributed across all of the IM column stores in the cluster. It is also possible to have all the data appear in the IM column store on every node (Engineered Systems only). See this white paper for more details.

There is 2 very interesting blog post around this subject:

Kerry Osborne show us how we can distribute the data (using the distribute INMEMORY attribute) across the nodes into this blog post.
Christian Antognini show us that having the data not fully populated on each instance could lead to bad performance into this blog post.

But wait: If my RAC service is an active/passive one (I mean the service is started on only one instance) then I would like to have all the data fully populated into the In-Memory column store of the “active” instance (and not distributed across the instances), right?

Let’s try to achieve this (all data fully populated into the In-Memory column store of the active instance):

So, by default, the IMCS distribution is the following:

SQL> alter table bdt inmemory ;

Table altered.

SQL> select count(*) from bdt;

  COUNT(*)
----------
  38220000

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         1 COMPLETED     313720832 1610612736           732684288
         2 COMPLETED     274530304 1610612736           826236928

As you can see no instance contains all data as the BYTES_NOT_POPULATED column is greater than 0.

Now let’s set the hidden parameter “_inmemory_auto_distribute” to false and re-trigger the population on instance 1:

SQL> alter system set "_inmemory_auto_distribute"=false;

System altered.

SQL> select count(*) from bdt;

  COUNT(*)
----------
  38220000

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         1 COMPLETED     587137024 1610612736                   0
         2 COMPLETED     274530304 1610612736           826236928

BINGO! Look at the Instance 1, it now contains all the data in its IMCS (as BYTES_NOT_POPULATED=0).

If I connect to the Instance 2 and launch the query to trigger the population I’ll get:

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';
  2    3  
   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         1 COMPLETED     587137024 1610612736                   0
         2 COMPLETED     589234176 1610612736                   0

So, both instances now contain all the data (like the duplicate attribute would have done).

Remarks:

In case of active/passive service (means preferred/available service) then after a service failover the second instance will also contain all the data (once populated).
In case of active/active service then each database instance will contain all the data but not necessary at the same time (so the behaviour is a little bit different of the duplicate attribute).
I used an hidden parameter, so you should get oracle support approval to use it.
I’ll check the behaviour with PDB into another post.

Conclusion:

Thanks to the hidden parameter “_inmemory_auto_distribute” we are able to change the way the In-Memory Column Store are populated by default on an oracle RAC database.
By default all objects populated into memory will be distributed across all of the IM column stores in the cluster.
With “_inmemory_auto_distribute” set to false, then we are able to populate one instance (or all the instances) with all the data.

Update 2015/03/27:

This “workaround” proposed to get rid of the default behaviour on non Engineered Systems (“Every row of the test table is stored in the IMCS of either one instance or the other”) works for any type of service.
In case of preferred/available service, there is no need to set the hidden parameter to get the IM fully populated on one node. The secret sauce is linked to the parallel_instance_group parameter. See this blog post for more details.