Wednesday, August 25, 2010

Cache Purging – Session Variables

There is another way to purge cache for selected tables that is based on a pull mechanism – not push – which is to use session variables to detect when data on the underlying database has changed.

Here is the sequence of things one has to do:

1) Create a ETL_TIMESTAMP table on your warehouse as follows :

create table ETL_TIMESTAMP
(
  ETL_TIMESTAMP TIMESTAMP(6)
);

2) At the end of every ETL run, run the following script to refresh this timestamp:

truncate table ETL_TIMESTAMP;

insert into ETL_TIMESTAMP select sysdate from dual;

3) Now create a session variable so that every new session gets the value of the ETL refresh timestamp when it starts – by using the following SQL:

select etl_timestamp from etl_timestamp

image

 

4) Finally you want to make sure that cache entries are discarded when the value of this variable changes. We now have to ensure that this variable is referenced in the query. This can be done in a number of ways – one of which is to reference it in the logical table source (LTS).

Modify every LTS (or at least the one for which cache purges are required) by modifying the ‘WHERE clause’:

image

Now cache entries will be automatically purged everytime you run ETL.

Pros and Cons

Pros

  1. The most important difference between this method and the others we discussed here and here, is that being a pull API, it is fairly resilient to changes in physical configuration. If you add or remove servers from a clustered environment, or have Dev. / UAT / QA / Production BI servers pointed to the same warehouse database, each of the caches will automatically refresh when they should.
  2. Since this is session level, there is no latency to a cache refresh. As soon as the new data appears on the database, all new sessions will see the new data.

Cons

  1. Configuration maybe hard, if every LTS is to be modified.
  2. Existing sessions might see some old cache hits. If you have long running sessions, then you may need to enhance this to use both a session variable and a repository variable to provide staleness guarantees. The latter is left as an exercise for the reader.

1 comment: