Wednesday, July 7, 2010

Tuning Data Warehouse Models on Oracle - 2

In the last post we talked about how, in a fact table, even high selectivity – 10 % - does not lead to reduction of the blocks read.

This means that if you keep 10 years of data in the fact table – and select just 1 year – the database is on an average going to have to scan the full table. This can – will – be bad for performance. What one would ideally like to do – is just scan a 10th of the data. To accomplish that one has to ensure that each year is stored on the minimum number of blocks. This co-location of related records can be accomplished in three distinct ways.

There are three tools available for us to accomplish that:

  1. Index Organized table
  2. Partitioning 
  3. Physical Reordering via CTAS

Index Organized Table

The easiest way to keep the rows ordered is to tell Oracle to keep them that way – i.e. via an index organized table. The syntax is simple – you just have to create the table with an organization index clause:

create table sales (

) organization index;

The table data is stored with the primary key index instead of in its own blocks. If the primary key is prefixed with the columns you want to sort on – then records with a given value will appear together. For example,

create table sales (

…,

constraint PK_SALES primary key (year, sales_id)

) organization index;

will keep all records for a given year together.

However, we do not recommend actually doing this for fact tables because in our experience this does not work well with bitmap indexes.

Partitioning

If the number of values in the field that you want to co-locate by is small, you can typically partition by that field – for example, you may do something resembling the following:

create table sales (…)

     partition by range (year)

     partition p1 values less than (2005) tablespace u1,

     partition p2 values less than (2006) tablespace u2,

     partition p3 values less than (2007) tablespace u3,

     partition p4 values less than (2008) tablespace u4,

   partition p5 values less than (MAXVALUE) tablespace u5;

You would of course, have to maintain it as years were added. This is made much easier by the use of interval partitioning introduced in 11g.

 

create table sales (…)

     partition by range (year)

interval  (1)

     partition p1 values less than (2005) tablespace u1;

This is great when the number of partitions is variable – i.e. as data for future comes in.

The thing to watch out for is when the cardinality of the columns you are sorting on is high. For example, if you were sorting on date instead of years, then instead of 10 partitions could end up with 3650 partitions.

CTAS

The last way to keep rows of a certain value together is to to simply recreate the table using CTAS. For example,

create table sales_ordered as select * from sales order by date;

Note that this option, unlike the other two, does not provide any knowledge of the physical ordering to the optimizer. However, the benefits of co-locating the rows (as discussed in http://navi-obiee.blogspot.com/2010/04/tuning-data-warehouse-models-on-oracle.html) will continue to be available.

 

In a warehouse environment, we recommend that you always be aware of the physical order of data in your fact tables. If the cardinality of your most common filtering attribute is low, partitioning may be the best way to order the rows. If high, then CTAS with order by may be your only solution.

1 comment: