Thursday, April 1, 2010

Tuning Data Warehouse Models on Oracle

While not strictly related to OBIEE – I am currently working on a number of database optimization tasks, and will discuss a couple of ideas to be aware of when designing a warehouse schema for the oracle database.

Let me start with a simple question:

Assume you have a table of  1000 blocks – with a column ‘Digit’ (that goes from 0-9 obviously). If the table is properly indexed on ‘Digit’ and you query all records where Digit=’0’ – how may data blocks will you have to read ?

Simple applying the unitary method gives you :

(1/10) * 1000 = 100 blocks.

Unfortunately, it turns out that even though this might be a reasonable estimate for an OLTP application – this can be rather dramatically wrong for a fact table in a DSS environment.

Why? Well, if you have designed your star schema correctly, you only have some surrogate keys, and some measures in the fact table (no large varchars). Assuming 5 (4 byte) surrogate keys, and 10 (8 byte) measures – your record length is

5x4+8x10=100 bytes long

Now – your block size may be 8K – and so the number of records that fit in a single block is 8K/100=80 records.

By now you are wondering what this has to do with the original question. A lot as it turns out. Remember, all reads and write are for the block as a whole – and a block has to be read if *any* of its rows qualify the criteria. What is the probability that out of 80 records on a given block - none of them are a digit 0:

P(No 0’s)= P(1st record Not 0) x P(2nd Record not 0)… P(80th record not 0)

=(0.9) ^ 80

=0.00022

Which means in such an environment, the number of blocks that you would expect to read is 1000 x (1-0.00022) = 999.8 blocks. A very different answer from the simplistic 100 we first got we first got.

And this is not all - if you use table compression (which gives you – say 50% gain) – and a large blocksize (recommended with compression) you may have as many as:

32K/ (100 bytes *50% compression ratio)=640 records per block.

In the next post – I will talk about what this means in terms of optimizer behavior.