Sunday, December 12, 2010

Connection Pooling and Sizing of Connection Pools

 

A question that does not get asked as often as it should is what is the best size for a connection pool to a database. However, the answer to that question can have a major impact the response times your users experience as well as the throughput of the system.

Luckily for us, a number of people on the internet have provided a simple answer to that question for different technologies – “It Depends!”.

http://dev.mysql.com/tech-resources/articles/connection_pooling_with_connectorj.html

http://searchdomino.techtarget.com/expert/KnowledgebaseAnswer/0,289625,sid4_gci851755,00.html

http://stackoverflow.com/questions/405352/mysql-connection-pooling-question-is-it-worth-it

So, let us talk about what are the factors to take into consideration when planning the optimal size of the connection pool.

  1. A connection pool that is too small means that queries have to wait to get a connection before they execute. This delay can be quite large – in seconds – for a loaded system
  2. A very large connection pool, on the other hand has its own problems
    • First of all, connection pools are fairly heavy in terms of memory usage. This will impact the memory OBIEE server uses, which may or may not be a concern depending on whether you are on a 64-bit platform, and the available RAM.
    • Secondly, if you have a large connection pool, all queries will be sent on to the database. Your database box and/or disk will see a performance degradation beyond their capacity.

That means you need two pieces of information to correctly size the connection pool – the database capacity for parallelism and the expected concurrency. The former typically depends on the database you run, the machine specs, and the operating system. Your DBA will be able to help determine this based on his past experience. Think of this as the “supply” of connections.

Here is a little spreadsheet to help determine the “demand” for connections. The numbers here are, of course, made up for illustration only:

Name of Factor Number Cumulative Effect
No of Named Users 5000 5000
% logged on at peak times 10% 500
No. of reports on a dashboard 10 5000
% time running a dash board – For example, if a dash board runs in 5 seconds, and think time between runs is 30s – this is (5)/(5+30) 14% 714
Cache Miss Ratio – This will be high for highly customized reports 80% 571
Number of Logical Queries/logical SQL 3 1713

This number above – approximately 1700 – gives you a number of sizing guidelines. For example, if your database will gracefully handle only about 600 connections/queries at a time – then you know you need approximately 3 (1700/600) databases to handle this load correctly.

Also, if you find – for example due to memory issues – that an OBIEE instance will only scale up to a connection pool of 200 – you know you will need a cluster of about 8 (1700/200) instances of OBIEE to handle this load.

Of course, you can also play with the the numbers above to try and get the required connection pool size down. For example, if by creating better aggregates, you are able to get the dashboard running time from 5s to 1s, this can be 5x reduction in the number of required connections.