Connection pooling cloud applications for improved throughput

Go to Dutch version

Connection pooling

Connection pooling is a way to share and/or distribute data exchanges across a pool of connections to a database or cloud database. Connection pooling can be used to reduce the number of construction and tear-down operations on connections, which saves time and resources when each session does not need continuously a dedicated connection to a database.

A connection pool can also be used when an individual session is able to saturate a dedicated connection to distribute the load and increase throughput to the cloud database. When each connection has a limited throughput per second, minute or day, pooling multiple independent connections will yield a higher throughput than a single connection, assuming no other limitations on scarce resources like CPU cycles, memory capacity, disk throughput or limited throughput of other data containers apply.

Connection pooling across data containers

On a statement execution level, the select, bulk insert, insert, update, delete and synchronize statements automatically use a data container chosen from a fixed list of data containers. On each execution, another data container from the list can be chosen. For insert, update and delete statements, the data container can vary per row being exchanged, whereas the data container can vary for each page of rows on bulk insert.

The list of data containers to choose from is specified per occurrence of a table name. For a select-statement, each table instance used in the statement can have a different list of data containers, such as the table with from, join or sub-queries.

For example when inserting data on a table with a list of four data containers, the load will be distributed in a round-robin fashion across the four data containers. The initial position will be chosen at random for the round-robin algorithm to ensure that even repeated execution of an insert statement with just a single row insert is more or less evenly distributed across the available connections.

Sample Pooling Cloud Connections

The specification of additional data containers for an operation is quite simple. Just replace the data container alias after @ by a comma-separated list of the data containers to chose from, enclosed in accolades such as:

declare
  l_cnt number;
begin
  for i in 1..10
  loop
    select count(*)
    into   l_cnt
    from   ExactOnlineREST..Journals@{eol1,eol2,eol3,eol4,eol5}
    ;
  end loop;
end;

On each statement execution, a different data container is chosen to run the count statement on. After execution ends, each of the five data containers will have executed the count statement twice.

Non-even distribution

The list of data containers is chosen from in a round-robin fashion. When each data container is listed once, the load will be distributed evenly. However, it is allowed to include the same data container multiple times. This enables non-even distribution of the load, such as one data connection has a significant higher throughput such as a higher API-call rate limits or connecting to a faster node.

For example, the following insert statement will execute one out of three insert actions on the data container eol1, whereas the data container eol2 executes two:

insert into ExactOnlineREST..Tasks@{eol1,eol2,eol2}
( description
)
values
( 'sample'
)

Data container differences across pool

By design, there are no checks on the data containers listed in a pool to ensure that the behavior is consistent. For instance, it is possible to mix data containers running on a different Invantive SQL-driver, such as:

@{eol1,eol2,oracle3,oracle4}

A developer must ensure that the actions have the desired result, independent on what data container is chosen. This allows for distributing load across different drivers and/or physical network connections such as direct and bridged network connection.

There is also no check on partitions selected. Each data container can have a different list of partitions available and selected.

Performance Gains

Assuming data to be exchanged across the pooled connections is readily available and no limits on local resources, the performance gains can be linear depending on the target platform’s limitations. For instance, loading data into a cloud platform with a throughput of 1 row per second per connection but an aggregated maximum throughput of 2.000 rows per second using four connections will typically reduce runtime by factor 4.

Broken connections in a pool

Connection pooling for cloud applications require all connections to be healthy. There are currently no features to automatically skip connections known to be disfunctioning. For this purpose, the fail-over mechanism of Invantive databases can be used. In a future release, connections known to be unhealthy may be excluded from receiving action requests as long as unhealthy.

ANSI SQL connection pooling

On all SQL-based connections such as PostgreSQL and Oracle RDBMS, Invantive SQL has offered connection pooling since introduction of the SQL engine. In settings*.xml connection pooling can specifically be disabled by setting AllowConnectionPooling on a data container. Through the ConnectionString connection pooling for SQL-based connections can further be configured using attributes such as maximum-number-of-pooled-connections.

Some individual SQL-based drivers additionally provide connection pooling in the ADO.NET layer. These can be configured through the ConnectionString. Invantive SQL automatically forwards all connection attributes not recognized for use with the Invantive driver to the ADO.NET driver layer.

Availability

Connection pooling for cloud applications is available starting release 22.0.513 on all Invantive products.