Native Performance High-Volume Updates with Invantive SQL

Summary

Invantive SQL includes native performance high-volume updates on traditional SQL databases such as PostgreSQL. A performance speed-up by a factor 2.000 of high-volume updates has been measured compared to the one-by-one updates available only on previous releases.

Invantive SQL Performance

Invantive SQL has now been around for a few years and connects to over 50 platforms, consisting of cloud applications such as Freshdesk, file formats such as Dutch audit files and on-premise databases such as Oracle. On the front-end there are approximately 10 products that ease integration into existing user environments, typically on Microsoft Windows.

With the advent of the number of connected platforms, we have seen that the volume of data exchanged has grown considerably. Not only on aggregated level, but also per user group. That required us two years ago to spend more time on optimizing typical execution patterns with the goal of meeting or exceeding native performance of traditional databases.

Hundreds of optimizations have been released which typically require no or little changes to existing Invantive SQL code: existing code just runs faster with a new release. For instance, just prefix an insert statement with ‘bulk’ to switch to using a bulk loader on supported platforms such as Exact Online or SQL Server.

Existing Invantive SQL code just runs faster with a new release

Update Performance

A continuing issue was the performance of the Invantive SQL update statement on traditional platforms which are capable of updating thousands of rows per second. In the previous implementation on traditional database platforms, Invantive SQL would:

  • fetch the data meeting the filter criteria,
  • update the next values and
  • then update the rows one-by-one.

This is functionally correct and it enables use of all Invantive SQL features and value semantics.

However, in an assignment on my first employment this was exactly the behavior used by emulation software capable of running native Bull mainframe software on UNIX with Oracle when I was called in for help. The results were correct, but large scheduled jobs ran forever, consuming all available resources by executing millions of individual update statements. To make things worse, the emulation software failed to use bind variables. This caused on Oracle 7 without automatic binding enabled a huge CPU usage due to continuously parsing similar SQL statements.

Although Invantive SQL always nicely uses bind variables to reduce CPU load, Invantive SQL is still processing or transporting millions of rows one-by-one.

Update Improvements

The update performance is in general an issue when the backing platform is capable of updating thousands of rows per second. This currently only applies to traditional database platforms such as PostgreSQL, MySQL and Oracle.

This also includes Teradata, but massive updates on Teradata will consume many vAMPs and resources. On Teradata there are better options available based on it’s architecture.

Invantive SQL will translate update statements into a single update statements when a number of preconditions have been filled. Such a single native update statement runs on the traditional database with native performance. The following preconditions have to be met:

  • Update uses no data retrieved from other connected data containers during execution time.
  • Update uses no Invantive SQL functions like ‘metaphone3’ which are unavailable or can have different semantics.
  • The filter is list of simple conditions using fields and constants only, which are combined using ‘and’ and ‘or’ in any execution order and grouping.

Note that the first precondition can sometimes be met by using a low-cost hint to move evaluation of the value to be used to parse time.

Measurements indicate that the native update statement runs a factor 2.000 faster than the previous one-by-one approach.

Measurement Script

The following Invantive SQL script creates a large table on Amazon PostgreSQL database of type t2.medium with 100.000 rows. Then the script updates a single column using the one-by-one approach triggered by the ‘reverse’ function and a native update.

Performance of the native update on 100.000 rows is approximately 2.000 times faster than running one-by-one. Your actual speed-up depends on factors such as network latency.

--
-- Create table with 100.000K rows and some columns.
--
-- Takes 11 seconds on PostgreSQL, including 
-- auto-created indexes and bulk insert.
--
create or replace table testmassupdate@betalingsscore
as
select *
from   range(100000)@datadictionary
join   dual@datadictionary

--
-- Update using one-by-one update.
-- Takes 1.983.734 ms.
--
update testmassupdate@betalingsscore set dummy_string=reverse(dummy_string)

--
-- Update using native performance high-volume update.
-- Takes 1015 ms.
--
update testmassupdate@betalingsscore set dummy_string='Z'