Invantive SQL is used by many companies for exchanging data between (cloud) platforms and data maintenance such as synchronizing back office systems with mid office or front office systems. Release 20.1.393 includes features to improve the scalability of DML operations
delete in the non-time critical maintenance of high volume environments by restricting their run time using the
These improvements hold across all available platforms that support
delete, such as Visma.net Financials, ActiveCampaign, Exact Online or ANSI SQL-based platforms such as SQL Server.
The insert operation has no new features, but thanks to the
bulk insert support on platforms such as SQL Server, PostgreSQL and Oracle it already provides great performance. Note that MySQL databases do not offer native bulk insert support.
limit clause has been available for years on all
select statements, allowing restricting the number of rows retrieved from one or a range of data containers in a query. The
limit clause introduces a step in the query’s execution plan, applying a cut in data retrieval after handling the indicated number of rows.
For example, the following query statement will only retrieve a low volume of rows before ending even when there are millions of rows present:
select * from devices@sqlserver limit 1
The query can also be written as
top 1 to improve interoperability with queries written originally for platforms such as SQL Server.
Batches of data
limit can be placed before or after an
order by clause. The
order by clause requires batching the full result set and then sorting them completely before returning rows to the next step in the query’s execution plan.
limit statement can be moved from after to before the
order by step improving the performance in terms of speed and data processed thanks to the streaming nature of Invantive SQL row retrieval.
Performance limitation on PostgreSQL of
Specifically and solely on PostgreSQL, the PostgreSQL ado.net provider npgSql used by Invantive SQL has a strong limitation in streaming data. By default, the npgSql provider needs to complete full retrieval of a result set before being able to continue. Even when just a few rows are retrieved, the npgSql provider will retrieve all rows from PostgreSQL before continuing.
Since Invantive SQL currently does not forward
limit or group operations such as
max to the data container as described on Query Tool MAX() , Order By ... statements speed question with SQL Server, this introduces significant overhead on PostgreSQL. This overhead is especially measurable on very large tables in terms of amount of data stored.
When a query statement is executed in parallel across multiple partitions such as across Exact Online companies (“divisions”), a number of individual partitions in parallel start streaming data into the output stream. Once the requested number of rows has reached the
limit clause, the evaluation will end of all streams as in:
-- -- Select for instance 250 companies. -- use all@eol -- -- Retrieve the first 5.000 transaction lines -- across 250 companies. -- select * from TransactionLinesBulk@eol limit 5000
Both single threaded (single partition) execution as parallel execution uses a look-ahead buffer of typically “one large screen filled with rows” per partition. The amount of rows in this look-ahead buffer varies depending on properties of the data container but a value of 100 rows is an easy to remember approximation.
In the sample given, their might be 16 parallel executions each fetching 1.000 rows given the paging size of
TransactionLinesBulk before the 5.000 rows have been handled, leading to 11.000 rows being trashed. Even worse, when some partitions run significantly faster, some of them might fetch 2.000 rows while other might still be waiting for data.
Although it may seem a waste, the algorithm has been optimized over the year to improve the performance of first row retrieval and query execution time with a limited percentage of rows going unused.
In many scenarios, the synchronization of data across systems or maintenance of data is not time critical. Business runs fine as long as data state converges within a reasonable time frame. However, the data volumes to be handled can often fluctuate. The most evident scenario is a Black Friday scenario: an order of magnitude more of circumstantial transaction data arriving in a time slot to be handled.
On both the
limit clause can be added. The
limit restricts the maximum number of rows being updated or deleted. For example, the following statement will update a limited number of rows, even when more rows fall within the restrictions:
update exactonlinerest..tasks@exactonline set description = description || ' [DONE]' where handled = true and description not like '% [DONE]' limit 100
When the same statement is run over-and-over again in a number of days, all handled tasks will ultimately change their description. However, on a single run no more than 100 tasks will be updated and in general not many more than 100 tasks will be retrieved to determine their primary key for an update. In fact, the internal working of Invantive SQL is:
- Retrieve the number of rows matching the criteria of the where-clause, limited by the specified maximum number of rows.
- For each row, determine the primary key value and for update also the new row values.
- Update or delete the row using an API call.
As a welcome side-effect of running much faster, this approach also makes it easier to stay with limitations on the number of API calls allowed.
Filter Forwarding on
However, sometimes a
limit does not enhance scalability of performance. Invantive SQL can forward many simple filters contained in a where-clause to a
update to the data container. For instance, the following simple delete removes all rows matching the where-clause:
delete logging@sqlserver where date_created < trunc(sysdate) - 7
This especially and mostly always applies to ANSI SQL-based platforms such as SQL Server. Few cloud platforms have an API that allow multiple rows to be deleted or updated with a single API call.
In general, the performance is better of not using
limit when a platform can forward simple filters and your update or delete statement uses a combination of simple filters that can be forwarded.
The easiest way to verify the recommended approach is to measure actual system behavior. For many purposes, the contents of the data dictionary view
SessionIOs@DataDictionary provides a great starting point for further tuning and optimization. The view enables you to make sure that actual statement execution behaviour matches your expectations. Also you can read more various performance-related topcis on this community.