Improve Watershed logic with Query tool

sda · 23 maart 2021 om 09:18

Hi there,
I want to one-way sync some data FROM PostGre TO Sql.
I have un bunch of tables some of them are very small in # records so a CREATE OR REPLACE is fine.
One table is an accounting table now with 600K records, and growing.
Using a CREATE OR REPLACE works with the amount of records growing it will make unnecessary computing tasks (like deleting and re-inserting 99% of unchanged records), so it’s not nice.

For that I followed on the forums :

I tried the SYNCHRONIZE that achieve a nice job like this:

CREATE OR REPLACE TABLE account_move@inmemorystorage
as select * from account_move@Postgre WHERE company_id IN (4)

SYNCHHONIZE account_move@inmemorystorage to account_move@sql
with insert identified by id

But It will take 23s to load 27K records (in this case filtered to company_id :4)
then the synchronize statement will need 78 seconds to compare the source account_move@inmemorystorage with 27K records to the destination account_move@sql with 600K records.

In this specific case, I know that my source data has a unique id, only ascending, and that when a record is created, no need to look if it has been deleted or updated. It is not possible in the workflows.

To achieve the sync job, I only need to look the last (or max) id inserted, and select in the source all ids that are above that one.

like :

INSERT INTO table account_move@sql as select * from account_move@postgre
WHERE id > select MAX(id) from account_move@sql

that should be way more faster that the nice synchronize in this case.

but the select MAX(id) from account_move@sql is a long statement, is seems that Query Tool is not passing the MAX operator to sql, instead downloading the all set of data, and doing the MAX on Query Tool side. for 600K records, it takes 80s to output the result.

doing a

select id from account_move@sql ORDER BY id DESC LIMIT 1

will produce the same behaviour, the results takes 84s to sort.

Am I on a wrong path ?

thanks !

sda · 30 maart 2021 om 07:24

closing this , I wrote here a more elaborated statement .
I will create a separate case for the long execution time of select MAX(id) from account_move@sql

Topic		Antwoorden	Weergaven
How to optimize synchronize tables SQL statement to achieve one way watershed incremental logic Ideas (en) invantive-query-tool , performance , invantive-sql	6	632	9 februari 2023
Synchronize after Create or Replace raises itgenclr039 error due to 'rowid$' column Questions invantive-query-tool , invantive-sql	3	640	20 september 2023
Synchronize statement Questions invantive-sql	4	343	20 september 2023
Synchronize to selecteert (veel) data uit de doel-tabel Vragen invantive-query-tool , sql-server , synchronize	3	367	11 april 2024
Synchronize inside a loop seems not working Questions exact-online , invantive-sql	4	149	21 maart 2024

Improve Watershed logic with Query tool

Gerelateerde topics