I am working on a new watershed incremental loading script in stored procedure that would achieve fast and low traffic consumption, on tables containing 500K (and growing…) records of data.
I have a
source_table@container1 that contains a ID (
PRIMARY KEY with
AUTO_INCREMENT) and write_date columns.
Simply doing a
synchronize source_table@container1 into dest_table@container2 with insert or update identified by id
to synchronize the table data just works, very nicely and quite fast (approximately 140 seconds for 500K records).
I’ve found out that Invantive
- downloads all records from both tables
- then does a compare of the data,
- then pushes the necessary insert or update to the
In my case, from over 500K records, only very few rows have changed value. Only doing incremental changes would take (if well-coded) a few ms. In my case I have 40+ data tables to synchronize, if possible at small and regular interval.
In my script logic I have done that for now:
I leverage a new table
sync_table@container2, that contains script run information, like
Select top 1 job_start from sync_data order by id DESC ->> that I store into a last_job_start variable.
then I get only created or updated records from container1:
create or replace temp@inmemorystorage as select * from source_table@container1 where write_date > last_job_start
This table holds a very small amount of updated or created records since the last sync, enabling incremental loading.
But them I stuck, because using the nice
synchronize that enables me (quite nicely) to skip writing a long
insert + update logic:
synchronize temp@inmemorystorage into dest_table@container2 with insert or update identified by id
synchronize still loads the full data from the container2… so I saved half of the script execution time.
Is there a turnaround (or new lnvantive logics to come
synchronize?) to overcome the full load of the destination container?