Synchronize after Create or Replace raises itgenclr039 error due to 'rowid$' column

Hi,
I’m trying some experimental code to fasten the synchronize statement into Query tool to sync data one way from PostGreSQL to SQL Server.

I’m running the last 20.1.388 Beta.

I first do a full sync of my data with a create or Replace table. I have a sync_table where I store my last update time, so next time I do a sync, I can only update or insert record based on write_date that exists in my PostgreSQL source table t1.

My goal is to prevent the synchronize statement to download 500K records from PostgreSQL and from SQL Server and compare them one by one.

-- first full sync statement
select sysdateutc;

local define JOB_START "${outcome:0,0}";

create or replace table t1@sql 
as 
select * 
from   t1@Postgre;
        
insert into sync_data@sql
( [job_table_name]
, [job_start]
, [job_end]
, [sync_type]
) 
Values 
( 't1'
, '${JOB_START}'
, sysdateutc
, 'full'
);

then I do an iterative sync by leveraging my last job_start date and the synchronize statement:

-- Incremental statement
select sysdateutc;

local define JOB_START "${outcome:0,0}";

select MAX(job_start) from sync_data@sql where job_table_name LIKE 't1';

local define LAST_JOB_START "${outcome:0,0}";

create or replace table t1@inmemorystorage 
as 
select * 
from   t1@Postgre 
where  write_date > '${LAST_JOB_START}')
;

insert into sync_data@sql
( [job_table_name]
, [job_start]
, [job_end]
, [sync_type]
) 
VALUES 
( 't1'
, '${JOB_START}'
, sysdateutc
, 'inc'
);

synchronize t1@inmemorystorage 
to          t1@sql 
with        insert 
            or update
identified 
by          id

But the last line finds a rowid$ in the t1@sql table created by the create or replace that synchronize is not happy with. I am not able to do a except rowid$ here.

To be added, similar scripting is described here with Incremental replicate Freshdesk to SQL Server
however :

(...)
    execute immediate 'create or replace table oldtickets@inmemorystorage as select * except rowid$ from tickets@sql';
    execute immediate 'create or replace table tickets@sql as select * from recent_tickets@freshdesk union distinct on id select * except rowid$ from oldtickets@inmemorystorage';
  end if;
end;

is not what i would like to achieve because create or replace table oldtickets@inmemorystorage as select * except rowid$ from tickets@sql will load uncessary lot of data that will be erased and re-inserted.

Help appreciated.

itgenclr039: One or more errors occurred.Impossible de mettre à jour la colonne identité ‘rowid$’.Message ID: 12317fc2-16ab-4cbc-993c-bb9c63ba4438Occurred (UTC): 29/03/2021 22:12:57SYNCHRONIZE account_invoice@inmemorystorage to account_invoice@sage_sql2019_Vega_replication with insert or update identified by idupdate Vega_replication.dbo.account_invoice set create_date = @w0create_date, write_date = @w1write_date, rowid$ = @w2rowid$ where ( rowid$ = @w1rowid$ )update Vega_replication.dbo.account_invoice set create_date = @w0create_date, write_date = @w1write_date, rowid$ = @w2rowid$ where ( rowid$ = @w1rowid$ )update Vega_replication.dbo.account_invoice set create_date = @w0create_date, write_date = @w1write_date, rowid$ = @w2rowid$ where ( rowid$ = @w1rowid$ )update Vega_replication.dbo.account_invoice set create_date = @w0create_date, write_date = @w1write_date, rowid$ = @w2rowid$ where ( rowid$ = @w1rowid$ )update Vega_replication.dbo.account_invoice set create_date = @w0create_date, write_date = @w1write_date, rowid$ = @w2rowid$ where ( rowid$ = @w1rowid$ )update Vega_replication.dbo.account_invoice set create_date = @w0create_date, write_date = @w1write_date, rowid$ = @w2rowid$ where ( rowid$ = @w1rowid$ )update Vega_replication.dbo.account_invoice set create_date = @w0create_date, write_date = @w1write_date, rowid$ = @w2rowid$ where ( rowid$ = @w1rowid$ )update Vega_replication.dbo.account_invoice set create_date = @w0create_date, write_date = @w1write_date, rowid$ = @w2rowid$ where ( rowid$ = @w1rowid$ )

Extend synchronize

Probably, the extended notation of the synchronize will help you solve this issue. For example, you can change the behaviour as with:

synchronize Contacts@ac
from        Contacts_soll@inmemorystorage
with        insert or update all except cdate
            ignore changes to cdate, udate
identified
by          id ignore nulls
continue on first 500 errors

Date filtering

Aside: the date filter is now applied using text representation. That might work depending on the text representation having the same ordering as the date representation. However, it might fail depending on your local configuration. For instance, the date January 1, 2000 is after December 31, 1999, but the text representation ‘12-31-1999’ is after ‘01-01-2000’ using the normal ordering.

As a workaround you could either use of block of PSQL like:

declare
  l_last_date date;
begin
  select max(...)
  into   l_last_date
  from   ...
  ;
  --
  -- Use it.
  --
end;

or add logic to make sure the text representation follows the date ordering using a format mask YYYYMMDDHH24MISS, such as:

select to_char(date_max, 'YYYYYMMDDHH24MISS')
from   ( select max(...) from ... )

and cast this text value to a date again using the same format mask on use:

where datecolumn >= to_date(date_as_text, 'YYYYMMDDHH24MISS')

Please do not cast the datecolumn to a text since that probably disables any server-side forwarding of the where-clause as well as index use on the database.

SQL Grammar of Synchronize

The documentation on synchronize is:

The compare and synchronize statement provides one-directional and bi-directional analysis and alignment of data in two tables. It is an extended version of SQL statements like ‘upsert’ and ‘merge’ found in other SQL implementations. The two tables can be located in any data container, enabling cross-data container synchronization of data. In most scenarios, data from both tables is downloaded and compared by Invantive SQL to establish the necessary actions. Some specific edge cases may have an optimized algorithm to reduce data downloads.

In an one-directional approach as specified by ‘FROM’ or ‘TO’, the compare and synchronize statement typically analyzes the differences between two tables with identically named columns, separating all differences in one out of four groups:

  • insert: a row exists in the source table and not in the target table;
  • delete: a row exists not in the source table, but exists in the target table;
  • update: the row exists in both tables, but the column contents are different.
  • none: the row exists in both tables and the column contents are equal.

Column values are matched on name. By default, all columns having identical names will be included in the matching process. The IGNORE CHANGES TO can be used to exclude columns by name from matching. IGNORE CHANGES TO is typically used to exclude changes in column values that should not trigger a DML-operation, such as for columns whose value is managed by the application, including technical keys.

Unique rows in both tables are matched using the list of columns as specified by ‘IDENTIFIED BY’. Identifying column NULL contents are considered a specific value for matching; when both the source and target table have a null value, the rows will be compared. Rows are completely ignored for synchronisation when all identifying columns have a NULL value and the clause “IGNORE NULLS” is present.

After analysis, the differences are be filtered to only contain a combination of DML-operations specified using ‘WITH’. Applying all DML-operation types INSERT, UPDATE and DELETE would result in the two tables having identical contents. The INSERT and UPDATE operations by default apply to all columns, but columns can be excluded using ‘ALL EXCEPT’. Excluding columns is typically used to leave out audit columns such as a generated creation date upon update and still have it included upon insert.

For better performance, SYNCHRONIZE uses bulk operations where supported, similar to CREATE TABLE and BULK INSERT. Bulk operations typically offer an order of magnitude better performance than solitary operations. However, error handling can be inconvenient since it is often unclear what specific row triggered an error. Neither, it is not easily established upon an error whether other rows in the same bulk operation have been processed. The BATCHSIZE clause allows specification of the number of rows per bulk operation with a minimum of 1. When BATCHSIZE is not specified, a platform-specific default value is used which can fluctuate due to dynamical management by Invantive SQL.

In case of bi-directional synchronization, the RESOLVE BY-clause enables specification of the column names whose value determines what table contains the preferred row with most current values. Using left-to-right column preference, both values of all RESOLVE BY columns are inspected. The first column value to have a higher ranking value than the other table selects that table as source. The logic is reversed for a column from higher ranking to lower ranking when DESC is specified. The RESOLVE BY-clause is typically used in combination with columns containing a (synchronized) timestamp value such as UTC time.

The ‘APPLY TO’ syntax is reserved for future use to allow routing DML-operations to other tables.

By default, the SYNCHRONIZE statement will fail upon the first error returned. Using ‘CONTINUE ON FIRST … ERRORS’ the threshold can be increased. However, the statement will fail whenever any error occurred during execution.

The differences can then either be:

  • applied on the target table to make the tables equal, using bulk operations where supported.
  • returned as a list of content differences (not yet available);
  • returned as Invantive SQL DML-statements to achieve synchronization (not yet available).

After checking the output of the last two categories of analysis, the differences can be applied on the target table.

In a bi-directional approach, as specified by using the keyword ‘AND’ between the two table identifiers, deletes will not be returned from the analysis phase and replaced by inserts on the table which does not have the other table’s row. The determination of the target for an update is based upon the resolution preference as defined by the ‘RESOLVE’ clause. The ‘RESOLVE’ clause either defines to always apply updates to the left or right table as listed in their order using, respectively, ‘PREFER LEFT’ and ‘PREFER RIGHT’. Otherwise, the combined ordinal value of the columns listed in the ‘RESOLVE BY’ clause will be evaluated and the highest-ranking value is considered to be leading.

well thanks for your help, I will pursue that path!
Indeed the statement would become

synchronize t1@inmemorystorage 
to          t1@sql 
with        insert 
            or update all except rowid$
identified 
by          id

it works!

I’ll share later after review my script logic, I’ve built a stored procedure to do an incremental sync.