Summary
Data replication using trickle loading on Invantive Data Replicator (either OEM or regular) can considerably reduce load on the Exact Online eco system and reduce replication time for Exact Online tables. This note provides an approach and practical queries to easily set up and maintain data replication for large amounts of companies and customers. The approach scales beyond 5.000 companies provided with sufficient resources.
Data Replication Performance and OEM
Invantive Data Replicator is increasingly deployed on large environments, with data volumes exceeding 1 terabyte. It typically concerns thousands of partitions of platforms such as Exact Online companies, Loket payroll companies or Visma.net companies. Data replication should achieve one or both of the following:
- Increase performance of queries, while reducing strain on API ecosystems.
- Enable use of tools that do not support Invantive SQL drivers or ADO.NET.
The replication process itself can be a real resource drain for your cloud partners given the amount of partitions (companie) and data. Therefore Invantive provides a number of optimization strategies to reduce replication time while also reducing the load on your supplier’s API ecosystem.
Most popular is the use of “trickle loading”. Trickle loading is an incremental strategy: after initially loading a full data set only changes are applied, consisting of deletes, inserts and updates.
The information flow from cloud platforms with trickle loading is displayed below as the deviating flow through “Receivers”:
Especially with large data sets only a few percent is “hot”, while all other data is at rest and stored for the future.
On cloud platforms these changes (deltas) are generally taken from webhook events. On database platforms these changes are generally taken from database triggers and/or transaction logs such as Oracle redo logs.
Specifically for OEM deployments, there are generally dozens of environments sharing one OEM license contract code. Also, often some hosting is provided to customers bringing their own Invantive license (“BYOL” or “Bring Your Own License”). Configuring and automatic maintenance of webhooks in such environments requires some additional care to avoid repeating labor efforts when companies are added.
The following code helps you optimize the activation and maintenance of trickle loading for large volumes of different customers served using OEM. We’ve used webhooks on Exact Online to illustrate the code.
Standard Use of Trickle Loading (Webhooks)
Standard use of trickle loading is explained in another topic, but these are the general steps:
Setup
For regular internal-use deployments with little changes in cloud platforms connected and 1 or 1.000 partitions, the activation of trickle loading is typically done once using the following statement per table:
--
-- Select all companies/partitions eligible.
--
use all
--
-- Enable trickle loading for a specific table.
--
alter persistent cache set table NAME approach trickle
followed once by setting the Invantive Data Replicator token used for authentication on downloading events using:
alter persistent cache set token = 'value'
Download Feed of Events
After those changes have been made, the regularly running Invantive SQL replication script typically includes somewhere at the top a statement to download new events:
alter persistent cache download feed
Automatic Activation of Trickle Loading (Webhooks) for OEM
The following Invantive PSQL block enables trickle loading for all selected partitions (such as Exact Online companies) limited to the replicated tables that support webhooks.
Sample code in Invantive PSQL:
begin
for r_trickle in
( --
-- Activate trickle loading for all tables for which
-- trickle loading is available but not yet activated.
--
-- Selects all replicated tables that support webhooks (and
-- therefore trickle loading). Restrict
-- to those tables that have one or more table partitions
-- (such as Exact Online companies) for which trickle loading
-- is not yet activated.
--
-- Expected runtime on initial run in seconds: 5 times number of partitions (companies).
-- Expected runtime after initial runs in seconds: 1.
--
select distinct
'alter persistent cache set table '
|| tbe.tbe_full_qualified_name
|| ' approach trickle'
stmt
from dc_tables@datacache tbe
join dc_table_partitions@DataCache tpn
on tpn.tbe_id = tbe.tbe_id
and tpn.tpt_seeding_approach_used_code = 'C' /* Complete. */
join SYSTEMTABLES@DataDictionary ste
on ste.full_qualified_name = tbe.tbe_full_qualified_name
and ste.webhook_topic is not null
--
-- Skip all XML tables, since none of them
-- has webhooks anyway.
--
where tbe.tbe_full_qualified_name like '%REST%'
)
loop
execute immediate r_trickle.stmt;
end loop;
end;
When webhooks are not available, you can consider the use of the smart sampling approach. Alternatively, when tables have more than 50.000 rows and server-side filtering is available, you can use queries of the following type to improve replication performance:
create or replace table REPLICA@DATABASE
as
select ... from OLD
union
distinct on PK
select ... from NEW where MODIFIED > date
Another alternative is the use of the synchronize
statement which aligns the contents of the two tables. Please refer to the SQL grammar for more instructions.
Download Messages for OEM
Webhook event leads to messages being registered by Invantive Cloud in a queue. The contents of this queue can be downloaded and processed by Invantive Data Replicator.
For OEMs, to ensure the right messages are downloaded, the data container ID must be provided to the download. Otherwise, the first environment will download all messages. These messages will not be processed, but downloading them makes them unavailable for the remaining environments.
Sample code in Invantive PSQL:
declare
l_license_contract_code varchar2 := 'L9999999999';
l_secret_token varchar2 := 'secret';
begin
--
-- Expected runtime: 1 second per 1.000 messages.
--
for r_dcr
in
( select 'alter persistent cache download feed LICENSE CONTRACT CODE '''
|| l_license_contract_code
|| ''''
|| ' token '''
|| l_secret_token
|| ''''
|| ' datacontainer '''
|| dcr.dcr_name
|| ''''
stmt
from ( select distinct
dcr.dcr_name
from dc_table_partitions@DataCache tpn
join dc_tables@DataCache tbe
on tbe.tbe_id = tpn.tbe_id
join dc_data_containers@DataCache dcr
on dcr.dcr_id = tbe.dcr_id
where tpn.tpt_seeding_approach_used_code = 'T'
) dcr
order
by dcr.dcr_name
)
loop
execute immediate r_dcr.stmt;
end loop;
end;
Webhook Storms
Exact Online has “webhook storms”. Events such as restoring a backup, upgrades or closing periods on large companies can trigger millions of messages within an hour. There is no meaningful rate limiting applied, so when improperly designed your webhook receiver might experience a kind of “Distributed Denial of Service” attack.
Invantive Cloud can handle 10 million events per hour. When exceeded, events are throttled and handled over time. For instance, when 100 million events occur across 20.000 Exact Online companies within 60 minutes, it will take Invantive 10 hours to catch up and make the events available for you.
However, the averaged number of webhook events delivered is 50,000 per hour, only 0,5% compared to the maximum handling capacity. Webhook storms do occur, but will seldom significantly delay the receipt of your events by Data Replicator.
De-duplication of Multiple Events
Exact Online is known to trigger multiple events for one change. For instance, a webhook event will be sent for the transaction header for every line contained in a transaction. This has improved over time and currently we see little to no duplicates. Any excess duplicates are automatically de-duplicated by Invantive Cloud when received within a configurable time interval. The time interval is by default set to 15 minutes.
Token Security for OEM
The token to retrieve the feed with events is typically stored persistently in the Data Replicator repository to enable retrieval from batch jobs, using:
alter persistent cache set token = 'value'
However, some OEMs allow their end-users to access the Data Replicator repository database directly. Access to the token might offer an additional element for an attack vector.
Therefor the token is not persistently stored in the database in this sample. Instead, the token is stored in the script. You can add the standard device key encryption on the token string for better security. On Exact Online, this is not necessary in general since Exact Online webhooks contain little to no actual business information.
Check Activation of Trickle Loading
On Exact Online you can check the current activated webhooks using WebhookSubscriptions:
select /*+ ods(false) http_disk_cache(false) http_memory_cache(false) */
*
from WebhookSubscriptions
For example:
Check Routing of Webhook Events
After initial activation of trickle loading on Exact Online, it will take some time before the actual replication is done using incremental trickle loading instead of a complete copy. Invantive Data Replicator requires a heartbeat signal from Exact Online to close the loop. As long as no messages have been received for webhook events on a combination of Exact Online company (partition) and replicated table, the replication will keep using a complete copy to avoid data loss.
The following code helps you test the loop for receiving events by changing an element in Exact Online accounts and checking that the related message arrives:
--
-- Test retrieval of messages, for instance using accounts.
--
update exactonlinerest..accounts
set name = name || '!'
where name like 'Belastingdienst%'
--
-- Revert change.
--
update exactonlinerest..accounts
set name = replace(name, '!', '')
where name like 'Belastingdienst%!'
--
-- Then:
-- * Wait 3 minutes.
-- * Run download feed above.
-- * Check contents of incoming messages.
--
select *
from dc_incoming_messages@DataCache
For example:
In case you want to manually force close the loop, please contact a consultant.