Time-travel Your ERP data using Invantive SQL

This article illustrates how you can store data for time-travel queries in a compressed format using Invantive SQL. The algorithm works for any type of data of any source such as Salesforce, Exact Online or Visma.net. It is based upon the Oracle-based data vault technology of Invantive Producer and suitable for use with medium-sized data sets. For data sets exceeding 100.000 rows more advanced technologies such as Invantive Producer or Invantive Data Vault are recommended.

The instructions apply for storage of the data in memory or in SQL Server, Oracle, MySQL or PostgreSQL. To use the instructions below for another storage than in-memory, please define an additional data container in your database like ‘sqlserver’ and replace history@inmemorystorage by history@sqlserver.

Instructions

Follow the steps below to store data for time-travel using Invantive SQL:

  1. test data;
  2. create history table;
  3. register new version;
  4. use.

Test Data

In this step we will create a number of versions of the same data set, each reflecting a different points-in-time. For this example, we are using a very simple invoice table.

The version at time t=0 of invoices has no contents. The version at time t=1 has the following contents:

image

Whereas version at t=2 has the following contents:

image

Between t=0 and t=1, three invoices have been created. Between t=1 and t=2, the user has removed the invoice with number 0, created a new invoice 3 and updated the description of invoice 1 from “First invoice” to “New first invoice; fixed typo.”

The following code creates the test data set:

--
-- Set up data.
--
begin
  create or replace table t1@inmemorystorage
  as
  select cast(0 as int32) invoicenumber
  ,      'To delete invoice' description
  ,      to_date('20190101', 'YYYYMMDD') created
  ,      to_date('20190101', 'YYYYMMDD') modified
  union all
  select cast(1 as int32) invoicenumber
  ,      'First invoice' description
  ,      to_date('20190101', 'YYYYMMDD') created
  ,      to_date('20190101', 'YYYYMMDD') modified
  union all
  select cast(2 as int32) invoicenumber
  ,      'Second invoice' description
  ,      to_date('20190104', 'YYYYMMDD') created
  ,      to_date('20190104', 'YYYYMMDD') modified
  ;
  create or replace table t0@inmemorystorage
  as
  select *
  from   t1@InMemoryStorage
  where  false
  ;
  create or replace table t2@InMemoryStorage
  as
  --
  -- New situation contains all old invoices except the
  -- first invoice which was changed, and invoice 0 which
  -- was deleted, plus one new invoice.
  --
  select * except rowid$
  from   t1@inmemorystorage
  where  invoicenumber not in (0, 1)
  union all
  --
  -- Changed invoice 1.
  --
  select invoicenumber
  ,      'New first invoice; fixed typo.' description
  ,      created
  ,      to_date('20190212', 'YYYYMMDD') modified
  from   t1@InMemoryStorage
  where  invoicenumber = 1
  union all
  select cast(3 as int32) invoicenumber
  ,      'Third invoice' description
  ,      to_date('20190312', 'YYYYMMDD') created
  ,      to_date('20190312', 'YYYYMMDD') modified
  ;
end;

Create History Table

All historical and current records are stored in a table hist@inmemorystorage. This table contains all columns contained with the original data set. Additionally, it has columns recognizable by the prefix ‘h_’ to enable time-travel:

  • h_id: an ever-increasing number over time.
  • h_date_start_utc: the effective start date of the row (UTC).
  • h_date_end_utc: the effective end date of the row (exclusive) (UTC). The date December 31, 9999 is used to a record without end to enable index use.
  • h_is_last_known: a signal that indicates whether the row is relevant for the last known situation.
  • h_event: an audit column that registers the event, being either ‘I’ for ‘Insert’, ‘U’ for ‘Update’ and ‘D’ for ‘Delete’.
  • h_session_id: an audit column that registers the Invantive SQL session.
  • h_created_on: an audit column that registers the device used.
  • h_created_by: an audit column that registers the name of the user.
  • h_date_measured_utc: an audit column that registers the moment in time (UTC) at which history was updated.

The following code creates the history table and should be executed once:

---- Initial setup of history table "hist".--begin  if not exists (select /*+ ods(false) */ 1 from systemtables@datadictionary where name = 'HIST')  then    create or replace table hist@inmemorystorage    as    select cast(0 as int64)            h_id            label 'ID'    ,      sys_context('USERENV', 'SESSIONID')            h_session_id            label 'Measurement Session ID'    ,      sys_context('USERENV', 'CLIENT_MACHINE_NAME')            h_created_on            label 'Machine Created on'    ,      sys_context('USERENV', 'USER_FULL_NAME')            h_created_by            label 'User Created by'    ,      cast(sysdateutc as datetime)            h_date_measured_utc            label 'Measured (UTC)'    ,      cast(sysdateutc as datetime)            h_date_start_utc            label 'Start Date (UTC)'    ,      cast(sysdateutc as datetime)            h_date_end_utc            label 'End Date (UTC)'    ,      cast('X' as char)            h_event            label 'Event'    ,      cast(true as boolean)            h_is_last_known            label 'Last Known State?'    ,      t.* except rowid$    from   t0@inmemorystorage t    where false    ;  end if;end;

Register new Version in History

The registration of a new (differential) version in the history table consists of three steps:

  1. determine last ID used to number the history records;
  2. load changes between versions;
  3. complete previously current records.

These three steps should be executed once after running

For the test, you can create the old and new situation using the following statements to make old equal t=0 and new the situation at t=1:

create or replace table OLD@InMemoryStorageasselect *from   t0@inmemorystoragecreate or replace table NEW@InMemoryStorageasselect *from   t1@inmemorystorage

And then one more time after moving old and new to t=1 and t=2 respectively:

create or replace table OLD@InMemoryStorageasselect *from   t1@inmemorystoragecreate or replace table NEW@InMemoryStorageasselect *from   t2@inmemorystorage

Determine last ID Used

The following query for use with Invantive Data Hub or Invantive Query Tool stores the last used ID in the local variable H_ID_LAST:

select cast(coalesce(h_id_max, 0) as int64) h_id_max_nnfrom   ( select /*+ ods(false) */                max(h_id) h_id_max         from   hist@InMemoryStorage       )local define H_ID_LAST "${outcome:0,0}"

The hint of ods(false) can be removed when not running in combination with Invantive Data Replicator.

On initial use, the last ID used will be 0. On continuing use, higher numbers will be returned.

Load Changes between Versions

Then, a range of changes between the previous and current version is loaded into the history table using:

---- Take list of result sets (with partition column) and change them into time travel list.--insertinto hist@inmemorystorageselect cast(to_number('${H_ID_LAST}') + row_number() as int64) h_id,      sys_context('USERENV', 'SESSIONID') h_session_id,      sys_context('USERENV', 'CLIENT_MACHINE_NAME') h_created_on,      sys_context('USERENV', 'USER_FULL_NAME') h_created_by,      sysdateutc h_date_measured_utc,      t.*from   ( select created + (sysdateutc - sysdate) h_date_start_utc         ,      to_date('99991231', 'YYYYMMDD') h_date_end         ,      'I' h_event         ,      true h_is_last_known         ,      new.* except rowid$         from   new@InMemoryStorage new         where  cast(new.invoicenumber as varchar2)                not in                 ( select cast(invoicenumber as varchar2)                  from   old@InMemoryStorage                )         union all         select sysdateutc h_date_start_utc         ,      sysdateutc h_date_end_utc         ,      'D' h_event         ,      false h_is_last_known         ,      old.* except rowid$         from   old@InMemoryStorage old         where  cast(old.invoicenumber as varchar2)                not in                 ( select cast(invoicenumber as varchar2)                  from   new@InMemoryStorage                )         union all          select new.modified + (sysdateutc - sysdate) h_date_start_utc         ,      to_date('99991231', 'YYYYMMDD') h_date_end_utc         ,      'U' h_event         ,      true h_is_last_known         ,      new.* except rowid$         from   new@InMemoryStorage new         join   old@InMemoryStorage old         on     old.invoicenumber = new.invoicenumber         --         -- Rewrite this clause to a compare on all fields when no modified is available or not covering all scenarios.         --         and    old.modified != new.modified       ) t

Complete Previously Current Records

After loading the new data, the previously current history records must be marked as ended. This is done using a piece of Invantive PSQL:

begin  for r   in  ( select /*+ ods(false) */            h_id    ,      h_date_start_utc    ,      invoicenumber    from   hist@inmemorystorage    where  h_id in ( select /*+ ods(false) */ max(h_id) from hist@InMemoryStorage group by invoicenumber )  )  loop    --    -- Update old records to be no longer last known.    --    update hist@inmemorystorage    set    h_is_last_known = false    where  h_is_last_known = true    and    invoicenumber = r.invoicenumber    and    h_id != r.h_id    ;    --    -- Put an end date on records now no longer current.    --    update hist@inmemorystorage    --    -- No 1 second offset as on Oracle with Invantive Producer, since Invantive SQL optimizes this better.    --    -- Use MOMENT >= start and MOMENT < end    --    set    h_date_end_utc = r.h_date_start_utc    where  h_date_end_utc = to_date('99991231', 'YYYYMMDD')    and    invoicenumber = r.invoicenumber    and    h_id != r.h_id    ;  end loop;end;

Using the History

Full History

The relevant contents of the history table after executing the steps above are:

For example, you can see for invoice number 1 that starting December 31, 2018 10 PM (UTC) the description was “First invoice”. This description lasted till February 11, 2019 10 PM (UTC) when it was changed to “New first invoice; fixed typo.” This description is still current.

Current Records

You can retrieve the last known or current records by querying on the last known signal:

Contents at Moment in Time

You can also easily reconstruct the registered history for a specific moment in time, using a query such as:

Enable Power BI Slider with a Where Clause

A popular use of this historical is to add a slider to a Power BI report. The slider generates a where clause to an SQL statement on a table or view. However, compared to “Contents at Moment in Time” this requires that a single date can be filtered upon. Using a Carthesian product of a calendar view and the historical data, you can easily generate such functionality using a view or materialized view with a query as shown below. Please remember to replace the Calendar@datadictionary table by the calendar view of your platform or environment.

In the sample below you can see that there are no differences when moving from March 10 to March 11, 2019. However, going to March 12, 2019, an additional records is added for the invoice with number 3.

Use on a Database

The code presented can be used without essential changes by replacing hist@inmemorystorage by hist@sqlserver or another alias. However, remember that old@inmemorystorage table is not available across sessions. You can either use Ossus or your database to memorize the current situation to load old@inmemorystorage on a later run for.

For example, after the PSQL block, you could add:

create or replace table old@PERSISTENTSTORAGEasselect /*+ ods(false) */        * from   old@inmemorystorage

And then replace the create table of old@inmemorystorage by:

create or replace table old@inmemorystorageasselect *from old@PERSISTENTSTORAGE

And replace the create table of new@inmemorystorage by:

create or replace table new@inmemorystorageasselect * from   exactonlinerest..salesinvoices

On the last query, you can choose whether to take an up-to-date replica using the hint:

/*+ ods(true, interval '1 seconds') */

which possibly might use incremental loading such as with webhooks, or to use a normal download of the data using the hint:

/*+ ods(false) */

Remove Update

Some database platforms are less well-suited for handling updates. For instance PostgreSQL and Teradata have been designed for insert and delete performance.

You can remove the updates from the algorithm by removing the last active signal and end date at the expense of slightly less query performance.