Limit columns or rows replicated to SQL Server

Invantive Data Replicator enables replication of high volumes of relational data of cloud applications and traditional databases for terabytes and thousands of companies. The data can come from dozens of cloud platforms such as AFAS Profit, Twinfield, ActiveCampaign, Chargebee or Exact Online.

The replication process includes all selected partitions (such as companies), all columns and all rows from the replicated table. The replication process maintains a replica in a traditional database.

A majority of the Invantive Data Replicator runs Microsoft SQL Server on Azure thanks to the scalability, easy set up and a market with skilled resources. Therefore, we will use SQL Server to illustrate the concepts.

The number of selected partitions (such as companies) is in general never an issue, since a column is typically used to identify a partition and even a 32-bit integer can already be used to register data on billions of companies.

The number of columns and rows can however be an issue and may need to be limited. This topic provides information how to solve such issues.

Limit Number of Columns below 1024

SQL Server limits the number of columns in a table (as do other supported platforms such as MySQL, Oracle and PostgreSQL). The current maximum number of columns on SQL Server is 1.024. Please note that Invantive SQL and Data Replicator both do not support column sets with sparse columns.

The SQL Server limits are below the limit on the 4 billion columns supported by Invantive SQL. It is not uncommon, especially for metadata-based data containers, to include more than 1.024 columns in a table. Many of these columns are typically empty, but it can depend on the environment what columns receive a value.

Invantive Data Replicator mimics the data model of the source table by automatically managing the metadata. Data Replicator can try to execute a create table with over 1,024 columns which will fail on SQL Server.

An ongoing optimization effort is made to exclude columns which never will receive a value from the metadata. So the first step on exceeding the maximum number of columns is to make sure the last production release of Invantive Data Replicator is in active use.

When that doesn’t solve the issue, the replication process can be altered. Instead of directly replicating data from a table, an interim (temporary) table can be used to store the data and then replicated. This effectively disable all trickle loading features. The popular Exact Online sync APIs as available through *Incremental tables are still available. The following SQL code illustrates the concept:

use all@datacontainer

create or replace table temp@inmemorystorage
as
select /*+ ods(false) */
       ...at most 1024 columns...
from   sourcetable

select /*+ ods(true, interval '0 seconds') */
       count(*)
from   temp@inmemorystorage

The create or replace table statement will create an in-memory table containing solely the selected columns. Although superfluous since in-memory storage is by default never replicated, the ods(false) hint makes it explicit that the data should not be replicated. This may be helpful when there is a need to switch to a storage platform different from in-memory due to sheer large size. A server with 32 GB of memory can not easily handle a 256 GB in-memory table. In these cases, a data can be stored directly on your target platform, combining the second and third statement using a SQL statement with an inline view like:

select /*+ ods(true, interval '0 seconds') */
       *
from   ( select /*+ ods(false) */
                ...at most 1024 columns..
         from   sourcetable
       )

Limit Number of Rows

In some scenarios, large amounts of data managed by Invantive Data Replicator are no longer relevant for use downstream. For instance, bank transactions of over 25 years ago are often irrelevant. In such a scenario the historical data may be archived once using Data Replicator and the most current data might be made available through another Data Replicator instance.

Similar to the limitation of columns describe above, the number of rows can be vertically filtered using a where-clause. Both techniques for limiting number of columns and rows can be combined:

use all@datacontainer

create or replace table temp@inmemorystorage
as
select /*+ ods(false) */
       *
from   sourcetable
where  FILTER

select /*+ ods(true, interval '0 seconds') */
       count(*)
from   temp@inmemorystorage