Page compression for Data Replication on SQL Server

This note presents the advantages and disadvantages of using page compression on SQL Server with Invantive Data Replicator.

Page compression on Data Replicator databases has been seen to reduce actual required storage volume by a factor 5 and often up to a factor 10. The performance of queries typically improves or stays the same.

High-Volume Replication

Invantive Data Replicator is a solution which replicates other data from cloud platforms, database platforms and files into one consolidated operational data store for reporting and analysis purposes. Originally crafted as a solution to allow access to data when the original data container is offline or no longer available, it has grown into a highly scalable solution for replicating large volumes of data.

It is not uncommon for sites to replicate thousands of data sources into one consolidated store, typically hosted on-premise or in the cloud using SQL Server or PostgreSQL. Recently, support for Oracle RDBMS Standard Edition 11 and newer has been added. For Oracle RDBMS, we recommend the use of the native drivers, since the managed drivers provided by Oracle lack support for bulk loading.

Data Replicator automatically manages the metadata and database structure, while maintaining 24x7 availability to the released version of the replicate. In such an environment, it is not uncommon to have several hundreds of thousands of versions across tables (such as “Customers” or “Accounts”) and partitions (such as Exact Online companies or Salesforce orgs).

The data volume in high-volume scenarios easily exceeds 1 TB of storage in a single database. Depending on available resources, loading a new version of the replicated data can takes hours or longer, although the actual switch is done as an atomic action.

Compression

Especially with cloud-based solutions, the amount of memory and I/O resources is limited and increases are costly. However, SQL Server out-of-the-box provides various proven compression techniques.

SQL Server can use so-called “page compression” to remove redundancy on the data in a page (a block of storage in the database). For instance, suppose each row has a Division Label (like on Exact Online) which often has the same value. With page compression, the division labels are stored only once in the page and references are made to the division label. Even better, when there are texts which start with the same text like ‘John Doe’ and ‘John Acme’, the string 'John ’ is shared.

Page compression has been seen to reduce actual required storage volume by a factor 5 and often up to a factor 10.

In real-life examples with databases in 1 TB range, the page compression has been seen to reduce actual required storage volume by a factor 5 and often up to a factor 10, depending on the actual data. Please refer to the algorithm to more accurately predict your gains and improve your data distribution. The bulk loading approach by Invantive Data Replicator fits the algorithm neatly since it allows pages to fill up quickly and the data is not updated between loading and deletion.

Of course, the actual compression and decompression of data consumes CPU power. On the contrary, the amount of disk I/O is reduced, and the efficiency of the caches with disk-based data is increased. In general, the performance of queries improves or stays the same, despite extra CPU resources needed.

SQL Script to enable SQL Server page compression

To activate page compression, please consult your local SQL Server administrator. The following SQL script for use directly on SQL Server helps to provide insight in the gains by enabling page compression on the currently replicated tables:

select 'alter table '
      + t.table_name
      + ' rebuild partition = all with (data_compression = page /*, online=on : Optional for Enterprise Edition. */)'
from   ( select tbe.facts_table_name table_name
        from   dc_tables tbe
        union all
        select tbe.history_table_name table_name
        from   dc_tables tbe
      ) t
--
-- No partition compressed yet.
--      
where  t.table_name
       not in
       ( select st.name
         from   sys.partitions sp
         join   sys.tables st
         on     st.object_id = sp.object_id
         where  sp.data_compression_desc = 'page'
       )