Faster OData downloads with smaller size with small adjustment in Power BI

Go to Dutch version.

Invantive Cloud offers access through OData to dozens of (cloud) platforms. We use OData version 4.01 with extensions from the next 4.02 version.

By default, Power BI requests the value of all fields. However, often a value is missing. The column name plus a value “null” is then returned.

With a small modification to the OData.Feed statement in the Power BI Advanced Editor, the download can be accelerated and also significantly reduced in volume by removing the overhead of missing values and their column names.

Customizing OData.Feed in Power BI Advanced Editor.

An OData.Feed statement looks like this in the Advanced Editor:

let
    Source = OData.Feed("URL", null, [Implementation="2.0"]),

By adding , ODataVersion=4, OmitValues=ODataOmitValues.Nulls and immediately also setting a higher timeout as described in Avoid timeout error on Power BI OData download and applying Link downloads to Power BI data sets you then get:

let
    Source = OData.Feed("URL", null, [Implementation="2.0", ODataVersion=4, OmitValues=ODataOmitValues.Nulls, Headers=[Referer = "change-to-your-unique-id-for-the-source" ], Timeout=#duration(0,4,0,0)]),

Faster and Smaller OData Downloads Outside of Power BI

It is also possible to make use of the faster and smaller downloads outside of Power BI by omitting missing values and the corresponding column names.

For this purpose, the following HTTP headers should be set:

  • OData-MaxVersion to the value 4.0.
  • Preferto the valueomit-values=nulls`.

Invantive Bridge Online will then set the value omit-values=nulls in the HTTP header Preference-Applied in the data set to confirm the request.

Measurements

A number of measurements were performed to determine the impact of omitting the missing values.

Table Size before (KB) Compressed size before (KB) Size after (KB) Compressed size after (KB )
Dummy.Demo.Persons 829 101 428 71
REST: TransactionLinesIncremental 6,738 710 4,531 652
REST: TransactionLinesBulk 15,104 1,857 9,971 1618
XML: GLTransactionLines 371,791 33,537 6,758 388
Twinfield 1200 80 75 6

The first table is a simple table with personal data and limited denormalization.

The other three tables each contain the same approximately 6,000 general ledger lines from the Exact Online accounting package.

The measurements show that tables based on a well-normalized data model have relatively little gain in download size (15-30% lower for Persons and TransactionLinesIncremental).

Similarly, the table TransactionLinesBulk with relatively large amounts of denormalized data will be only 15% smaller. This is consistent with the expectation that the compression algorithm will merge the duplicate data by denormalization to a certain level.

However, for the table GLTransactionLines the gain is enormous. This is a “sparse” table with many fields without values. Due to lacking definitions in the Exact Online metadata definitions (XSD), it is not known which fields are actually present with a value and very many fields will be returned while only a relatively small number are filled. The compression does ensure that the duplicate data is merged as much as possible, but a generic compression algorithm runs into its limits here.

The download by omitting redundant empty field values leads to an almost 100x smaller download for the XML format. It is quite measurable that after this optimization, the table TransactionLinesBulk suffers relatively from the denormalized nature. It is and remains over 2x as large as the incremental download table and about 4x as large as the (somewhat more limited in terms of meaning elements) XML version of the general ledger transactions.

In general, all tables would still benefit from support for the Brotli compression protocol by Power BI and honoring of ETag and Cache-Control HTTP response headers by Power BI. This could reduce the amount of network traffic by about 10% for production use and a factor of 10 for development work by data specialists.

Availability

The new optimization capabilities can be added now without risk.

The release supporting the new optimization capabilities is expected to be in production by January 21, 2022.