Invantive Cloud offers access through OData to dozens of (cloud) platforms. We use OData version 4.01 with extensions from the next 4.02 version.
By default, Power BI requests the value of all fields. However, often a value is missing. The column name plus a value “null” is then returned.
With a small modification to the OData.Feed
statement in the Power BI Advanced Editor, the download can be accelerated and also significantly reduced in volume by removing the overhead of missing values and their column names.
Customizing OData.Feed
in Power BI Advanced Editor.
An OData.Feed
statement looks like this in the Advanced Editor:
let
Source = OData.Feed("URL", null, [Implementation="2.0"]),
By adding , ODataVersion=4, OmitValues=ODataOmitValues.Nulls
and immediately also setting a higher timeout as described in Avoid timeout error on Power BI OData download and applying Link downloads to Power BI data sets you then get:
let
Source = OData.Feed("URL", null, [Implementation="2.0", ODataVersion=4, OmitValues=ODataOmitValues.Nulls, Headers=[Referer = "change-to-your-unique-id-for-the-source" ], Timeout=#duration(0,4,0,0)]),
Faster and Smaller OData Downloads Outside of Power BI
It is also possible to make use of the faster and smaller downloads outside of Power BI by omitting missing values and the corresponding column names.
For this purpose, the following HTTP headers should be set:
-
OData-MaxVersion
to the value4.0
. - Prefer
to the value
omit-values=nulls`.
Invantive Bridge Online will then set the value omit-values=nulls
in the HTTP header Preference-Applied
in the data set to confirm the request.
Measurements
A number of measurements were performed to determine the impact of omitting the missing values.
Table | Size before (KB) | Compressed size before (KB) | Size after (KB) | Compressed size after (KB ) |
---|---|---|---|---|
Dummy.Demo.Persons | 829 | 101 | 428 | 71 |
REST: TransactionLinesIncremental | 6,738 | 710 | 4,531 | 652 |
REST: TransactionLinesBulk | 15,104 | 1,857 | 9,971 | 1618 |
XML: GLTransactionLines | 371,791 | 33,537 | 6,758 | 388 |
Twinfield | 1200 | 80 | 75 | 6 |
The first table is a simple table with personal data and limited denormalization.
The other three tables each contain the same approximately 6,000 general ledger lines from the Exact Online accounting package.
The measurements show that tables based on a well-normalized data model have relatively little gain in download size (15-30% lower for Persons
and TransactionLinesIncremental
).
Similarly, the table TransactionLinesBulk
with relatively large amounts of denormalized data will be only 15% smaller. This is consistent with the expectation that the compression algorithm will merge the duplicate data by denormalization to a certain level.
However, for the table GLTransactionLines
the gain is enormous. This is a “sparse” table with many fields without values. Due to lacking definitions in the Exact Online metadata definitions (XSD), it is not known which fields are actually present with a value and very many fields will be returned while only a relatively small number are filled. The compression does ensure that the duplicate data is merged as much as possible, but a generic compression algorithm runs into its limits here.
The download by omitting redundant empty field values leads to an almost 100x smaller download for the XML format. It is quite measurable that after this optimization, the table TransactionLinesBulk
suffers relatively from the denormalized nature. It is and remains over 2x as large as the incremental download table and about 4x as large as the (somewhat more limited in terms of meaning elements) XML version of the general ledger transactions.
In general, all tables would still benefit from support for the Brotli compression protocol by Power BI and honoring of ETag and Cache-Control HTTP response headers by Power BI. This could reduce the amount of network traffic by about 10% for production use and a factor of 10 for development work by data specialists.
Availability
The new optimization capabilities can be added now without risk.
The release supporting the new optimization capabilities is expected to be in production by January 21, 2022.