Invantive Cloud offers access through OData to dozens of (cloud) platforms. We use OData version 4.01 with extensions from the next 4.02 version.
By default, Power BI requests the value of all fields. However, often a value is missing. The column name plus a value “null” is then returned.
With a small modification to the
OData.Feed statement in the Power BI Advanced Editor, the download can be accelerated and also significantly reduced in volume by removing the overhead of missing values and their column names.
OData.Feed statement looks like this in the Advanced Editor:
let Source = OData.Feed("URL", null, [Implementation="2.0"]),
, ODataVersion=4, OmitValues=ODataOmitValues.Nulls and immediately also setting a higher timeout as described in Avoid timeout error on Power BI OData download and applying Link downloads to Power BI data sets you then get:
let Source = OData.Feed("URL", null, [Implementation="2.0", ODataVersion=4, OmitValues=ODataOmitValues.Nulls, Headers=[Referer = "change-to-your-unique-id-for-the-source" ], Timeout=#duration(0,4,0,0)]),
It is also possible to make use of the faster and smaller downloads outside of Power BI by omitting missing values and the corresponding column names.
For this purpose, the following HTTP headers should be set:
OData-MaxVersionto the value
to the valueomit-values=nulls`.
Invantive Bridge Online will then set the value
omit-values=nulls in the HTTP header
Preference-Applied in the data set to confirm the request.
A number of measurements were performed to determine the impact of omitting the missing values.
|Table||Size before (KB)||Compressed size before (KB)||Size after (KB)||Compressed size after (KB )|
The first table is a simple table with personal data and limited denormalization.
The other three tables each contain the same approximately 6,000 general ledger lines from the Exact Online accounting package.
The measurements show that tables based on a well-normalized data model have relatively little gain in download size (15-30% lower for
Similarly, the table
TransactionLinesBulk with relatively large amounts of denormalized data will be only 15% smaller. This is consistent with the expectation that the compression algorithm will merge the duplicate data by denormalization to a certain level.
However, for the table
GLTransactionLines the gain is enormous. This is a “sparse” table with many fields without values. Due to lacking definitions in the Exact Online metadata definitions (XSD), it is not known which fields are actually present with a value and very many fields will be returned while only a relatively small number are filled. The compression does ensure that the duplicate data is merged as much as possible, but a generic compression algorithm runs into its limits here.
The download by omitting redundant empty field values leads to an almost 100x smaller download for the XML format. It is quite measurable that after this optimization, the table
TransactionLinesBulk suffers relatively from the denormalized nature. It is and remains over 2x as large as the incremental download table and about 4x as large as the (somewhat more limited in terms of meaning elements) XML version of the general ledger transactions.
In general, all tables would still benefit from support for the Brotli compression protocol by Power BI and honoring of ETag and Cache-Control HTTP response headers by Power BI. This could reduce the amount of network traffic by about 10% for production use and a factor of 10 for development work by data specialists.
The new optimization capabilities can be added now without risk.
The release supporting the new optimization capabilities is expected to be in production by January 21, 2022.