Faster data processing from Invantive Cloud on selected OData clients

Invantive Cloud provides data for dozens of popular cloud platforms in the popular OData feed format. This allows programs such as Power BI to easily and quickly read thousands of tables and thousands of companies with a near real-time nature.

Invantive Cloud acts here as so-called “OData producer” while for example Power BI or https://access-odata.com acts as an “OData client”.

In this topic you can read how data processing is accelerated and how the following new features further increase the speed:

  • HTTP 304 support for repeated downloads reduce amount of data over the Internet to almost 0.
  • Information for local cache control allows for more possibilities to not exchange any data at all.
  • Parallel processing metadata allows for more parallelism in downloading.

Repeated Downloads when Developing Reports

Developing reports often involves retrieving the same data over and over again in a short period of time. This is especially noticeable with Power BI; the smallest change can already lead to a (usually unnecessary) data retrieval.

It is therefore wise when building Power BI reports to work with a small data set, for example only 1 company or data from a certain timespan. This improves developer experience.

Repeating Downloads for Non-Shared Datasets.

Repeated retrieval of the same data in a short period of time also often occurs when reports are automatically refreshed but not all data is shared.

Speed gains with Response Cache.

Invantive Cloud already offered the possibility through Invantive Bridge Online to answer requests that come in repeatedly within a short configurable time frame (for example a few hours) with the same answer each time. See also Differentiate OData4 for Power BI Cache Behavior.

This increases performance noticeably, especially when building reports, and lowers the load on the (cloud) platform providing the data. For example, an OData query without cache takes 4 seconds for AccountsIncremental on Exact Online with over 4,000 relationships, while the same OData query takes about 6 milliseconds from the response cache.

Note that even responses of hundreds of megabytes are answered from the response cache within a second, but in practice the load into Power BI takes a long time because often the local Internet connection are not gigabit links and also Power BI needs time to extract and process the data.

In practice, about 75% of queries are answered from this OData Response Cache. In the case of intensive report building, this is often even higher, up to 99%.

Update Response Cache May 2022

Recently it is also possible to manually empty the cache, so that the user can also determine that really fresh data is needed. To do so, use the “Reset cache” option in the top right menu of Invantive Bridge Online:

image

New Performance Enhancements

As the platform grows, the data volume processed increases hand-over-hand and improvements were desirable.

The HTTP protocol offers many opportunities to reduce network load. Two enhancements take advantage of this.

More Parallelism

The first improvement involves further increasing parallelism. Depending on the subscription type, 1 to 16 parallel downloads are possible. Extensive reports will sometimes require dozens of datasets to be refreshed. Each dataset normally consists of three downloads:

  • Retrieve /odata4.
  • Retrieve /odata4/$metadata.
  • Retrieve data for all partitions in 1x.

All three downloads must pass through the same pool of slots with limited capacity. The first two will almost always be able to be answered from response cache in a few milliseconds. But if dozens of data sets are downloaded, eventually every available slot is occupied with relatively slow data retrieval.

Starting Nov. 4, 2021, the first two requests (/odata4 and /odata4/$metadata) will be answered separately without further delay. As a result, the OData Consumer will receive the metadata sooner and each processing stream on the consumer can already proceed with preparations to retrieve the data.

Especially with large numbers of datasets, the OData Producer and the OData Consumer can better divide the tasks between them and thus achieve shorter processing times. The savings are not huge, but measurable on the order of a few seconds for a complex refresh.

Limiting Network Traffic

Some OData Consumers often perform the same request from Invantive Bridge Online. In doing so, they may indicate that they already know of a recent answer to the same question. This recent answer was given a unique attribute in the ETag HTTP header from the Producer.

The OData Producer can notify the Consumer that there is no changed data via an HTTP 304 response if the OData Consumer also sends this attribute via the If-Modified-Since HTTP header.

Currently, only browsers provide such an ETag.

In Invantive Bridge Online Monitoring the use of 304 is made visible; in this case by retrieving an OData4 URL via a browser. In the bottom line, the data (in this case from ActiveCampaign) is freshly retrieved. That took almost 3 seconds. In the middle line, the browser performed the request again and received an HTTP 304 in response. This took 3 ms and resulted in an empty payload. The top line also came from cache, but the payload was completely resent. Unfortunately, due to system load, this request also took a relatively long time and is not entirely representative of regular behavior.

![Example 304 with OData|690x241]](upload://4KG2v5sCKXnBoB9146GqtIcgX76.png)

Does the 304 HTTP state with ETag and If-Modified-Since currently offer any benefits with Microsoft Power BI Desktop?.

No, currently Microsoft Power BI Desktop does not take advantage of this.

We hope that at some point Microsoft can add to Power BI Desktop this functionality as well. This could improve processing speed and download time when developing complex reports an order of magnitude.

Redundant Making Network Traffic

In addition to the addition of the ETag HTTP header and support for the HTTP 304 state, Invantive Bridge Online now provides more information about the timespan the OData Consumer may reuse the response locally without asking the OData Producer at all if the data has changed. So a round trip over the Internet can then be dropped completely.

The HTTP header “Cache-Control” tells the Consumer that the data may be reused locally during the remaining time interval that Invantive Bridge Online also uses.

Such a header looks like this:

Cache-Control HTTP header for OData4

This Cache-Control shows that this is a “private” cache. This tells the Producer that the data should not be temporarily stored and re-served in an intermediate layer. This is desirable from a security point of view.

Also shown is a “max-age” of 53930. This is the remaining time in seconds until Invantive Bridge Online itself would refresh the data. This time is calculated by taking the OData Response Cache lifetime as set on the database definition in Invantive Cloud and subtracting from it the elapsed time since the dataset was last retrieved from the source.

Currently, only browsers use Cache-Control.

Does Cache-Control currently benefit Microsoft Power BI Desktop?

No, currently Microsoft Power BI Desktop does not make use of this.

We hope that at some point Microsoft can add to Power BI Desktop this functionality as well. This could improve processing speed and download time when developing complex reports by an order of magnitude.

Help Wanted on Speed Gains on Power BI Desktop

We’d like to ask for your help to see these performance improvements reflected in Power BI Desktop as well.

If you would like to help with this:

An overview of all open Power BI ideas for Invantive Cloud can be found at Power BI suggestions for improvement.