Faster data processing from Invantive Cloud on selected OData clients

Invantive Cloud offers the data from dozens of popular cloud platforms in the popular OData feed format. This allows programs such as Power BI to easily and quickly read thousands of tables and thousands of companies in a near real-time fashion.

Invantive Cloud acts as a so-called “OData producer” while for example Power BI or https://access-odata.com acts as an “OData client”.

In this topic you can read how the data processing is accelerated and how the following new features further increase the speed:

  • HTTP 304 support for repetitive downloads reduces the amount of data over the Internet to almost 0.
  • Information for local cache control increases the possibility of not even exchanging data at all.
  • Parallel processing of metadata provides more parallelism in downloading.

Repeating Downloads when Developing Reports

Developing reports often involves retrieving the same data over and over again in a short period of time. This is especially noticeable with Power BI Desktop; even the smallest change can lead to the data being retrieved again (usually unnecessarily).

It is therefore wise to work with a small data set when building Power BI reports, for example only 1 company or data from a certain period. This makes the process run smoother and with less delays.

Repeated Downloads for Non-Shared Datasets

Repeatedly retrieving the same data in a short time also often occurs when reports are automatically refreshed, but not all data is shared.

Speed gain with Response Cache

Invantive Cloud already offered through Invantive Bridge Online the possibility to answer repetitive questions within a short period of time (for example a few hours) with the same answer. See also Differentiate OData4 for Power BI Cache Behavior.

This noticeably improves performance, especially when building reports, and reduces the load on the (cloud) platform that delivers the data. For example, an OData query without cache takes 4 seconds for AccountsIncremental on Exact Online with over 4,000 customers, while the same OData query takes about 6 milliseconds from the response cache.

Note that even responses of hundreds of megabytes are answered from the response cache within a second, but in practice the network capacity makes it take longer because often the local Internet connection is not a gigabit-link. Also Power BI needs time to unpack and process the data.

In practice, about 75% of the queries are answered from this OData Response Cache. In the case of intensive report building this is often even higher, up to 99%.

New Performance Improvements

With the growth of the platform, the processed data volume increases hand-over-hand and improvements were desirable.

The HTTP protocol offers many opportunities to reduce the network load. Two improvements take advantage of this.

More Parallelism

The first improvement concerns further increasing parallelism. Depending on the subscription type, 1 to 16 parallel downloads are possible. Extensive reports will sometimes require dozens of datasets to be refreshed. Each dataset normally consists of three downloads:

  • Retrieve /odata4.
  • Retrieve /odata4/$metadata.
  • Retrieve data for all partitions in 1x.

All three downloads run across the same limited capacity. The first two will almost always be able to be answered in a few milliseconds from response cache. But if dozens of data sets are being downloaded, then eventually every available slot is occupied with the relatively slow data retrieval in step 3.

Effective November 4, 2021, the first two requests (/odata4 and /odata4/$metadata) will be answered separately without further delay. As a result, the OData Consumer receives the metadata earlier and each processing flow on the Consumer can already proceed with preparing the retrieval of the data.

Especially with large numbers of datasets, the OData Producer and the OData Consumer can better divide the tasks between them and thus achieve shorter data processing times. The saving is not enormous, but measurable in the order of a few seconds for a complex refresh.

Limiting Network Traffic

Some OData Consumers often ask the same question to Invantive Bridge Online. They can indicate that they already know a recent answer to the same question. This recent answer received from the Producer a unique attribute in the so-called “ETag” HTTP header.

The OData Producer can report to the Consumer that there is no modified data through an HTTP 304 response when the OData Consumer also sends the known value of the ETag attribute through the “If-Modified-Since” HTTP header.

At this moment only browsers provide such an ETag.

In the Invantive Bridge Online Monitoring the usage becomes visible; in this case by retrieving an OData4 URL through a browser. In the bottom line, the data (in this case from ActiveCampaign) is freshly retrieved. This took almost 3 seconds. In the middle line, the browser asked the question again and got an HTTP 304 as response. This took 3 ms and resulted in an empty payload. The top line also came from response cache, but the payload was completely resent. Unfortunately, due to system load, this request also took a relatively long time and is not entirely representative of regular behavior.

Does the 304 HTTP state with ETag and If-Modified-Since currently offer any benefits with Microsoft Power BI Desktop?

No, at this time Microsoft Power BI Desktop does not take advantage of this.

We hope that at some point Microsoft will add to Power BI Desktop this functionality as well. This could greatly improve processing speed and download time when developing complex reports by an order of magnitude of 10 times.

Making Network Traffic Redundant

Besides the addition of the ETag HTTP header and support for the HTTP 304 status, Invantive Bridge Online now also provides more information on how long the OData Consumer may re-use the answer locally without even asking the OData Producer whether the data has changed. A round-trip via the Internet can thus be completely eliminated.

Through the HTTP header “Cache-Control” the Consumer is informed that the data may be re-used locally during the remaining time interval that Invantive Bridge Online also uses.

Such a header looks like this:

Cache-Control HTTP header for OData4

This Cache-Control shows that it is a “private” cache. This tells the Producer that the data should not be temporarily stored in and served again from an intermediate layer. This is desirable from a security point of view.

In addition, it shows a “max-age” of 53930. This is the remaining time in seconds until Invantive Bridge Online itself would refresh the data from the backing cloud platform. This time is calculated by taking the OData Response Cache lifetime as set on the database definition in Invantive Cloud and subtracting the elapsed time since the dataset was last retrieved from the source.

At this moment solely browsers use Cache-Control.

**Does Cache-Control currently offer any advantages with Microsoft Power BI Desktop?

No, at this time Microsoft Power BI Desktop does not make use of this.

We hope that at some point Microsoft can add to Power BI Desktop this functionality as well. This could greatly improve the processing speed and download time when developing complex reports by an order of magnitude of 10 times.

Help Requested with Speed Gains on Power BI Desktop

We would like to ask for your help to see these performance improvements reflected in Power BI Desktop.

If you want to help with this:

An overview of all open Power BI ideas for Invantive Cloud can be found at Power BI suggestions for improvement.