Long-term return of your documents thanks to process referential metadata and time travel

Metadata, what is that?

All Document Management Systems (DMS) save documents as unstructured data. That is sufficient to reproduce the document. But when you have thousants or millions of documents, then it can be tricky to find something back. Even if you could search like Google does, then it is still tricky. That is why in most DMS solutions, like Microsoft SharePoint or FileNet, extra data can be linked to the document as so-called ‘metadata’. This metadata are like a kind of ‘stickers’ that are pasted onto the document. An example of such a sticker is ‘Name Customer’.

This makes it possible to quickly search through all documents on the value of ‘Name Customer’. The contents of the stickers can also be used to make a document visible in multiple folders. The document appears with the customer, but also with the representative if his name is on the sticker ‘Representative’. This makes the tight limitations of the traditional file cabinet a thing of the past. An (original) document can only be located in óne folder at a time.

Disadvantages in the modern world

The use of metadata like this helps you to retrieve documents quickly and efficiently. However, for long-term use this manner of working has a number of disadvantages. Because what is happening?

The world around us is changing faster than it did fifty years ago. What might be a well-known brand of cookies from an independent manufacturer, might be part of a global concern tomorrow and spliced into one part for biscuits and another for cake the day after that. Yet the validity of legal documents only seems to grow: while the consumer used to be happy with a warranty of just one year, nowadays he can expect a product to have the life expectancy that can actually be expected of the product. With or without a written guarantee.

This is even noticeable in the implementations of projects. An urban redevelopment has a horizon of thirty years or sometimes longer. Yet during the ride the plans have been changes ten times and the involved parties produce the project are no longer the same as those that stood at the start of the redevelopment years ago.

However, a document remains fully or partly relevant during the whole time. For example, because it is a design on which a calculation is based or because it is a legal agreement. But how do you retrieve a document quickly if in the meantime the name of both parties has already been changed four times? Let alone the answer to the question: what was the contents of the contracts that were added by both parties because of a fusion three years ago and of which both parties knew that they existed at that time?

For questions like this simple metadata does not suffice. Metadata like ‘Name Customer’ does not usually have an ‘overview moment’ like: what was the situation three years ago? And metadata does not change along with the world when it changes around it. In a library of which the literature dates back hundred years we notice this as well, archivists have special systems to still be able to locate items. That is great, but it is expensive and relatively slow for digital retrieval.

What is going wrong?

The first steps toward a solution are placed with the use of ‘managed metadata’ for example, but for a structured solution we need to return towards relational theory. The first problem is that important data is saved in multiple ways. If the name of the same customer is on every document and the name changes because of a take-over or restructuring, then it is necessary to check all stickers. A portion of the documents will be transferred to the new organization, but another portion of the documents might remain behind in a project for example. A laborious yet correct work approach is to reclassify all documents and provide the proper stickers on them. It is insufficient to quickly give all names a technical update. Because yes, it is possible that a document might be connected to a project and what if that project is left behind in a dead end? Yes, correct updating is a tedious job that demands a lot of time and knowledge.

The second problem arises now: we are talking about the name of the customer, but the name of the customer is a derived piece of information. It might be that it is a letter about a project and is therefore part of the project file, but it could just as easily be an address change from the head office. Or an offer for the repitition delivery of pens. If you want to judge properly if and when the name of the customer on the document needs to be changed, then you need to retrieve its origin. What company process are we talking about: a project, a sale or a business relocation?

The third problem is that the situation might change as time passes: the customer might be called one way now, but back then he was called this instead. You need to stick to what the situation was in a given moment in time, if you want to justify why certain choices were made.

Solution: process referential metadata and time travel

The relational theory helps us with the solution. You could (also still) put a copy of the data on a document, but the important part is to secure a longlasting relationship with the underlying business process and the company object. For example with a sale process and a sale opportunity. Or an area development and a project. You can then always automatically deduce the historical and current situation. Within Invantive we call that ‘process referential metadata’.

An offer that comes with a sale opportunity you would link to the sale opportunity with ‘process referential metadata’. The document (offer) will then automatically inherit the name of the customer and the size of the sale opportunity in its metadata. Searching for all offers that are currently more than ten thousand euro will always yield the right result, including the mentioned offer. Even if the sale opportunity was only raised to a larger amount later. Searching for all offers that have ever been worth less than ten thousand euro will also yield the right result: including the mentioned offer.

Data Vault

The example indicated that it is essential to be able to travel in time. Because are we talking about a size that is currently more than ten thousand euro or one that has ever been more than ten thousand euro? The relation between the document, the business process and the business object needs to be permanent. The Data Vault principle by Dan Linstedt enables this. To put it in a nutshell, Data Vault ensures that every business object (and therefore also business process) will receive a unique attribute that will not change during its life span. The car with the license plate 1-PNP-23 might get a different license plate, but the unique attribute will not change. This unique attribute is recorded in a so-called ‘hub’. According to the theory, all historical data will then arrive into so-called ‘satellites’. Sometimes a so-called ‘fat hub’ will also be used, in which satellites and hub are combined.

If you record the unique attribute from a hub in a document on the metadata, then the prolonged yield can finally be retrieved out of documents: searching and finding relevant documents of twenty years ago. Even if the world around you has changed.

Practical experience with Invantive Estate

Invantive Estate is a combination of real-time data warehouses specifically aimed on process driven and mostly project driven organizations. The data warehouse is based on Data Vault.

However, Invantive Estate is a hybrid: next to a data warehouse it is also a system in which users can mutate the data. So not all changes come from a system in which data is exchanged, but data is directly recorded into the data warehouse through web screens and for example Microsoft Outlook.

In Invantive Estate the concept of ‘process referential metadata’ has been applied for five years now. Practical experience has shown that the automatic deduction of document data works very intuitively. Users do not even notice that multiple metadata characteristics are being derived. They do experience the ease of being able to locate documents quickly, whether they were from the distant past or just recent additions. Only if users are experiences with other Document Management Systems, they start to wonder how they were ever working comfortably with those.

On the basis of theory and practical experience it seems that the combination of ‘process referential metadata’ and time travel finally live up to the promise of a DMS to retrieve data over a prolonged time!