Metadata Enrichment is Essential to Realize the Value of Open Datasets

By Assaf Katan, Apertio CEO & Co-Founder

According to McKinsey “Data is now a critical corporate asset—and its value is tied to its ultimate use … Value is likely to accrue to the owners of scarce data, [and] to players that aggregate data in unique ways.”

Addressing the need for reliable and diverse Big Data sources

It’s true that there is growing availability of Open Datasets from governmental and other data sources. The 2018 Linked Data Cloud below shows the growing interconnected nature of open data for public use.

However it’s rarely as simple as showing connections. In order to make these datasets usable and actionable, they first need to be discoverable. Discovery starts with the metadata - traditionally the Achilles heel of Open Data. Published on thousands of disparate websites and usually with poor (or even wrong) metadata, Open Datasets are there to be used and their value realized, but locating the relevant datasets is oftentimes difficult or even impossible.

According to experts at Dataversity, even though “tools for enabling such efforts exist, most are typically useable only by deep application technical specialists within the organization or by consultants focused on this area. Business data pros need to be able to have a Metadata Discovery solution to aid them in surfing and slicing data by themselves.”

The Schema.org framework is a step in the right direction as it calls for “upgrade” and standardization of the metadata and the datasets’ discoverability. But even with this in place, data professionals cannot discover datasets according to more detailed parameters - granular locations, company names, professional terms and other information found within the datasets. Furthermore, Schema.org relies on the data publishers’ adherence with the format, while there’s little incentive or motivation to do so.

Choosing third party solutions with metadata enrichment

The way we access Open Data is changing, and it’s being led by a drive for real discovery. Many independent and third-party solutions can provide deeper metadata enrichment, making it easier for data professionals to benefit from a wealth of relevant data. The right access to this data is opening up incredible economic value across dozens of industries, as McKinsey outline below.

The best solutions are providing innovative and exciting ways to access metadata. Some sources such as Archive-IT, a leading provider of cultural heritage data allow smart phrase extraction from the files themselves, not limiting the search capabilities to the descriptions or classifications. Others take a broader look at the context of the dataset, looking at the whole picture to understand contextual differences, such as whether a word like ‘Jordan’ refers to the country, the river, or the NBA icon.

More in-depth solutions might be able to compare and contrast datasets, looking at the relationship between them to help you fill in the gaps. Additionally, in a similar way to how OpenUp Hub advocates for Open Science, peer wisdom and usage-based information on data quality and related datasets could also be beneficial for Open Data.

In development of a solution for Open Data that takes metadata enrichment to the next level, Apertio has built a deep search engine with all these enhancements and more. For the first time, data scientists and analysts have cutting edge tools for the effortless discovery of quality datasets.

Bio: Assaf Katan, Apertio CEO & Co-Founder, is an accomplished executive and CEO with 20 years experience in both startup and corporate environments. Vast experience in leading strategic business initiatives, M&A, growth processes and transformations from planning to execution. Passionate about closing deals and desert hiking.

Related: