Towards information profiling: data lake content metadata management

Abelló Gamazo, Alberto; Al-serafi, Ayman Mounir Mohamed; Calders, Toon; Romero Moral, Óscar

research

Towards information profiling: data lake content metadata management

Authors: Alberto Abelló Gamazo
Ayman Mounir Mohamed Al-serafi
Toon Calders
Óscar Romero Moral
Publication date: 1 January 2016
Publisher: 'Institute of Electrical and Electronics Engineers (IEEE)'
Doi

Abstract

There is currently a burst of Big Data (BD) processed and stored in huge raw data repositories, commonly called Data Lakes (DL). These BD require new techniques of data integration and schema alignment in order to make the data usable by its consumers and to discover the relationships linking their content. This can be provided by metadata services which discover and describe their content. However, there is currently a lack of a systematic approach for such kind of metadata discovery and management. Thus, we propose a framework for the profiling of informational content stored in the DL, which we call information profiling. The profiles are stored as metadata to support data analysis. We formally define a metadata management process which identifies the key activities required to effectively handle this.We demonstrate the alternative techniques and performance of our process using a prototype implementation handling a real-life case-study from the OpenML DL, which showcases the value and feasibility of our approach.Peer ReviewedPostprint (author's final draft

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

UPCommons (Universitat Politècnica de Catalunya)

oai:upcommons.upc.edu:2117/100...

Last time updated on 28/02/2025

UPCommons

oai:upcommons.upc.edu:2117/100...

Last time updated on 17/04/2020

UPCommons. Portal del coneixement obert de la UPC

oai:upcommons.upc.edu:2117/100...

Last time updated on 01/05/2017