research

Entity reconciliation in big data sources: A systematic mapping study

Abstract

The entity reconciliation (ER) problem aroused much interest as a research topic in today’s Big Dataera, full of big and open heterogeneous data sources. This problem poses when relevant information ona topic needs to be obtained using methods based on: (i) identifying records that represent the samereal world entity, and (ii) identifying those records that are similar but do not correspond to the samereal-world entity. ER is an operational intelligence process, whereby organizations can unify differentand heterogeneous data sources in order to relate possible matches of non-obvious entities. Besides, thecomplexity that the heterogeneity of data sources involves, the large number of records and differencesamong languages, for instance, must be added. This paper describes a Systematic Mapping Study (SMS) ofjournal articles, conferences and workshops published from 2010 to 2017 to solve the problem describedbefore, first trying to understand the state-of-the-art, and then identifying any gaps in current research.Eleven digital libraries were analyzed following a systematic, semiautomatic and rigorous process thathas resulted in 61 primary studies. They represent a great variety of intelligent proposals that aim tosolve ER. The conclusion obtained is that most of the research is based on the operational phase asopposed to the design phase, and most studies have been tested on real-world data sources, where a lotof them are heterogeneous, but just a few apply to industry. There is a clear trend in research techniquesbased on clustering/blocking and graphs, although the level of automation of the proposals is hardly evermentioned in the research work.Ministerio de Economía y Competitividad TIN2013-46928-C3-3-RMinisterio de Economía y Competitividad TIN2016-76956-C3-2-RMinisterio de Economía y Competitividad TIN2015-71938-RED

    Similar works