Search CORE

22 research outputs found

Entity Identification Problem in Big and Open Data

Author: Domínguez Mayo Francisco José
Escalona Cuaresma María José
González Enríquez José
Goto Masatomo
Lee Vivian
Publication venue: ScitePress Digital Library
Publication date: 01/01/2015
Field of study

Big and Open Data provide great opportunities to businesses to enhance their competitive advantages if utilized properly. However, during past few years’ research in Big and Open Data process, we have encountered big challenge in entity identification reconciliation, when trying to establish accurate relationships between entities from different data sources. In this paper, we present our innovative Intelligent Reconciliation Platform and Virtual Graphs solution that addresses this issue. With this solution, we are able to efficiently extract Big and Open Data from heterogeneous source, and integrate them into a common analysable format. Further enhanced with the Virtual Graphs technology, entity identification reconciliation is processed dynamically to produce more accurate result at system runtime. Moreover, we believe that our technology can be applied to a wide diversity of entity identification problems in several domains, e.g., e- Health, cultural heritage, and company identities in financial world.Ministerio de Ciencia e Innovación TIN2013-46928-C3-3-

idUS. Depósito de Investigación Universidad de Sevilla

TAPER: query-aware, partition-enhancement for large, heterogenous, graphs

Author: Firth Hugo
Missier Paolo
Publication venue
Publication date: 23/06/2016
Field of study

Graph partitioning has long been seen as a viable approach to address Graph DBMS scalability. A partitioning, however, may introduce extra query processing latency unless it is sensitive to a specific query workload, and optimised to minimise inter-partition traversals for that workload. Additionally, it should also be possible to incrementally adjust the partitioning in reaction to changes in the graph topology, the query workload, or both. Because of their complexity, current partitioning algorithms fall short of one or both of these requirements, as they are designed for offline use and as one-off operations. The TAPER system aims to address both requirements, whilst leveraging existing partitioning algorithms. TAPER takes any given initial partitioning as a starting point, and iteratively adjusts it by swapping chosen vertices across partitions, heuristically reducing the probability of inter-partition traversals for a given pattern matching queries workload. Iterations are inexpensive thanks to time and space optimisations in the underlying support data structures. We evaluate TAPER on two different large test graphs and over realistic query workloads. Our results indicate that, given a hash-based partitioning, TAPER reduces the number of inter-partition traversals by around 80%; given an unweighted METIS partitioning, by around 30%. These reductions are achieved within 8 iterations and with the additional advantage of being workload-aware and usable online.Comment: 12 pages, 11 figures, unpublishe

arXiv.org e-Print Archive

Crossref

University of Birmingham Research Portal

Newcastle University E-Prints

Entity Identity Reconciliation based Big Data Federation-A MDE approach

Author: Domínguez-Mayo Francisco Jose
Enríquez Jose Gonzalez
Escalona María José
García García Julián Alberto
Goto Masatomo
Lee Vivian
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2015
Field of study

“Information is power” is a sentence attributed to Francis Bacon that acquired a high important in the current era of the information. However, too much information can be a negative aspect. The term of “Infoxication” refers to the difficulty a person can have understanding an issue and making decisions that can be caused by the presence of too much information. With the increasing of relevance of open data and big database, the application of mechanisms and solutions to manage information is critical. This paper introduces the problem of unique identification and data reconciliation and offers a discussion about how to solve this problem in big and open data environment. The problem of data reconciliation in multiple databases and the unique identification of entities is not a new problem, but, how effective are classical mechanisms in the new internet environment? In this paper a solution based on model-driven engineering and virtual graph is presented in order to improve the processing of information in big open repositories. The paper illustrates the idea with a real example for the right exploitation of heritage information in the south of Spain

AIS Electronic Library (AISeL)

idUS. Depósito de Investigación Universidad de Sevilla

Entity Identity Reconciliation based Big Data Federation A MDE approach

Author: Domínguez Mayo Francisco José
Escalona Cuaresma María José
García García Julián Alberto
González Enríquez José
Goto Masatomo
Lee Vivian
Publication venue: Association for Information Systems (AIS)
Publication date: 01/01/2015
Field of study

idUS. Depósito de Investigación Universidad de Sevilla

Investigation of Database Models for Evolving Graphs

Author: Gounaris Anastasios
Kosmatopoulos Andreas
Spitalas Alexandros
Tsichlas Kostas
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th International Symposium on Temporal Representation and Reasoning (TIME 2021)
Publication date: 01/01/2021
Field of study

We deal with the efficient implementation of storage models for time-varying graphs. To this end, we present an improved approach for the HiNode vertex-centric model based on MongoDB. This approach, apart from its inherent space optimality, exhibits significant improvements in global query execution times, which is the most challenging query type for entity-centric approaches. Not only significant speedups are achieved but more expensive queries can be executed as well, when compared to an implementation based on Cassandra due to the capability to exploit indices to a larger extent and benefit from in-database query processing

Dagstuhl Research Online Publication Server