22 research outputs found
Entity Identification Problem in Big and Open Data
Big and Open Data provide great opportunities to businesses to enhance their competitive advantages if
utilized properly. However, during past few years’ research in Big and Open Data process, we have
encountered big challenge in entity identification reconciliation, when trying to establish accurate
relationships between entities from different data sources. In this paper, we present our innovative Intelligent
Reconciliation Platform and Virtual Graphs solution that addresses this issue. With this solution, we are able
to efficiently extract Big and Open Data from heterogeneous source, and integrate them into a common
analysable format. Further enhanced with the Virtual Graphs technology, entity identification reconciliation
is processed dynamically to produce more accurate result at system runtime. Moreover, we believe that our
technology can be applied to a wide diversity of entity identification problems in several domains, e.g., e-
Health, cultural heritage, and company identities in financial world.Ministerio de Ciencia e InnovaciĂłn TIN2013-46928-C3-3-
TAPER: query-aware, partition-enhancement for large, heterogenous, graphs
Graph partitioning has long been seen as a viable approach to address Graph
DBMS scalability. A partitioning, however, may introduce extra query processing
latency unless it is sensitive to a specific query workload, and optimised to
minimise inter-partition traversals for that workload. Additionally, it should
also be possible to incrementally adjust the partitioning in reaction to
changes in the graph topology, the query workload, or both. Because of their
complexity, current partitioning algorithms fall short of one or both of these
requirements, as they are designed for offline use and as one-off operations.
The TAPER system aims to address both requirements, whilst leveraging existing
partitioning algorithms. TAPER takes any given initial partitioning as a
starting point, and iteratively adjusts it by swapping chosen vertices across
partitions, heuristically reducing the probability of inter-partition
traversals for a given pattern matching queries workload. Iterations are
inexpensive thanks to time and space optimisations in the underlying support
data structures. We evaluate TAPER on two different large test graphs and over
realistic query workloads. Our results indicate that, given a hash-based
partitioning, TAPER reduces the number of inter-partition traversals by around
80%; given an unweighted METIS partitioning, by around 30%. These reductions
are achieved within 8 iterations and with the additional advantage of being
workload-aware and usable online.Comment: 12 pages, 11 figures, unpublishe
Entity Identity Reconciliation based Big Data Federation-A MDE approach
“Information is power” is a sentence attributed to Francis Bacon that acquired a high important in the current era of the information. However, too much information can be a negative aspect. The term of “Infoxication” refers to the difficulty a person can have understanding an issue and making decisions that can be caused by the presence of too much information. With the increasing of relevance of open data and big database, the application of mechanisms and solutions to manage information is critical. This paper introduces the problem of unique identification and data reconciliation and offers a discussion about how to solve this problem in big and open data environment. The problem of data reconciliation in multiple databases and the unique identification of entities is not a new problem, but, how effective are classical mechanisms in the new internet environment? In this paper a solution based on model-driven engineering and virtual graph is presented in order to improve the processing of information in big open repositories. The paper illustrates the idea with a real example for the right exploitation of heritage information in the south of Spain
Entity Identity Reconciliation based Big Data Federation A MDE approach
“Information is power” is a sentence attributed to Francis Bacon that acquired a high important in the current era of the information. However, too much information can be a negative aspect. The term of “Infoxication” refers to the difficulty a person can have understanding an issue and making decisions that can be caused by the presence of too much information. With the increasing of relevance of open data and big database, the application of mechanisms and solutions to manage information is critical. This paper introduces the problem of unique identification and data reconciliation and offers a discussion about how to solve this problem in big and open data environment. The problem of data reconciliation in multiple databases and the unique identification of entities is not a new problem, but, how effective are classical mechanisms in the new internet environment? In this paper a solution based on model-driven engineering and virtual graph is presented in order to improve the processing of information in big open repositories. The paper illustrates the idea with a real example for the right exploitation of heritage information in the south of Spain.Ministerio de Ciencia e Innovación TIN2013-46928-C3-3-
Investigation of Database Models for Evolving Graphs
We deal with the efficient implementation of storage models for time-varying graphs. To this end, we present an improved approach for the HiNode vertex-centric model based on MongoDB. This approach, apart from its inherent space optimality, exhibits significant improvements in global query execution times, which is the most challenging query type for entity-centric approaches. Not only significant speedups are achieved but more expensive queries can be executed as well, when compared to an implementation based on Cassandra due to the capability to exploit indices to a larger extent and benefit from in-database query processing