Search CORE

1,053 research outputs found

Putting Context into Schema Matching

Author: Bohannon Philip
Elnahrawy Eiman
Fan Wenfei
Flaster Michael
Publication venue
Publication date: 01/01/2006
Field of study

Edinburgh Research Explorer

Extending and inferring functional dependencies in schema transformation

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Crossref

From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back

Author: Bertossi Leopoldo
Salimi Babak
Publication venue
Publication date: 13/12/2014
Field of study

In this work we establish and investigate connections between causes for query answers in databases, database repairs wrt. denial constraints, and consistency-based diagnosis. The first two are relatively new research areas in databases, and the third one is an established subject in knowledge representation. We show how to obtain database repairs from causes, and the other way around. Causality problems are formulated as diagnosis problems, and the diagnoses provide causes and their responsibilities. The vast body of research on database repairs can be applied to the newer problems of computing actual causes for query answers and their responsibilities. These connections, which are interesting per se, allow us, after a transition -inspired by consistency-based diagnosis- to computational problems on hitting sets and vertex covers in hypergraphs, to obtain several new algorithmic and complexity results for database causality.Comment: To appear in Theory of Computing Systems. By invitation to special issue with extended papers from ICDT 2015 (paper arXiv:1412.4311

arXiv.org e-Print Archive

Carleton University's Institutional Repository

Dagstuhl Research Online Publication Server

Profiling relational data: a survey

Author: Abedjan Ziawasch
Golab Lukasz
Naumann Felix
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/08/2016
Field of study

Profiling data to determine metadata about a given dataset is an important and frequent activity of any IT professional and researcher and is necessary for various use-cases. It encompasses a vast array of methods to examine datasets and produce metadata. Among the simpler results are statistics, such as the number of null values and distinct values in a column, its data type, or the most frequent patterns of its data values. Metadata that are more difficult to compute involve multiple columns, namely correlations, unique column combinations, functional dependencies, and inclusion dependencies. Further techniques detect conditional properties of the dataset at hand. This survey provides a classification of data profiling tasks and comprehensively reviews the state of the art for each class. In addition, we review data profiling tools and systems from research and industry. We conclude with an outlook on the future of data profiling beyond traditional profiling tasks and beyond relational databases

DSpace@MIT

Inconsistency and Incompleteness in Relational Databases and Logic Programs

Author: Viswanath Navin
Publication venue: ScholarWorks @ Georgia State University
Publication date: 08/07/2009
Field of study

The aim of this thesis is to study the role played by negation in databases and to develop data models that can handle inconsistent and incomplete information. We develop models that also allow incompleteness through disjunctive information under both the CWA and the OWA in relational databases. In the area of logic programming, extended logic programs allow explicit representation of negative information. As a result, a number of extended logic programs have an inconsistent semantics. We present a translation of extended logic programs to normal logic programs that is more tolerant to inconsistencies. Extended logic programs have also been used widely in order to compute the repairs of an inconsistent database. We present some preliminary ideas on how source information can be incorporated into the repair program in order to produce a subset of the set of all repairs based on a preference for certain sources over others

ScholarWorks @ Georgia State University

GraphX: Unifying Data-Parallel and Graph-Parallel Analytics

Author: Crankshaw Daniel
Dave Ankur
Franklin Michael J.
Gonzalez Joseph E.
Stoica Ion
Xin Reynold S.
Publication venue
Publication date: 11/02/2014
Field of study

From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. However, the same restrictions that enable the performance gains also make it difficult to express many of the important stages in a typical graph-analytics pipeline: constructing the graph, modifying its structure, or expressing computation that spans multiple graphs. As a consequence, existing graph analytics pipelines compose graph-parallel and data-parallel systems using external storage systems, leading to extensive data movement and complicated programming model. To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation. GraphX provides a small, core set of graph-parallel operators expressive enough to implement the Pregel and PowerGraph abstractions, yet simple enough to be cast in relational algebra. GraphX uses a collection of query optimization techniques such as automatic join rewrites to efficiently implement these graph-parallel operators. We evaluate GraphX on real-world graphs and workloads and demonstrate that GraphX achieves comparable performance as specialized graph computation systems, while outperforming them in end-to-end graph pipelines. Moreover, GraphX achieves a balance between expressiveness, performance, and ease of use

arXiv.org e-Print Archive

CiteSeerX