4 research outputs found
Efficient Discovery of Ontology Functional Dependencies
Poor data quality has become a pervasive issue due to the increasing
complexity and size of modern datasets. Constraint based data cleaning
techniques rely on integrity constraints as a benchmark to identify and correct
errors. Data values that do not satisfy the given set of constraints are
flagged as dirty, and data updates are made to re-align the data and the
constraints. However, many errors often require user input to resolve due to
domain expertise defining specific terminology and relationships. For example,
in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be
captured in a pharmaceutical ontology. While functional dependencies (FDs) have
traditionally been used in existing data cleaning solutions to model syntactic
equivalence, they are not able to model broader relationships (e.g., is-a)
defined by an ontology. In this paper, we take a first step towards extending
the set of data quality constraints used in data cleaning by defining and
discovering \emph{Ontology Functional Dependencies} (OFDs). We lay out
theoretical and practical foundations for OFDs, including a set of sound and
complete axioms, and a linear inference procedure. We then develop effective
algorithms for discovering OFDs, and a set of optimizations that efficiently
prune the search space. Our experimental evaluation using real data show the
scalability and accuracy of our algorithms.Comment: 12 page
Implication and axiomatization of functional and constant constraints
Abstract: Akhtar et al. introduced equality-generating constraints and functional constraints as a first step towards dependency-like integrity constraints for RDF data [3]. Here, we focus on functional constraints. Since the usefulness of functional constraints is not limited to the RDF data model, we study the functional constraints in the more general setting of relations with arbitrary arity. We further introduce constant constraints and study the functional and constant constraints combined. Our main results are sound and complete axiomatizations for the functional and constant constraints, both separately and combined. These axiomatizations are derived using the chase algorithm for equality-generating constraints. For derivations of constant constraints, we show how every chase step can be simulated by a bounded number of applications of inference rules. For derivations of functional constraints, we show that the chase algorithm can be normalized to a more specialized symmetry-preserving chase algorithm performing so-called symmetry-preserving steps. We then show how each symmetry-preserving step can be simulated by a bounded number of applications of inference rules. The axiomatization for functional constraints is in particular applicable to the RDF data model, solving a major open problem of Akhtar et al