4 research outputs found

    Efficient Discovery of Ontology Functional Dependencies

    Full text link
    Poor data quality has become a pervasive issue due to the increasing complexity and size of modern datasets. Constraint based data cleaning techniques rely on integrity constraints as a benchmark to identify and correct errors. Data values that do not satisfy the given set of constraints are flagged as dirty, and data updates are made to re-align the data and the constraints. However, many errors often require user input to resolve due to domain expertise defining specific terminology and relationships. For example, in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be captured in a pharmaceutical ontology. While functional dependencies (FDs) have traditionally been used in existing data cleaning solutions to model syntactic equivalence, they are not able to model broader relationships (e.g., is-a) defined by an ontology. In this paper, we take a first step towards extending the set of data quality constraints used in data cleaning by defining and discovering \emph{Ontology Functional Dependencies} (OFDs). We lay out theoretical and practical foundations for OFDs, including a set of sound and complete axioms, and a linear inference procedure. We then develop effective algorithms for discovering OFDs, and a set of optimizations that efficiently prune the search space. Our experimental evaluation using real data show the scalability and accuracy of our algorithms.Comment: 12 page

    Implication and axiomatization of functional and constant constraints

    No full text
    Abstract: Akhtar et al. introduced equality-generating constraints and functional constraints as a first step towards dependency-like integrity constraints for RDF data [3]. Here, we focus on functional constraints. Since the usefulness of functional constraints is not limited to the RDF data model, we study the functional constraints in the more general setting of relations with arbitrary arity. We further introduce constant constraints and study the functional and constant constraints combined. Our main results are sound and complete axiomatizations for the functional and constant constraints, both separately and combined. These axiomatizations are derived using the chase algorithm for equality-generating constraints. For derivations of constant constraints, we show how every chase step can be simulated by a bounded number of applications of inference rules. For derivations of functional constraints, we show that the chase algorithm can be normalized to a more specialized symmetry-preserving chase algorithm performing so-called symmetry-preserving steps. We then show how each symmetry-preserving step can be simulated by a bounded number of applications of inference rules. The axiomatization for functional constraints is in particular applicable to the RDF data model, solving a major open problem of Akhtar et al
    corecore