18,543 research outputs found

    Efficient Discovery of Ontology Functional Dependencies

    Full text link
    Poor data quality has become a pervasive issue due to the increasing complexity and size of modern datasets. Constraint based data cleaning techniques rely on integrity constraints as a benchmark to identify and correct errors. Data values that do not satisfy the given set of constraints are flagged as dirty, and data updates are made to re-align the data and the constraints. However, many errors often require user input to resolve due to domain expertise defining specific terminology and relationships. For example, in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be captured in a pharmaceutical ontology. While functional dependencies (FDs) have traditionally been used in existing data cleaning solutions to model syntactic equivalence, they are not able to model broader relationships (e.g., is-a) defined by an ontology. In this paper, we take a first step towards extending the set of data quality constraints used in data cleaning by defining and discovering \emph{Ontology Functional Dependencies} (OFDs). We lay out theoretical and practical foundations for OFDs, including a set of sound and complete axioms, and a linear inference procedure. We then develop effective algorithms for discovering OFDs, and a set of optimizations that efficiently prune the search space. Our experimental evaluation using real data show the scalability and accuracy of our algorithms.Comment: 12 page

    On Independence Atoms and Keys

    Full text link
    Uniqueness and independence are two fundamental properties of data. Their enforcement in database systems can lead to higher quality data, faster data service response time, better data-driven decision making and knowledge discovery from data. The applications can be effectively unlocked by providing efficient solutions to the underlying implication problems of keys and independence atoms. Indeed, for the sole class of keys and the sole class of independence atoms the associated finite and general implication problems coincide and enjoy simple axiomatizations. However, the situation changes drastically when keys and independence atoms are combined. We show that the finite and the general implication problems are already different for keys and unary independence atoms. Furthermore, we establish a finite axiomatization for the general implication problem, and show that the finite implication problem does not enjoy a k-ary axiomatization for any k

    Lattices with non-Shannon Inequalities

    Full text link
    We study the existence or absence of non-Shannon inequalities for variables that are related by functional dependencies. Although the power-set on four variables is the smallest Boolean lattice with non-Shannon inequalities there exist lattices with many more variables without non-Shannon inequalities. We search for conditions that ensures that no non-Shannon inequalities exist. It is demonstrated that 3-dimensional distributive lattices cannot have non-Shannon inequalities and planar modular lattices cannot have non-Shannon inequalities. The existence of non-Shannon inequalities is related to the question of whether a lattice is isomorphic to a lattice of subgroups of a group.Comment: Ten pages. Submitted to ISIT 2015. The appendix will not appear in the proceeding

    Justification for inclusion dependency normal form

    Get PDF
    Functional dependencies (FDs) and inclusion dependencies (INDs) are the most fundamental integrity constraints that arise in practice in relational databases. In this paper, we address the issue of normalization in the presence of FDs and INDs and, in particular, the semantic justification for Inclusion Dependency Normal Form (IDNF), a normal form which combines Boyce-Codd normal form with the restriction on the INDs that they be noncircular and key-based. We motivate and formalize three goals of database design in the presence of FDs and INDs: noninteraction between FDs and INDs, elimination of redundancy and update anomalies, and preservation of entity integrity. We show that, as for FDs, in the presence of INDs being free of redundancy is equivalent to being free of update anomalies. Then, for each of these properties, we derive equivalent syntactic conditions on the database design. Individually, each of these syntactic conditions is weaker than IDNF and the restriction that an FD not be embedded in the righthand side of an IND is common to three of the conditions. However, we also show that, for these three goals of database design to be satisfied simultaneously, IDNF is both a necessary and sufficient condition

    Guaranteeing no interaction between functional dependencies and tree-like inclusion dependencies

    Get PDF
    Functional dependencies (FDs) and inclusion dependencies (INDs) are the most fundamental integrity constraints that arise in practice in relational databases. A given set of FDs does not interact with a given set of INDs if logical implication of any FD can be determined solely by the given set of FDs, and logical implication of any IND can be determined solely by the given set of INDs. The set of tree-like INDs constitutes a useful subclass of INDs whose implication problem is polynomial time decidable. We exhibit a necessary and sufficient condition for a set of FDs and tree-like INDs not to interact; this condition can be tested in polynomial time
    corecore