18,543 research outputs found
Efficient Discovery of Ontology Functional Dependencies
Poor data quality has become a pervasive issue due to the increasing
complexity and size of modern datasets. Constraint based data cleaning
techniques rely on integrity constraints as a benchmark to identify and correct
errors. Data values that do not satisfy the given set of constraints are
flagged as dirty, and data updates are made to re-align the data and the
constraints. However, many errors often require user input to resolve due to
domain expertise defining specific terminology and relationships. For example,
in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be
captured in a pharmaceutical ontology. While functional dependencies (FDs) have
traditionally been used in existing data cleaning solutions to model syntactic
equivalence, they are not able to model broader relationships (e.g., is-a)
defined by an ontology. In this paper, we take a first step towards extending
the set of data quality constraints used in data cleaning by defining and
discovering \emph{Ontology Functional Dependencies} (OFDs). We lay out
theoretical and practical foundations for OFDs, including a set of sound and
complete axioms, and a linear inference procedure. We then develop effective
algorithms for discovering OFDs, and a set of optimizations that efficiently
prune the search space. Our experimental evaluation using real data show the
scalability and accuracy of our algorithms.Comment: 12 page
On Independence Atoms and Keys
Uniqueness and independence are two fundamental properties of data. Their
enforcement in database systems can lead to higher quality data, faster data
service response time, better data-driven decision making and knowledge
discovery from data. The applications can be effectively unlocked by providing
efficient solutions to the underlying implication problems of keys and
independence atoms. Indeed, for the sole class of keys and the sole class of
independence atoms the associated finite and general implication problems
coincide and enjoy simple axiomatizations. However, the situation changes
drastically when keys and independence atoms are combined. We show that the
finite and the general implication problems are already different for keys and
unary independence atoms. Furthermore, we establish a finite axiomatization for
the general implication problem, and show that the finite implication problem
does not enjoy a k-ary axiomatization for any k
Lattices with non-Shannon Inequalities
We study the existence or absence of non-Shannon inequalities for variables
that are related by functional dependencies. Although the power-set on four
variables is the smallest Boolean lattice with non-Shannon inequalities there
exist lattices with many more variables without non-Shannon inequalities. We
search for conditions that ensures that no non-Shannon inequalities exist. It
is demonstrated that 3-dimensional distributive lattices cannot have
non-Shannon inequalities and planar modular lattices cannot have non-Shannon
inequalities. The existence of non-Shannon inequalities is related to the
question of whether a lattice is isomorphic to a lattice of subgroups of a
group.Comment: Ten pages. Submitted to ISIT 2015. The appendix will not appear in
the proceeding
Justification for inclusion dependency normal form
Functional dependencies (FDs) and inclusion dependencies (INDs) are the most fundamental integrity constraints that arise in practice in relational databases. In this paper, we address the issue of normalization in the presence of FDs and INDs and, in particular, the semantic justification for Inclusion Dependency Normal Form (IDNF), a normal form which combines Boyce-Codd normal form with the restriction on the INDs that they be noncircular and key-based. We motivate and formalize three goals of database design in the presence of FDs and INDs: noninteraction between FDs and INDs, elimination of redundancy and update anomalies, and preservation of entity integrity. We show that, as for FDs, in the presence of INDs being free of redundancy is equivalent to being free of update anomalies. Then, for each of these properties, we derive equivalent syntactic conditions on the database design. Individually, each of these syntactic conditions is weaker than IDNF and the restriction that an FD not be embedded in the righthand side of an IND is common to three of the conditions. However, we also show that, for these three goals of database design to be satisfied simultaneously, IDNF is both a necessary and sufficient condition
Guaranteeing no interaction between functional dependencies and tree-like inclusion dependencies
Functional dependencies (FDs) and inclusion dependencies (INDs) are the most fundamental integrity constraints that arise in practice in relational databases. A given set of FDs does not interact with a given set of INDs if logical implication of any FD can be determined solely by the given set of FDs, and logical implication of any IND can be determined solely by the given set of INDs. The set of tree-like INDs constitutes a useful subclass of INDs whose implication problem is polynomial time decidable. We exhibit a necessary and sufficient condition for a set of FDs and tree-like INDs not to interact; this condition can be tested in polynomial time
- …