12,970 research outputs found
Characterization of order-like dependencies with formal concept analysis
Functional Dependencies (FDs) play a key role in many fields
of the relational database model, one of the most widely used database
systems. FDs have also been applied in data analysis, data quality, knowl-
edge discovery and the like, but in a very limited scope, because of their
fixed semantics. To overcome this limitation, many generalizations have
been defined to relax the crisp definition of FDs. FDs and a few of their
generalizations have been characterized with Formal Concept Analysis
which reveals itself to be an interesting unified framework for charac-
terizing dependencies, that is, understanding and computing them in a
formal way. In this paper, we extend this work by taking into account
order-like dependencies. Such dependencies, well defined in the database
field, consider an ordering on the domain of each attribute, and not sim-
ply an equality relation as with standard FDs.Peer ReviewedPostprint (published version
Integrity Constraints Revisited: From Exact to Approximate Implication
Integrity constraints such as functional dependencies (FD), and multi-valued
dependencies (MVD) are fundamental in database schema design. Likewise,
probabilistic conditional independences (CI) are crucial for reasoning about
multivariate probability distributions. The implication problem studies whether
a set of constraints (antecedents) implies another constraint (consequent), and
has been investigated in both the database and the AI literature, under the
assumption that all constraints hold exactly. However, many applications today
consider constraints that hold only approximately. In this paper we define an
approximate implication as a linear inequality between the degree of
satisfaction of the antecedents and consequent, and we study the relaxation
problem: when does an exact implication relax to an approximate implication? We
use information theory to define the degree of satisfaction, and prove several
results. First, we show that any implication from a set of data dependencies
(MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most
quadratic in the number of variables; when the consequent is an FD, the factor
can be reduced to 1. Second, we prove that there exists an implication between
CIs that does not admit any relaxation; however, we prove that every
implication between CIs relaxes "in the limit". Finally, we show that the
implication problem for differential constraints in market basket analysis also
admits a relaxation with a factor equal to 1. Our results recover, and
sometimes extend, several previously known results about the implication
problem: implication of MVDs can be checked by considering only 2-tuple
relations, and the implication of differential constraints for frequent item
sets can be checked by considering only databases containing a single
transaction
Integrity Constraints Revisited: From Exact to Approximate Implication
Integrity constraints such as functional dependencies (FD), and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Finally, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Our results recover, and sometimes extend, several previously known results about the implication problem: implication of MVDs can be checked by considering only 2-tuple relations, and the implication of differential constraints for frequent item sets can be checked by considering only databases containing a single transaction
Efficient Discovery of Ontology Functional Dependencies
Poor data quality has become a pervasive issue due to the increasing
complexity and size of modern datasets. Constraint based data cleaning
techniques rely on integrity constraints as a benchmark to identify and correct
errors. Data values that do not satisfy the given set of constraints are
flagged as dirty, and data updates are made to re-align the data and the
constraints. However, many errors often require user input to resolve due to
domain expertise defining specific terminology and relationships. For example,
in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be
captured in a pharmaceutical ontology. While functional dependencies (FDs) have
traditionally been used in existing data cleaning solutions to model syntactic
equivalence, they are not able to model broader relationships (e.g., is-a)
defined by an ontology. In this paper, we take a first step towards extending
the set of data quality constraints used in data cleaning by defining and
discovering \emph{Ontology Functional Dependencies} (OFDs). We lay out
theoretical and practical foundations for OFDs, including a set of sound and
complete axioms, and a linear inference procedure. We then develop effective
algorithms for discovering OFDs, and a set of optimizations that efficiently
prune the search space. Our experimental evaluation using real data show the
scalability and accuracy of our algorithms.Comment: 12 page
The role of assumptions in causal discovery
The paper looks at the conditional independence search approach to causal discovery, proposed by Spirtes et al. and Pearl and Verma, from the point of view of the mechanism-based view of causality in econometrics, explicated by Simon. As demonstrated by Simon, the problem of determining the causal structure from data is severely underconstrained and the perceived causal structure depends on the a priori assumptions that one is willing to make. I discuss the assumptions made in the independence search-based causal discovery and their identifying strength
- …