1,449 research outputs found
Integrity Constraints Revisited: From Exact to Approximate Implication
Integrity constraints such as functional dependencies (FD), and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Finally, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Our results recover, and sometimes extend, several previously known results about the implication problem: implication of MVDs can be checked by considering only 2-tuple relations, and the implication of differential constraints for frequent item sets can be checked by considering only databases containing a single transaction
Efficient Discovery of Ontology Functional Dependencies
Poor data quality has become a pervasive issue due to the increasing
complexity and size of modern datasets. Constraint based data cleaning
techniques rely on integrity constraints as a benchmark to identify and correct
errors. Data values that do not satisfy the given set of constraints are
flagged as dirty, and data updates are made to re-align the data and the
constraints. However, many errors often require user input to resolve due to
domain expertise defining specific terminology and relationships. For example,
in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be
captured in a pharmaceutical ontology. While functional dependencies (FDs) have
traditionally been used in existing data cleaning solutions to model syntactic
equivalence, they are not able to model broader relationships (e.g., is-a)
defined by an ontology. In this paper, we take a first step towards extending
the set of data quality constraints used in data cleaning by defining and
discovering \emph{Ontology Functional Dependencies} (OFDs). We lay out
theoretical and practical foundations for OFDs, including a set of sound and
complete axioms, and a linear inference procedure. We then develop effective
algorithms for discovering OFDs, and a set of optimizations that efficiently
prune the search space. Our experimental evaluation using real data show the
scalability and accuracy of our algorithms.Comment: 12 page
Fuzzy inequational logic
We present a logic for reasoning about graded inequalities which generalizes
the ordinary inequational logic used in universal algebra. The logic deals with
atomic predicate formulas of the form of inequalities between terms and
formalizes their semantic entailment and provability in graded setting which
allows to draw partially true conclusions from partially true assumptions. We
follow the Pavelka approach and define general degrees of semantic entailment
and provability using complete residuated lattices as structures of truth
degrees. We prove the logic is Pavelka-style complete. Furthermore, we present
a logic for reasoning about graded if-then rules which is obtained as
particular case of the general result
On the Interaction of Inclusion Dependencies with Independence Atoms
Proceeding volume: 46Inclusion dependencies are one of the most important database constraints. In isolation their finite and unrestricted implication problems coincide, are finitely axiomatizable, PSPACE-complete, and fixed-parameter tractable in their arity. In contrast, finite and unrestricted implication problems for the combined class of functional and inclusion de- pendencies deviate from one another and are each undecidable. The same holds true for the class of embedded multivalued dependencies. An important embedded tractable fragment of embedded multivalued dependencies are independence atoms. These stipulate independence between two attribute sets in the sense that for every two tuples there is a third tuple that agrees with the first tuple on the first attribute set and with the second tuple on the second attribute set. For independence atoms, their finite and unrestricted implication problems coincide, are finitely axiomatizable, and decidable in cubic time. In this article, we study the implication problems of the combined class of independence atoms and inclusion dependencies. We show that their finite and unrestricted implication problems coincide, are finitely axiomatizable, PSPACE-complete, and fixed-parameter tractable in their arity. Hence, significant expressivity is gained without sacrificing any of the desirable properties that inclusion dependencies have in isolation. Finally, we establish an efficient condition that is sufficient for independence atoms and inclusion dependencies not to inter- act. The condition ensures that we can apply known algorithms for deciding implication of the individual classes of independence atoms and inclusion dependencies, respectively, to decide implication for an input that combines both individual classes.Peer reviewe
Endogenous Prospect Theory.
In previous models of (cumulative) prospect theory reference-dependence of preferences is imposed beforehand and the location of the reference point is exogenously determined. This paper provides an axiomatization of a new specification of cumulative prospect theory, termed endogenous prospect theory, where reference-dependence is derived from preference conditions and a unique reference point arises endogenously.prospect theory; reference point; diminishing sensitivity; loss aversion;
Fundamentals and applications of order dependencies
Business-intelligence queries often involve SQL functions and algebraic expressions. There can be clear semantic relationships between a column's values and the values of a function over that column. A common property is monotonicity: as the column's values ascend, so do the function's values (or the other column's values). This we call an order dependency (OD). Queries can be evaluated more efficiently when the query optimizer uses order dependencies. They can be run even faster when the optimizer can also reason over known ODs to infer new ones.
Order dependencies can be declared as integrity constraints, and they can be detected automatically for many types of SQL functions and algebraic expressions. We present optimization techniques using ODs for queries that involve join, order by, group by, partition by, and distinct. Essentially, ODs can further exploit interesting orders to eliminate or simplify potentially expensive sorts in the query plan. We evaluate these techniques over our prototype implementation in IBM® DB2® using the TPC-DS® benchmark schema and some customer inspired queries. Our experimental results demonstrate a significant performance gain.
Dependencies have played an important role in database theory. We study the theoretical aspects of order dependencies-and unidirectional order dependencies (UODs), a proper sub-class of ODs-which describe the relationships among lexicographical orderings of sets of tuples. We investigate the inference problem for order dependencies. We establish the following: (i) a sound and complete axiomatization for UODs which is sound for ODs; (ii) a hierarchy of order dependency classes; (iii) a proof of co-NP-completeness of the inference problem for ODs and for the subclass of UODs; (iv) a proof of co-NP-completeness of the inference problem of functional dependencies (FDs) from ODs in general, but demonstrate linear time complexity for the inference of FDs from UODs; (v) a sound and complete elimination procedure for testing logical implication over ODs; and (vi) a sound and complete polynomial inference algorithm for sets of UODs over natural domains
- …