3,275 research outputs found

    Query Stability in Monotonic Data-Aware Business Processes [Extended Version]

    Get PDF
    Organizations continuously accumulate data, often according to some business processes. If one poses a query over such data for decision support, it is important to know whether the query is stable, that is, whether the answers will stay the same or may change in the future because business processes may add further data. We investigate query stability for conjunctive queries. To this end, we define a formalism that combines an explicit representation of the control flow of a process with a specification of how data is read and inserted into the database. We consider different restrictions of the process model and the state of the system, such as negation in conditions, cyclic executions, read access to written data, presence of pending process instances, and the possibility to start fresh process instances. We identify for which facet combinations stability of conjunctive queries is decidable and provide encodings into variants of Datalog that are optimal with respect to the worst-case complexity of the problem.Comment: This report is the extended version of a paper accepted at the 19th International Conference on Database Theory (ICDT 2016), March 15-18, 2016 - Bordeaux, Franc

    Model Counting of Query Expressions: Limitations of Propositional Methods

    Full text link
    Query evaluation in tuple-independent probabilistic databases is the problem of computing the probability of an answer to a query given independent probabilities of the individual tuples in a database instance. There are two main approaches to this problem: (1) in `grounded inference' one first obtains the lineage for the query and database instance as a Boolean formula, then performs weighted model counting on the lineage (i.e., computes the probability of the lineage given probabilities of its independent Boolean variables); (2) in methods known as `lifted inference' or `extensional query evaluation', one exploits the high-level structure of the query as a first-order formula. Although it is widely believed that lifted inference is strictly more powerful than grounded inference on the lineage alone, no formal separation has previously been shown for query evaluation. In this paper we show such a formal separation for the first time. We exhibit a class of queries for which model counting can be done in polynomial time using extensional query evaluation, whereas the algorithms used in state-of-the-art exact model counters on their lineages provably require exponential time. Our lower bounds on the running times of these exact model counters follow from new exponential size lower bounds on the kinds of d-DNNF representations of the lineages that these model counters (either explicitly or implicitly) produce. Though some of these queries have been studied before, no non-trivial lower bounds on the sizes of these representations for these queries were previously known.Comment: To appear in International Conference on Database Theory (ICDT) 201

    Datalog with Negation and Monotonicity

    Get PDF
    Positive Datalog has several nice properties that are lost when the language is extended with negation. One example is that fixpoints of positive Datalog programs are robust w.r.t. the order in which facts are inserted, which facilitates efficient evaluation of such programs in distributed environments. A natural question to ask, given a (stratified) Datalog program with negation, is whether an equivalent positive Datalog program exists. In this context, it is known that positive Datalog can express only a strict subset of the monotone queries, yet the exact relationship between the positive and monotone fragments of semi-positive and stratified Datalog was previously left open. In this paper, we complete the picture by showing that monotone queries expressible in semi-positive Datalog exist which are not expressible in positive Datalog. To provide additional insight into this gap, we also characterize a large class of semi-positive Datalog programs for which the dichotomy `monotone if and only if rewritable to positive Datalog\u27 holds. Finally, we give best-effort techniques to reduce the amount of negation that is exhibited by a program, even if the program is not monotone

    Integrity Constraints Revisited: From Exact to Approximate Implication

    Get PDF
    Integrity constraints such as functional dependencies (FD), and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Finally, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Our results recover, and sometimes extend, several previously known results about the implication problem: implication of MVDs can be checked by considering only 2-tuple relations, and the implication of differential constraints for frequent item sets can be checked by considering only databases containing a single transaction

    A Theory of Pricing Private Data

    Full text link
    Personal data has value to both its owner and to institutions who would like to analyze it. Privacy mechanisms protect the owner's data while releasing to analysts noisy versions of aggregate query results. But such strict protections of individual's data have not yet found wide use in practice. Instead, Internet companies, for example, commonly provide free services in return for valuable sensitive information from users, which they exploit and sometimes sell to third parties. As the awareness of the value of the personal data increases, so has the drive to compensate the end user for her private information. The idea of monetizing private data can improve over the narrower view of hiding private data, since it empowers individuals to control their data through financial means. In this paper we propose a theoretical framework for assigning prices to noisy query answers, as a function of their accuracy, and for dividing the price amongst data owners who deserve compensation for their loss of privacy. Our framework adopts and extends key principles from both differential privacy and query pricing in data markets. We identify essential properties of the price function and micro-payments, and characterize valid solutions.Comment: 25 pages, 2 figures. Best Paper Award, to appear in the 16th International Conference on Database Theory (ICDT), 201

    When Can Matrix Query Languages Discern Matrices?

    Get PDF
    We investigate when two graphs, represented by their adjacency matrices, can be distinguished by means of sentences formed in MATLANG, a matrix query language which supports a number of elementary linear algebra operators. When undirected graphs are concerned, and hence the adjacency matrices are real and symmetric, precise characterisations are in place when two graphs (i.e., their adjacency matrices) can be distinguished. Turning to directed graphs, one has to deal with asymmetric adjacency matrices. This complicates matters. Indeed, it requires to understand the more general problem of when two arbitrary matrices can be distinguished in MATLANG. We provide characterisations of the distinguishing power of MATLANG on real and complex matrices, and on adjacency matrices of directed graphs in particular. The proof techniques are a combination of insights from the symmetric matrix case and results from linear algebra and linear control theory

    A Simple Parallel Algorithm for Natural Joins on Binary Relations

    Get PDF
    • …
    corecore