25 research outputs found

    Structurally Tractable Uncertain Data

    Full text link
    Many data management applications must deal with data which is uncertain, incomplete, or noisy. However, on existing uncertain data representations, we cannot tractably perform the important query evaluation tasks of determining query possibility, certainty, or probability: these problems are hard on arbitrary uncertain input instances. We thus ask whether we could restrict the structure of uncertain data so as to guarantee the tractability of exact query evaluation. We present our tractability results for tree and tree-like uncertain data, and a vision for probabilistic rule reasoning. We also study uncertainty about order, proposing a suitable representation, and study uncertain data conditioned by additional observations.Comment: 11 pages, 1 figure, 1 table. To appear in SIGMOD/PODS PhD Symposium 201

    Fast M\"obius and Zeta Transforms

    Full text link
    M\"obius inversion of functions on partially ordered sets (posets) P\mathcal{P} is a classical tool in combinatorics. For finite posets it consists of two, mutually inverse, linear transformations called zeta and M\"obius transform, respectively. In this paper we provide novel fast algorithms for both that require O(nk)O(nk) time and space, where n=∣P∣n = |\mathcal{P}| and kk is the width (length of longest antichain) of P\mathcal{P}, compared to O(n2)O(n^2) for a direct computation. Our approach assumes that P\mathcal{P} is given as directed acyclic graph (DAG) (E,P)(\mathcal{E}, \mathcal{P}). The algorithms are then constructed using a chain decomposition for a one time cost of O(∣E∣+∣Ered∣k)O(|\mathcal{E}| + |\mathcal{E}_\text{red}| k), where Ered\mathcal{E}_\text{red} is the number of edges in the DAG's transitive reduction. We show benchmarks with implementations of all algorithms including parallelized versions. The results show that our algorithms enable M\"obius inversion on posets with millions of nodes in seconds if the defining DAGs are sufficiently sparse.Comment: 16 pages, 7 figures, submitted for revie

    Lifted Probabilistic Inference: A Guide for the Database Researcher

    Get PDF

    Provenance and Probabilities in Relational Databases: From Theory to Practice

    Get PDF
    International audienceWe review the basics of data provenance in relational databases. We describe different provenance formalisms, from Boolean provenance to provenance semirings and beyond, that can be used for a wide variety of purposes, to obtain additional information on the output of a query. We discuss representation systems for data provenance, circuits in particular, with a focus on practical implementation. Finally, we explain how provenance is practically used for probabilistic query evaluation in probabilistic databases

    Range Queries on Uncertain Data

    Full text link
    Given a set PP of nn uncertain points on the real line, each represented by its one-dimensional probability density function, we consider the problem of building data structures on PP to answer range queries of the following three types for any query interval II: (1) top-11 query: find the point in PP that lies in II with the highest probability, (2) top-kk query: given any integer k≤nk\leq n as part of the query, return the kk points in PP that lie in II with the highest probabilities, and (3) threshold query: given any threshold τ\tau as part of the query, return all points of PP that lie in II with probabilities at least τ\tau. We present data structures for these range queries with linear or nearly linear space and efficient query time.Comment: 26 pages. A preliminary version of this paper appeared in ISAAC 2014. In this full version, we also present solutions to the most general case of the problem (i.e., the histogram bounded case), which were left as open problems in the preliminary versio

    Model Counting of Query Expressions: Limitations of Propositional Methods

    Full text link
    Query evaluation in tuple-independent probabilistic databases is the problem of computing the probability of an answer to a query given independent probabilities of the individual tuples in a database instance. There are two main approaches to this problem: (1) in `grounded inference' one first obtains the lineage for the query and database instance as a Boolean formula, then performs weighted model counting on the lineage (i.e., computes the probability of the lineage given probabilities of its independent Boolean variables); (2) in methods known as `lifted inference' or `extensional query evaluation', one exploits the high-level structure of the query as a first-order formula. Although it is widely believed that lifted inference is strictly more powerful than grounded inference on the lineage alone, no formal separation has previously been shown for query evaluation. In this paper we show such a formal separation for the first time. We exhibit a class of queries for which model counting can be done in polynomial time using extensional query evaluation, whereas the algorithms used in state-of-the-art exact model counters on their lineages provably require exponential time. Our lower bounds on the running times of these exact model counters follow from new exponential size lower bounds on the kinds of d-DNNF representations of the lineages that these model counters (either explicitly or implicitly) produce. Though some of these queries have been studied before, no non-trivial lower bounds on the sizes of these representations for these queries were previously known.Comment: To appear in International Conference on Database Theory (ICDT) 201
    corecore