11,539 research outputs found

    Learning Tuple Probabilities

    Get PDF
    Learning the parameters of complex probabilistic-relational models from labeled training data is a standard technique in machine learning, which has been intensively studied in the subfield of Statistical Relational Learning (SRL), but---so far---this is still an under-investigated topic in the context of Probabilistic Databases (PDBs). In this paper, we focus on learning the probability values of base tuples in a PDB from labeled lineage formulas. The resulting learning problem can be viewed as the inverse problem to confidence computations in PDBs: given a set of labeled query answers, learn the probability values of the base tuples, such that the marginal probabilities of the query answers again yield in the assigned probability labels. We analyze the learning problem from a theoretical perspective, cast it into an optimization problem, and provide an algorithm based on stochastic gradient descent. Finally, we conclude by an experimental evaluation on three real-world and one synthetic dataset, thus comparing our approach to various techniques from SRL, reasoning in information extraction, and optimization

    Provenance and Probabilities in Relational Databases: From Theory to Practice

    Get PDF
    International audienceWe review the basics of data provenance in relational databases. We describe different provenance formalisms, from Boolean provenance to provenance semirings and beyond, that can be used for a wide variety of purposes, to obtain additional information on the output of a query. We discuss representation systems for data provenance, circuits in particular, with a focus on practical implementation. Finally, we explain how provenance is practically used for probabilistic query evaluation in probabilistic databases

    Approximate Lifted Inference with Probabilistic Databases

    Full text link
    This paper proposes a new approach for approximate evaluation of #P-hard queries with probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluating a fixed number of query plans, each providing an upper bound on the true probability, then taking their minimum. We provide an algorithm that takes into account important schema information to enumerate only the minimal necessary plans among all possible plans. Importantly, this algorithm is a strict generalization of all known results of PTIME self-join-free conjunctive queries: A query is safe if and only if our algorithm returns one single plan. We also apply three relational query optimization techniques to evaluate all minimal safe plans very fast. We give a detailed experimental evaluation of our approach and, in the process, provide a new way of thinking about the value of probabilistic methods over non-probabilistic methods for ranking query answers.Comment: 12 pages, 5 figures, pre-print for a paper appearing in VLDB 2015. arXiv admin note: text overlap with arXiv:1310.625

    Structurally Tractable Uncertain Data

    Full text link
    Many data management applications must deal with data which is uncertain, incomplete, or noisy. However, on existing uncertain data representations, we cannot tractably perform the important query evaluation tasks of determining query possibility, certainty, or probability: these problems are hard on arbitrary uncertain input instances. We thus ask whether we could restrict the structure of uncertain data so as to guarantee the tractability of exact query evaluation. We present our tractability results for tree and tree-like uncertain data, and a vision for probabilistic rule reasoning. We also study uncertainty about order, proposing a suitable representation, and study uncertain data conditioned by additional observations.Comment: 11 pages, 1 figure, 1 table. To appear in SIGMOD/PODS PhD Symposium 201

    Storing and Querying Probabilistic XML Using a Probabilistic Relational DBMS

    Get PDF
    This work explores the feasibility of storing and querying probabilistic XML in a probabilistic relational database. Our approach is to adapt known techniques for mapping XML to relational data such that the possible worlds are preserved. We show that this approach can work for any XML-to-relational technique by adapting a representative schema-based (inlining) as well as a representative schemaless technique (XPath Accelerator). We investigate the maturity of probabilistic rela- tional databases for this task with experiments with one of the state-of- the-art systems, called Trio

    The relationship between IR and multimedia databases

    Get PDF
    Modern extensible database systems support multimedia data through ADTs. However, because of the problems with multimedia query formulation, this support is not sufficient.\ud \ud Multimedia querying requires an iterative search process involving many different representations of the objects in the database. The support that is needed is very similar to the processes in information retrieval.\ud \ud Based on this observation, we develop the miRRor architecture for multimedia query processing. We design a layered framework based on information retrieval techniques, to provide a usable query interface to the multimedia database.\ud \ud First, we introduce a concept layer to enable reasoning over low-level concepts in the database.\ud \ud Second, we add an evidential reasoning layer as an intermediate between the user and the concept layer.\ud \ud Third, we add the functionality to process the users' relevance feedback.\ud \ud We then adapt the inference network model from text retrieval to an evidential reasoning model for multimedia query processing.\ud \ud We conclude with an outline for implementation of miRRor on top of the Monet extensible database system
    corecore