Search CORE

190 research outputs found

Faster Query Answering in Probabilistic Databases using Read-Once Functions

Author: Perduca Vittorio
Roy Sudeepa
Tannen Val
Publication venue
Publication date: 04/12/2010
Field of study

A boolean expression is in read-once form if each of its variables appears exactly once. When the variables denote independent events in a probability space, the probability of the event denoted by the whole expression in read-once form can be computed in polynomial time (whereas the general problem for arbitrary expressions is #P-complete). Known approaches to checking read-once property seem to require putting these expressions in disjunctive normal form. In this paper, we tell a better story for a large subclass of boolean event expressions: those that are generated by conjunctive queries without self-joins and on tuple-independent probabilistic databases. We first show that given a tuple-independent representation and the provenance graph of an SPJ query plan without self-joins, we can, without using the DNF of a result event expression, efficiently compute its co-occurrence graph. From this, the read-once form can already, if it exists, be computed efficiently using existing techniques. Our second and key contribution is a complete, efficient, and simple to implement algorithm for computing the read-once forms (whenever they exist) directly, using a new concept, that of co-table graph, which can be significantly smaller than the co-occurrence graph.Comment: Accepted in ICDT 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Tractability in probabilistic databases

Author: Dan Suciu
Dan Suciu
See Profile
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately

CiteSeerX

Crossref

Approximate Confidence Computation in Probabilistic Databases.

Author: Huang Jiewen
Koch Christoph
Olteanu Dan
Publication venue
Publication date: 01/01/2010
Field of study

This paper introduces a deterministic approximation algorithm with error guarantees for computing the probability of propositional formulas over discrete random variables. The algorithm is based on an incremental compilation of formulas into decision diagrams using three types of decompositions: Shannon expansion, independence partitioning, and product factorization. With each decomposition step, lower and upper bounds on the probability of the partially compiled formula can be quickly computed and checked against the allowed error. This algorithm can be effectively used to compute approximate confidence values of answer tuples to positive relational algebra queries on general probabilistic databases (c-tables with discrete probability distributions). We further tune our algorithm so as to capture all known tractable conjunctive queries without selfjoins on tuple-independent probabilistic databases: In this case, the algorithm requires time polynomial in the input size even for exact computation. We implemented the algorithm as an extension of the SPROUT query engine. An extensive experimental effort shows that it consistently outperforms state-of-art approximation techniques by several orders of magnitude

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Crossref

Probabilistic Query Evaluation with Bag Semantics

Author: Grohe Martin
Lindner Peter
Standke Christoph
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 26th International Conference on Database Theory (ICDT 2023)
Publication date: 13/04/2022
Field of study

We initiate the study of probabilistic query evaluation under bag semantics where tuples are allowed to be present with duplicates. We focus on self-join free conjunctive queries, and probabilistic databases where occurrences of different facts are independent, which is the natural generalization of tuple-independent probabilistic databases to the bag semantics setting. For set semantics, the data complexity of this problem is well understood, even for the more general class of unions of conjunctive queries: it is either in polynomial time, or #P-hard, depending on the query (Dalvi & Suciu, JACM 2012). Due to potentially unbounded multiplicities, the bag probabilistic databases we discuss are no longer finite objects, which requires a treatment of representation mechanisms. Moreover, the answer to a Boolean query is a probability distribution over non-negative integers, rather than a probability distribution over {true, false}. Therefore, we discuss two flavors of probabilistic query evaluation: computing expectations of answer tuple multiplicities, and computing the probability that a tuple is contained in the answer at most k times for some parameter k. Subject to mild technical assumptions on the representation systems, it turns out that expectations are easy to compute, even for unions of conjunctive queries. For query answer probabilities, we obtain a dichotomy between solvability in polynomial time and #P-hardness for self-join free conjunctive queries

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

A dichotomy for non-repeating queries with negation in probabilistic databases

Author: Abiteboul S.
Fink R.
Valiant L.
Wang T.-Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

This paper shows that any non-repeating conjunctive rela-tional query with negation has either polynomial time or #P-hard data complexity on tuple-independent probabilis-tic databases. This result extends a dichotomy by Dalvi and Suciu for non-repeating conjunctive queries to queries with negation. The tractable queries with negation are precisely the hierarchical ones and can be recognised efficiently. 1

CiteSeerX

Crossref

The Dichotomy of Evaluating Homomorphism-Closed Queries on Probabilistic Graphs

Author: Amarilli Antoine
Ceylan İsmail İlkan
Publication venue
Publication date: 06/01/2021
Field of study

We study the problem of probabilistic query evaluation on probabilistic graphs, namely, tuple-independent probabilistic databases on signatures of arity two. Our focus is the class of queries that is closed under homomorphisms, or equivalently, the infinite unions of conjunctive queries. Our main result states that all unbounded queries from this class are #P-hard for probabilistic query evaluation. As bounded queries from this class are equivalent to a union of conjunctive queries, they are already classified by the dichotomy of Dalvi and Suciu (2012). Hence, our result and theirs imply a complete data complexity dichotomy, between polynomial time and #P-hardness, for evaluating infinite unions of conjunctive queries over probabilistic graphs. This dichotomy covers in particular all fragments of infinite unions of conjunctive queries such as negation-free (disjunctive) Datalog, regular path queries, and a large class of ontology-mediated queries on arity-two signatures. Our result is shown by reducing from counting the valuations of positive partitioned 2-DNF formulae for some queries, or from the source-to-target reliability problem in an undirected graph for other queries, depending on properties of minimal models. The presented dichotomy result applies to even a special case of probabilistic query evaluation called generalized model counting, where fact probabilities must be 0, 0.5, or 1.Comment: 30 pages. Journal version of the ICDT'20 paper https://drops.dagstuhl.de/opus/volltexte/2020/11939/. Submitted to LMCS. The previous version (version 2) was the same as the ICDT'20 paper with some minor formatting tweaks and 7 extra pages of technical appendi

arXiv.org e-Print Archive

Episciences.org

Directory of Open Access Journals