Search CORE

25 research outputs found

Structurally Tractable Uncertain Data

Author: Abiteboul S.
Abiteboul S.
Agrawal R.
Amarilli A.
Carlson A.
Courcelle B.
Deutch D.
Dong X.
Galárraga L.
Gottlob G.
Lauritzen S. L.
Maniu S.
Raedt L. D.
Robertson N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/07/2015
Field of study

Many data management applications must deal with data which is uncertain, incomplete, or noisy. However, on existing uncertain data representations, we cannot tractably perform the important query evaluation tasks of determining query possibility, certainty, or probability: these problems are hard on arbitrary uncertain input instances. We thus ask whether we could restrict the structure of uncertain data so as to guarantee the tractability of exact query evaluation. We present our tractability results for tree and tree-like uncertain data, and a vision for probabilistic rule reasoning. We also study uncertainty about order, proposing a suitable representation, and study uncertain data conditioned by additional observations.Comment: 11 pages, 1 figure, 1 table. To appear in SIGMOD/PODS PhD Symposium 201

arXiv.org e-Print Archive

Crossref

Fast M\"obius and Zeta Transforms

Author: Pegolotti Tommaso
Püschel Markus
Seifert Bastian
Publication venue
Publication date: 24/11/2022
Field of study

M\"obius inversion of functions on partially ordered sets (posets)

\mathcal{P}

is a classical tool in combinatorics. For finite posets it consists of two, mutually inverse, linear transformations called zeta and M\"obius transform, respectively. In this paper we provide novel fast algorithms for both that require

O(nk)

time and space, where

n = |\mathcal{P}|

and

k

is the width (length of longest antichain) of

\mathcal{P}

, compared to

O(n^2)

for a direct computation. Our approach assumes that

\mathcal{P}

is given as directed acyclic graph (DAG)

(\mathcal{E}, \mathcal{P})

. The algorithms are then constructed using a chain decomposition for a one time cost of

O(|\mathcal{E}| + |\mathcal{E}_\text{red}| k)

, where

\mathcal{E}_\text{red}

is the number of edges in the DAG's transitive reduction. We show benchmarks with implementations of all algorithms including parallelized versions. The results show that our algorithms enable M\"obius inversion on posets with millions of nodes in seconds if the defining DAGs are sufficiently sparse.Comment: 16 pages, 7 figures, submitted for revie

arXiv.org e-Print Archive

Lifted Probabilistic Inference: A Guide for the Database Researcher

Author
Publication venue
Publication date: 06/03/2020
Field of study

CiteSeerX

Provenance and Probabilities in Relational Databases: From Theory to Practice

Author: Senellart Pierre
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/12/2017
Field of study

International audienceWe review the basics of data provenance in relational databases. We describe different provenance formalisms, from Boolean provenance to provenance semirings and beyond, that can be used for a wide variety of purposes, to obtain additional information on the output of a query. We discuss representation systems for data provenance, circuits in particular, with a focus on practical implementation. Finally, we explain how provenance is practically used for probabilistic query evaluation in probabilistic databases

INRIA a CCSD electronic archive server

Range Queries on Uncertain Data

Author: B Chazelle
B Chazelle
B Chazelle
B Chazelle
G Frederickson
J Driscoll
J Mitchell
M Yiu
P Agarwal
Publication venue
Publication date: 09/01/2015
Field of study

Given a set

P

n

uncertain points on the real line, each represented by its one-dimensional probability density function, we consider the problem of building data structures on

P

to answer range queries of the following three types for any query interval

I

: (1) top-

1

query: find the point in

P

that lies in

I

with the highest probability, (2) top-

k

query: given any integer

k\leq n

as part of the query, return the

k

points in

P

that lie in

I

with the highest probabilities, and (3) threshold query: given any threshold

\tau

as part of the query, return all points of

P

that lie in

I

with probabilities at least

\tau

. We present data structures for these range queries with linear or nearly linear space and efficient query time.Comment: 26 pages. A preliminary version of this paper appeared in ISAAC 2014. In this full version, we also present solutions to the most general case of the problem (i.e., the histogram bounded case), which were left as open problems in the preliminary versio

arXiv.org e-Print Archive

Crossref

Model Counting of Query Expressions: Limitations of Propositional Methods

Author: Beame Paul
Li Jerry
Roy Sudeepa
Suciu Dan
Publication venue
Publication date: 15/12/2013
Field of study

Query evaluation in tuple-independent probabilistic databases is the problem of computing the probability of an answer to a query given independent probabilities of the individual tuples in a database instance. There are two main approaches to this problem: (1) in `grounded inference' one first obtains the lineage for the query and database instance as a Boolean formula, then performs weighted model counting on the lineage (i.e., computes the probability of the lineage given probabilities of its independent Boolean variables); (2) in methods known as `lifted inference' or `extensional query evaluation', one exploits the high-level structure of the query as a first-order formula. Although it is widely believed that lifted inference is strictly more powerful than grounded inference on the lineage alone, no formal separation has previously been shown for query evaluation. In this paper we show such a formal separation for the first time. We exhibit a class of queries for which model counting can be done in polynomial time using extensional query evaluation, whereas the algorithms used in state-of-the-art exact model counters on their lineages provably require exponential time. Our lower bounds on the running times of these exact model counters follow from new exponential size lower bounds on the kinds of d-DNNF representations of the lineages that these model counters (either explicitly or implicitly) produce. Though some of these queries have been studied before, no non-trivial lower bounds on the sizes of these representations for these queries were previously known.Comment: To appear in International Conference on Database Theory (ICDT) 201

arXiv.org e-Print Archive

CiteSeerX