25 research outputs found
Structurally Tractable Uncertain Data
Many data management applications must deal with data which is uncertain,
incomplete, or noisy. However, on existing uncertain data representations, we
cannot tractably perform the important query evaluation tasks of determining
query possibility, certainty, or probability: these problems are hard on
arbitrary uncertain input instances. We thus ask whether we could restrict the
structure of uncertain data so as to guarantee the tractability of exact query
evaluation. We present our tractability results for tree and tree-like
uncertain data, and a vision for probabilistic rule reasoning. We also study
uncertainty about order, proposing a suitable representation, and study
uncertain data conditioned by additional observations.Comment: 11 pages, 1 figure, 1 table. To appear in SIGMOD/PODS PhD Symposium
201
Fast M\"obius and Zeta Transforms
M\"obius inversion of functions on partially ordered sets (posets)
is a classical tool in combinatorics. For finite posets it
consists of two, mutually inverse, linear transformations called zeta and
M\"obius transform, respectively. In this paper we provide novel fast
algorithms for both that require time and space, where and is the width (length of longest antichain) of
, compared to for a direct computation. Our approach
assumes that is given as directed acyclic graph (DAG)
. The algorithms are then constructed using a chain
decomposition for a one time cost of , where is the number of
edges in the DAG's transitive reduction. We show benchmarks with
implementations of all algorithms including parallelized versions. The results
show that our algorithms enable M\"obius inversion on posets with millions of
nodes in seconds if the defining DAGs are sufficiently sparse.Comment: 16 pages, 7 figures, submitted for revie
Provenance and Probabilities in Relational Databases: From Theory to Practice
International audienceWe review the basics of data provenance in relational databases. We describe different provenance formalisms, from Boolean provenance to provenance semirings and beyond, that can be used for a wide variety of purposes, to obtain additional information on the output of a query. We discuss representation systems for data provenance, circuits in particular, with a focus on practical implementation. Finally, we explain how provenance is practically used for probabilistic query evaluation in probabilistic databases
Range Queries on Uncertain Data
Given a set of uncertain points on the real line, each represented by
its one-dimensional probability density function, we consider the problem of
building data structures on to answer range queries of the following three
types for any query interval : (1) top- query: find the point in that
lies in with the highest probability, (2) top- query: given any integer
as part of the query, return the points in that lie in
with the highest probabilities, and (3) threshold query: given any threshold
as part of the query, return all points of that lie in with
probabilities at least . We present data structures for these range
queries with linear or nearly linear space and efficient query time.Comment: 26 pages. A preliminary version of this paper appeared in ISAAC 2014.
In this full version, we also present solutions to the most general case of
the problem (i.e., the histogram bounded case), which were left as open
problems in the preliminary versio
Model Counting of Query Expressions: Limitations of Propositional Methods
Query evaluation in tuple-independent probabilistic databases is the problem
of computing the probability of an answer to a query given independent
probabilities of the individual tuples in a database instance. There are two
main approaches to this problem: (1) in `grounded inference' one first obtains
the lineage for the query and database instance as a Boolean formula, then
performs weighted model counting on the lineage (i.e., computes the probability
of the lineage given probabilities of its independent Boolean variables); (2)
in methods known as `lifted inference' or `extensional query evaluation', one
exploits the high-level structure of the query as a first-order formula.
Although it is widely believed that lifted inference is strictly more powerful
than grounded inference on the lineage alone, no formal separation has
previously been shown for query evaluation. In this paper we show such a formal
separation for the first time.
We exhibit a class of queries for which model counting can be done in
polynomial time using extensional query evaluation, whereas the algorithms used
in state-of-the-art exact model counters on their lineages provably require
exponential time. Our lower bounds on the running times of these exact model
counters follow from new exponential size lower bounds on the kinds of d-DNNF
representations of the lineages that these model counters (either explicitly or
implicitly) produce. Though some of these queries have been studied before, no
non-trivial lower bounds on the sizes of these representations for these
queries were previously known.Comment: To appear in International Conference on Database Theory (ICDT) 201