4 research outputs found
Answering Queries using Views over Probabilistic XML: Complexity and Tractability
We study the complexity of query answering using views in a probabilistic XML
setting, identifying large classes of XPath queries -- with child and
descendant navigation and predicates -- for which there are efficient (PTime)
algorithms. We consider this problem under the two possible semantics for XML
query results: with persistent node identifiers and in their absence.
Accordingly, we consider rewritings that can exploit a single view, by means of
compensation, and rewritings that can use multiple views, by means of
intersection. Since in a probabilistic setting queries return answers with
probabilities, the problem of rewriting goes beyond the classic one of
retrieving XML answers from views. For both semantics of XML queries, we show
that, even when XML answers can be retrieved from views, their probabilities
may not be computable. For rewritings that use only compensation, we describe a
PTime decision procedure, based on easily verifiable criteria that distinguish
between the feasible cases -- when probabilistic XML results are computable --
and the unfeasible ones. For rewritings that can use multiple views, with
compensation and intersection, we identify the most permissive conditions that
make probabilistic rewriting feasible, and we describe an algorithm that is
sound in general, and becomes complete under fairly permissive restrictions,
running in PTime modulo worst-case exponential time equivalence tests. This is
the best we can hope for since intersection makes query equivalence intractable
already over deterministic data. Our algorithm runs in PTime whenever
deterministic rewritings can be found in PTime.Comment: VLDB201
Aggregate Queries for Discrete and Continuous Probabilistic XML
Sources of data uncertainty and imprecision are numerous. A way to handle this uncertainty is to associate probabilistic annotations to data. Many such probabilistic database models have been proposed, both in the relational and in the semi-structured setting. The latter is particularly well adapted to the management of uncertain data coming from a variety of automatic processes. An important problem, in the context of probabilistic XML databases, is that of answering aggregate queries (count, sum, avg, etc.), which has received limited attention so far. In a model unifying the various (discrete) semi-structured probabilistic models studied up to now, we present algorithms to compute the distribution of the aggregation values (exploiting some regularity properties of the aggregate functions) and probabilistic moments (especially, expectation and variance) of this distribution. We also prove the intractability of some of these problems and investigate approximation techniques. We finally extend the discrete model to a continuous one, in order to take into account continuous data values, such as measurements from sensor networks, and present algorithms to compute distribution functions and moments for various classes of continuous distributions of data values