20 research outputs found
Tractability in probabilistic databases
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately
Generalized Lineage-Aware Temporal Windows: Supporting Outer and Anti Joins in Temporal-Probabilistic Databases
The result of a temporal-probabilistic (TP) join with negation includes, at
each time point, the probability with which a tuple of a positive relation
matches none of the tuples in a negative relation , for a
given join condition . TP outer and anti joins thus resemble the
characteristics of relational outer and anti joins also in the case when there
exist time points at which input tuples from have non-zero
probabilities to be and input tuples from have non-zero
probabilities to be , respectively. For the computation of TP joins with
negation, we introduce generalized lineage-aware temporal windows, a mechanism
that binds an output interval to the lineages of all the matching valid tuples
of each input relation. We group the windows of two TP relations into three
disjoint sets based on the way attributes, lineage expressions and intervals
are produced. We compute all windows in an incremental manner, and we show that
pipelined computations allow for the direct integration of our approach into
PostgreSQL. We thereby alleviate the prevalent redundancies in the interval
computations of existing approaches, which is proven by an extensive
experimental evaluation with real-world datasets
10 Years of Probabilistic Querying – What Next?
Over the past decade, the two research areas of probabilistic databases and probabilistic programming have intensively studied the problem of making structured probabilistic inference scalable, but — so far — both areas developed almost independently of one another. While probabilistic databases have focused on describing tractable query classes based on the structure of query plans and data lineage, probabilistic programming has contributed sophisticated inference techniques based on knowledge compilation and lifted (first-order) inference. Both fields have developed their own variants of — both exact and approximate — top-k algorithms for query evaluation, and both investigate query optimization techniques known from SQL, Datalog, and Prolog, which all calls for a more intensive study of the commonalities and integration of the two fields. Moreover, we believe that natural-language processing and information extraction will remain a driving factor and in fact a longstanding challenge for developing expressive representation models which can be combined with structured probabilistic inference — also for the next decades to come
A dichotomy for non-repeating queries with negation in probabilistic databases
This paper shows that any non-repeating conjunctive rela-tional query with negation has either polynomial time or #P-hard data complexity on tuple-independent probabilis-tic databases. This result extends a dichotomy by Dalvi and Suciu for non-repeating conjunctive queries to queries with negation. The tractable queries with negation are precisely the hierarchical ones and can be recognised efficiently. 1
Faster Query Answering in Probabilistic Databases using Read-Once Functions
A boolean expression is in read-once form if each of its variables appears
exactly once. When the variables denote independent events in a probability
space, the probability of the event denoted by the whole expression in
read-once form can be computed in polynomial time (whereas the general problem
for arbitrary expressions is #P-complete). Known approaches to checking
read-once property seem to require putting these expressions in disjunctive
normal form. In this paper, we tell a better story for a large subclass of
boolean event expressions: those that are generated by conjunctive queries
without self-joins and on tuple-independent probabilistic databases. We first
show that given a tuple-independent representation and the provenance graph of
an SPJ query plan without self-joins, we can, without using the DNF of a result
event expression, efficiently compute its co-occurrence graph. From this, the
read-once form can already, if it exists, be computed efficiently using
existing techniques. Our second and key contribution is a complete, efficient,
and simple to implement algorithm for computing the read-once forms (whenever
they exist) directly, using a new concept, that of co-table graph, which can be
significantly smaller than the co-occurrence graph.Comment: Accepted in ICDT 201
Factorised Representations of Query Results
Query tractability has been traditionally defined as a function of input
database and query sizes, or of both input and output sizes, where the query
result is represented as a bag of tuples. In this report, we introduce a
framework that allows to investigate tractability beyond this setting. The key
insight is that, although the cardinality of a query result can be exponential,
its structure can be very regular and thus factorisable into a nested
representation whose size is only polynomial in the size of both the input
database and query.
For a given query result, there may be several equivalent representations,
and we quantify the regularity of the result by its readability, which is the
minimum over all its representations of the maximum number of occurrences of
any tuple in that representation. We give a characterisation of
select-project-join queries based on the bounds on readability of their results
for any input database. We complement it with an algorithm that can find
asymptotically optimal upper bounds and corresponding factorised
representations.Comment: 44 pages, 13 figure
Oblivious Bounds on the Probability of Boolean Functions
This paper develops upper and lower bounds for the probability of Boolean
functions by treating multiple occurrences of variables as independent and
assigning them new individual probabilities. We call this approach dissociation
and give an exact characterization of optimal oblivious bounds, i.e. when the
new probabilities are chosen independent of the probabilities of all other
variables. Our motivation comes from the weighted model counting problem (or,
equivalently, the problem of computing the probability of a Boolean function),
which is #P-hard in general. By performing several dissociations, one can
transform a Boolean formula whose probability is difficult to compute, into one
whose probability is easy to compute, and which is guaranteed to provide an
upper or lower bound on the probability of the original formula by choosing
appropriate probabilities for the dissociated variables. Our new bounds shed
light on the connection between previous relaxation-based and model-based
approximations and unify them as concrete choices in a larger design space. We
also show how our theory allows a standard relational database management
system (DBMS) to both upper and lower bound hard probabilistic queries in
guaranteed polynomial time.Comment: 34 pages, 14 figures, supersedes: http://arxiv.org/abs/1105.281