43,312 research outputs found
Generalized Lineage-Aware Temporal Windows: Supporting Outer and Anti Joins in Temporal-Probabilistic Databases
The result of a temporal-probabilistic (TP) join with negation includes, at
each time point, the probability with which a tuple of a positive relation
matches none of the tuples in a negative relation , for a
given join condition . TP outer and anti joins thus resemble the
characteristics of relational outer and anti joins also in the case when there
exist time points at which input tuples from have non-zero
probabilities to be and input tuples from have non-zero
probabilities to be , respectively. For the computation of TP joins with
negation, we introduce generalized lineage-aware temporal windows, a mechanism
that binds an output interval to the lineages of all the matching valid tuples
of each input relation. We group the windows of two TP relations into three
disjoint sets based on the way attributes, lineage expressions and intervals
are produced. We compute all windows in an incremental manner, and we show that
pipelined computations allow for the direct integration of our approach into
PostgreSQL. We thereby alleviate the prevalent redundancies in the interval
computations of existing approaches, which is proven by an extensive
experimental evaluation with real-world datasets
Spatial Probabilistic Temporal Databases
Research in spatio-temporal probabilistic reasoning examines algorithms for handling data such as cell phone triangulation, GPS systems, movement prediction software, and other inexact but useful data sources. In this thesis I describe a probabilistic model theory for such data. The Spatial PrObabilistic Temporal database framework (or SPOT database framework) provides methods for interpreting, checking consistency, automatically revising, and querying such databases. This thesis examines two different semantics within the SPOT framework and presents polynomial-time consistency checking algorithms for both. It introduces several revision techniques for repairing inconsistent databases and compares them to the AGM Axioms for belief state revision; finding an algorithm that, by only changing the probability bounds in the SPOT atoms, can repair a SPOT database in polynomial time while still satisfying the AGM axioms. Also included is an investigation into optimistic and cautious versions of a selection query that returns all objects in a given region with at least (or at most) a certain probability. For these queries, I introduce an indexing structure akin to the R-tree called a SPOT tree, and show experiments where indexing speeds up selection with both artificial and real-world data. I also introduce query preprocessing techniques that bound the sets of solutions with both circumscribing and inscribing regions, and discover these to also provide query time improvements in practice. By covering semantics, consistency checking, database revision, indexing, and query preprocessing techniques for SPOT database, this thesis provides a significant step towards a SPOT database framework that may be applied to the sorts of real-world problems in the impressive amount of semi-accurate spatio-temporal data available today
Lineage-Aware Temporal Windows: Supporting Set Operations in Temporal-Probabilistic Databases
In temporal-probabilistic (TP) databases, the combination of the temporal and
the probabilistic dimension adds significant overhead to the computation of set
operations. Although set queries are guaranteed to yield linearly sized output
relations, existing solutions exhibit quadratic runtime complexity. They suffer
from redundant interval comparisons and additional joins for the formation of
lineage expressions. In this paper, we formally define the semantics of set
operations in TP databases and study their properties. For their efficient
computation, we introduce the lineage-aware temporal window, a mechanism that
directly binds intervals with lineage expressions. We suggest the lineage-aware
window advancer (LAWA) for producing the windows of two TP relations in
linearithmic time, and we implement all TP set operations based on LAWA. By
exploiting the flexibility of lineage-aware temporal windows, we perform direct
filtering of irrelevant intervals and finalization of output lineage
expressions and thus guarantee that no additional computational cost or buffer
space is needed. A series of experiments over both synthetic and real-world
datasets show that (a) our approach has predictable performance, depending only
on the input size and not on the number of time intervals per fact or their
overlap, and that (b) it outperforms state-of-the-art approaches in both
temporal and probabilistic databases
Probabilistic Temporal Databases, I: Algebra
Dyreson and Snodgrass have drawn attention to the fact that in many
temporal database applications, there is often uncertainty present
about the start time of events, the end time of events, the duration of
events, etc. When the granularity of time is small (e.g. milliseconds),
a statement such as "Packet p was shipped sometime during the
first 5 days of January, 1998" leads to a massive amount of uncertainty
(5 times 24 times 60 times 60 times 1000) possibilities. As noted by
Zaniolo et. al., past
attempts to deal with uncertainty in databases have been restricted
to relatively small amounts of uncertainty in attributes.
Dyreson and Snodgrass have taken an important first
step towards solving this problem.
In this paper, we first introduce the syntax of Temporal-Probabilistic
(TP) relations and then show how they can be converted to an explicit,
significantly more space-consuming form called Annotated Relations.
We then present a {\em Theoretical Annotated Temporal
Algebra} (TATA). Being explicit, TATA
is convenient for specifying how the
algebraic operations should behave, but is impractical to use because
annotated relations are overwhelmingly large.
Next, we present a Temporal Probabilistic Algebra (TPA).
We show that our definition of the TP-Algebra
provides a correct implementation of TATA despite the fact that
it operates on implicit, succinct TP-relations instead of the
overwhelmingly large annotated relations.
Finally, we report on timings for an implementation of the TP-Algebra
built on top of ODBC.
(Also cross-referenced as UMIACS-TR-99-09
Time-Aware Probabilistic Knowledge Graphs
The emergence of open information extraction as a tool for constructing and expanding knowledge graphs has aided the growth of temporal data, for instance, YAGO, NELL and Wikidata. While YAGO and Wikidata maintain the valid time of facts, NELL records the time point at which a fact is retrieved from some Web corpora. Collectively, these knowledge graphs (KG) store facts extracted from Wikipedia and other sources. Due to the imprecise nature of the extraction tools that are used to build and expand KG, such as NELL, the facts in the KG are weighted (a confidence value representing the correctness of a fact). Additionally, NELL can be considered as a transaction time KG because every fact is associated with extraction date. On the other hand, YAGO and Wikidata use the valid time model because they maintain facts together with their validity time (temporal scope). In this paper, we propose a bitemporal model (that combines transaction and valid time models) for maintaining and querying bitemporal probabilistic knowledge graphs. We study coalescing and scalability of marginal and MAP inference. Moreover, we show that complexity of reasoning tasks in atemporal probabilistic KG carry over to the bitemporal setting. Finally, we report our evaluation results of the proposed model
Learning Tuple Probabilities
Learning the parameters of complex probabilistic-relational models from
labeled training data is a standard technique in machine learning, which has
been intensively studied in the subfield of Statistical Relational Learning
(SRL), but---so far---this is still an under-investigated topic in the context
of Probabilistic Databases (PDBs). In this paper, we focus on learning the
probability values of base tuples in a PDB from labeled lineage formulas. The
resulting learning problem can be viewed as the inverse problem to confidence
computations in PDBs: given a set of labeled query answers, learn the
probability values of the base tuples, such that the marginal probabilities of
the query answers again yield in the assigned probability labels. We analyze
the learning problem from a theoretical perspective, cast it into an
optimization problem, and provide an algorithm based on stochastic gradient
descent. Finally, we conclude by an experimental evaluation on three real-world
and one synthetic dataset, thus comparing our approach to various techniques
from SRL, reasoning in information extraction, and optimization
- …