43,312 research outputs found

    Generalized Lineage-Aware Temporal Windows: Supporting Outer and Anti Joins in Temporal-Probabilistic Databases

    Get PDF
    The result of a temporal-probabilistic (TP) join with negation includes, at each time point, the probability with which a tuple of a positive relation p{\bf p} matches none of the tuples in a negative relation n{\bf n}, for a given join condition θ\theta. TP outer and anti joins thus resemble the characteristics of relational outer and anti joins also in the case when there exist time points at which input tuples from p{\bf p} have non-zero probabilities to be truetrue and input tuples from n{\bf n} have non-zero probabilities to be falsefalse, respectively. For the computation of TP joins with negation, we introduce generalized lineage-aware temporal windows, a mechanism that binds an output interval to the lineages of all the matching valid tuples of each input relation. We group the windows of two TP relations into three disjoint sets based on the way attributes, lineage expressions and intervals are produced. We compute all windows in an incremental manner, and we show that pipelined computations allow for the direct integration of our approach into PostgreSQL. We thereby alleviate the prevalent redundancies in the interval computations of existing approaches, which is proven by an extensive experimental evaluation with real-world datasets

    Spatial Probabilistic Temporal Databases

    Get PDF
    Research in spatio-temporal probabilistic reasoning examines algorithms for handling data such as cell phone triangulation, GPS systems, movement prediction software, and other inexact but useful data sources. In this thesis I describe a probabilistic model theory for such data. The Spatial PrObabilistic Temporal database framework (or SPOT database framework) provides methods for interpreting, checking consistency, automatically revising, and querying such databases. This thesis examines two different semantics within the SPOT framework and presents polynomial-time consistency checking algorithms for both. It introduces several revision techniques for repairing inconsistent databases and compares them to the AGM Axioms for belief state revision; finding an algorithm that, by only changing the probability bounds in the SPOT atoms, can repair a SPOT database in polynomial time while still satisfying the AGM axioms. Also included is an investigation into optimistic and cautious versions of a selection query that returns all objects in a given region with at least (or at most) a certain probability. For these queries, I introduce an indexing structure akin to the R-tree called a SPOT tree, and show experiments where indexing speeds up selection with both artificial and real-world data. I also introduce query preprocessing techniques that bound the sets of solutions with both circumscribing and inscribing regions, and discover these to also provide query time improvements in practice. By covering semantics, consistency checking, database revision, indexing, and query preprocessing techniques for SPOT database, this thesis provides a significant step towards a SPOT database framework that may be applied to the sorts of real-world problems in the impressive amount of semi-accurate spatio-temporal data available today

    Lineage-Aware Temporal Windows: Supporting Set Operations in Temporal-Probabilistic Databases

    Get PDF
    In temporal-probabilistic (TP) databases, the combination of the temporal and the probabilistic dimension adds significant overhead to the computation of set operations. Although set queries are guaranteed to yield linearly sized output relations, existing solutions exhibit quadratic runtime complexity. They suffer from redundant interval comparisons and additional joins for the formation of lineage expressions. In this paper, we formally define the semantics of set operations in TP databases and study their properties. For their efficient computation, we introduce the lineage-aware temporal window, a mechanism that directly binds intervals with lineage expressions. We suggest the lineage-aware window advancer (LAWA) for producing the windows of two TP relations in linearithmic time, and we implement all TP set operations based on LAWA. By exploiting the flexibility of lineage-aware temporal windows, we perform direct filtering of irrelevant intervals and finalization of output lineage expressions and thus guarantee that no additional computational cost or buffer space is needed. A series of experiments over both synthetic and real-world datasets show that (a) our approach has predictable performance, depending only on the input size and not on the number of time intervals per fact or their overlap, and that (b) it outperforms state-of-the-art approaches in both temporal and probabilistic databases

    Probabilistic Temporal Databases, I: Algebra

    Get PDF
    Dyreson and Snodgrass have drawn attention to the fact that in many temporal database applications, there is often uncertainty present about the start time of events, the end time of events, the duration of events, etc. When the granularity of time is small (e.g. milliseconds), a statement such as "Packet p was shipped sometime during the first 5 days of January, 1998" leads to a massive amount of uncertainty (5 times 24 times 60 times 60 times 1000) possibilities. As noted by Zaniolo et. al., past attempts to deal with uncertainty in databases have been restricted to relatively small amounts of uncertainty in attributes. Dyreson and Snodgrass have taken an important first step towards solving this problem. In this paper, we first introduce the syntax of Temporal-Probabilistic (TP) relations and then show how they can be converted to an explicit, significantly more space-consuming form called Annotated Relations. We then present a {\em Theoretical Annotated Temporal Algebra} (TATA). Being explicit, TATA is convenient for specifying how the algebraic operations should behave, but is impractical to use because annotated relations are overwhelmingly large. Next, we present a Temporal Probabilistic Algebra (TPA). We show that our definition of the TP-Algebra provides a correct implementation of TATA despite the fact that it operates on implicit, succinct TP-relations instead of the overwhelmingly large annotated relations. Finally, we report on timings for an implementation of the TP-Algebra built on top of ODBC. (Also cross-referenced as UMIACS-TR-99-09

    Time-Aware Probabilistic Knowledge Graphs

    Get PDF
    The emergence of open information extraction as a tool for constructing and expanding knowledge graphs has aided the growth of temporal data, for instance, YAGO, NELL and Wikidata. While YAGO and Wikidata maintain the valid time of facts, NELL records the time point at which a fact is retrieved from some Web corpora. Collectively, these knowledge graphs (KG) store facts extracted from Wikipedia and other sources. Due to the imprecise nature of the extraction tools that are used to build and expand KG, such as NELL, the facts in the KG are weighted (a confidence value representing the correctness of a fact). Additionally, NELL can be considered as a transaction time KG because every fact is associated with extraction date. On the other hand, YAGO and Wikidata use the valid time model because they maintain facts together with their validity time (temporal scope). In this paper, we propose a bitemporal model (that combines transaction and valid time models) for maintaining and querying bitemporal probabilistic knowledge graphs. We study coalescing and scalability of marginal and MAP inference. Moreover, we show that complexity of reasoning tasks in atemporal probabilistic KG carry over to the bitemporal setting. Finally, we report our evaluation results of the proposed model

    Learning Tuple Probabilities

    Get PDF
    Learning the parameters of complex probabilistic-relational models from labeled training data is a standard technique in machine learning, which has been intensively studied in the subfield of Statistical Relational Learning (SRL), but---so far---this is still an under-investigated topic in the context of Probabilistic Databases (PDBs). In this paper, we focus on learning the probability values of base tuples in a PDB from labeled lineage formulas. The resulting learning problem can be viewed as the inverse problem to confidence computations in PDBs: given a set of labeled query answers, learn the probability values of the base tuples, such that the marginal probabilities of the query answers again yield in the assigned probability labels. We analyze the learning problem from a theoretical perspective, cast it into an optimization problem, and provide an algorithm based on stochastic gradient descent. Finally, we conclude by an experimental evaluation on three real-world and one synthetic dataset, thus comparing our approach to various techniques from SRL, reasoning in information extraction, and optimization
    • …
    corecore