218 research outputs found
Generalized Lineage-Aware Temporal Windows: Supporting Outer and Anti Joins in Temporal-Probabilistic Databases
The result of a temporal-probabilistic (TP) join with negation includes, at
each time point, the probability with which a tuple of a positive relation
matches none of the tuples in a negative relation , for a
given join condition . TP outer and anti joins thus resemble the
characteristics of relational outer and anti joins also in the case when there
exist time points at which input tuples from have non-zero
probabilities to be and input tuples from have non-zero
probabilities to be , respectively. For the computation of TP joins with
negation, we introduce generalized lineage-aware temporal windows, a mechanism
that binds an output interval to the lineages of all the matching valid tuples
of each input relation. We group the windows of two TP relations into three
disjoint sets based on the way attributes, lineage expressions and intervals
are produced. We compute all windows in an incremental manner, and we show that
pipelined computations allow for the direct integration of our approach into
PostgreSQL. We thereby alleviate the prevalent redundancies in the interval
computations of existing approaches, which is proven by an extensive
experimental evaluation with real-world datasets
Snapshot Semantics for Temporal Multiset Relations (Extended Version)
Snapshot semantics is widely used for evaluating queries over temporal data:
temporal relations are seen as sequences of snapshot relations, and queries are
evaluated at each snapshot. In this work, we demonstrate that current
approaches for snapshot semantics over interval-timestamped multiset relations
are subject to two bugs regarding snapshot aggregation and bag difference. We
introduce a novel temporal data model based on K-relations that overcomes these
bugs and prove it to correctly encode snapshot semantics. Furthermore, we
present an efficient implementation of our model as a database middleware and
demonstrate experimentally that our approach is competitive with native
implementations and significantly outperforms such implementations on queries
that involve aggregation.Comment: extended version of PVLDB pape
Query Results over Ongoing Databases that Remain Valid as Time Passes By (Extended Version)
Ongoing time point now is used to state that a tuple is valid from the start
point onward. For database systems ongoing time points have far-reaching
implications since they change continuously as time passes by. State-of-the-art
approaches deal with ongoing time points by instantiating them to the reference
time. The instantiation yields query results that are only valid at the chosen
time and get invalidated as time passes by. We propose a solution that keeps
ongoing time points uninstantiated during query processing. We do so by
evaluating predicates and functions at all possible reference times. This
renders query results independent of a specific reference time and yields
results that remain valid as time passes by. As query results, we propose
ongoing relations that include a reference time attribute. The value of the
reference time attribute is restricted by predicates and functions on ongoing
attributes. We describe and evaluate an efficient implementation of ongoing
data types and operations in PostgreSQL.Comment: Extended version of ICDE pape
Lineage-Aware Temporal Windows: Supporting Set Operations in Temporal-Probabilistic Databases
In temporal-probabilistic (TP) databases, the combination of the temporal and
the probabilistic dimension adds significant overhead to the computation of set
operations. Although set queries are guaranteed to yield linearly sized output
relations, existing solutions exhibit quadratic runtime complexity. They suffer
from redundant interval comparisons and additional joins for the formation of
lineage expressions. In this paper, we formally define the semantics of set
operations in TP databases and study their properties. For their efficient
computation, we introduce the lineage-aware temporal window, a mechanism that
directly binds intervals with lineage expressions. We suggest the lineage-aware
window advancer (LAWA) for producing the windows of two TP relations in
linearithmic time, and we implement all TP set operations based on LAWA. By
exploiting the flexibility of lineage-aware temporal windows, we perform direct
filtering of irrelevant intervals and finalization of output lineage
expressions and thus guarantee that no additional computational cost or buffer
space is needed. A series of experiments over both synthetic and real-world
datasets show that (a) our approach has predictable performance, depending only
on the input size and not on the number of time intervals per fact or their
overlap, and that (b) it outperforms state-of-the-art approaches in both
temporal and probabilistic databases
Dynamic Spanning Trees for Connectivity Queries on Fully-dynamic Undirected Graphs (Extended version)
Answering connectivity queries is fundamental to fully dynamic graphs where
edges and vertices are inserted and deleted frequently. Existing work proposes
data structures and algorithms with worst-case guarantees. We propose a new
data structure, the dynamic tree (D-tree), together with algorithms to
construct and maintain it. The D-tree is the first data structure that scales
to fully dynamic graphs with millions of vertices and edges and, on average,
answers connectivity queries much faster than data structures with worst case
guarantees
Leveraging range joins for the computation of overlap joins
Joins are essential and potentially expensive operations in database management systems. When data is associated with time periods, joins commonly include predicates that require pairs of argument tuples to overlap in order to qualify for the result. Our goal is to enable built-in systems support for such joins. In particular, we present an approach where overlap joins are formulated as unions of range joins, which are more general purpose joins compared to overlap joins, i.e., are useful in their own right, and are supported well by B+-trees. The approach is sufficiently flexible that it also supports joins with additional equality predicates, as well as open, closed, and half-open time periods over discrete and continuous domains, thus offering both generality and simplicity, which is important in a system setting. We provide both a stand-alone solution that performs on par with the state-of-the-art and a DBMS embedded solution that is able to exploit standard indexing and clearly outperforms existing DBMS solutions that depend on specialized indexing techniques. We offer both analytical and empirical evaluations of the proposals. The empirical study includes comparisons with pertinent existing proposals and offers detailed insight into the performance characteristics of the proposals
Speeding Up Reachability Queries in Public Transport Networks Using Graph Partitioning
Computing path queries such as the shortest path in public transport networks is challenging because the path costs between nodes change over time. A reachability query from a node at a given start time on such a network retrieves all points of interest (POIs) that are reachable within a given cost budget. Reachability queries are essential building blocks in many applications, for example, group recommendations, ranking spatial queries, or geomarketing. We propose an efficient solution for reachability queries in public transport networks. Currently, there are two options to solve reachability queries. (1) Execute a modified version of Dijkstra’s algorithm that supports time-dependent edge traversal costs; this solution is slow since it must expand edge by edge and does not use an index. (2) Issue a separate path query for each single POI, i.e., a single reachability query requires answering many path queries. None of these solutions scales to large networks with many POIs. We propose a novel and lightweight reachability index. The key idea is to partition the network into cells. Then, in contrast to other approaches, we expand the network cell by cell. Empirical evaluations on synthetic and real-world networks confirm the efficiency and the effectiveness of our index-based reachability query solution
abcOD: Mining Band Order Dependencies
We present the design of and a demonstration plan for abcOD, a tool for efficiently discovering approximate band conditional order dependencies (abcODs) from data. abcOD utilizes a dynamic programming algorithm based on a longest monotonic band. Using real datasets, we demonstrate how the discovered abcODs can help users understand ordered data semantics, identify potential data quality problems, and interactively clean the data
- …