105 research outputs found
Snapshot Semantics for Temporal Multiset Relations (Extended Version)
Snapshot semantics is widely used for evaluating queries over temporal data:
temporal relations are seen as sequences of snapshot relations, and queries are
evaluated at each snapshot. In this work, we demonstrate that current
approaches for snapshot semantics over interval-timestamped multiset relations
are subject to two bugs regarding snapshot aggregation and bag difference. We
introduce a novel temporal data model based on K-relations that overcomes these
bugs and prove it to correctly encode snapshot semantics. Furthermore, we
present an efficient implementation of our model as a database middleware and
demonstrate experimentally that our approach is competitive with native
implementations and significantly outperforms such implementations on queries
that involve aggregation.Comment: extended version of PVLDB pape
Finding k-Dissimilar Paths with Minimum Collective Length
Shortest path computation is a fundamental problem in road networks. However,
in many real-world scenarios, determining solely the shortest path is not
enough. In this paper, we study the problem of finding k-Dissimilar Paths with
Minimum Collective Length (kDPwML), which aims at computing a set of paths from
a source s to a target t such that all paths are pairwise dissimilar by at
least \theta and the sum of the path lengths is minimal. We introduce an exact
algorithm for the kDPwML problem, which iterates over all possible s-t paths
while employing two pruning techniques to reduce the prohibitively expensive
computational cost. To achieve scalability, we also define the much smaller set
of the simple single-via paths, and we adapt two algorithms for kDPwML queries
to iterate over this set. Our experimental analysis on real road networks shows
that iterating over all paths is impractical, while iterating over the set of
simple single-via paths can lead to scalable solutions with only a small
trade-off in the quality of the results.Comment: Extended version of the SIGSPATIAL'18 paper under the same titl
GEDLIB: Une bibliothèque C++ pour le calcul de la distance d'édition sur graphes
International audienceThe graph edit distance (GED) is a flexible graph dissimilarity measure widely used within the structural pattern recognition field. In this paper, we present GEDLIB, a C++ library for exactly or approximately computing GED. Many existing algorithms for GED are already implemented in GEDLIB. Moreover, GEDLIB is designed to be easily extensible: for implementing new edit cost functions and GED algorithms, it suffices to implement abstract classes contained in the library. For implementing these extensions, the user has access to a wide range of utilities, such as deep neural networks, support vector machines, mixed integer linear programming solvers, a blackbox optimizer, and solvers for the linear sum assignment problem with and without error-correction
Upper Bounding the Graph Edit Distance Based on Rings and Machine Learning
The graph edit distance (GED) is a flexible distance measure which is widely
used for inexact graph matching. Since its exact computation is NP-hard,
heuristics are used in practice. A popular approach is to obtain upper bounds
for GED via transformations to the linear sum assignment problem with
error-correction (LSAPE). Typically, local structures and distances between
them are employed for carrying out this transformation, but recently also
machine learning techniques have been used. In this paper, we formally define a
unifying framework LSAPE-GED for transformations from GED to LSAPE. We also
introduce rings, a new kind of local structures designed for graphs where most
information resides in the topology rather than in the node labels.
Furthermore, we propose two new ring based heuristics RING and RING-ML, which
instantiate LSAPE-GED using the traditional and the machine learning based
approach for transforming GED to LSAPE, respectively. Extensive experiments
show that using rings for upper bounding GED significantly improves the state
of the art on datasets where most information resides in the graphs'
topologies. This closes the gap between fast but rather inaccurate LSAPE based
heuristics and more accurate but significantly slower GED algorithms based on
local search
Leveraging range joins for the computation of overlap joins
Joins are essential and potentially expensive operations in database management systems. When data is associated with time periods, joins commonly include predicates that require pairs of argument tuples to overlap in order to qualify for the result. Our goal is to enable built-in systems support for such joins. In particular, we present an approach where overlap joins are formulated as unions of range joins, which are more general purpose joins compared to overlap joins, i.e., are useful in their own right, and are supported well by B+-trees. The approach is sufficiently flexible that it also supports joins with additional equality predicates, as well as open, closed, and half-open time periods over discrete and continuous domains, thus offering both generality and simplicity, which is important in a system setting. We provide both a stand-alone solution that performs on par with the state-of-the-art and a DBMS embedded solution that is able to exploit standard indexing and clearly outperforms existing DBMS solutions that depend on specialized indexing techniques. We offer both analytical and empirical evaluations of the proposals. The empirical study includes comparisons with pertinent existing proposals and offers detailed insight into the performance characteristics of the proposals
Leveraging range joins for the computation of overlap joins
Joins are essential and potentially expensive operations in database management systems. When data is associated with time periods, joins commonly include predicates that require pairs of argument tuples to overlap in order to qualify for the result. Our goal is to enable built-in systems support for such joins. In particular, we present an approach where overlap joins are formulated as unions of range joins, which are more general purpose joins compared to overlap joins, i.e., are useful in their own right, and are supported well by B+-trees. The approach is sufficiently flexible that it also supports joins with additional equality predicates, as well as open, closed, and half-open time periods over discrete and continuous domains, thus offering both generality and simplicity, which is important in a system setting. We provide both a stand-alone solution that performs on par with the state-of-the-art and a DBMS embedded solution that is able to exploit standard indexing and clearly outperforms existing DBMS solutions that depend on specialized indexing techniques. We offer both analytical and empirical evaluations of the proposals. The empirical study includes comparisons with pertinent existing proposals and offers detailed insight into the performance characteristics of the proposals
- …