4,856 research outputs found
Continuous Nearest Neighbor Queries over Sliding Windows
Abstract—This paper studies continuous monitoring of nearest neighbor (NN) queries over sliding window streams. According to this model, data points continuously stream in the system, and they are considered valid only while they belong to a sliding window that contains 1) the W most recent arrivals (count-based) or 2) the arrivals within a fixed interval W covering the most recent time stamps (time-based). The task of the query processor is to constantly maintain the result of long-running NN queries among the valid data. We present two processing techniques that apply to both count-based and time-based windows. The first one adapts conceptual partitioning, the best existing method for continuous NN monitoring over update streams, to the sliding window model. The second technique reduces the problem to skyline maintenance in the distance-time space and precomputes the future changes in the NN set. We analyze the performance of both algorithms and extend them to variations of NN search. Finally, we compare their efficiency through a comprehensive experimental evaluation. The skyline-based algorithm achieves lower CPU cost, at the expense of slightly larger space overhead. Index Terms—Location-dependent and sensitive, spatial databases, query processing, nearest neighbors, data streams, sliding windows.
Reverse k Nearest Neighbor Search over Trajectories
GPS enables mobile devices to continuously provide new opportunities to
improve our daily lives. For example, the data collected in applications
created by Uber or Public Transport Authorities can be used to plan
transportation routes, estimate capacities, and proactively identify low
coverage areas. In this paper, we study a new kind of query-Reverse k Nearest
Neighbor Search over Trajectories (RkNNT), which can be used for route planning
and capacity estimation. Given a set of existing routes DR, a set of passenger
transitions DT, and a query route Q, a RkNNT query returns all transitions that
take Q as one of its k nearest travel routes. To solve the problem, we first
develop an index to handle dynamic trajectory updates, so that the most
up-to-date transition data are available for answering a RkNNT query. Then we
introduce a filter refinement framework for processing RkNNT queries using the
proposed indexes. Next, we show how to use RkNNT to solve the optimal route
planning problem MaxRkNNT (MinRkNNT), which is to search for the optimal route
from a start location to an end location that could attract the maximum (or
minimum) number of passengers based on a pre-defined travel distance threshold.
Experiments on real datasets demonstrate the efficiency and scalability of our
approaches. To the best of our best knowledge, this is the first work to study
the RkNNT problem for route planning.Comment: 12 page
Hierarchical Categories in Colored Searching
In colored range counting (CRC), the input is a set of points where each point is assigned a "color" (or a "category") and the goal is to store them in a data structure such that the number of distinct categories inside a given query range can be counted efficiently. CRC has strong motivations as it allows data structure to deal with categorical data.
However, colors (i.e., the categories) in the CRC problem do not have any internal structure, whereas this is not the case for many datasets in practice where hierarchical categories exists or where a single input belongs to multiple categories. Motivated by these, we consider variants of the problem where such structures can be represented. We define two variants of the problem called hierarchical range counting (HCC) and sub-category colored range counting (SCRC) and consider hierarchical structures that can either be a DAG or a tree. We show that the two problems on some special trees are in fact equivalent to other well-known problems in the literature. Based on these, we also give efficient data structures when the underlying hierarchy can be represented as a tree. We show a conditional lower bound for the general case when the existing hierarchy can be any DAG, through reductions from the orthogonal vectors problem
Efficient Construction of Probabilistic Tree Embeddings
In this paper we describe an algorithm that embeds a graph metric
on an undirected weighted graph into a distribution of tree metrics
such that for every pair , and
. Such embeddings have
proved highly useful in designing fast approximation algorithms, as many hard
problems on graphs are easy to solve on tree instances. For a graph with
vertices and edges, our algorithm runs in time with high
probability, which improves the previous upper bound of shown by
Mendel et al.\,in 2009.
The key component of our algorithm is a new approximate single-source
shortest-path algorithm, which implements the priority queue with a new data
structure, the "bucket-tree structure". The algorithm has three properties: it
only requires linear time in the number of edges in the input graph; the
computed distances have a distance preserving property; and when computing the
shortest-paths to the -nearest vertices from the source, it only requires to
visit these vertices and their edge lists. These properties are essential to
guarantee the correctness and the stated time bound.
Using this shortest-path algorithm, we show how to generate an intermediate
structure, the approximate dominance sequences of the input graph, in time, and further propose a simple yet efficient algorithm to converted
this sequence to a tree embedding in time, both with high
probability. Combining the three subroutines gives the stated time bound of the
algorithm.
Then we show that this efficient construction can facilitate some
applications. We proved that FRT trees (the generated tree embedding) are
Ramsey partitions with asymptotically tight bound, so the construction of a
series of distance oracles can be accelerated
- …