972 research outputs found

    Continuous Nearest Neighbor Queries over Sliding Windows

    Get PDF
    Abstract—This paper studies continuous monitoring of nearest neighbor (NN) queries over sliding window streams. According to this model, data points continuously stream in the system, and they are considered valid only while they belong to a sliding window that contains 1) the W most recent arrivals (count-based) or 2) the arrivals within a fixed interval W covering the most recent time stamps (time-based). The task of the query processor is to constantly maintain the result of long-running NN queries among the valid data. We present two processing techniques that apply to both count-based and time-based windows. The first one adapts conceptual partitioning, the best existing method for continuous NN monitoring over update streams, to the sliding window model. The second technique reduces the problem to skyline maintenance in the distance-time space and precomputes the future changes in the NN set. We analyze the performance of both algorithms and extend them to variations of NN search. Finally, we compare their efficiency through a comprehensive experimental evaluation. The skyline-based algorithm achieves lower CPU cost, at the expense of slightly larger space overhead. Index Terms—Location-dependent and sensitive, spatial databases, query processing, nearest neighbors, data streams, sliding windows.

    Orthogonal Range Reporting and Rectangle Stabbing for Fat Rectangles

    Full text link
    In this paper we study two geometric data structure problems in the special case when input objects or queries are fat rectangles. We show that in this case a significant improvement compared to the general case can be achieved. We describe data structures that answer two- and three-dimensional orthogonal range reporting queries in the case when the query range is a \emph{fat} rectangle. Our two-dimensional data structure uses O(n)O(n) words and supports queries in O(loglogU+k)O(\log\log U +k) time, where nn is the number of points in the data structure, UU is the size of the universe and kk is the number of points in the query range. Our three-dimensional data structure needs O(nlogεU)O(n\log^{\varepsilon}U) words of space and answers queries in O(loglogU+k)O(\log \log U + k) time. We also consider the rectangle stabbing problem on a set of three-dimensional fat rectangles. Our data structure uses O(n)O(n) space and answers stabbing queries in O(logUloglogU+k)O(\log U\log\log U +k) time.Comment: extended version of a WADS'19 pape

    I/O-Efficient Planar Range Skyline and Attrition Priority Queues

    Full text link
    In the planar range skyline reporting problem, we store a set P of n 2D points in a structure such that, given a query rectangle Q = [a_1, a_2] x [b_1, b_2], the maxima (a.k.a. skyline) of P \cap Q can be reported efficiently. The query is 3-sided if an edge of Q is grounded, giving rise to two variants: top-open (b_2 = \infty) and left-open (a_1 = -\infty) queries. All our results are in external memory under the O(n/B) space budget, for both the static and dynamic settings: * For static P, we give structures that answer top-open queries in O(log_B n + k/B), O(loglog_B U + k/B), and O(1 + k/B) I/Os when the universe is R^2, a U x U grid, and a rank space grid [O(n)]^2, respectively (where k is the number of reported points). The query complexity is optimal in all cases. * We show that the left-open case is harder, such that any linear-size structure must incur \Omega((n/B)^e + k/B) I/Os for a query. We show that this case is as difficult as the general 4-sided queries, for which we give a static structure with the optimal query cost O((n/B)^e + k/B). * We give a dynamic structure that supports top-open queries in O(log_2B^e (n/B) + k/B^1-e) I/Os, and updates in O(log_2B^e (n/B)) I/Os, for any e satisfying 0 \le e \le 1. This leads to a dynamic structure for 4-sided queries with optimal query cost O((n/B)^e + k/B), and amortized update cost O(log (n/B)). As a contribution of independent interest, we propose an I/O-efficient version of the fundamental structure priority queue with attrition (PQA). Our PQA supports FindMin, DeleteMin, and InsertAndAttrite all in O(1) worst case I/Os, and O(1/B) amortized I/Os per operation. We also add the new CatenateAndAttrite operation that catenates two PQAs in O(1) worst case and O(1/B) amortized I/Os. This operation is a non-trivial extension to the classic PQA of Sundar, even in internal memory.Comment: Appeared at PODS 2013, New York, 19 pages, 10 figures. arXiv admin note: text overlap with arXiv:1208.4511, arXiv:1207.234

    Efficient MaxCount and threshold operators of moving objects

    Get PDF
    Calculating operators of continuously moving objects presents some unique challenges, especially when the operators involve aggregation or the concept of congestion, which happens when the number of moving objects in a changing or dynamic query space exceeds some threshold value. This paper presents the following six d-dimensional moving object operators: (1) MaxCount (or MinCount), which finds the Maximum (or Minimum) number of moving objects simultaneously present in the dynamic query space at any time during the query time interval. (2) CountRange, which finds a count of point objects whose trajectories intersect the dynamic query space during the query time interval. (3) ThresholdRange, which finds the set of time intervals during which the dynamic query space is congested. (4) ThresholdSum, which finds the total length of all the time intervals during which the dynamic query space is congested. (5) ThresholdCount, which finds the number of disjoint time intervals during which the dynamic query space is congested. And (6) ThresholdAverage, which finds the average length of time of all the time intervals when the dynamic query space is congested. For these operators separate algorithms are given to find only estimate or only precise values. Experimental results from more than 7,500 queries indicate that the estimation algorithms produce fast, efficient results with error under 5%

    Indexing the Earth Mover's Distance Using Normal Distributions

    Full text link
    Querying uncertain data sets (represented as probability distributions) presents many challenges due to the large amount of data involved and the difficulties comparing uncertainty between distributions. The Earth Mover's Distance (EMD) has increasingly been employed to compare uncertain data due to its ability to effectively capture the differences between two distributions. Computing the EMD entails finding a solution to the transportation problem, which is computationally intensive. In this paper, we propose a new lower bound to the EMD and an index structure to significantly improve the performance of EMD based K-nearest neighbor (K-NN) queries on uncertain databases. We propose a new lower bound to the EMD that approximates the EMD on a projection vector. Each distribution is projected onto a vector and approximated by a normal distribution, as well as an accompanying error term. We then represent each normal as a point in a Hough transformed space. We then use the concept of stochastic dominance to implement an efficient index structure in the transformed space. We show that our method significantly decreases K-NN query time on uncertain databases. The index structure also scales well with database cardinality. It is well suited for heterogeneous data sets, helping to keep EMD based queries tractable as uncertain data sets become larger and more complex.Comment: VLDB201
    corecore