136,544 research outputs found

    Hybrid LSH: Faster Near Neighbors Reporting in High-dimensional Space

    Get PDF
    We study the rr-near neighbors reporting problem (rr-NN), i.e., reporting \emph{all} points in a high-dimensional point set SS that lie within a radius rr of a given query point qq. Our approach builds upon on the locality-sensitive hashing (LSH) framework due to its appealing asymptotic sublinear query time for near neighbor search problems in high-dimensional space. A bottleneck of the traditional LSH scheme for solving rr-NN is that its performance is sensitive to data and query-dependent parameters. On datasets whose data distributions have diverse local density patterns, LSH with inappropriate tuning parameters can sometimes be outperformed by a simple linear search. In this paper, we introduce a hybrid search strategy between LSH-based search and linear search for rr-NN in high-dimensional space. By integrating an auxiliary data structure into LSH hash tables, we can efficiently estimate the computational cost of LSH-based search for a given query regardless of the data distribution. This means that we are able to choose the appropriate search strategy between LSH-based search and linear search to achieve better performance. Moreover, the integrated data structure is time efficient and fits well with many recent state-of-the-art LSH-based approaches. Our experiments on real-world datasets show that the hybrid search approach outperforms (or is comparable to) both LSH-based search and linear search for a wide range of search radii and data distributions in high-dimensional space.Comment: Accepted as a short paper in EDBT 201

    Orthogonal Range Reporting and Rectangle Stabbing for Fat Rectangles

    Full text link
    In this paper we study two geometric data structure problems in the special case when input objects or queries are fat rectangles. We show that in this case a significant improvement compared to the general case can be achieved. We describe data structures that answer two- and three-dimensional orthogonal range reporting queries in the case when the query range is a \emph{fat} rectangle. Our two-dimensional data structure uses O(n)O(n) words and supports queries in O(loglogU+k)O(\log\log U +k) time, where nn is the number of points in the data structure, UU is the size of the universe and kk is the number of points in the query range. Our three-dimensional data structure needs O(nlogεU)O(n\log^{\varepsilon}U) words of space and answers queries in O(loglogU+k)O(\log \log U + k) time. We also consider the rectangle stabbing problem on a set of three-dimensional fat rectangles. Our data structure uses O(n)O(n) space and answers stabbing queries in O(logUloglogU+k)O(\log U\log\log U +k) time.Comment: extended version of a WADS'19 pape

    Optimal Color Range Reporting in One Dimension

    Full text link
    Color (or categorical) range reporting is a variant of the orthogonal range reporting problem in which every point in the input is assigned a \emph{color}. While the answer to an orthogonal point reporting query contains all points in the query range QQ, the answer to a color reporting query contains only distinct colors of points in QQ. In this paper we describe an O(N)-space data structure that answers one-dimensional color reporting queries in optimal O(k+1)O(k+1) time, where kk is the number of colors in the answer and NN is the number of points in the data structure. Our result can be also dynamized and extended to the external memory model
    corecore