18,893 research outputs found

    Lower Bounds for Oblivious Near-Neighbor Search

    Get PDF
    We prove an Ω(dlg⁥n/(lg⁥lg⁥n)2)\Omega(d \lg n/ (\lg\lg n)^2) lower bound on the dynamic cell-probe complexity of statistically oblivious\mathit{oblivious} approximate-near-neighbor search (ANN\mathsf{ANN}) over the dd-dimensional Hamming cube. For the natural setting of d=Θ(log⁥n)d = \Theta(\log n), our result implies an Ω~(lg⁥2n)\tilde{\Omega}(\lg^2 n) lower bound, which is a quadratic improvement over the highest (non-oblivious) cell-probe lower bound for ANN\mathsf{ANN}. This is the first super-logarithmic unconditional\mathit{unconditional} lower bound for ANN\mathsf{ANN} against general (non black-box) data structures. We also show that any oblivious static\mathit{static} data structure for decomposable search problems (like ANN\mathsf{ANN}) can be obliviously dynamized with O(log⁥n)O(\log n) overhead in update and query time, strengthening a classic result of Bentley and Saxe (Algorithmica, 1980).Comment: 28 page

    Dynamic Ordered Sets with Exponential Search Trees

    Full text link
    We introduce exponential search trees as a novel technique for converting static polynomial space search structures for ordered sets into fully-dynamic linear space data structures. This leads to an optimal bound of O(sqrt(log n/loglog n)) for searching and updating a dynamic set of n integer keys in linear space. Here searching an integer y means finding the maximum key in the set which is smaller than or equal to y. This problem is equivalent to the standard text book problem of maintaining an ordered set (see, e.g., Cormen, Leiserson, Rivest, and Stein: Introduction to Algorithms, 2nd ed., MIT Press, 2001). The best previous deterministic linear space bound was O(log n/loglog n) due Fredman and Willard from STOC 1990. No better deterministic search bound was known using polynomial space. We also get the following worst-case linear space trade-offs between the number n, the word length w, and the maximal key U < 2^w: O(min{loglog n+log n/log w, (loglog n)(loglog U)/(logloglog U)}). These trade-offs are, however, not likely to be optimal. Our results are generalized to finger searching and string searching, providing optimal results for both in terms of n.Comment: Revision corrects some typoes and state things better for applications in subsequent paper

    High-dimensional approximate nearest neighbor: k-d Generalized Randomized Forests

    Get PDF
    We propose a new data-structure, the generalized randomized kd forest, or kgeraf, for approximate nearest neighbor searching in high dimensions. In particular, we introduce new randomization techniques to specify a set of independently constructed trees where search is performed simultaneously, hence increasing accuracy. We omit backtracking, and we optimize distance computations, thus accelerating queries. We release public domain software geraf and we compare it to existing implementations of state-of-the-art methods including BBD-trees, Locality Sensitive Hashing, randomized kd forests, and product quantization. Experimental results indicate that our method would be the method of choice in dimensions around 1,000, and probably up to 10,000, and pointsets of cardinality up to a few hundred thousands or even one million; this range of inputs is encountered in many critical applications today. For instance, we handle a real dataset of 10610^6 images represented in 960 dimensions with a query time of less than 11sec on average and 90\% responses being true nearest neighbors

    Analysis of approximate nearest neighbor searching with clustered point sets

    Full text link
    We present an empirical analysis of data structures for approximate nearest neighbor searching. We compare the well-known optimized kd-tree splitting method against two alternative splitting methods. The first, called the sliding-midpoint method, which attempts to balance the goals of producing subdivision cells of bounded aspect ratio, while not producing any empty cells. The second, called the minimum-ambiguity method is a query-based approach. In addition to the data points, it is also given a training set of query points for preprocessing. It employs a simple greedy algorithm to select the splitting plane that minimizes the average amount of ambiguity in the choice of the nearest neighbor for the training points. We provide an empirical analysis comparing these two methods against the optimized kd-tree construction for a number of synthetically generated data and query sets. We demonstrate that for clustered data and query sets, these algorithms can provide significant improvements over the standard kd-tree construction for approximate nearest neighbor searching.Comment: 20 pages, 8 figures. Presented at ALENEX '99, Baltimore, MD, Jan 15-16, 199
    • 

    corecore