5,146 research outputs found

    Fast Cross-Polytope Locality-Sensitive Hashing

    Get PDF
    We provide a variant of cross-polytope locality sensitive hashing with respect to angular distance which is provably optimal in asymptotic sensitivity and enjoys O(dlnd)\mathcal{O}(d \ln d ) hash computation time. Building on a recent result (by Andoni, Indyk, Laarhoven, Razenshteyn, Schmidt, 2015), we show that optimal asymptotic sensitivity for cross-polytope LSH is retained even when the dense Gaussian matrix is replaced by a fast Johnson-Lindenstrauss transform followed by discrete pseudo-rotation, reducing the hash computation time from O(d2)\mathcal{O}(d^2) to O(dlnd)\mathcal{O}(d \ln d ). Moreover, our scheme achieves the optimal rate of convergence for sensitivity. By incorporating a low-randomness Johnson-Lindenstrauss transform, our scheme can be modified to require only O(ln9(d))\mathcal{O}(\ln^9(d)) random bitsComment: 14 pages, 6 figure

    Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search

    Full text link
    The Indyk-Motwani Locality-Sensitive Hashing (LSH) framework (STOC 1998) is a general technique for constructing a data structure to answer approximate near neighbor queries by using a distribution H\mathcal{H} over locality-sensitive hash functions that partition space. For a collection of nn points, after preprocessing, the query time is dominated by O(nρlogn)O(n^{\rho} \log n) evaluations of hash functions from H\mathcal{H} and O(nρ)O(n^{\rho}) hash table lookups and distance computations where ρ(0,1)\rho \in (0,1) is determined by the locality-sensitivity properties of H\mathcal{H}. It follows from a recent result by Dahlgaard et al. (FOCS 2017) that the number of locality-sensitive hash functions can be reduced to O(log2n)O(\log^2 n), leaving the query time to be dominated by O(nρ)O(n^{\rho}) distance computations and O(nρlogn)O(n^{\rho} \log n) additional word-RAM operations. We state this result as a general framework and provide a simpler analysis showing that the number of lookups and distance computations closely match the Indyk-Motwani framework, making it a viable replacement in practice. Using ideas from another locality-sensitive hashing framework by Andoni and Indyk (SODA 2006) we are able to reduce the number of additional word-RAM operations to O(nρ)O(n^\rho).Comment: 15 pages, 3 figure

    Author recognition using Locality Sensitive Hashing & Alergia (Stochastic Finite Automata)

    Get PDF
    In today’s world data grows very fast. It is difficult to answer questions like 1) Is the content completely written by this author, 2) Did he get few sentences or pages from another author, 3) Is there any way to identify actual author. There are many plagiarism software’s available in the market which identify duplicate content. It doesn’t understand writing pattern involved. There is always a necessity to make an effort to find the original author. Locality sensitive hashing is one such standard for applying hashing to recognize authors writing pattern

    Kernelized Locality-Sensitive Hashing for Fast Image Landmark Association

    Get PDF
    As the concept of war has evolved, navigation in urban environments where GPS may be degraded is increasingly becoming more important. Two existing solutions are vision-aided navigation and vision-based Simultaneous Localization and Mapping (SLAM). The problem, however, is that vision-based navigation techniques can require excessive amounts of memory and increased computational complexity resulting in a decrease in speed. This research focuses on techniques to improve such issues by speeding up and optimizing the data association process in vision-based SLAM. Specifically, this work studies the current methods that algorithms use to associate a current robot pose to that of one previously seen and introduce another method to the image mapping arena for comparison. The current method, kd-trees, is effcient in lower dimensions, but does not narrow the search space enough in higher dimensional datasets. In this research, Kernelized Locality-Sensitive Hashing (KLSH) is implemented to conduct the aforementioned pose associations. Results on KLSH shows that fewer image comparisons are required for location identification than that of other methods. This work can then be extended into a vision-SLAM implementation to subsequently produce a map

    Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning

    Get PDF
    This paper presents LSHAD, an anomaly detection (AD) method based on Locality Sensitive Hashing (LSH), capable of dealing with large-scale datasets. The resulting algorithm is highly parallelizable and its implementation in Apache Spark further increases its ability to handle very large datasets. Moreover, the algorithm incorporates an automatic hyperparameter tuning mechanism so that users do not have to implement costly manual tuning. Our LSHAD method is novel as both hyperparameter automation and distributed properties are not usual in AD techniques. Our results for experiments with LSHAD across a variety of datasets point to state-of-the-art AD performance while handling much larger datasets than state-of-the-art alternatives. In addition, evaluation results for the tradeoff between AD performance and scalability show that our method offers significant advantages over competing methods.This research has been financially supported in part by the Spanish Ministerio de Economía y Competitividad (project PID-2019-109238GB-C22) and by the Xunta de Galicia (grants ED431C 2018/34 and ED431G 2019/01) through European Union ERDF funds. CITIC, as a research center accredited by the Galician University System, is funded by the Consellería de Cultura, Educación e Universidades of the Xunta de Galicia, supported 80% through ERDF Funds (ERDF Operational Programme Galicia 2014–2020) and 20% by the Secretaría Xeral de Universidades (Grant ED431G 2019/01).This work was also supported by National Funds through the Portuguese FCT - Fundação para a Ciência e a Tecnologia (projects UIDB/00760/2020 and UIDP/00760/2020).info:eu-repo/semantics/publishedVersio

    A reliable order-statistics-based approximate nearest neighbor search algorithm

    Full text link
    We propose a new algorithm for fast approximate nearest neighbor search based on the properties of ordered vectors. Data vectors are classified based on the index and sign of their largest components, thereby partitioning the space in a number of cones centered in the origin. The query is itself classified, and the search starts from the selected cone and proceeds to neighboring ones. Overall, the proposed algorithm corresponds to locality sensitive hashing in the space of directions, with hashing based on the order of components. Thanks to the statistical features emerging through ordering, it deals very well with the challenging case of unstructured data, and is a valuable building block for more complex techniques dealing with structured data. Experiments on both simulated and real-world data prove the proposed algorithm to provide a state-of-the-art performance

    Fast kNN Graph Construction with Locality Sensitive Hashing

    Get PDF
    Abstract. The k nearest neighbors (kNN) graph, perhaps the most popular graph in machine learning, plays an essential role for graphbased learning methods. Despite its many elegant properties, the brute force kNN graph construction method has computational complexity of O(n 2 ), which is prohibitive for large scale data sets. In this paper, based on the divide-and-conquer strategy, we propose an efficient algorithm for approximating kNN graphs, which has the time complexity of O(l(d + log n)n) only (d is the dimensionality and l is usually a small number). This is much faster than most existing fast methods. Specifically, we engage the locality sensitive hashing technique to divide items into small subsets with equal size, and then build one kNN graph on each subset using the brute force method. To enhance the approximation quality, we repeat this procedure for several times to generate multiple basic approximate graphs, and combine them to yield a high quality graph. Compared with existing methods, the proposed approach has features that are: (1) much more efficient in speed (2) applicable to generic similarity measures; (3) easy to parallelize. Finally, on three benchmark large-scale data sets, our method beats existing fast methods with obvious advantages
    corecore