2,341 research outputs found

    Accelerating Nearest Neighbor Search on Manycore Systems

    Full text link
    We develop methods for accelerating metric similarity search that are effective on modern hardware. Our algorithms factor into easily parallelizable components, making them simple to deploy and efficient on multicore CPUs and GPUs. Despite the simple structure of our algorithms, their search performance is provably sublinear in the size of the database, with a factor dependent only on its intrinsic dimensionality. We demonstrate that our methods provide substantial speedups on a range of datasets and hardware platforms. In particular, we present results on a 48-core server machine, on graphics hardware, and on a multicore desktop

    Bolt: Accelerated Data Mining with Fast Vector Compression

    Full text link
    Vectors of data are at the heart of machine learning and data mining. Recently, vector quantization methods have shown great promise in reducing both the time and space costs of operating on vectors. We introduce a vector quantization algorithm that can compress vectors over 12x faster than existing techniques while also accelerating approximate vector operations such as distance and dot product computations by up to 10x. Because it can encode over 2GB of vectors per second, it makes vector quantization cheap enough to employ in many more circumstances. For example, using our technique to compute approximate dot products in a nested loop can multiply matrices faster than a state-of-the-art BLAS implementation, even when our algorithm must first compress the matrices. In addition to showing the above speedups, we demonstrate that our approach can accelerate nearest neighbor search and maximum inner product search by over 100x compared to floating point operations and up to 10x compared to other vector quantization methods. Our approximate Euclidean distance and dot product computations are not only faster than those of related algorithms with slower encodings, but also faster than Hamming distance computations, which have direct hardware support on the tested platforms. We also assess the errors of our algorithm's approximate distances and dot products, and find that it is competitive with existing, slower vector quantization algorithms.Comment: Research track paper at KDD 201

    High-dimensional approximate nearest neighbor: k-d Generalized Randomized Forests

    Get PDF
    We propose a new data-structure, the generalized randomized kd forest, or kgeraf, for approximate nearest neighbor searching in high dimensions. In particular, we introduce new randomization techniques to specify a set of independently constructed trees where search is performed simultaneously, hence increasing accuracy. We omit backtracking, and we optimize distance computations, thus accelerating queries. We release public domain software geraf and we compare it to existing implementations of state-of-the-art methods including BBD-trees, Locality Sensitive Hashing, randomized kd forests, and product quantization. Experimental results indicate that our method would be the method of choice in dimensions around 1,000, and probably up to 10,000, and pointsets of cardinality up to a few hundred thousands or even one million; this range of inputs is encountered in many critical applications today. For instance, we handle a real dataset of 10610^6 images represented in 960 dimensions with a query time of less than 11sec on average and 90\% responses being true nearest neighbors

    Efficient Large-scale Approximate Nearest Neighbor Search on the GPU

    Full text link
    We present a new approach for efficient approximate nearest neighbor (ANN) search in high dimensional spaces, extending the idea of Product Quantization. We propose a two-level product and vector quantization tree that reduces the number of vector comparisons required during tree traversal. Our approach also includes a novel highly parallelizable re-ranking method for candidate vectors by efficiently reusing already computed intermediate values. Due to its small memory footprint during traversal, the method lends itself to an efficient, parallel GPU implementation. This Product Quantization Tree (PQT) approach significantly outperforms recent state of the art methods for high dimensional nearest neighbor queries on standard reference datasets. Ours is the first work that demonstrates GPU performance superior to CPU performance on high dimensional, large scale ANN problems in time-critical real-world applications, like loop-closing in videos

    Acceleration and Parallelization of ZENO/Walk-on-Spheres

    Get PDF
    AbstractThis paper describes our on-going work to accelerate ZENO, a software tool based on Monte Carlo methods (MCMs), used for computing material properties at nanoscale. ZENO employs three main algorithms: (1) Walk on Spheres (WoS), (2) interior sampling, and (3) surface sampling. We have accelerated the first two algorithms. For the sake of brevity, the paper will discuss our work on the first one only as it is the most commonly used and the acceleration techniques were similar in both cases.WoS is a Brownian motion MCM for solving a class of partial differential equations (PDEs). It provides a stochastic solution to a PDE by estimating the probability that a random walk, which started at infinity, will hit the surface of the material under consideration. WoS is highly effective when the problem's geometry is additive, as this greatly reduces the number of walk steps needed to achieve accurate results. The walks start on the surface of an enclosing sphere and can make much larger jumps than in a direct simulation of Brownian motion. Our current implementation represents the molecular structure of nanomaterials as a union of possibly overlapping spheres. The core processing bottleneck in WoS is a Computational Geometry one, as the algorithm repeatedly determines the distance from query point to the material surface in each step of the random walk.In this paper, we present results from benchmarking spatial data structures, including several open-source implementations of k-D trees, for accelerating WoS algorithmically. The paper also presents results from our multicore and cluster parallel implementation to show that it exhibits linear strong scaling with the number of cores and compute nodes; this implementation delivers up to 4 orders of magnitude speedup compared to the original FORTRAN code when run on 8 nodes (each with dual 6-core Intel Xeon CPUs) with 24 threads per node
    • …
    corecore