7,334 research outputs found

    Fast kk-NNG construction with GPU-based quick multi-select

    Full text link
    In this paper we describe a new brute force algorithm for building the kk-Nearest Neighbor Graph (kk-NNG). The kk-NNG algorithm has many applications in areas such as machine learning, bio-informatics, and clustering analysis. While there are very efficient algorithms for data of low dimensions, for high dimensional data the brute force search is the best algorithm. There are two main parts to the algorithm: the first part is finding the distances between the input vectors which may be formulated as a matrix multiplication problem. The second is the selection of the kk-NNs for each of the query vectors. For the second part, we describe a novel graphics processing unit (GPU) -based multi-select algorithm based on quick sort. Our optimization makes clever use of warp voting functions available on the latest GPUs along with use-controlled cache. Benchmarks show significant improvement over state-of-the-art implementations of the kk-NN search on GPUs

    Contract-Based General-Purpose GPU Programming

    Get PDF
    Using GPUs as general-purpose processors has revolutionized parallel computing by offering, for a large and growing set of algorithms, massive data-parallelization on desktop machines. An obstacle to widespread adoption, however, is the difficulty of programming them and the low-level control of the hardware required to achieve good performance. This paper suggests a programming library, SafeGPU, that aims at striking a balance between programmer productivity and performance, by making GPU data-parallel operations accessible from within a classical object-oriented programming language. The solution is integrated with the design-by-contract approach, which increases confidence in functional program correctness by embedding executable program specifications into the program text. We show that our library leads to modular and maintainable code that is accessible to GPGPU non-experts, while providing performance that is comparable with hand-written CUDA code. Furthermore, runtime contract checking turns out to be feasible, as the contracts can be executed on the GPU
    • …
    corecore