9 research outputs found

    Accelerating Nearest Neighbor Search on Manycore Systems

    Full text link
    We develop methods for accelerating metric similarity search that are effective on modern hardware. Our algorithms factor into easily parallelizable components, making them simple to deploy and efficient on multicore CPUs and GPUs. Despite the simple structure of our algorithms, their search performance is provably sublinear in the size of the database, with a factor dependent only on its intrinsic dimensionality. We demonstrate that our methods provide substantial speedups on a range of datasets and hardware platforms. In particular, we present results on a 48-core server machine, on graphics hardware, and on a multicore desktop

    Bigger Buffer k-d Trees on Multi-Many-Core Systems

    Get PDF
    A buffer k-d tree is a k-d tree variant for massively-parallel nearest neighbor search. While providing valuable speed-ups on modern many-core devices in case both a large number of reference and query points are given, buffer k-d trees are limited by the amount of points that can fit on a single device. In this work, we show how to modify the original data structure and the associated workflow to make the overall approach capable of dealing with massive data sets. We further provide a simple yet efficient way of using multiple devices given in a single workstation. The applicability of the modified framework is demonstrated in the context of astronomy, a field that is faced with huge amounts of data

    Bigger Buffer k-d Trees on Multi-Many-Core Systems

    Get PDF
    A buffer k-d tree is a k-d tree variant for massively-parallel nearest neighbor search. While providing valuable speed-ups on modern many-core devices in case both a large number of reference and query points are given, buffer k-d trees are limited by the amount of points that can fit on a single device. In this work, we show how to modify the original data structure and the associated workflow to make the overall approach capable of dealing with massive data sets. We further provide a simple yet efficient way of using multiple devices given in a single workstation. The applicability of the modified framework is demonstrated in the context of astronomy, a field that is faced with huge amounts of data

    A Resource-Aware Nearest-Neighbor Search Algorithm for K-Dimensional Trees

    Get PDF
    Abstract-Kd-tree search is widely used today in computer vision -for example in object recognition to process a large set of features and identify the objects in a scene. However, the search times vary widely based on the size of the data set to be processed, the number of objects present in the frame, the size and shape of the kd-tree, etc. Constraining the search interval is extremely critical for real-time applications in order to avoid frame drops and to achieve a good response time. The inherent parallelism in the algorithm can be exploited by using massively parallel architectures like many-core processors. However, the variation in execution time is more pronounced on such hardware (HW) due to the presence of shared resources and dynamically varying load situations created by applications running concurrently. In this work, we propose a new resource-aware nearest-neighbor search algorithm for kd-trees on many-core processors. The novel algorithm can adapt itself to the dynamically varying load on a many-core processor and can achieve a good response time and avoid frame drops. The results show significant improvements in performance and detection rate compared to the conventional approach while the simplicity of the conventional algorithm is retained in the new model

    Sistema de digitalização 3D usando super-resolução em imagens RGBD

    Get PDF
    Orientador : Prof. Dr. Luciano SilvaCo-orientadora : Profª. Drª. Olga R. P. BellonDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 10/09/2013Inclui referênciasResumo: Com o advento de novos sensores de profundidade de baixo custo e com o aumento do poder de processamento paralelo das placas gr_a_cas, houve um aumento signi_cativo em pesquisas na _area de reconstru_c~ao 3D em tempo real. No grupo de pesquisa IMAGO, existe um sistema de reconstru_c~ao 3D para a preserva _c~ao digital, adaptado aos scanners a laser de alta resolu_c~ao. Visando aumentar a exibilidade deste sistema, o objetivo deste trabalho _e a amplia_c~ao do atual pipeline de reconstru_c~ao 3D do IMAGO para permitir a cria_c~ao de modelos utilizando os novos sensores em tempo real. Outro objetivo _e a aplica_c~ao de um m_etodo para o tratamento das imagens de baixa qualidade desses sensores, proporcionando modelos reconstru__dos a partir das novas imagens de melhor resolu_c~ao. A principal meta da preserva_c~ao digital _e a _delidade tanto na geometria quanto na textura do modelo _nal, o tempo e custo computacional s~ao objetivos secund_arios. Portanto, o novo pipeline se resume a tr^es etapas: a modelagem geom_etrica em tempo real, a super-resolu_c~ao e a reconstru_c~ao 3D de alto custo. O objetivo da primeira _e proporcionar a captura completa e o armazenamento de todas as imagens, ambos em tempo real, usando o modelo atualizado apenas para guiar o usu_ario. Na segunda etapa, aumentamos a qualidade e resolu_c~ao das imagens capturadas para a cria_c~ao de um modelo mais _dedigno na etapa _nal, a etapa de reconstru_c~ao 3D utilizando o atual sistema do IMAGO.Abstract: With the advent of new low-cost depth sensors and with the increasing parallel processing power of graphics cards, there was a signi_cant increase in research involving the _eld of real-time 3D reconstruction. In the IMAGO research group, there is a 3D reconstruction system for digital preservation, applied to high resolution laser scanners. To increase the exibility of the mentioned system, our goal is to contributes to the expansion of IMAGO's current 3D reconstruction pipeline to enable the creation of models using new real-time depth sensors. Another objective is the employment of a method that process the sensor's low resolution images, providing reconstructed models using higher resolution images. The aim of digital preservation is the accuracy in both geometry and texture for the _nal model, the computational time and cost are secondary goals. Therefore, the new pipeline is summarized in three steps: a real-time geometric modeling, a super-resolution technique, and high-cost geometric modeling. The goal of the _rst step is to provide a complete capture and image storage, using the real-time model to guide the user. In the second step, we increase the quality and resolution of the captured images to create smooth and accurate models in the 3D reconstruction step using IMAGO's current system

    Exploiting Graphics Processing Units for Massively Parallel Multi-Dimensional Indexing

    Get PDF
    Department of Computer EngineeringScientific applications process truly large amounts of multi-dimensional datasets. To efficiently navigate such datasets, various multi-dimensional indexing structures, such as the R-tree, have been extensively studied for the past couple of decades. Since the GPU has emerged as a new cost-effective performance accelerator, now it is common to leverage the massive parallelism of the GPU in various applications such as medical image processing, computational chemistry, and particle physics. However, hierarchical multi-dimensional indexing structures are inherently not well suited for parallel processing because their irregular memory access patterns make it difficult to exploit massive parallelism. Moreover, recursive tree traversal often fails due to the small run-time stack and cache memory in the GPU. First, we propose Massively Parallel Three-phase Scanning (MPTS) R-tree traversal algorithm to avoid the irregular memory access patterns and recursive tree traversal so that the GPU can access tree nodes in a sequential manner. The experimental study shows that MPTS R-tree traversal algorithm consistently outperforms traditional recursive R-Tree search algorithm for multi-dimensional range query processing. Next, we focus on reducing the query response time and extending n-ary multi-dimensional indexing structures - R-tree, so that a large number of GPU threads cooperate to process a single query in parallel. Because the number of submitted concurrent queries in scientific data analysis applications is relatively smaller than that of enterprise database systems and ray tracing in computer graphics. Hence, we propose a novel variant of R-trees Massively Parallel Hilbert R-Tree (MPHR-Tree), which is designed for a novel parallel tree traversal algorithm Massively Parallel Restart Scanning (MPRS). The MPRS algorithm traverses the MPHR-Tree in mostly contiguous memory access patterns without recursion, which offers more chances to optimize the parallel SIMD algorithm. Our extensive experimental results show that the MPRS algorithm outperforms the other stackless tree traversal algorithms, which are designed for efficient ray tracing in computer graphics community. Furthermore, we develop query co-processing scheme that makes use of both the CPU and GPU. In this approach, we store the internal and leaf nodes of upper tree in CPU host memory and GPU device memory, respectively. We let the CPU traverse internal nodes because the conditional branches in hierarchical tree structures often cause a serious warp divergence problem in the GPU. For leaf nodes, the GPU scans a large number of leaf nodes in parallel based on the selection ratio of a given range query. It is well known that the GPU is superior to the CPU for parallel scanning. The experimental results show that our proposed multi-dimensional range query co-processing scheme improves the query response time by up to 12x and query throughput by up to 4x compared to the state-of-the-art GPU tree traversal algorithm.ope

    MASSIVELY PARALLEL ALGORITHMS FOR POINT CLOUD BASED OBJECT RECOGNITION ON HETEROGENEOUS ARCHITECTURE

    Get PDF
    With the advent of new commodity depth sensors, point cloud data processing plays an increasingly important role in object recognition and perception. However, the computational cost of point cloud data processing is extremely high due to the large data size, high dimensionality, and algorithmic complexity. To address the computational challenges of real-time processing, this work investigates the possibilities of using modern heterogeneous computing platforms and its supporting ecosystem such as massively parallel architecture (MPA), computing cluster, compute unified device architecture (CUDA), and multithreaded programming to accelerate the point cloud based object recognition. The aforementioned computing platforms would not yield high performance unless the specific features are properly utilized. Failing that the result actually produces an inferior performance. To achieve the high-speed performance in image descriptor computing, indexing, and matching in point cloud based object recognition, this work explores both coarse and fine grain level parallelism, identifies the acceptable levels of algorithmic approximation, and analyzes various performance impactors. A set of heterogeneous parallel algorithms are designed and implemented in this work. These algorithms include exact and approximate scalable massively parallel image descriptors for descriptor computing, parallel construction of k-dimensional tree (KD-tree) and the forest of KD-trees for descriptor indexing, parallel approximate nearest neighbor search (ANNS) and buffered ANNS (BANNS) on the KD-tree and the forest of KD-trees for descriptor matching. The results show that the proposed massively parallel algorithms on heterogeneous computing platforms can significantly improve the execution time performance of feature computing, indexing, and matching. Meanwhile, this work demonstrates that the heterogeneous computing architectures, with appropriate architecture specific algorithms design and optimization, have the distinct advantages of improving the performance of multimedia applications
    corecore