2 research outputs found

    OUT-OF-CORE CONE BEAM RECONSTRUCTION USING MULTIPLE GPUS

    Get PDF
    ABSTRACT This paper presents a graphics processing unit (GPU) based method capable of accelerating cone-beam reconstruction of large volume data, which cannot be entirely stored in video memory. Our method accelerates the Feldkamp, Davis and Kress (FDK) algorithm in a multi-GPU environment. We present how the entire volume can be efficiently decomposed into small portions to reduce the usage of video memory on each graphics card. Experimental results are also presented to understand the reconstruction throughput on an nVIDIA Tesla S1070 server. It takes approximately three minutes to reconstruct a 2048 3 -voxel volume from 720 2048 2 -pixel projections. The effective bandwidth of video memory reaches 137 GB/s per GPU, demonstrating a higher utilization of texture caches

    Surfing the optimization space of a multiple-GPU parallel implementation of a X-ray tomography reconstruction algorithm

    Get PDF
    The increasing popularity of massively parallel architectures based on accelerators have opened up the possibility of significantly improving the performance of X-ray computed tomography (CT) applications towards achieving real-time imaging. However, achieving this goal is a challenging process, as most CT applications have not been designed for exploiting the amount of parallelism existing in these architectures. In this paper we present the massively parallel implementation and optimization of Mangoose(++), a CT application for reconstructing 3D volumes from 20 images collected by scanners based on cone-beam geometry. The main contribution of this paper are the following. First, we develop a modular application design that allows to exploit the functional parallelism inside the application and to facilitate the parallelization of individual application phases. Second, we identify a set of optimizations that can be applied individually and in combination for optimally deploying the application on a massively parallel multi-GPU system. Third, we present a study of surfing the optimization space of the modularized application and demonstrate that a significant benefit can be obtained from employing the adequate combination of application optimizations. (C) 2014 Elsevier Inc. All rights reserved.This work was partially funded by the Spanish Ministry of Science and Technology under the grant TIN2010-16497, the AMIT project (CEN-20101014) from the CDTI-CENIT program, RECAVA-RETIC Network (RD07/0014/2009), projects TEC2010-21619-C04-01, TEC2011-28972-C02-01, and PI11/00616 from the Spanish Ministerio de Ciencia e Innovacion, ARTEMIS program (S2009/DPI-1802), from the Comunidad de Madrid
    corecore