19 research outputs found

    Multi-GPU Acceleration of Iterative X-ray CT Image Reconstruction

    Get PDF
    X-ray computed tomography is a widely used medical imaging modality for screening and diagnosing diseases and for image-guided radiation therapy treatment planning. Statistical iterative reconstruction (SIR) algorithms have the potential to significantly reduce image artifacts by minimizing a cost function that models the physics and statistics of the data acquisition process in X-ray CT. SIR algorithms have superior performance compared to traditional analytical reconstructions for a wide range of applications including nonstandard geometries arising from irregular sampling, limited angular range, missing data, and low-dose CT. The main hurdle for the widespread adoption of SIR algorithms in multislice X-ray CT reconstruction problems is their slow convergence rate and associated computational time. We seek to design and develop fast parallel SIR algorithms for clinical X-ray CT scanners. Each of the following approaches is implemented on real clinical helical CT data acquired from a Siemens Sensation 16 scanner and compared to the straightforward implementation of the Alternating Minimization (AM) algorithm of O’Sullivan and Benac [1]. We parallelize the computationally expensive projection and backprojection operations by exploiting the massively parallel hardware architecture of 3 NVIDIA TITAN X Graphical Processing Unit (GPU) devices with CUDA programming tools and achieve an average speedup of 72X over a straightforward CPU implementation. We implement a multi-GPU based voxel-driven multislice analytical reconstruction algorithm called Feldkamp-Davis-Kress (FDK) [2] and achieve an average overall speedup of 1382X over the baseline CPU implementation by using 3 TITAN X GPUs. Moreover, we propose a novel adaptive surrogate-function based optimization scheme for the AM algorithm, resulting in more aggressive update steps in every iteration. On average, we double the convergence rate of our baseline AM algorithm and also improve image quality by using the adaptive surrogate function. We extend the multi-GPU and adaptive surrogate-function based acceleration techniques to dual-energy reconstruction problems as well. Furthermore, we design and develop a GPU-based deep Convolutional Neural Network (CNN) to denoise simulated low-dose X-ray CT images. Our experiments show significant improvements in the image quality with our proposed deep CNN-based algorithm against some widely used denoising techniques including Block Matching 3-D (BM3D) and Weighted Nuclear Norm Minimization (WNNM). Overall, we have developed novel fast, parallel, computationally efficient methods to perform multislice statistical reconstruction and image-based denoising on clinically-sized datasets

    Optimization of GPU-Accelerated Iterative CT Reconstruction Algorithm for Clinical Use

    Get PDF
    In order to transition the GPU-accelerated CT reconstruction algorithm to a more clinical environment, a graphical user interface is implemented. Some optimization methods on the implementation are presented. We describe the alternating minimization (AM) algorithm as the updating algorithm, and the branchless distance-driven method for the system forward operator. We introduce a version of the Feldkamp-Davis-Kress algorithm to generate the initial image for our alternating minimization algorithm and compare it to a choice of a constant initial image. For the sake of better rate of convergence, we introduce the ordered-subsets method, find the optimal number of ordered subsets, and discuss the possibility of using a hybrid ordered-subsets method. Based on the run-time analysis, we implement a GPU-accelerated combination and accumulation process using a Hillis-Steele scan and shared memory. We then analyze some code-related problems, which indicate that our implementation of the AM algorithm may reach the limit of single precision after approximately 3,500 iterations. The Hotelling observer, as an estimation of the human observer, is introduced to assess the image quality of reconstructed images. The estimation of human observer performance may enable us to optimize the algorithm parameters with respect to clinical use

    Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT

    Get PDF
    Modern CT image reconstruction algorithms rely on projection and back-projection operations to refine an image estimate in iterative image reconstruction. A widely-used state-of-the-art technique is distance-driven projection and back-projection. While the distance-driven technique yields superior image quality in iterative algorithms, it is a computationally demanding process. This has a detrimental effect on the relevance of the algorithms in clinical settings. A few methods have been proposed for enhancing the distance-driven technique in order to take advantage of modern computer hardware. This study explores a two-dimensional extension of the branchless method, which is a technique that does not compromise image quality. The extension of the branchless method is named “pre-projection integration” because it gets a performance boost by integrating the data before the projection and back-projection operations. It was written with Nvidia’s CUDA framework and carefully designed for massively parallel graphics processing units (GPUs). The performance and the image quality of the pre-projection integration method were analyzed. Both projection and back-projection are significantly faster with pre-projection integration. The image quality was analyzed using cone beam CT image reconstruction algorithms within Jeffrey Fessler’s Image Reconstruction Toolbox. Images produced from regularized, iterative image reconstruction algorithms using the pre-projection integration method show no significant artifacts

    3D Forward and Back-Projection for X-Ray CT Using Separable Footprints

    Full text link
    Iterative methods for 3D image reconstruction have the potential to improve image quality over conventional filtered back projection (FBP) in X-ray computed tomography (CT). However, the computation burden of 3D cone-beam forward and back-projectors is one of the greatest challenges facing practical adoption of iterative methods for X-ray CT. Moreover, projector accuracy is also important for iterative methods. This paper describes two new separable footprint (SF) projector methods that approximate the voxel footprint functions as 2D separable functions. Because of the separability of these footprint functions, calculating their integrals over a detector cell is greatly simplified and can be implemented efficiently. The SF-TR projector uses trapezoid functions in the transaxial direction and rectangular functions in the axial direction, whereas the SF-TT projector uses trapezoid functions in both directions. Simulations and experiments showed that both SF projector methods are more accurate than the distance-driven (DD) projector, which is a current state-of-the-art method in the field. The SF-TT projector is more accurate than the SF-TR projector for rays associated with large cone angles. The SF-TR projector has similar computation speed with the DD projector and the SF-TT projector is about two times slower.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/85876/1/Fessler5.pd

    Accelerating iterative CT reconstruction algorithms using Tensor Cores

    Get PDF
    Tensor Cores are specialized hardware units added to recent NVIDIA GPUs to speed up matrix multiplication-related tasks, such as convolutions and densely connected layers in neural networks. Due to their specific hardware implementation and programming model, Tensor Cores cannot be straightforwardly applied to other applications outside machine learning. In this paper, we demonstrate the feasibility of using NVIDIA Tensor Cores for the acceleration of a non-machine learning application: iterative Computed Tomography (CT) reconstruction. For large CT images and real-time CT scanning, the reconstruction time for many existing iterative reconstruction methods is relatively high, ranging from seconds to minutes, depending on the size of the image. Therefore, CT reconstruction is an application area that could potentially benefit from Tensor Core hardware acceleration. We first studied the reconstruction algorithm's performance as a function of the hardware related parameters and proposed an approach to accelerate reconstruction on Tensor Cores. The results show that the proposed method provides about 5 x increase in speed and energy saving using the NVIDIA RTX 2080 Ti GPU for the parallel projection of 32 images of size 512 x 512. The relative reconstruction error due to the mixed-precision computations was almost equal to the error of single-precision (32-bit) floating- point computations. We then presented an approach for real-time and memory-limited applications by exploiting the symmetry of the system (i.e., the acquisition geometry). As the proposed approach is based on the conjugate gradient method, it can be generalized to extend its application to many research and industrial fields

    Development and Implementation of Fully 3D Statistical Image Reconstruction Algorithms for Helical CT and Half-Ring PET Insert System

    Get PDF
    X-ray computed tomography: CT) and positron emission tomography: PET) have become widely used imaging modalities for screening, diagnosis, and image-guided treatment planning. Along with the increased clinical use are increased demands for high image quality with reduced ionizing radiation dose to the patient. Despite their significantly high computational cost, statistical iterative reconstruction algorithms are known to reconstruct high-quality images from noisy tomographic datasets. The overall goal of this work is to design statistical reconstruction software for clinical x-ray CT scanners, and for a novel PET system that utilizes high-resolution detectors within the field of view of a whole-body PET scanner. The complex choices involved in the development and implementation of image reconstruction algorithms are fundamentally linked to the ways in which the data is acquired, and they require detailed knowledge of the various sources of signal degradation. Both of the imaging modalities investigated in this work have their own set of challenges. However, by utilizing an underlying statistical model for the measured data, we are able to use a common framework for this class of tomographic problems. We first present the details of a new fully 3D regularized statistical reconstruction algorithm for multislice helical CT. To reduce the computation time, the algorithm was carefully parallelized by identifying and taking advantage of the specific symmetry found in helical CT. Some basic image quality measures were evaluated using measured phantom and clinical datasets, and they indicate that our algorithm achieves comparable or superior performance over the fast analytical methods considered in this work. Next, we present our fully 3D reconstruction efforts for a high-resolution half-ring PET insert. We found that this unusual geometry requires extensive redevelopment of existing reconstruction methods in PET. We redesigned the major components of the data modeling process and incorporated them into our reconstruction algorithms. The algorithms were tested using simulated Monte Carlo data and phantom data acquired by a PET insert prototype system. Overall, we have developed new, computationally efficient methods to perform fully 3D statistical reconstructions on clinically-sized datasets
    corecore