2,555 research outputs found

    Mapping Iterative Medical Imaging Algorithm on Cell Accelerator

    Get PDF
    Algebraic reconstruction techniques require about half the number of projections as that of Fourier backprojection methods, which makes these methods safer in terms of required radiation dose. Algebraic reconstruction technique (ART) and its variant OS-SART (ordered subset simultaneous ART) are techniques that provide faster convergence with comparatively good image quality. However, the prohibitively long processing time of these techniques prevents their adoption in commercial CT machines. Parallel computing is one solution to this problem. With the advent of heterogeneous multicore architectures that exploit data parallel applications, medical imaging algorithms such as OS-SART can be studied to produce increased performance. In this paper, we map OS-SART on cell broadband engine (Cell BE). We effectively use the architectural features of Cell BE to provide an efficient mapping. The Cell BE consists of one powerPC processor element (PPE) and eight SIMD coprocessors known as synergetic processor elements (SPEs). The limited memory storage on each of the SPEs makes the mapping challenging. Therefore, we present optimization techniques to efficiently map the algorithm on the Cell BE for improved performance over CPU version. We compare the performance of our proposed algorithm on Cell BE to that of Sun Fire ×4600, a shared memory machine. The Cell BE is five times faster than AMD Opteron dual-core processor. The speedup of the algorithm on Cell BE increases with the increase in the number of SPEs. We also experiment with various parameters, such as number of subsets, number of processing elements, and number of DMA transfers between main memory and local memory, that impact the performance of the algorithm

    Streaming Model Based Volume Ray Casting Implementation for Cell Broadband Engine

    Get PDF

    Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT

    Get PDF
    Modern CT image reconstruction algorithms rely on projection and back-projection operations to refine an image estimate in iterative image reconstruction. A widely-used state-of-the-art technique is distance-driven projection and back-projection. While the distance-driven technique yields superior image quality in iterative algorithms, it is a computationally demanding process. This has a detrimental effect on the relevance of the algorithms in clinical settings. A few methods have been proposed for enhancing the distance-driven technique in order to take advantage of modern computer hardware. This study explores a two-dimensional extension of the branchless method, which is a technique that does not compromise image quality. The extension of the branchless method is named “pre-projection integration” because it gets a performance boost by integrating the data before the projection and back-projection operations. It was written with Nvidia’s CUDA framework and carefully designed for massively parallel graphics processing units (GPUs). The performance and the image quality of the pre-projection integration method were analyzed. Both projection and back-projection are significantly faster with pre-projection integration. The image quality was analyzed using cone beam CT image reconstruction algorithms within Jeffrey Fessler’s Image Reconstruction Toolbox. Images produced from regularized, iterative image reconstruction algorithms using the pre-projection integration method show no significant artifacts

    CUDA accelerated cone‐beam reconstruction

    Get PDF
    Cone-Beam Computed Tomography (CBCT) is an imaging method that reconstructs a 3D representation of the object from its 2D X-ray images. It is an important diagnostic tool in the medical field, especially dentistry. However, most 3D reconstruction algorithms are computationally intensive and time consuming; this limitation constrains the use of CBCT. In recent years, high-end graphics cards, such as the ones powered by NVIDIA graphics processing units (GPUs), are able to perform general purpose computation. Due to the highly parallel nature of the 3D reconstruction algorithms, it is possible to implement these algorithms on the GPU to reduce the processing time to the level that is practical. Two of the most popular 3D Cone-Beam reconstruction algorithms are the Feldkamp-Davis-Kress algorithm (FDK) and the Algebraic Reconstruction Technique (ART). FDK is fast to construct 3D images, but the quality of its images is lower than the quality of ART images. However, ART requires significantly more computation. Material ART is a recently developed algorithm that uses beam-hardening correction to eliminate artifacts. In this thesis, these three algorithms were implemented on the NVIDIA\u27s CUDA platform. These CUDA based algorithms were tested on three different graphics cards, using phantom and real data. The test results show significant speedup when compared to the CPU software implementation. The speedup is sufficient to allow a moderate cost personal computer with NVIDIA graphics card to process CBCT images in real-time

    High Speed 3D Tomography on CPU, GPU, and FPGA

    Get PDF
    12 pages; 50% d'acceptationInternational audienceBack-projection (BP) is a costly computational step in tomography image reconstruction such as positron emission tomography (PET). To reduce the computation time, this paper presents a pipelined, prefetch, and parallelized architecture for PET BP (3PA-PET). The key feature of this architecture is its original memory access strategy, masking the high latency of the external memory. Indeed, the pattern of the memory references to the data acquired hinders the processing unit. The memory access bottleneck is overcome by an efficient use of the intrinsic temporal and spatial locality of the BP algorithm. A loop reordering allows an efficient use of general purpose processor's caches, for software implementation, as well as the 3D predictive and adaptive cache (3D-AP cache), when considering hardware implementations. Parallel hardware pipelines are also efficient thanks to a hierarchical 3D-AP cache: each pipeline performs a memory reference in about one clock cycle to reach a computational throughput close to 100%. The 3PA-PET architecture is prototyped on a system on programmable chip (SoPC) to validate the system and to measure its expected performances. Time performances are compared with a desktop PC, a workstation, and a graphic processor unit (GPU)

    Partial-Data Interpolation During Arcing of an X-Ray Tube in a Computed Tomography Scanner

    Get PDF
    X-ray tubes are used in computed tomography (CT) scanners as the energy source for generation of images. These tubes occasionally tend to arc, an undesired phenomenon where high current surges through the tube. During the time that the x-ray tube recovers to full voltage after an arc, image data is being collected. Normally this data, acquired at less than full voltage, is discarded and interpolation is performed over the arc duration. However, this is not ideal and some residual imperfections in images, called artifacts, still remain. Proposed here is an algorithm that corrects for improper tube voltage, allowing previously discarded data to be used for imaging. Instead of throwing away all data during the arc period, we use some of the data that is available as the voltage is rising back to its programmed value. This method reduces the length of the interpolation period, thus reducing artifacts. Results of implementation on a CT scanner show that there is an improvement in image quality using the partial-data interpolation method when compared to standard interpolation and that we can save up to 30 of data from being lost during an arc. With the continuous drive from the imaging field to have faster scanners with short image acquisition times, adverse effects due to arcing are becoming more significant and the improvement proposed in this research is increasingly relevan

    Partial-Data Interpolation During Arcing of an X-Ray Tube in a Computed Tomography Scanner

    Get PDF
    X-ray tubes are used in computed tomography (CT) scanners as the energy source for generation of images. These tubes occasionally tend to arc, an undesired phenomenon where high current surges through the tube. During the time that the x-ray tube recovers to full voltage after an arc, image data is being collected. Normally this data, acquired at less than full voltage, is discarded and interpolation is performed over the arc duration. However, this is not ideal and some residual imperfections in images, called artifacts, still remain. Proposed here is an algorithm that corrects for improper tube voltage, allowing previously discarded data to be used for imaging. Instead of throwing away all data during the arc period, we use some of the data that is available as the voltage is rising back to its programmed value. This method reduces the length of the interpolation period, thus reducing artifacts. Results of implementation on a CT scanner show that there is an improvement in image quality using the partial-data interpolation method when compared to standard interpolation and that we can save up to 30 of data from being lost during an arc. With the continuous drive from the imaging field to have faster scanners with short image acquisition times, adverse effects due to arcing are becoming more significant and the improvement proposed in this research is increasingly relevan

    High-performance and hardware-aware computing: proceedings of the first International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC\u2708)

    Get PDF
    The HipHaC workshop aims at combining new aspects of parallel, heterogeneous, and reconfigurable microprocessor technologies with concepts of high-performance computing and, particularly, numerical solution methods. Compute- and memory-intensive applications can only benefit from the full hardware potential if all features on all levels are taken into account in a holistic approach
    corecore