217 research outputs found
Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods
Compressive sensing (CS) has been shown to enable dramatic acceleration of MRI acquisition in some applications. Being an iterative reconstruction technique, CS MRI reconstructions can be more time-consuming than traditional inverse Fourier reconstruction. We have accelerated our CS MRI reconstruction by factors of up to 27 by using a split Bregman solver combined with a graphics processing unit (GPU) computing platform. The increases in speed we find are similar to those we measure for matrix multiplication on this platform, suggesting that the split Bregman methods parallelize efficiently. We demonstrate that the combination of the rapid convergence of the split Bregman algorithm and the massively parallel strategy of GPU computing can enable real-time CS reconstruction of even acquisition data matrices of dimension 40962 or more, depending on available GPU VRAM. Reconstruction of two-dimensional data matrices of dimension 10242 and smaller took ~0.3 s or less, showing that this platform also provides very fast iterative reconstruction for small-to-moderate size images
Towards extending the SWITCH platform for time-critical, cloud-based CUDA applications: Job scheduling parameters influencing performance
SWITCH (Software Workbench for Interactive, Time Critical and Highly self-adaptive cloud applications) allows for the development and deployment of real-time applications in the cloud, but it does not yet support instances backed by Graphics Processing Units (GPUs). Wanting to explore how SWITCH might support CUDA (a GPU architecture) in the future, we have undertaken a review of time-critical CUDA applications, discovering that run-time requirements (which we call ‘wall time’) are in many cases regarded as the most important. We have performed experiments to investigate which parameters have the greatest impact on wall time when running multiple Amazon Web Services GPU-backed instances. Although a maximum of 8 single-GPU instances can be launched in a single Amazon Region, launching just 2 instances rather than 1 gives a 42% decrease in wall time. Also, instances are often wasted doing nothing, and there is a moderately-strong relationship between how problems are distributed across instances and wall time. These findings can be used to enhance the SWITCH provision for specifying Non-Functional Requirements (NFRs); in the future, GPU-backed instances could be supported. These findings can also be used more generally, to optimise the balance between the computational resources needed and the resulting wall time to obtain results
High Speed 3D Tomography on CPU, GPU, and FPGA
12 pages; 50% d'acceptationInternational audienceBack-projection (BP) is a costly computational step in tomography image reconstruction such as positron emission tomography (PET). To reduce the computation time, this paper presents a pipelined, prefetch, and parallelized architecture for PET BP (3PA-PET). The key feature of this architecture is its original memory access strategy, masking the high latency of the external memory. Indeed, the pattern of the memory references to the data acquired hinders the processing unit. The memory access bottleneck is overcome by an efficient use of the intrinsic temporal and spatial locality of the BP algorithm. A loop reordering allows an efficient use of general purpose processor's caches, for software implementation, as well as the 3D predictive and adaptive cache (3D-AP cache), when considering hardware implementations. Parallel hardware pipelines are also efficient thanks to a hierarchical 3D-AP cache: each pipeline performs a memory reference in about one clock cycle to reach a computational throughput close to 100%. The 3PA-PET architecture is prototyped on a system on programmable chip (SoPC) to validate the system and to measure its expected performances. Time performances are compared with a desktop PC, a workstation, and a graphic processor unit (GPU)
Novel high performance techniques for high definition computer aided tomography
Mención Internacional en el título de doctorMedical image processing is an interdisciplinary field in which multiple research areas are involved:
image acquisition, scanner design, image reconstruction algorithms, visualization, etc.
X-Ray Computed Tomography (CT) is a medical imaging modality based on the attenuation
suffered by the X-rays as they pass through the body. Intrinsic differences in attenuation properties
of bone, air, and soft tissue result in high-contrast images of anatomical structures. The
main objective of CT is to obtain tomographic images from radiographs acquired using X-Ray
scanners. The process of building a 3D image or volume from the 2D radiographs is known as
reconstruction. One of the latest trends in CT is the reduction of the radiation dose delivered
to patients through the decrease of the amount of acquired data. This reduction results in artefacts
in the final images if conventional reconstruction methods are used, making it advisable to
employ iterative reconstruction algorithms.
There are numerous reconstruction algorithms available, from which we can highlight two
specific types: traditional algorithms, which are fast but do not enable the obtaining of high
quality images in situations of limited data; and iterative algorithms, slower but more reliable
when traditional methods do not reach the quality standard requirements. One of the priorities
of reconstruction is the obtaining of the final images in near real time, in order to reduce the
time spent in diagnosis. To accomplish this objective, new high performance techniques and methods
for accelerating these types of algorithms are needed. This thesis addresses the challenges
of both traditional and iterative reconstruction algorithms, regarding acceleration and image
quality. One common approach for accelerating these algorithms is the usage of shared-memory
and heterogeneous architectures. In this thesis, we propose a novel simulation/reconstruction
framework, namely FUX-Sim. This framework follows the hypothesis that the development of
new flexible X-ray systems can benefit from computer simulations, which may also enable performance
to be checked before expensive real systems are implemented. Its modular design
abstracts the complexities of programming for accelerated devices to facilitate the development
and evaluation of the different configurations and geometries available. In order to obtain near
real execution times, low-level optimizations for the main components of the framework are
provided for Graphics Processing Unit (GPU) architectures.
Other alternative tackled in this thesis is the acceleration of iterative reconstruction algorithms
by using distributed memory architectures. We present a novel architecture that unifies
the two most important computing paradigms for scientific computing nowadays: High Performance
Computing (HPC). The proposed architecture combines Big Data frameworks with the
advantages of accelerated computing.
The proposed methods presented in this thesis provide more flexible scanner configurations
as they offer an accelerated solution. Regarding performance, our approach is as competitive as
the solutions found in the literature. Additionally, we demonstrate that our solution scales with
the size of the problem, enabling the reconstruction of high resolution images.El procesamiento de imágenes médicas es un campo interdisciplinario en el que participan múltiples
áreas de investigación como la adquisición de imágenes, diseño de escáneres, algoritmos de
reconstrucción de imágenes, visualización, etc. La tomografía computarizada (TC) de rayos X es
una modalidad de imágen médica basada en el cálculo de la atenuación sufrida por los rayos X a
medida que pasan por el cuerpo a escanear. Las diferencias intrínsecas en la atenuación de hueso,
aire y tejido blando dan como resultado imágenes de alto contraste de estas estructuras anatómicas.
El objetivo principal de la TC es obtener imágenes tomográficas a partir estas radiografías
obtenidas mediante escáneres de rayos X. El proceso de construir una imagen o volumen en 3D a
partir de las radiografías 2D se conoce como reconstrucción. Una de las últimas tendencias en la
tomografía computarizada es la reducción de la dosis de radiación administrada a los pacientes
a través de la reducción de la cantidad de datos adquiridos. Esta reducción da como resultado
artefactos en las imágenes finales si se utilizan métodos de reconstrucción convencionales, por
lo que es aconsejable emplear algoritmos de reconstrucción iterativos.
Existen numerosos algoritmos de reconstrucción disponibles a partir de los cuales podemos
destacar dos categorías: algoritmos tradicionales, rápidos pero no permiten obtener imágenes de
alta calidad en situaciones en las que los datos son limitados; y algoritmos iterativos, más lentos
pero más estables en situaciones donde los métodos tradicionales no alcanzan los requisitos en
cuanto a la calidad de la imagen. Una de las prioridades de la reconstrucción es la obtención
de las imágenes finales en tiempo casi real, con el fin de reducir el tiempo de diagnóstico. Para
lograr este objetivo, se necesitan nuevas técnicas y métodos de alto rendimiento para acelerar
estos algoritmos.
Esta tesis aborda los desafíos de los algoritmos de reconstrucción tradicionales e iterativos,
con respecto a la aceleración y la calidad de imagen. Un enfoque común para acelerar estos
algoritmos es el uso de arquitecturas de memoria compartida y heterogéneas. En esta tesis,
proponemos un nuevo sistema de simulación/reconstrucción, llamado FUX-Sim. Este sistema se
construye alrededor de la hipótesis de que el desarrollo de nuevos sistemas de rayos X flexibles
puede beneficiarse de las simulaciones por computador, en los que también se puede realizar
un control del rendimiento de los nuevos sistemas a desarrollar antes de su implementación
física. Su diseño modular abstrae las complejidades de la programación para aceleradores con el
objetivo de facilitar el desarrollo y la evaluación de las diferentes configuraciones y geometrías
disponibles. Para obtener ejecuciones en casi tiempo real, se proporcionan optimizaciones de
bajo nivel para los componentes principales del sistema en las arquitecturas GPU.
Otra alternativa abordada en esta tesis es la aceleración de los algoritmos de reconstrucción
iterativa mediante el uso de arquitecturas de memoria distribuidas. Presentamos una arquitectura
novedosa que unifica los dos paradigmas informáticos más importantes en la actualidad:
computación de alto rendimiento (HPC) y Big Data. La arquitectura propuesta combina sistemas
Big Data con las ventajas de los dispositivos aceleradores.
Los métodos propuestos presentados en esta tesis proporcionan configuraciones de escáner
más flexibles y ofrecen una solución acelerada. En cuanto al rendimiento, nuestro enfoque es tan
competitivo como las soluciones encontradas en la literatura. Además, demostramos que nuestra
solución escala con el tamaño del problema, lo que permite la reconstrucción de imágenes de
alta resolución.This work has been mainly funded thanks to a FPU fellowship (FPU14/03875) from the Spanish
Ministry of Education.
It has also been partially supported by other grants:
• DPI2016-79075-R. “Nuevos escenarios de tomografía por rayos X”, from the Spanish Ministry
of Economy and Competitiveness.
• TIN2016-79637-P Towards unification of HPC and Big Data Paradigms from the Spanish
Ministry of Economy and Competitiveness.
• Short-term scientific missions (STSM) grant from NESUS COST Action IC1305.
• TIN2013-41350-P, Scalable Data Management Techniques for High-End Computing Systems
from the Spanish Ministry of Economy and Competitiveness.
• RTC-2014-3028-1 NECRA Nuevos escenarios clinicos con radiología avanzada from the
Spanish Ministry of Economy and Competitiveness.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: José Daniel García Sánchez.- Secretario: Katzlin Olcoz Herrero.- Vocal: Domenico Tali
Software architecture for multi-bed FDK-based reconstruction in X-ray CT scanners
Most small-animal X-ray computed tomography (CT) scanners are based on cone-beam geometry with a flat-panel detector orbiting in a circular trajectory. Image reconstruction in these systems is usually performed by approximate methods based on the algorithm proposed by Feldkamp et al. (FDK). Besides the implementation of the reconstruction algorithm itself, in order to design a real system it is necessary to take into account numerous issues so as to obtain the best quality images from the acquired data. This work presents a comprehensive, novel software architecture for small-animal CT scanners based on cone-beam geometry with circular scanning trajectory. The proposed architecture covers all the steps from the system calibration to the volume reconstruction and conversion into Hounsfield units. It includes an efficient implementation of an FDK-based reconstruction algorithm that takes advantage of system symmetries and allows for parallel reconstruction using a multiprocessor computer. Strategies for calibration and artifact correction are discussed to justify the strategies adopted. New procedures for multi-bed misalignment, beam-hardening, and
Housfield units calibration are proposed. Experiments with phantoms and real data showed the suitability of the proposed software architecture for an X-ray small animal CT based on
cone-beam geometry.This work was partially funded by AMIT project from the CDTI CENIT program, TEC2007-64731, TEC2008-06715- C02-01, RD07/0014/2009, TRA2009 0175, RECAVA-RETIC, and RD09/0077/00087 (Ministerio de Ciencia e Inovación), and ARTEMIS S2009/DPI-1802 (Comunidad de Madrid).Publicad
3D Alternating Direction TV-Based Cone-Beam CT Reconstruction with Efficient GPU Implementation
Iterative image reconstruction (IIR) with sparsity-exploiting methods, such as total variation (TV) minimization, claims potentially large reductions in sampling requirements. However, the computation complexity becomes a heavy burden, especially in 3D reconstruction situations. In order to improve the performance for iterative reconstruction, an efficient IIR algorithm for cone-beam computed tomography (CBCT) with GPU implementation has been proposed in this paper. In the first place, an algorithm based on alternating direction total variation using local linearization and proximity technique is proposed for CBCT reconstruction. The applied proximal technique avoids the horrible pseudoinverse computation of big matrix which makes the proposed algorithm applicable and efficient for CBCT imaging. The iteration for this algorithm is simple but convergent. The simulation and real CT data reconstruction results indicate that the proposed algorithm is both fast and accurate. The GPU implementation shows an excellent acceleration ratio of more than 100 compared with CPU computation without losing numerical accuracy. The runtime for the new 3D algorithm is about 6.8 seconds per loop with the image size of 256×256×256 and 36 projections of the size of 512×512
OUT-OF-CORE CONE BEAM RECONSTRUCTION USING MULTIPLE GPUS
ABSTRACT This paper presents a graphics processing unit (GPU) based method capable of accelerating cone-beam reconstruction of large volume data, which cannot be entirely stored in video memory. Our method accelerates the Feldkamp, Davis and Kress (FDK) algorithm in a multi-GPU environment. We present how the entire volume can be efficiently decomposed into small portions to reduce the usage of video memory on each graphics card. Experimental results are also presented to understand the reconstruction throughput on an nVIDIA Tesla S1070 server. It takes approximately three minutes to reconstruct a 2048 3 -voxel volume from 720 2048 2 -pixel projections. The effective bandwidth of video memory reaches 137 GB/s per GPU, demonstrating a higher utilization of texture caches
- …