8 research outputs found
High-Performance 3D Compressive Sensing MRI Reconstruction Using Many-Core Architectures
Compressive sensing (CS) describes how sparse
signals can be accurately reconstructed from many fewer samples
than required by the Nyquist criterion. Since MRI scan duration
is proportional to the number of acquired samples, CS has been
gaining significant attention in MRI. However, the computationally
intensive nature of CS reconstructions has precluded their
use in routine clinical practice. In this work, we investigate how
different throughput-oriented architectures can benefit one CS
algorithm and what levels of acceleration are feasible on different
modern platforms. We demonstrate that a CUDA-based code
running on an NVIDIA Tesla C2050 GPU can reconstruct a
256 Ă 160 Ă 80 volume from an 8-channel acquisition in 19 seconds,
which is in itself a significant improvement over the state of the art. We then
show that Intel's Knights Ferry can perform the same 3D MRI
reconstruction in only 12 seconds, bringing CS methods even
closer to clinical viability
Recommended from our members
MR Shuffling: Accelerated Single-Scan Multi-Contrast Magnetic Resonance Imaging
Magnetic resonance imaging (MRI) is an attractive medical imaging modality as it is non-invasive and does not involve ionizing radiation. Routine clinical MRI exams obtain MR images corresponding to different soft tissue contrast by performing multiple scans. When two-dimensional (2D) imaging is used, these scans are often repeated in other scanning planes. As a result, the number of scans comprising an MRI exam leads to prohibitively long exam times as compared to other medical imaging modalities such as computed tomography. Many approaches have been designed to accelerate the MRI acquisition while maintaining diagnostic quality.One approach is to collect multiple measurements while the MRI signal is evolving due to relaxation. This enables a reduction in scan time, as fewer acquisition windows are needed to collect the same number of measurements. However, when the temporal aspect of the acquisition is left unmodeled, artifacts are likely to appear in the reconstruction. Most often, these artifacts manifest as image blurring. The effect depends on the acquisition parameters as well as the tissue relaxation itself, resulting in spatially varying blurring. The severity of the artifacts is directly related to the level of acceleration, and thus presents a tradeoff with scan time. The effect is amplified when imaging in three dimensions, severely limiting scan efficiency. Volumetric variants would be used if not for the blurring, as they are able to reconstruct images at isotropic resolution and support mutli-planar reformatting.Another established acceleration technique, called parallel imaging, takes advantage of spatially sensitive receive coil arrays to collect multiple MRI measurements in parallel. Thus, the acquisition is shortened, and the reconstruction uses the spatial sensitivity information to recover the image. More recently, methods have been developed that leverage image structure such as sparsity and low rank to reduce the required number of samples for a well-posed reconstruction. Compressed sensing and its low rank extensions use these concepts to acquire incoherent measurements below the Nyquist rate. These techniques are especially suited to MRI, as incoherent measurements can be easily achieved through pseudo-random under-sampling. As the mechanisms behind parallel imaging and compressed sensing are fundamentally different, they can be combined to achieve even higher acceleration.This dissertation proposes accelerated MRI acquisition and reconstruction techniques that account for the temporal dynamics of the MR signal. The methods build off of parallel imaging and compressed sensing to reduce scan time and flexibly model the temporal relaxation behavior. By randomly shuffling the sampling in the acquisition stage and imposing low rank constraints in the reconstruction stage, intrinsic physical parameters are modeled and their dynamics are recovered as multiple images of varying tissue contrast. Additionally, blurring artifacts are significantly reduced, as the temporal dynamics are accounted for in the reconstruction.This dissertation first introduces T2 Shuffling, a volumetric technique that reduces blurring and reconstructs multiple T2-weighted image contrasts from a single acquisition. The method is integrated into a clinical hospital environment and evaluated on patients. Next, this dissertation develops a fast and distributed reconstruction for T2 Shuffling that achieves clinically relevant processing time latency. Clinical validation results are shown comparing T2 Shuffling as a single-sequence alternative to conventional pediatric knee MRI. Based off the compelling results, a fast targeted knee MRI using T2 Shuffling is implemented, enabling same-day access to MRI at one-third the cost compared to the conventional exam. To date, over 2,400 T2 Shuffling patient scans have been performed.Continuing the theme of accelerated multi-contrast imaging, this dissertation extends the temporal signal model with T1-T2 Shuffling. Building off of T2 Shuffling, the new method additionally samples multiple points along the saturation recovery curve by varying the repetition time durations during the scan. Since the signal dynamics are governed by both T1 recovery and T2 relaxation, the reconstruction captures information about both intrinsic tissue parameters. As a result, multiple target synthetic contrast images are reconstructed, all from a single scan. Approaches for selecting the sequence parameters are provided, and the method is evaluated on in vivo brain imaging of a volunteer.Altogether, these methods comprise the theme of MR Shuffling, and may open new pathways toward fast clinical MRI
Image Reconstructions of Compressed Sensing MRI with Multichannel Data
Magnetic resonance imaging (MRI) provides high spatial resolution, high-quality of soft-tissue contrast, and multi-dimensional images. However, the speed of data acquisition limits potential applications. Compressed sensing (CS) theory allowing data being sampled at sub-Nyquist rate provides a possibility to accelerate the MRI scan time. Since most MRI scanners are currently equipped with multi-channel receiver systems, integrating CS with multi-channel systems can further shorten the scan time and also provide a better image quality. In this dissertation, we develop several techniques for integrating CS with parallel MRI.
First, we propose a method which extends the reweighted l1 minimization to the CS-MRI with multi-channel data. The individual channel images are recovered according to the reweighted l1 minimization algorithm. Then, the final image is combined by the sum-of-squares method. Computer simulations show that the new method can improve the reconstruction quality at a slightly increased computation cost.
Second, we propose a reconstruction approach using the ubiquitously available multi-core CPU to accelerate CS reconstructions of multiple channel data. CS reconstructions for phase array system using iterative l1 minimization are significantly time-consuming, where the computation complexity scales with the number of channels. The experimental results show that the reconstruction efficiency benefits significantly from parallelizing the CS reconstructions, and pipelining multi-channel data on multi-core processors. In our experiments, an additional speedup factor of 1.6 to 2.0 was achieved using the proposed method on a quad-core CPU.
Finally, we present an efficient reconstruction method for high-dimensional CS MRI with a GPU platform to shorten the time of iterative computations. Data managements as well as the iterative algorithm are properly designed to meet the way of SIMD (single instruction/multiple data) parallelizations. For three-dimension multi-channel data, all slices along frequency encoding direction and multiple channels are highly parallelized and simultaneously processed within GPU. Generally, the runtime on GPU only requires 2.3 seconds for reconstructing a simulated 4-channel data with a volume size of 256Ă256Ă32. Comparing to 67 seconds using CPU, it achieves 28 faster with the proposed method. The rapid reconstruction algorithms demonstrated in this work are expected to help bring high dimensional, multichannel parallel CS MRI closer to clinical applications
High performance communication on reconfigurable clusters
High Performance Computing (HPC) has matured to where it is an essential third pillar, along with theory and experiment, in most domains of science and engineering. Communication latency is a key factor that is limiting the performance of HPC, but can be addressed by integrating communication into accelerators. This integration allows accelerators to communicate with each other without CPU interactions, and even bypassing the network stack. Field Programmable Gate Arrays (FPGAs) are the accelerators that currently best integrate communication with computation. The large number of Multi-gigabit Transceivers (MGTs) on most high-end FPGAs can provide high-bandwidth and low-latency inter-FPGA connections. Additionally, the reconfigurable FPGA fabric enables tight coupling between computation kernel and network interface.
Our thesis is that an application-aware communication infrastructure for a multi-FPGA system makes substantial progress in solving the HPC communication bottleneck. This dissertation aims to provide an application-aware solution for communication infrastructure for FPGA-centric clusters. Specifically, our solution demonstrates application-awareness across multiple levels in the network stack, including low-level link protocols, router microarchitectures, routing algorithms, and applications.
We start by investigating the low-level link protocol and the impact of its latency variance on performance. Our results demonstrate that, although some link jitter is always present, we can still assume near-synchronous communication on an FPGA-cluster. This provides the necessary condition for statically-scheduled routing. We then propose two novel router microarchitectures for two different kinds of workloads: a wormhole Virtual Channel (VC)-based router for workloads with dynamic communication, and a statically-scheduled Virtual Output Queueing (VOQ)-based router for workloads with static communication. For the first (VC-based) router, we propose a framework that generates application-aware router configurations. Our results show that, by adding application-awareness into router configuration, the network performance of FPGA clusters can be substantially improved. For the second (VOQ-based) router, we propose a novel offline collective routing algorithm. This shows a significant advantage over a state-of-the-art collective routing algorithm.
We apply our communication infrastructure to a critical strong-scaling HPC kernel, the 3D FFT. The experimental results demonstrate that the performance of our design is faster than that on CPUs and GPUs by at least one order of magnitude (achieving strong scaling for the target applications). Surprisingly, the FPGA cluster performance is similar to that of an ASIC-cluster. We also implement the 3D FFT on another multi-FPGA platform: the Microsoft Catapult II cloud. Its performance is also comparable or superior to CPU and GPU HPC clusters. The second application we investigate is Molecular Dynamics Simulation (MD). We model MD on both FPGA clouds and clusters. We find that combining processing and general communication in the same device leads to extremely promising performance and the prospect of MD simulations well into the us/day range with a commodity cloud
SystÚmes de localisation en temps réel basés sur les réseaux de communication sans fil
Des techniques fiables de radiolocalisation sâavĂšrent indispensables au dĂ©veloppement dâun grand nombre de nouveaux systĂšmes pertinents. Les techniques de localisation basĂ©es sur les rĂ©seaux de communication sans-fil (WNs) sont particuliĂšrement adĂ©quates aux espaces confinĂ©s et fortement urbanisĂ©s. Le prĂ©sent projet de recherche sâintĂ©resse aux systĂšmes de localisation en temps rĂ©el (RTLS) basĂ©s sur les technologies de communication sans-fil existantes. Deux nouvelles techniques de radiolocalisation alternatives sont proposĂ©es pour amĂ©liorer la prĂ©cision de positionnement des nĆuds sans-fil mobiles par rapport aux mĂ©thodes conventionnelles basĂ©es sur la puissance des signaux reçus (RSS). La premiĂšre mĂ©thode de type gĂ©omĂ©trique propose une nouvelle mĂ©trique de compensation entre les puissances de signaux reçus par rapport Ă des paires de stations rĂ©ceptrices fixes. Lâavantage de cette technique est de rĂ©duire lâeffet des variations du milieu de propagation et des puissances dâĂ©mission des signaux sur la prĂ©cision de localisation. La mĂȘme mĂ©trique est sĂ©lectionnĂ©e pour former les signatures utilisĂ©es pour crĂ©er la carte radio de lâenvironnement de localisation durant la phase hors-ligne dans la deuxiĂšme mĂ©thode de type analyse de situation. Durant la phase de localisation en temps rĂ©el, la technique dâacquisition comprimĂ©e (CS) est appliquĂ©e pour retrouver les positions des nĆuds mobiles Ă partir dâun nombre rĂ©duit dâĂ©chantillons de signaux reçus en les comparant Ă la carte radio prĂ©Ă©tablie. Le calcul dâalgĂšbre multilinĂ©aire proposĂ© dans ce travail permet lâutilisation de ce type de mĂ©trique ternaire, Ă©quivalemment la diffĂ©rence des temps dâarrivĂ©e (TDOA), pour calculer les positions des cibles selon la technique de CS. Les deux mĂ©thodes sont ensuite validĂ©es par des simulations et des expĂ©rimentations effectuĂ©es dans des environnements Ă deux et Ă trois dimensions. Les expĂ©riences ont Ă©tĂ© menĂ©es dans un bĂątiment multi-Ă©tages (MFB) en utilisant lâinfrastructure sans-fil existante pour retrouver conjointement la position et lâĂ©tage des cibles en utilisant les techniques proposĂ©es. Un exemple emblĂ©matique de lâapplication des RTLS dans les zones urbaines est celui des systĂšmes de transport intelligents (ITS) pour amĂ©liorer la sĂ©curitĂ© routiĂšre. Ce projet sâintĂ©resse Ă©galement Ă la performance dâune application de sĂ©curitĂ© des piĂ©tons au niveau des intersections routiĂšres. Lâaccomplissement dâun tel systĂšme repose sur lâĂ©change fiable, sous des contraintes temporelles sĂ©vĂšres, des donnĂ©es de positionnement gĂ©ographique entre nĆuds mobiles pour se tenir mutuellement informĂ©s de leurs prĂ©sences et positions afin de prĂ©venir les risques de collision. Ce projet mĂšne une Ă©tude comparative entre deux architectures dâun systĂšme ITS permettant la communication entre piĂ©tons et vĂ©hicules, directement et via une unitĂ© de lâinfrastructure, conformĂ©ment aux standards de communication dans les rĂ©seaux ad hoc vĂ©hiculaires (VANETs)
Abbildungsmethoden fĂŒr die Brust mit einem 3D-Ultraschall-Computertomographen
In dieser Arbeit wird die Theorie, Implementierung und Evaluierung von Algorithmen der Ultraschall-Transmissionstomographie fĂŒr den am KIT entwickelten Prototypen 3D-USCT II behandelt. Bisherige Arbeiten gehen von idealen Voraussetzungen aus, diese Arbeit befasst sich hingegen mit der Bildrekonstruktion rauschbehafteter Echtdaten, die in einer klinischen Pilotstudie aufgenommen wurden. Von drei KrebsfĂ€llen konnten mit den Methoden dieser Arbeit zwei eindeutig identifiziert werden