8 research outputs found

    High-Performance 3D Compressive Sensing MRI Reconstruction Using Many-Core Architectures

    Get PDF
    Compressive sensing (CS) describes how sparse signals can be accurately reconstructed from many fewer samples than required by the Nyquist criterion. Since MRI scan duration is proportional to the number of acquired samples, CS has been gaining significant attention in MRI. However, the computationally intensive nature of CS reconstructions has precluded their use in routine clinical practice. In this work, we investigate how different throughput-oriented architectures can benefit one CS algorithm and what levels of acceleration are feasible on different modern platforms. We demonstrate that a CUDA-based code running on an NVIDIA Tesla C2050 GPU can reconstruct a 256 × 160 × 80 volume from an 8-channel acquisition in 19 seconds, which is in itself a significant improvement over the state of the art. We then show that Intel's Knights Ferry can perform the same 3D MRI reconstruction in only 12 seconds, bringing CS methods even closer to clinical viability

    Image Reconstructions of Compressed Sensing MRI with Multichannel Data

    Get PDF
    Magnetic resonance imaging (MRI) provides high spatial resolution, high-quality of soft-tissue contrast, and multi-dimensional images. However, the speed of data acquisition limits potential applications. Compressed sensing (CS) theory allowing data being sampled at sub-Nyquist rate provides a possibility to accelerate the MRI scan time. Since most MRI scanners are currently equipped with multi-channel receiver systems, integrating CS with multi-channel systems can further shorten the scan time and also provide a better image quality. In this dissertation, we develop several techniques for integrating CS with parallel MRI. First, we propose a method which extends the reweighted l1 minimization to the CS-MRI with multi-channel data. The individual channel images are recovered according to the reweighted l1 minimization algorithm. Then, the final image is combined by the sum-of-squares method. Computer simulations show that the new method can improve the reconstruction quality at a slightly increased computation cost. Second, we propose a reconstruction approach using the ubiquitously available multi-core CPU to accelerate CS reconstructions of multiple channel data. CS reconstructions for phase array system using iterative l1 minimization are significantly time-consuming, where the computation complexity scales with the number of channels. The experimental results show that the reconstruction efficiency benefits significantly from parallelizing the CS reconstructions, and pipelining multi-channel data on multi-core processors. In our experiments, an additional speedup factor of 1.6 to 2.0 was achieved using the proposed method on a quad-core CPU. Finally, we present an efficient reconstruction method for high-dimensional CS MRI with a GPU platform to shorten the time of iterative computations. Data managements as well as the iterative algorithm are properly designed to meet the way of SIMD (single instruction/multiple data) parallelizations. For three-dimension multi-channel data, all slices along frequency encoding direction and multiple channels are highly parallelized and simultaneously processed within GPU. Generally, the runtime on GPU only requires 2.3 seconds for reconstructing a simulated 4-channel data with a volume size of 256×256×32. Comparing to 67 seconds using CPU, it achieves 28 faster with the proposed method. The rapid reconstruction algorithms demonstrated in this work are expected to help bring high dimensional, multichannel parallel CS MRI closer to clinical applications

    High performance communication on reconfigurable clusters

    Get PDF
    High Performance Computing (HPC) has matured to where it is an essential third pillar, along with theory and experiment, in most domains of science and engineering. Communication latency is a key factor that is limiting the performance of HPC, but can be addressed by integrating communication into accelerators. This integration allows accelerators to communicate with each other without CPU interactions, and even bypassing the network stack. Field Programmable Gate Arrays (FPGAs) are the accelerators that currently best integrate communication with computation. The large number of Multi-gigabit Transceivers (MGTs) on most high-end FPGAs can provide high-bandwidth and low-latency inter-FPGA connections. Additionally, the reconfigurable FPGA fabric enables tight coupling between computation kernel and network interface. Our thesis is that an application-aware communication infrastructure for a multi-FPGA system makes substantial progress in solving the HPC communication bottleneck. This dissertation aims to provide an application-aware solution for communication infrastructure for FPGA-centric clusters. Specifically, our solution demonstrates application-awareness across multiple levels in the network stack, including low-level link protocols, router microarchitectures, routing algorithms, and applications. We start by investigating the low-level link protocol and the impact of its latency variance on performance. Our results demonstrate that, although some link jitter is always present, we can still assume near-synchronous communication on an FPGA-cluster. This provides the necessary condition for statically-scheduled routing. We then propose two novel router microarchitectures for two different kinds of workloads: a wormhole Virtual Channel (VC)-based router for workloads with dynamic communication, and a statically-scheduled Virtual Output Queueing (VOQ)-based router for workloads with static communication. For the first (VC-based) router, we propose a framework that generates application-aware router configurations. Our results show that, by adding application-awareness into router configuration, the network performance of FPGA clusters can be substantially improved. For the second (VOQ-based) router, we propose a novel offline collective routing algorithm. This shows a significant advantage over a state-of-the-art collective routing algorithm. We apply our communication infrastructure to a critical strong-scaling HPC kernel, the 3D FFT. The experimental results demonstrate that the performance of our design is faster than that on CPUs and GPUs by at least one order of magnitude (achieving strong scaling for the target applications). Surprisingly, the FPGA cluster performance is similar to that of an ASIC-cluster. We also implement the 3D FFT on another multi-FPGA platform: the Microsoft Catapult II cloud. Its performance is also comparable or superior to CPU and GPU HPC clusters. The second application we investigate is Molecular Dynamics Simulation (MD). We model MD on both FPGA clouds and clusters. We find that combining processing and general communication in the same device leads to extremely promising performance and the prospect of MD simulations well into the us/day range with a commodity cloud

    SystÚmes de localisation en temps réel basés sur les réseaux de communication sans fil

    Get PDF
    Des techniques fiables de radiolocalisation s’avĂšrent indispensables au dĂ©veloppement d’un grand nombre de nouveaux systĂšmes pertinents. Les techniques de localisation basĂ©es sur les rĂ©seaux de communication sans-fil (WNs) sont particuliĂšrement adĂ©quates aux espaces confinĂ©s et fortement urbanisĂ©s. Le prĂ©sent projet de recherche s’intĂ©resse aux systĂšmes de localisation en temps rĂ©el (RTLS) basĂ©s sur les technologies de communication sans-fil existantes. Deux nouvelles techniques de radiolocalisation alternatives sont proposĂ©es pour amĂ©liorer la prĂ©cision de positionnement des nƓuds sans-fil mobiles par rapport aux mĂ©thodes conventionnelles basĂ©es sur la puissance des signaux reçus (RSS). La premiĂšre mĂ©thode de type gĂ©omĂ©trique propose une nouvelle mĂ©trique de compensation entre les puissances de signaux reçus par rapport Ă  des paires de stations rĂ©ceptrices fixes. L’avantage de cette technique est de rĂ©duire l’effet des variations du milieu de propagation et des puissances d’émission des signaux sur la prĂ©cision de localisation. La mĂȘme mĂ©trique est sĂ©lectionnĂ©e pour former les signatures utilisĂ©es pour crĂ©er la carte radio de l’environnement de localisation durant la phase hors-ligne dans la deuxiĂšme mĂ©thode de type analyse de situation. Durant la phase de localisation en temps rĂ©el, la technique d’acquisition comprimĂ©e (CS) est appliquĂ©e pour retrouver les positions des nƓuds mobiles Ă  partir d’un nombre rĂ©duit d’échantillons de signaux reçus en les comparant Ă  la carte radio prĂ©Ă©tablie. Le calcul d’algĂšbre multilinĂ©aire proposĂ© dans ce travail permet l’utilisation de ce type de mĂ©trique ternaire, Ă©quivalemment la diffĂ©rence des temps d’arrivĂ©e (TDOA), pour calculer les positions des cibles selon la technique de CS. Les deux mĂ©thodes sont ensuite validĂ©es par des simulations et des expĂ©rimentations effectuĂ©es dans des environnements Ă  deux et Ă  trois dimensions. Les expĂ©riences ont Ă©tĂ© menĂ©es dans un bĂątiment multi-Ă©tages (MFB) en utilisant l’infrastructure sans-fil existante pour retrouver conjointement la position et l’étage des cibles en utilisant les techniques proposĂ©es. Un exemple emblĂ©matique de l’application des RTLS dans les zones urbaines est celui des systĂšmes de transport intelligents (ITS) pour amĂ©liorer la sĂ©curitĂ© routiĂšre. Ce projet s’intĂ©resse Ă©galement Ă  la performance d’une application de sĂ©curitĂ© des piĂ©tons au niveau des intersections routiĂšres. L’accomplissement d’un tel systĂšme repose sur l’échange fiable, sous des contraintes temporelles sĂ©vĂšres, des donnĂ©es de positionnement gĂ©ographique entre nƓuds mobiles pour se tenir mutuellement informĂ©s de leurs prĂ©sences et positions afin de prĂ©venir les risques de collision. Ce projet mĂšne une Ă©tude comparative entre deux architectures d’un systĂšme ITS permettant la communication entre piĂ©tons et vĂ©hicules, directement et via une unitĂ© de l’infrastructure, conformĂ©ment aux standards de communication dans les rĂ©seaux ad hoc vĂ©hiculaires (VANETs)

    Abbildungsmethoden fĂŒr die Brust mit einem 3D-Ultraschall-Computertomographen

    Get PDF
    In dieser Arbeit wird die Theorie, Implementierung und Evaluierung von Algorithmen der Ultraschall-Transmissionstomographie fĂŒr den am KIT entwickelten Prototypen 3D-USCT II behandelt. Bisherige Arbeiten gehen von idealen Voraussetzungen aus, diese Arbeit befasst sich hingegen mit der Bildrekonstruktion rauschbehafteter Echtdaten, die in einer klinischen Pilotstudie aufgenommen wurden. Von drei KrebsfĂ€llen konnten mit den Methoden dieser Arbeit zwei eindeutig identifiziert werden

    Probing the sub-thalamic nucleus: development of bio-markers from very Local Field Potentials

    Get PDF
    corecore