638 research outputs found

    Accelerated High-Resolution Photoacoustic Tomography via Compressed Sensing

    Get PDF
    Current 3D photoacoustic tomography (PAT) systems offer either high image quality or high frame rates but are not able to deliver high spatial and temporal resolution simultaneously, which limits their ability to image dynamic processes in living tissue. A particular example is the planar Fabry-Perot (FP) scanner, which yields high-resolution images but takes several minutes to sequentially map the photoacoustic field on the sensor plane, point-by-point. However, as the spatio-temporal complexity of many absorbing tissue structures is rather low, the data recorded in such a conventional, regularly sampled fashion is often highly redundant. We demonstrate that combining variational image reconstruction methods using spatial sparsity constraints with the development of novel PAT acquisition systems capable of sub-sampling the acoustic wave field can dramatically increase the acquisition speed while maintaining a good spatial resolution: First, we describe and model two general spatial sub-sampling schemes. Then, we discuss how to implement them using the FP scanner and demonstrate the potential of these novel compressed sensing PAT devices through simulated data from a realistic numerical phantom and through measured data from a dynamic experimental phantom as well as from in-vivo experiments. Our results show that images with good spatial resolution and contrast can be obtained from highly sub-sampled PAT data if variational image reconstruction methods that describe the tissues structures with suitable sparsity-constraints are used. In particular, we examine the use of total variation regularization enhanced by Bregman iterations. These novel reconstruction strategies offer new opportunities to dramatically increase the acquisition speed of PAT scanners that employ point-by-point sequential scanning as well as reducing the channel count of parallelized schemes that use detector arrays.Comment: submitted to "Physics in Medicine and Biology

    CUDA Implementation of a Navier-Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows

    Get PDF
    Graphics processor units (GPU) that are traditionally designed for graphics rendering have emerged as massively-parallel co-processors to the central processing unit (CPU). Small-footprint desktop supercomputers with hundreds of cores that can deliver teraflops peak performance at the price of conventional workstations have been realized. A computational fluid dynamics (CFD) simulation capability with rapid computational turnaround time has the potential to transform engineering analysis and design optimization procedures. We describe the implementation of a Navier-Stokes solver for incompressible fluid flow using desktop platforms equipped with multi-GPUs. Specifically, NVIDIA’s Compute Unified Device Architecture (CUDA) programming model is used to implement the discretized form of the governing equations. The projection algorithm to solve the incompressible fluid flow equations is divided into distinct CUDA kernels, and a unique implementation that exploits the memory hierarchy of the CUDA programming model is suggested. Using a quad-GPU platform, we observe two orders of magnitude speedup relative to a serial CPU implementation. Our results demonstrate that multi-GPU desktops can serve as a cost-effective small-footprint parallel computing platform to accelerate CFD simulations substantially. I. Introductio

    MeshfreeFlowNet: A Physics-Constrained Deep Continuous Space-Time Super-Resolution Framework

    Get PDF
    We propose MeshfreeFlowNet, a novel deep learning-based super-resolution framework to generate continuous (grid-free) spatio-temporal solutions from the low-resolution inputs. While being computationally efficient, MeshfreeFlowNet accurately recovers the fine-scale quantities of interest. MeshfreeFlowNet allows for: (i) the output to be sampled at all spatio-temporal resolutions, (ii) a set of Partial Differential Equation (PDE) constraints to be imposed, and (iii) training on fixed-size inputs on arbitrarily sized spatio-temporal domains owing to its fully convolutional encoder. We empirically study the performance of MeshfreeFlowNet on the task of super-resolution of turbulent flows in the Rayleigh-Benard convection problem. Across a diverse set of evaluation metrics, we show that MeshfreeFlowNet significantly outperforms existing baselines. Furthermore, we provide a large scale implementation of MeshfreeFlowNet and show that it efficiently scales across large clusters, achieving 96.80% scaling efficiency on up to 128 GPUs and a training time of less than 4 minutes.Comment: Supplementary Video: https://youtu.be/mjqwPch9gDo. Accepted to SC2

    Schnelle Löser für Partielle Differentialgleichungen

    Get PDF
    This workshop was well attended by 52 participants with broad geographic representation from 11 countries and 3 continents. It was a nice blend of researchers with various backgrounds

    Modeling and Development of Iterative Reconstruction Algorithms in Emerging X-ray Imaging Technologies

    Get PDF
    Many new promising X-ray-based biomedical imaging technologies have emerged over the last two decades. Five different novel X-ray based imaging technologies are discussed in this dissertation: differential phase-contrast tomography (DPCT), grating-based phase-contrast tomography (GB-PCT), spectral-CT (K-edge imaging), cone-beam computed tomography (CBCT), and in-line X-ray phase contrast (XPC) tomosynthesis. For each imaging modality, one or more specific problems prevent them being effectively or efficiently employed in clinical applications have been discussed. Firstly, to mitigate the long data-acquisition times and large radiation doses associated with use of analytic reconstruction methods in DPCT, we analyze the numerical and statistical properties of two classes of discrete imaging models that form the basis for iterative image reconstruction. Secondly, to improve image quality in grating-based phase-contrast tomography, we incorporate 2nd order statistical properties of the object property sinograms, including correlations between them, into the formulation of an advanced multi-channel (MC) image reconstruction algorithm, which reconstructs three object properties simultaneously. We developed an advanced algorithm based on the proximal point algorithm and the augmented Lagrangian method to rapidly solve the MC reconstruction problem. Thirdly, to mitigate image artifacts that arise from reduced-view and/or noisy decomposed sinogram data in K-edge imaging, we exploited the inherent sparseness of typical K-edge objects and incorporated the statistical properties of the decomposed sinograms to formulate two penalized weighted least square problems with a total variation (TV) penalty and a weighted sum of a TV penalty and an l1-norm penalty with a wavelet sparsifying transform. We employed a fast iterative shrinkage/thresholding algorithm (FISTA) and splitting-based FISTA algorithm to solve these two PWLS problems. Fourthly, to enable advanced iterative algorithms to obtain better diagnostic images and accurate patient positioning information in image-guided radiation therapy for CBCT in a few minutes, two accelerated variants of the FISTA for PLS-based image reconstruction are proposed. The algorithm acceleration is obtained by replacing the original gradient-descent step by a sub-problem that is solved by use of the ordered subset concept (OS-SART). In addition, we also present efficient numerical implementations of the proposed algorithms that exploit the massive data parallelism of multiple graphics processing units (GPUs). Finally, we employed our developed accelerated version of FISTA for dealing with the incomplete (and often noisy) data inherent to in-line XPC tomosynthesis which combines the concepts of tomosynthesis and in-line XPC imaging to utilize the advantages of both for biological imaging applications. We also investigate the depth resolution properties of XPC tomosynthesis and demonstrate that the z-resolution properties of XPC tomosynthesis is superior to that of conventional absorption-based tomosynthesis. To investigate all these proposed novel strategies and new algorithms in these different imaging modalities, we conducted computer simulation studies and real experimental data studies. The proposed reconstruction methods will facilitate the clinical or preclinical translation of these emerging imaging methods

    Microwave Tomography Using Stochastic Optimization And High Performance Computing

    Get PDF
    This thesis discusses the application of parallel computing in microwave tomography for detection and imaging of dielectric objects. The main focus is on microwave tomography with the use of a parallelized Finite Difference Time Domain (FDTD) forward solver in conjunction with non-linear stochastic optimization based inverse solvers. Because such solvers require very heavy computation, their investigation has been limited in favour of deterministic inverse solvers that make use of assumptions and approximations of the imaging target. Without the use of linearization assumptions, a non-linear stochastic microwave tomography system is able to resolve targets of arbitrary permittivity contrast profiles while avoiding convergence to local minima of the microwave tomography optimization space. This work is focused on ameliorating this computational load with the use of heavy parallelization. The presented microwave tomography system is capable of modelling complex, heterogeneous, and dispersive media using the Debye model. A detailed explanation of the dispersive FDTD is presented herein. The system uses scattered field data due to multiple excitation angles, frequencies, and observation angles in order to improve target resolution, reduce the ill-posedness of the microwave tomography inverse problem, and improve the accuracy of the complex permittivity profile of the imaging target. The FDTD forward solver is parallelized with the use of the Common Unified Device Architecture (CUDA) programming model developed by NVIDIA corporation. In the forward solver, the time stepping of the fields are computed on a Graphics Processing Unit (GPU). In addition the inverse solver makes use of the Message Passing Interface (MPI) system to distribute computation across multiple work stations. The FDTD method was chosen due to its ease of parallelization using GPU computing, in addition to its ability to simulate wideband excitation signals during a single forward simulation. We investigated the use of distributed Particle Swarm Optimization (PSO) and Differential Evolution (DE) methods in the inverse solver for this microwave tomography system. In these optimization algorithms, candidate solutions are farmed out to separate workstations to be evaluated. As fitness evaluations are returned asynchronously, the optimization algorithm updates the population of candidate solutions and gives new candidate solutions to be evaluated to open workstations. In this manner, we used a total of eight graphics processing units during optimization with minimal downtime. Presented in this thesis is a microwave tomography algorithm that does not rely on linearization assumptions, capable of imaging a target in a reasonable amount of time for clinical applications. The proposed algorithm was tested using numerical phantoms that with material parameters similar to what one would find in normal or malignant human tissue

    An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters

    Get PDF
    Modern graphics processing units (GPUs) with many-core architectures have emerged as general-purpose parallel computing platforms that can accelerate simulation science applications tremendously. While multi-GPU workstations with several TeraFLOPS of peak computing power are available to accelerate computational problems, larger problems require even more resources. Conventional clusters of central processing units (CPU) are now being augmented with multiple GPUs in each compute-node to tackle large problems. The heterogeneous architecture of a multi-GPU cluster with a deep memory hierarchy creates unique challenges in developing scalable and efficient simulation codes. In this study, we pursue mixed MPI-CUDA implementations and investigate three strategies to probe the efficiency and scalability of incompressible flow computations on the Lincoln Tesla cluster at the National Center for Supercomputing Applications (NCSA). We exploit some of the advanced features of MPI and CUDA programming to overlap both GPU data transfer and MPI communications with computations on the GPU. We sustain approximately 2.4 TeraFLOPS on the 64 nodes of the NCSA Lincoln Tesla cluster using 128 GPUs with a total of 30,720 processing elements. Our results demonstrate that multi-GPU clusters can substantially accelerate computational fluid dynamics (CFD) simulations
    • …
    corecore