563 research outputs found

    GPU-Accelerated Simulation of Elastic Wave Propagation

    Get PDF
    Modeling of ultrasound waves propagation in hard biological materials such as bones and skull has a rapidly growing area of applications, e.g. brain cancer treatment planing, deep brain neurostimulation and neuromodulation, and opening blood brain barriers. Recently, we have developed a novel numerical model of elastic wave propagation based on the Kelvin-Voigt model accounting for linear elastic wave proration in heterogeneous absorption media. Although, the model offers unprecedented fidelity, its computational requirements have been prohibitive for realistic simulations. This paper presents an optimized version of the simulation model accelerated by the Nvidia CUDA language and deployed on the best GPUs including the Nvidia P100 accelerators present in the Piz Daint supercomputer. The native CUDA code reaches a speed-up of 5.4 when compared to the Matlab prototype accelerated by the Parallel Computing Toolbox running on the same GPU. Such reduction in computation time enables computation of large-scale treatment plans in terms of hours

    High Resolution 3D Ultrasonic Breast Imaging by Time-Domain Full Waveform Inversion

    Get PDF
    Ultrasound tomography (UST) scanners allow quantitative images of the human breast's acoustic properties to be derived with potential applications in screening, diagnosis and therapy planning. Time domain full waveform inversion (TD-FWI) is a promising UST image formation technique that fits the parameter fields of a wave physics model by gradient-based optimization. For high resolution 3D UST, it holds three key challenges: Firstly, its central building block, the computation of the gradient for a single US measurement, has a restrictively large memory footprint. Secondly, this building block needs to be computed for each of the 10310410^3-10^4 measurements, resulting in a massive parallel computation usually performed on large computational clusters for days. Lastly, the structure of the underlying optimization problem may result in slow progression of the solver and convergence to a local minimum. In this work, we design and evaluate a comprehensive computational strategy to overcome these challenges: Firstly, we introduce a novel gradient computation based on time reversal that dramatically reduces the memory footprint at the expense of one additional wave simulation per source. Secondly, we break the dependence on the number of measurements by using source encoding (SE) to compute stochastic gradient estimates. Also we describe a more accurate, TD-specific SE technique with a finer variance control and use a state-of-the-art stochastic LBFGS method. Lastly, we design an efficient TD multi-grid scheme together with preconditioning to speed up the convergence while avoiding local minima. All components are evaluated in extensive numerical proof-of-concept studies simulating a bowl-shaped 3D UST breast scanner prototype. Finally, we demonstrate that their combination allows us to obtain an accurate 442x442x222 voxel image with a resolution of 0.5mm using Matlab on a single GPU within 24h

    Anelastic sensitivity kernels with parsimonious storage for adjoint tomography and full waveform inversion

    Full text link
    We introduce a technique to compute exact anelastic sensitivity kernels in the time domain using parsimonious disk storage. The method is based on a reordering of the time loop of time-domain forward/adjoint wave propagation solvers combined with the use of a memory buffer. It avoids instabilities that occur when time-reversing dissipative wave propagation simulations. The total number of required time steps is unchanged compared to usual acoustic or elastic approaches. The cost is reduced by a factor of 4/3 compared to the case in which anelasticity is partially accounted for by accommodating the effects of physical dispersion. We validate our technique by performing a test in which we compare the KαK_\alpha sensitivity kernel to the exact kernel obtained by saving the entire forward calculation. This benchmark confirms that our approach is also exact. We illustrate the importance of including full attenuation in the calculation of sensitivity kernels by showing significant differences with physical-dispersion-only kernels

    Development and validation of real-time simulation of X-ray imaging with respiratory motion

    Get PDF
    International audienceWe present a framework that combines evolutionary optimisation, soft tissue modelling and ray tracing on GPU to simultaneously compute the respiratory motion and X-ray imaging in real-time. Our aim is to provide validated building blocks with high fidelity to closely match both the human physiology and the physics of X-rays. A CPU-based set of algorithms is presented to model organ behaviours during respiration. Soft tissue deformation is computed with an extension of the Chain Mail method. Rigid elements move according to kinematic laws. A GPU-based surface rendering method is proposed to compute the X-ray image using the Beer-Lambert law. It is provided as an open-source library. A quantitative validation study is provided to objectively assess the accuracy of both components: i) the respiration against anatomical data, and ii) the X-ray against the Beer-Lambert law and the results of Monte Carlo simulations. Our implementation can be used in various applications, such as interactive medical virtual environment to train percutaneous transhepatic cholangiography in interventional radiology, 2D/3D registration, computation of digitally reconstructed radiograph, simulation of 4D sinograms to test tomography reconstruction tools

    Use of multi-GPU systems for large FFTs: with applications in ultrasound simulations

    Get PDF
    Ultrasound simulations are a type of application that are both computationally and communicatively intensive. With better performance, implementations of these can be used in designing new ultrasound probes, developing better signal processing techniques, training new ultrasonographers, in treatment planning and many other uses [11]. The pseudo-spectral technique can be used effectively to express the wave-propagation model used in these simulations, and is characterised by its use of the Fast Fourier Transform (FFT). The FFT can account for over half of the time spent by ultrasound simulations, with the remaining consisting of embarrassingly parallel arithmetic [28]. The use of a Graphics Processing Unit (GPU) for general computations like the FFT has become ubiquitous with favourable performance. The current trend in the design of the Central Processing Unit (CPU) of most systems has seen a shift from single-core to multi-core processing with these now being assembled into multi-socket configurations. GPUs are already massively multi-core processors typically with three or four times as many cores the question remains: will GPUs follow a similar trend and incorporate multiple devices in individual sockets when implemented? The purpose of the work in this thesis is to assess the viability of multi-GPU systems for ultrasound simulations in terms of cost and performance compared to other system designs that offer similar computational resources. Current machine hardware is capable of supporting multiple GPU through peripheral devices and offers a glimpse of the potential of future machines however, relatively little work has been reported on the use of such systems for ultrasound simulations and the FFT algorithm. In this thesis, to address this issue, we benchmark and model the device-to-device communication potential of an existing multi-GPU system. Four different methods are considered, namely: via CPU, pointer swapping, hybrid-staged, and kernel. The results reveal that the pointer swapping and kernel based methods of managing communication can be up to twice as efficient as other methods. The methods for communication identified in the benchmarks are then used as the basis for a number of important generic communication functions, which are in turn used to implement a distributed 3D FFT algorithm as required by the ultrasound simulation. The multi-GPU distributed 3D FFT with four GPUs was found to be up to 18% faster than an existing FFT implementation on a six core CPU. This multi-GPU distributed 3D FFT implementation is then used in an ultra- sound simulation as a proof-of-concept case study of the thesis. By overlapping communication and computation between the CPU and GPU resources a speed up of 8% is observed

    Acceleration of Axisymetric Ultrasound Simulations

    Get PDF
    Simulácia šírenia ultrazvuku prostredníctvom mäkkých biologických tkanív má širokú škálu praktických aplikácií. Patria sem dizajn prevodníkov pre diagnostický a terapeutický ultrazvuk, vývoj nových metód spracovania signálov a zobrazovacích techník, štúdium anomálií ultrazvukových lúčov v heterogénnych médiách, ultrazvuková klasifikácia tkanív, učenie rádiológov používať ultrazvukové zariadenia a interpretáciu ultrazvukových obrazov, modelové vrstvenie medicínskeho obrazu a plánovanie liečby pre ultrazvuk s vysokou intenzitou. Ultrazvuková simulácia však predstavuje výpočtovo zložitý problém, pretože simulačné domény sú veľmi veľké v porovnaní s akustickými vlnovými dĺžkami, ktoré sú predmetom záujmu. Ale ak je problém osovo symetrický, problém môže byť riešený v 2D.To umožňuje spúšťanie simulácií na mriežke s väčším počtom bodov, s menším využitím výpoč- tových zdrojov za kratšiu dobu. Táto práca modeluje a implementuje zrýchlenie vlnovej nelineárnej ultrazvukovej simulácie v axisymetrickom súradnicovom systéme realizovanom v Matlabe pomocou Mex súborov pre diskrétne sínové a kosínové transformácie. Axisymetrická simulácia bola implementovaná v C++ ako open source rozšírenie K-WAVE toolboxu. Kód je optimalizovaný na beh na jednom uzle superpočítaču Salomon (IT4Innovations, Ostrava, Česká republika) s dvoma dvanásť-jadrovými procesormi Intel Xeon E5-2680v3. Na maximalizáciu výpočtovej efektívnosti boli vykonané viaceré optimalizácie kódu. Po prvé, fourierové tramsformácie boli vypočítané pomocou real-to-complex FFT z knižnice FFTW. V porovnaní s complex-to-complex FFT to znížilo čas výpočtu a pamäť spojenú s výpočtom FFT o takmer 50%. Taktiež diskrétne sínové a kosínové transformácie sa počítali pomocou knižnice FFTW, ktoré v Matlab verzii museli byť vyvolané z dynamicky načítaných MEX súborov. Po druhé, aby sa znížilo zaťaženie priepustnosti pamäte, boli všetky operácie počítané jednoduchej presnosti pohyblivej rádovej čiarky. Po tretie, elementárne operá- cie boli paralelizované pomocou OpenMP a potom vektorizované pomocou rozšírení SIMD (SSE). Celkový výpočet C++ verzie je až do 34-násobne rýchlejší a využíva menej ako tretinu pamäte ako Matlab verzia simulácie. Simulácia ktorá by trvala takmer dva dni tak môže byť vypočítaná za jeden a pol hodinu. Toto všetko umožňuje počítať simuláciu na výpočetnej mriežke s veľkosťou 16384 × 8192 bodov v primeranom čase.The simulation of ultrasound propagation through soft biological tissue has a wide range of practical applications. These include the design of transducers for diagnostic and therapeutic ultrasound, the development of new signal processing and imaging techniques, studying the aberration of ultrasound beams in heterogeneous media, ultrasonic tissue classification, training ultrasonographers to use ultrasound equipment and interpret ultrasound images, model-based medical image registration, and treatment planning and dosimetry for high-intensity focused ultrasound. However, ultrasound simulation presents a computationally difficult problem, as simulation domains are very large compared with the acoustic wavelengths of interest. But if the problem is axisymmetric, the governing equations can also be solved in 2D. This allows running simulations with larger grid size, with less computational resources and in a shorter time. This paper model and implements an acceleration of the Full-wave Nonlinear Ultrasound Simulation in an Axisymmetric Coordinate System implemented in Matlab using Mex Files for FFTW DST and DCT transformations. The axisymmetric simulation was implemented in C++ as an extension to the open source K-WAVE toolbox. The codes were optimized to run using one node of Salomon supercomputer cluster (IT4Innovations, Ostrava, Czechia) with two twelve-core Intel Xeon E5-2680v3 processors. To maximize computational efficiency, several stages of code optimization were performed. First, the FFTs were computed using the real-to-complex FFT from the FFTW library. Compared to the complex-to-complex FFT, this reduced the compute time and memory associated with the FFT by nearly 50%. Also, real-to-real DCTs and DSTs were computed using FFTW library, which ones in Matlab version, had to be invoked from dynamically loaded MEX Files. Second, to save memory bandwidth, all operations were computed in single precision. Third, element-wise operations were parallelized using OpenMP and then optimized using streaming SIMD extensions (SSE). The overall computation of the C++ k-space model is up to 34-times faster and uses less than one-third of the memory than Matlab version. The simulation which would take nearly two days by Matlab implementation can be now computed in one and half hour. This all allows running the simulation on the computational grid with 16384 × 8192 grid points within a reasonable time.

    Three-Dimensional Photoacoustic Computed Tomography: Imaging Models and Reconstruction Algorithms

    Get PDF
    Photoacoustic computed tomography: PACT), also known as optoacoustic tomography, is a rapidly emerging imaging modality that holds great promise for a wide range of biomedical imaging applications. Much effort has been devoted to the investigation of imaging physics and the optimization of experimental designs. Meanwhile, a variety of image reconstruction algorithms have been developed for the purpose of computed tomography. Most of these algorithms assume full knowledge of the acoustic pressure function on a measurement surface that either encloses the object or extends to infinity, which poses many difficulties for practical applications. To overcome these limitations, iterative image reconstruction algorithms have been actively investigated. However, little work has been conducted on imaging models that incorporate the characteristics of data acquisition systems. Moreover, when applying to experimental data, most studies simplify the inherent three-dimensional wave propagation as two-dimensional imaging models by introducing heuristic assumptions on the transducer responses and/or the object structures. One important reason is because three-dimensional image reconstruction is computationally burdensome. The inaccurate imaging models severely limit the performance of iterative image reconstruction algorithms in practice. In the dissertation, we propose a framework to construct imaging models that incorporate the characteristics of ultrasonic transducers. Based on the imaging models, we systematically investigate various iterative image reconstruction algorithms, including advanced algorithms that employ total variation-norm regularization. In order to accelerate three-dimensional image reconstruction, we develop parallel implementations on graphic processing units. In addition, we derive a fast Fourier-transform based analytical image reconstruction formula. By use of iterative image reconstruction algorithms based on the proposed imaging models, PACT imaging scanners can have a compact size while maintaining high spatial resolution. The research demonstrates, for the first time, the feasibility and advantages of iterative image reconstruction algorithms in three-dimensional PACT
    corecore