3,614 research outputs found
Decreasing time consumption of microscopy image segmentation through parallel processing on the GPU
The computational performance of graphical processing units (GPUs) has improved significantly. Achieving speedup factors of more than 50x compared to single-threaded CPU execution are not uncommon due to parallel processing. This makes their use for high throughput microscopy image analysis very appealing. Unfortunately, GPU programming is not straightforward and requires a lot of programming skills and effort. Additionally, the attainable speedup factor is hard to predict, since it depends on the type of algorithm, input data and the way in which the algorithm is implemented. In this paper, we identify the characteristic algorithm and data-dependent properties that significantly relate to the achievable GPU speedup. We find that the overall GPU speedup depends on three major factors: (1) the coarse-grained parallelism of the algorithm, (2) the size of the data and (3) the computation/memory transfer ratio. This is illustrated on two types of well-known segmentation methods that are extensively used in microscopy image analysis: SLIC superpixels and high-level geometric active contours. In particular, we find that our used geometric active contour segmentation algorithm is very suitable for parallel processing, resulting in acceleration factors of 50x for 0.1 megapixel images and 100x for 10 megapixel images
BioEM: GPU-accelerated computing of Bayesian inference of electron microscopy images
In cryo-electron microscopy (EM), molecular structures are determined from
large numbers of projection images of individual particles. To harness the full
power of this single-molecule information, we use the Bayesian inference of EM
(BioEM) formalism. By ranking structural models using posterior probabilities
calculated for individual images, BioEM in principle addresses the challenge of
working with highly dynamic or heterogeneous systems not easily handled in
traditional EM reconstruction. However, the calculation of these posteriors for
large numbers of particles and models is computationally demanding. Here we
present highly parallelized, GPU-accelerated computer software that performs
this task efficiently. Our flexible formulation employs CUDA, OpenMP, and MPI
parallelization combined with both CPU and GPU computing. The resulting BioEM
software scales nearly ideally both on pure CPU and on CPU+GPU architectures,
thus enabling Bayesian analysis of tens of thousands of images in a reasonable
time. The general mathematical framework and robust algorithms are not limited
to cryo-electron microscopy but can be generalized for electron tomography and
other imaging experiments
Development of a GPU-based Monte Carlo dose calculation code for coupled electron-photon transport
Monte Carlo simulation is the most accurate method for absorbed dose
calculations in radiotherapy. Its efficiency still requires improvement for
routine clinical applications, especially for online adaptive radiotherapy. In
this paper, we report our recent development on a GPU-based Monte Carlo dose
calculation code for coupled electron-photon transport. We have implemented the
Dose Planning Method (DPM) Monte Carlo dose calculation package (Sempau et al,
Phys. Med. Biol., 45(2000)2263-2291) on GPU architecture under CUDA platform.
The implementation has been tested with respect to the original sequential DPM
code on CPU in phantoms with water-lung-water or water-bone-water slab
geometry. A 20 MeV mono-energetic electron point source or a 6 MV photon point
source is used in our validation. The results demonstrate adequate accuracy of
our GPU implementation for both electron and photon beams in radiotherapy
energy range. Speed up factors of about 5.0 ~ 6.6 times have been observed,
using an NVIDIA Tesla C1060 GPU card against a 2.27GHz Intel Xeon CPU
processor.Comment: 13 pages, 3 figures, and 1 table. Paper revised. Figures update
A Streaming Multi-GPU Implementation of Image Simulation Algorithms for Scanning Transmission Electron Microscopy
Simulation of atomic resolution image formation in scanning transmission
electron microscopy can require significant computation times using traditional
methods. A recently developed method, termed plane-wave reciprocal-space
interpolated scattering matrix (PRISM), demonstrates potential for significant
acceleration of such simulations with negligible loss of accuracy. Here we
present a software package called Prismatic for parallelized simulation of
image formation in scanning transmission electron microscopy (STEM) using both
the PRISM and multislice methods. By distributing the workload between multiple
CUDA-enabled GPUs and multicore processors, accelerations as high as 1000x for
PRISM and 30x for multislice are achieved relative to traditional multislice
implementations using a single 4-GPU machine. We demonstrate a potentially
important application of Prismatic, using it to compute images for atomic
electron tomography at sufficient speeds to include in the reconstruction
pipeline. Prismatic is freely available both as an open-source CUDA/C++ package
with a graphical user interface and as a Python package, PyPrismatic
Accurate and efficient spin integration for particle accelerators
Accurate spin tracking is a valuable tool for understanding spin dynamics in
particle accelerators and can help improve the performance of an accelerator.
In this paper, we present a detailed discussion of the integrators in the spin
tracking code gpuSpinTrack. We have implemented orbital integrators based on
drift-kick, bend-kick, and matrix-kick splits. On top of the orbital
integrators, we have implemented various integrators for the spin motion. These
integrators use quaternions and Romberg quadratures to accelerate both the
computation and the convergence of spin rotations. We evaluate their
performance and accuracy in quantitative detail for individual elements as well
as for the entire RHIC lattice. We exploit the inherently data-parallel nature
of spin tracking to accelerate our algorithms on graphics processing units.Comment: 43 pages, 17 figure
A GPU-accelerated package for simulation of flow in nanoporous source rocks with many-body dissipative particle dynamics
Mesoscopic simulations of hydrocarbon flow in source shales are challenging,
in part due to the heterogeneous shale pores with sizes ranging from a few
nanometers to a few micrometers. Additionally, the sub-continuum fluid-fluid
and fluid-solid interactions in nano- to micro-scale shale pores, which are
physically and chemically sophisticated, must be captured. To address those
challenges, we present a GPU-accelerated package for simulation of flow in
nano- to micro-pore networks with a many-body dissipative particle dynamics
(mDPD) mesoscale model. Based on a fully distributed parallel paradigm, the
code offloads all intensive workloads on GPUs. Other advancements, such as
smart particle packing and no-slip boundary condition in complex pore
geometries, are also implemented for the construction and the simulation of the
realistic shale pores from 3D nanometer-resolution stack images. Our code is
validated for accuracy and compared against the CPU counterpart for speedup. In
our benchmark tests, the code delivers nearly perfect strong scaling and weak
scaling (with up to 512 million particles) on up to 512 K20X GPUs on Oak Ridge
National Laboratory's (ORNL) Titan supercomputer. Moreover, a single-GPU
benchmark on ORNL's SummitDev and IBM's AC922 suggests that the host-to-device
NVLink can boost performance over PCIe by a remarkable 40\%. Lastly, we
demonstrate, through a flow simulation in realistic shale pores, that the CPU
counterpart requires 840 Power9 cores to rival the performance delivered by our
package with four V100 GPUs on ORNL's Summit architecture. This simulation
package enables quick-turnaround and high-throughput mesoscopic numerical
simulations for investigating complex flow phenomena in nano- to micro-porous
rocks with realistic pore geometries
- …