4,768 research outputs found
A Customized 3D GPU Poisson Solver for Free Boundary Conditions
A 3-dimensional GPU Poisson solver is developed for all possible combinations
of free and periodic boundary conditions (BCs) along the three directions. It
is benchmarked for various grid sizes and different BCs and a significant
performance gain is observed for problems including one or more free BCs. The
GPU Poisson solver is also benchmarked against two different CPU
implementations of the same method and a significant amount of acceleration of
the computation is observed with the GPU version.Comment: 10 pages, 5 figure
A portable platform for accelerated PIC codes and its application to GPUs using OpenACC
We present a portable platform, called PIC_ENGINE, for accelerating
Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as
Graphic Processing Units (GPUs). The aim of this development is efficient
simulations on future exascale systems by allowing different parallelization
strategies depending on the application problem and the specific architecture.
To this end, this platform contains the basic steps of the PIC algorithm and
has been designed as a test bed for different algorithmic options and data
structures. Among the architectures that this engine can explore, particular
attention is given here to systems equipped with GPUs. The study demonstrates
that our portable PIC implementation based on the OpenACC programming model can
achieve performance closely matching theoretical predictions. Using the Cray
XC30 system, Piz Daint, at the Swiss National Supercomputing Centre (CSCS), we
show that PIC_ENGINE running on an NVIDIA Kepler K20X GPU can outperform the
one on an Intel Sandybridge 8-core CPU by a factor of 3.4
A Multi-Code Analysis Toolkit for Astrophysical Simulation Data
The analysis of complex multiphysics astrophysical simulations presents a
unique and rapidly growing set of challenges: reproducibility, parallelization,
and vast increases in data size and complexity chief among them. In order to
meet these challenges, and in order to open up new avenues for collaboration
between users of multiple simulation platforms, we present yt (available at
http://yt.enzotools.org/), an open source, community-developed astrophysical
analysis and visualization toolkit. Analysis and visualization with yt are
oriented around physically relevant quantities rather than quantities native to
astrophysical simulation codes. While originally designed for handling Enzo's
structure adaptive mesh refinement (AMR) data, yt has been extended to work
with several different simulation methods and simulation codes including Orion,
RAMSES, and FLASH. We report on its methods for reading, handling, and
visualizing data, including projections, multivariate volume rendering,
multi-dimensional histograms, halo finding, light cone generation and
topologically-connected isocontour identification. Furthermore, we discuss the
underlying algorithms yt uses for processing and visualizing data, and its
mechanisms for parallelization of analysis tasks.Comment: 18 pages, 6 figures, emulateapj format. Resubmitted to Astrophysical
Journal Supplement Series with revisions from referee. yt can be found at
http://yt.enzotools.org
Particle-in-Cell algorithms for emerging computer architectures
AbstractWe have designed Particle-in-Cell algorithms for emerging architectures. These algorithms share a common approach, using fine-grained tiles, but different implementations depending on the architecture. On the GPU, there were two different implementations, one with atomic operations and one with no data collisions, using CUDA C and Fortran. Speedups up to about 50 compared to a single core of the Intel i7 processor have been achieved. There was also an implementation for traditional multi-core processors using OpenMP which achieved high parallel efficiency. We believe that this approach should work for other emerging designs such as Intel Phi coprocessor from the Intel MIC architecture
A preliminary approach to intelligent x-ray imaging for baggage inspection at airports
Identifying explosives in baggage at airports relies on being able to characterize the materials that make up an X-ray image. If a suspicion is generated during the imaging process (step 1), the image data could be enhanced by adapting the scanning parameters (step 2). This paper addresses the first part of this problem and uses textural signatures to recognize and characterize materials and hence enabling system control. Directional Gabor-type filtering was applied to a series of different X-ray images. Images were processed in such a way as to simulate a line scanning geometry. Based on our experiments with images of industrial standards and our own samples it was found that different materials could be characterized in terms of the frequency range and orientation of the filters. It was also found that the signal strength generated by the filters could be used as an indicator of visibility and optimum imaging conditions predicted
Andy's Algorithms: new automated digital image analysis pipelines for FIJI.
Quantification of cellular antigens and their interactions via antibody-based detection methods are widely used in scientific research. Accurate high-throughput quantitation of these assays using general image analysis software can be time consuming and challenging, particularly when attempted by users with limited image processing and analysis knowledge. To overcome this, we have designed Andy's Algorithms, a series of automated image analysis pipelines for FIJI, that permits rapid, accurate and reproducible batch-processing of 3,3'-diaminobenzidine (DAB) immunohistochemistry, proximity ligation assays (PLAs) and other common assays. Andy's Algorithms incorporates a step-by-step tutorial and optimization pipeline to make batch image analysis simple for the untrained user and adaptable across laboratories. Andy's algorithms provide a simpler, faster, standardized work flow compared to existing programs, while offering equivalent performance and additional features, in a free to use open-source application of FIJI. Andy's Algorithms are available at GitHub, publicly accessed at https://github.com/andlaw1841/Andy-s-Algorithm
An efficient mixed-precision, hybrid CPU-GPU implementation of a fully implicit particle-in-cell algorithm
Recently, a fully implicit, energy- and charge-conserving particle-in-cell
method has been proposed for multi-scale, full-f kinetic simulations [G. Chen,
et al., J. Comput. Phys. 230,18 (2011)]. The method employs a Jacobian-free
Newton-Krylov (JFNK) solver, capable of using very large timesteps without loss
of numerical stability or accuracy. A fundamental feature of the method is the
segregation of particle-orbit computations from the field solver, while
remaining fully self-consistent. This paper describes a very efficient,
mixed-precision hybrid CPU-GPU implementation of the implicit PIC algorithm
exploiting this feature. The JFNK solver is kept on the CPU in double precision
(DP), while the implicit, charge-conserving, and adaptive particle mover is
implemented on a GPU (graphics processing unit) using CUDA in single-precision
(SP). Performance-oriented optimizations are introduced with the aid of the
roofline model. The implicit particle mover algorithm is shown to achieve up to
400 GOp/s on a Nvidia GeForce GTX580. This corresponds to 25% absolute GPU
efficiency against the peak theoretical performance, and is about 300 times
faster than an equivalent serial CPU (Intel Xeon X5460) execution. For the test
case chosen, the mixed-precision hybrid CPU-GPU solver is shown to over-perform
the DP CPU-only serial version by a factor of \sim 100, without apparent loss
of robustness or accuracy in a challenging long-timescale ion acoustic wave
simulation.Comment: 25 pages, 6 figures, submitted to J. Comput. Phy
- …