6,211 research outputs found
PyFR: An Open Source Framework for Solving Advection-Diffusion Type Problems on Streaming Architectures using the Flux Reconstruction Approach
High-order numerical methods for unstructured grids combine the superior
accuracy of high-order spectral or finite difference methods with the geometric
flexibility of low-order finite volume or finite element schemes. The Flux
Reconstruction (FR) approach unifies various high-order schemes for
unstructured grids within a single framework. Additionally, the FR approach
exhibits a significant degree of element locality, and is thus able to run
efficiently on modern streaming architectures, such as Graphical Processing
Units (GPUs). The aforementioned properties of FR mean it offers a promising
route to performing affordable, and hence industrially relevant,
scale-resolving simulations of hitherto intractable unsteady flows within the
vicinity of real-world engineering geometries. In this paper we present PyFR,
an open-source Python based framework for solving advection-diffusion type
problems on streaming architectures using the FR approach. The framework is
designed to solve a range of governing systems on mixed unstructured grids
containing various element types. It is also designed to target a range of
hardware platforms via use of an in-built domain specific language based on the
Mako templating engine. The current release of PyFR is able to solve the
compressible Euler and Navier-Stokes equations on grids of quadrilateral and
triangular elements in two dimensions, and hexahedral elements in three
dimensions, targeting clusters of CPUs, and NVIDIA GPUs. Results are presented
for various benchmark flow problems, single-node performance is discussed, and
scalability of the code is demonstrated on up to 104 NVIDIA M2090 GPUs. The
software is freely available under a 3-Clause New Style BSD license (see
www.pyfr.org)
Spectral Ewald Acceleration of Stokesian Dynamics for polydisperse suspensions
In this work we develop the Spectral Ewald Accelerated Stokesian Dynamics
(SEASD), a novel computational method for dynamic simulations of polydisperse
colloidal suspensions with full hydrodynamic interactions. SEASD is based on
the framework of Stokesian Dynamics (SD) with extension to compressible
solvents, and uses the Spectral Ewald (SE) method [Lindbo & Tornberg, J.
Comput. Phys. 229 (2010) 8994] for the wave-space mobility computation. To meet
the performance requirement of dynamic simulations, we use Graphic Processing
Units (GPU) to evaluate the suspension mobility, and achieve an order of
magnitude speedup compared to a CPU implementation. For further speedup, we
develop a novel far-field block-diagonal preconditioner to reduce the far-field
evaluations in the iterative solver, and SEASD-nf, a polydisperse extension of
the mean-field Brownian approximation of Banchio & Brady [J. Chem. Phys. 118
(2003) 10323]. We extensively discuss implementation and parameter selection
strategies in SEASD, and demonstrate the spectral accuracy in the mobility
evaluation and the overall computation scaling. We
present three computational examples to further validate SEASD and SEASD-nf in
monodisperse and bidisperse suspensions: the short-time transport properties,
the equilibrium osmotic pressure and viscoelastic moduli, and the steady shear
Brownian rheology. Our validation results show that the agreement between SEASD
and SEASD-nf is satisfactory over a wide range of parameters, and also provide
significant insight into the dynamics of polydisperse colloidal suspensions.Comment: 39 pages, 21 figure
QMCPACK: Advances in the development, efficiency, and application of auxiliary field and real-space variational and diffusion Quantum Monte Carlo
We review recent advances in the capabilities of the open source ab initio
Quantum Monte Carlo (QMC) package QMCPACK and the workflow tool Nexus used for
greater efficiency and reproducibility. The auxiliary field QMC (AFQMC)
implementation has been greatly expanded to include k-point symmetries,
tensor-hypercontraction, and accelerated graphical processing unit (GPU)
support. These scaling and memory reductions greatly increase the number of
orbitals that can practically be included in AFQMC calculations, increasing
accuracy. Advances in real space methods include techniques for accurate
computation of band gaps and for systematically improving the nodal surface of
ground state wavefunctions. Results of these calculations can be used to
validate application of more approximate electronic structure methods including
GW and density functional based techniques. To provide an improved foundation
for these calculations we utilize a new set of correlation-consistent effective
core potentials (pseudopotentials) that are more accurate than previous sets;
these can also be applied in quantum-chemical and other many-body applications,
not only QMC. These advances increase the efficiency, accuracy, and range of
properties that can be studied in both molecules and materials with QMC and
QMCPACK
Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code
Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance.
In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented
Pushing the Limits of 3D Color Printing: Error Diffusion with Translucent Materials
Accurate color reproduction is important in many applications of 3D printing,
from design prototypes to 3D color copies or portraits. Although full color is
available via other technologies, multi-jet printers have greater potential for
graphical 3D printing, in terms of reproducing complex appearance properties.
However, to date these printers cannot produce full color, and doing so poses
substantial technical challenges, from the shear amount of data to the
translucency of the available color materials. In this paper, we propose an
error diffusion halftoning approach to achieve full color with multi-jet
printers, which operates on multiple isosurfaces or layers within the object.
We propose a novel traversal algorithm for voxel surfaces, which allows the
transfer of existing error diffusion algorithms from 2D printing. The resulting
prints faithfully reproduce colors, color gradients and fine-scale details.Comment: 15 pages, 14 figures; includes supplemental figure
Finding faint HI structure in and around galaxies: scraping the barrel
Soon to be operational HI survey instruments such as APERTIF and ASKAP will
produce large datasets. These surveys will provide information about the HI in
and around hundreds of galaxies with a typical signal-to-noise ratio of
10 in the inner regions and 1 in the outer regions. In addition, such
surveys will make it possible to probe faint HI structures, typically located
in the vicinity of galaxies, such as extra-planar-gas, tails and filaments.
These structures are crucial for understanding galaxy evolution, particularly
when they are studied in relation to the local environment. Our aim is to find
optimized kernels for the discovery of faint and morphologically complex HI
structures. Therefore, using HI data from a variety of galaxies, we explore
state-of-the-art filtering algorithms. We show that the intensity-driven
gradient filter, due to its adaptive characteristics, is the optimal choice. In
fact, this filter requires only minimal tuning of the input parameters to
enhance the signal-to-noise ratio of faint components. In addition, it does not
degrade the resolution of the high signal-to-noise component of a source. The
filtering process must be fast and be embedded in an interactive visualization
tool in order to support fast inspection of a large number of sources. To
achieve such interactive exploration, we implemented a multi-core CPU (OpenMP)
and a GPU (OpenGL) version of this filter in a 3D visualization environment
().Comment: 17 pages, 9 figures, 4 tables. Astronomy and Computing, accepte
- …