197 research outputs found
A hybrid MPI-OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence
A hybrid scheme that utilizes MPI for distributed memory parallelism and
OpenMP for shared memory parallelism is presented. The work is motivated by the
desire to achieve exceptionally high Reynolds numbers in pseudospectral
computations of fluid turbulence on emerging petascale, high core-count,
massively parallel processing systems. The hybrid implementation derives from
and augments a well-tested scalable MPI-parallelized pseudospectral code. The
hybrid paradigm leads to a new picture for the domain decomposition of the
pseudospectral grids, which is helpful in understanding, among other things,
the 3D transpose of the global data that is necessary for the parallel fast
Fourier transforms that are the central component of the numerical
discretizations. Details of the hybrid implementation are provided, and
performance tests illustrate the utility of the method. It is shown that the
hybrid scheme achieves near ideal scalability up to ~20000 compute cores with a
maximum mean efficiency of 83%. Data are presented that demonstrate how to
choose the optimal number of MPI processes and OpenMP threads in order to
optimize code performance on two different platforms.Comment: Submitted to Parallel Computin
SpF: Enabling Petascale Performance for Pseudospectral Dynamo Models
Pseudospectral (PS) methods possess a number of characteristics (e.g., efficiency, accuracy, natural boundary conditions) that are extremely desirable for dynamo models. Unfortunately, dynamo models based upon PS methods face a number of daunting challenges, which include exposing additional parallelism, leveraging hardware accelerators, exploiting hybrid parallelism, and improving the scalability of global memory transposes. Although these issues are a concern for most models, solutions for PS methods tend to require far more pervasive changes to underlying data and control structures. Further, improvements in performance in one model are difficult to transfer to other models, resulting in significant duplication of effort across the research community.We have developed an extensible software framework for pseudospectral methods called SpF that is intended to enable extreme scalability and optimal performance. High-level abstractions provided by SpF unburden applications of the responsibility of managing domain decomposition and load balance while reducing the changes in code required to adapt to new computing architectures. The key design concept in SpF is that each phase of the numerical calculation is partitioned into disjoint numerical kernels that can be performed entirely in-processor. The granularity of domain-decomposition provided by SpF is only constrained by the data-locality requirements of these kernels. SpF builds on top of optimized vendor libraries for common numerical operations such as transforms, matrix solvers, etc., but can also be configured to use open source alternatives for portability. SpF includes several alternative schemes for global data redistribution and is expected to serve as an ideal testbed for further research into optimal approaches for different network architectures.In this presentation, we will describe the basic architecture of SpF as well as preliminary performance data and experience with adapting legacy dynamo codes. We will conclude with a discussion of planned extensions to SpF that will provide pseudospectral applications with additional flexibility with regard to time integration, linear solvers, and discretization in the radial direction
Solving the Klein-Gordon equation using Fourier spectral methods: A benchmark test for computer performance
The cubic Klein-Gordon equation is a simple but non-trivial partial
differential equation whose numerical solution has the main building blocks
required for the solution of many other partial differential equations. In this
study, the library 2DECOMP&FFT is used in a Fourier spectral scheme to solve
the Klein-Gordon equation and strong scaling of the code is examined on
thirteen different machines for a problem size of 512^3. The results are useful
in assessing likely performance of other parallel fast Fourier transform based
programs for solving partial differential equations. The problem is chosen to
be large enough to solve on a workstation, yet also of interest to solve
quickly on a supercomputer, in particular for parametric studies. Unlike other
high performance computing benchmarks, for this problem size, the time to
solution will not be improved by simply building a bigger supercomputer.Comment: 10 page
High performance Python for direct numerical simulations of turbulent flows
Direct Numerical Simulations (DNS) of the Navier Stokes equations is an
invaluable research tool in fluid dynamics. Still, there are few publicly
available research codes and, due to the heavy number crunching implied,
available codes are usually written in low-level languages such as C/C++ or
Fortran. In this paper we describe a pure scientific Python pseudo-spectral DNS
code that nearly matches the performance of C++ for thousands of processors and
billions of unknowns. We also describe a version optimized through Cython, that
is found to match the speed of C++. The solvers are written from scratch in
Python, both the mesh, the MPI domain decomposition, and the temporal
integrators. The solvers have been verified and benchmarked on the Shaheen
supercomputer at the KAUST supercomputing laboratory, and we are able to show
very good scaling up to several thousand cores.
A very important part of the implementation is the mesh decomposition (we
implement both slab and pencil decompositions) and 3D parallel Fast Fourier
Transforms (FFT). The mesh decomposition and FFT routines have been implemented
in Python using serial FFT routines (either NumPy, pyFFTW or any other serial
FFT module), NumPy array manipulations and with MPI communications handled by
MPI for Python (mpi4py). We show how we are able to execute a 3D parallel FFT
in Python for a slab mesh decomposition using 4 lines of compact Python code,
for which the parallel performance on Shaheen is found to be slightly better
than similar routines provided through the FFTW library. For a pencil mesh
decomposition 7 lines of code is required to execute a transform
Data Mining and Machine Learning in Astronomy
We review the current state of data mining and machine learning in astronomy.
'Data Mining' can have a somewhat mixed connotation from the point of view of a
researcher in this field. If used correctly, it can be a powerful approach,
holding the potential to fully exploit the exponentially increasing amount of
available data, promising great scientific advance. However, if misused, it can
be little more than the black-box application of complex computing algorithms
that may give little physical insight, and provide questionable results. Here,
we give an overview of the entire data mining process, from data collection
through to the interpretation of results. We cover common machine learning
algorithms, such as artificial neural networks and support vector machines,
applications from a broad range of astronomy, emphasizing those where data
mining techniques directly resulted in improved science, and important current
and future directions, including probability density functions, parallel
algorithms, petascale computing, and the time domain. We conclude that, so long
as one carefully selects an appropriate algorithm, and is guided by the
astronomical problem at hand, data mining can be very much the powerful tool,
and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra
figures, some minor additions to the tex
- …