11,910 research outputs found
Nuclear Physics from Lattice QCD
We review recent progress toward establishing lattice Quantum Chromodynamics
as a predictive calculational framework for nuclear physics. A survey of the
current techniques that are used to extract low-energy hadronic scattering
amplitudes and interactions is followed by a review of recent two-body and
few-body calculations by the NPLQCD collaboration and others. An outline of the
nuclear physics that is expected to be accomplished with Lattice QCD in the
next decade, along with estimates of the required computational resources, is
presented.Comment: 56 pages, 39 pdf figures. Final published versio
Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems
We evaluate optimized parallel sparse matrix-vector operations for several
representative application areas on widespread multicore-based cluster
configurations. First the single-socket baseline performance is analyzed and
modeled with respect to basic architectural properties of standard multicore
chips. Beyond the single node, the performance of parallel sparse matrix-vector
operations is often limited by communication overhead. Starting from the
observation that nonblocking MPI is not able to hide communication cost using
standard MPI implementations, we demonstrate that explicit overlap of
communication and computation can be achieved by using a dedicated
communication thread, which may run on a virtual core. Moreover we identify
performance benefits of hybrid MPI/OpenMP programming due to improved load
balancing even without explicit communication overlap. We compare performance
results for pure MPI, the widely used "vector-like" hybrid programming
strategies, and explicit overlap on a modern multicore-based cluster and a Cray
XE6 system.Comment: 16 pages, 10 figure
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS
GROMACS is a widely used package for biomolecular simulation, and over the
last two decades it has evolved from small-scale efficiency to advanced
heterogeneous acceleration and multi-level parallelism targeting some of the
largest supercomputers in the world. Here, we describe some of the ways we have
been able to realize this through the use of parallelization on all levels,
combined with a constant focus on absolute performance. Release 4.6 of GROMACS
uses SIMD acceleration on a wide range of architectures, GPU offloading
acceleration, and both OpenMP and MPI parallelism within and between nodes,
respectively. The recent work on acceleration made it necessary to revisit the
fundamental algorithms of molecular simulation, including the concept of
neighborsearching, and we discuss the present and future challenges we see for
exascale simulation - in particular a very fine-grained task parallelism. We
also discuss the software management, code peer review and continuous
integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin
Kaon physics from lattice QCD
I review lattice calculations and results for hadronic parameters relevant
for kaon physics, in particular the vector form factor f+(0) of semileptonic
kaon decays, the ratio fK/fpi of leptonic decay constants and the kaon bag
parameter BK. For each lattice calculation a colour code rating is assigned, by
following a procedure which is being proposed by the Flavianet Lattice
Averaging Group (FLAG), and the following final averages are obtained:
f+(0)=0.962(3)(4), fK/fpi = 1.196(1)(10) and \hat BK = 0.731(7)(35). In the
last part of the talk, the present status of lattice studies of non-leptonic
K--> pi pi decays is also briefly summarized.Comment: Plenary talk at 27th International Symposium on Lattice Field Theory
(Lattice 2009), Beijing, China, 25-31 Jul 2009. v2: two references and one
comment added, typos correcte
Preparing HPC Applications for the Exascale Era: A Decoupling Strategy
Production-quality parallel applications are often a mixture of diverse
operations, such as computation- and communication-intensive, regular and
irregular, tightly coupled and loosely linked operations. In conventional
construction of parallel applications, each process performs all the
operations, which might result inefficient and seriously limit scalability,
especially at large scale. We propose a decoupling strategy to improve the
scalability of applications running on large-scale systems.
Our strategy separates application operations onto groups of processes and
enables a dataflow processing paradigm among the groups. This mechanism is
effective in reducing the impact of load imbalance and increases the parallel
efficiency by pipelining multiple operations. We provide a proof-of-concept
implementation using MPI, the de-facto programming system on current
supercomputers. We demonstrate the effectiveness of this strategy by decoupling
the reduce, particle communication, halo exchange and I/O operations in a set
of scientific and data-analytics applications. A performance evaluation on
8,192 processes of a Cray XC40 supercomputer shows that the proposed approach
can achieve up to 4x performance improvement.Comment: The 46th International Conference on Parallel Processing (ICPP-2017
- …