Search CORE

11,910 research outputs found

Nuclear Physics from Lattice QCD

Author: Aoki
Aoki
Aoki
Aoki
Basak
Beane
Beane
Beane
Beane
Beane
Beane
Beane
Beane
Beane
Beane
Beane
Beane
Beane
Bernard
Bernard
Bernard
Bernard
Bernard
Blossier
Bulava
Chandrasekharan
Chen
Chen
Chen
Chen
Clark
Colangelo
de Forcrand
de Prony
Della
Detmold
Detmold
Detmold
Detmold
Detmold
Detmold
Duane
Dudek
Dürr
Dürr
Edwards
Edwards
Epelbaum
Epelbaum
Feng
Fleming
Foley
Frezzotti
Fromm
Fukugita
Fukugita
Furman
Gattringer
Gattringer
Gattringer
Ginsparg
Gottlieb
Gupta
Hagler
Hamber
Hasenbusch
Hildebrand
Huang
Ishii
Jenkins
Jung
K. Orginos
Kaplan
Khan
Kogut
Kreuzer
Lee
Lee
Lin
Lin
Liu
Liu
Lu
Luscher
Lüscher
Lüscher
Lüscher
Lüscher
Lüscher
M.J. Savage
Mahbub
Mai
Maiani
Mandula
Matsui
Miao
Michael
Morningstar
Nagata
Narayanan
Nemura
Neuberger
Okamoto
Orginos
Orginos
Orginos
Page
Peardon
Pieper
Renner
Roy
S.R. Beane
Sasaki
Shamir
Shamir
Sheikholeslami
Smigielski
Son
Symanzik
Takaishi
Tan
Torok
Toussaint
W. Detmold
Weinberg
Weingarten
Wilson
Wiringa
Yamazaki
Publication venue: 'Elsevier BV'
Publication date: 25/10/2010
Field of study

We review recent progress toward establishing lattice Quantum Chromodynamics as a predictive calculational framework for nuclear physics. A survey of the current techniques that are used to extract low-energy hadronic scattering amplitudes and interactions is followed by a review of recent two-body and few-body calculations by the NPLQCD collaboration and others. An outline of the nuclear physics that is expected to be accomplished with Lattice QCD in the next decade, along with estimates of the required computational resources, is presented.Comment: 56 pages, 39 pdf figures. Final published versio

arXiv.org e-Print Archive

Crossref

Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems

Author: Barrett R.
GEORG HAGER
GERALD SCHUBERT
GERHARD WELLEIN
HOLGER FEHSKE
Stüben K.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 29/06/2011
Field of study

We evaluate optimized parallel sparse matrix-vector operations for several representative application areas on widespread multicore-based cluster configurations. First the single-socket baseline performance is analyzed and modeled with respect to basic architectural properties of standard multicore chips. Beyond the single node, the performance of parallel sparse matrix-vector operations is often limited by communication overhead. Starting from the observation that nonblocking MPI is not able to hide communication cost using standard MPI implementations, we demonstrate that explicit overlap of communication and computation can be achieved by using a dedicated communication thread, which may run on a virtual core. Moreover we identify performance benefits of hybrid MPI/OpenMP programming due to improved load balancing even without explicit communication overlap. We compare performance results for pure MPI, the widely used "vector-like" hybrid programming strategies, and explicit overlap on a modern multicore-based cluster and a Cray XE6 system.Comment: 16 pages, 10 figure

arXiv.org e-Print Archive

Crossref

Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

Author: A Arnold
A Faradjian
B Hess
C Schütte
G Wilson
JA Anderson
JC Phillips
KJ Bowers
KJ Bowers
L Verlet
M Eleftheriou
M Shirts
MJ Abraham
P Eastman
R Yokota
S Pronk
S Páll
U Essmann
W Humphrey
WM Brown
Y Andoh
Y Sugita
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a constant focus on absolute performance. Release 4.6 of GROMACS uses SIMD acceleration on a wide range of architectures, GPU offloading acceleration, and both OpenMP and MPI parallelism within and between nodes, respectively. The recent work on acceleration made it necessary to revisit the fundamental algorithms of molecular simulation, including the concept of neighborsearching, and we discuss the present and future challenges we see for exascale simulation - in particular a very fine-grained task parallelism. We also discuss the software management, code peer review and continuous integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin

arXiv.org e-Print Archive

Publikationer från KTH

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

MPG.PuRe

Kaon physics from lattice QCD

Author: Lubicz Vittorio
Publication venue
Publication date: 01/01/2009
Field of study

I review lattice calculations and results for hadronic parameters relevant for kaon physics, in particular the vector form factor f+(0) of semileptonic kaon decays, the ratio fK/fpi of leptonic decay constants and the kaon bag parameter BK. For each lattice calculation a colour code rating is assigned, by following a procedure which is being proposed by the Flavianet Lattice Averaging Group (FLAG), and the following final averages are obtained: f+(0)=0.962(3)(4), fK/fpi = 1.196(1)(10) and \hat BK = 0.731(7)(35). In the last part of the talk, the present status of lattice studies of non-leptonic K--> pi pi decays is also briefly summarized.Comment: Plenary talk at 27th International Symposium on Lattice Field Theory (Lattice 2009), Beijing, China, 25-31 Jul 2009. v2: two references and one comment added, typos correcte

arXiv.org e-Print Archive

Archivio Aperto di Ateneo

Archivio della Ricerca - Università di Roma 3

Preparing HPC Applications for the Exascale Era: A Decoupling Strategy

Author: Gioiosa Roberto
Kestor Gokcen
Laure Erwin
Markidis Stefano
Peng Ivy Bo
Publication venue
Publication date: 03/08/2017
Field of study

Production-quality parallel applications are often a mixture of diverse operations, such as computation- and communication-intensive, regular and irregular, tightly coupled and loosely linked operations. In conventional construction of parallel applications, each process performs all the operations, which might result inefficient and seriously limit scalability, especially at large scale. We propose a decoupling strategy to improve the scalability of applications running on large-scale systems. Our strategy separates application operations onto groups of processes and enables a dataflow processing paradigm among the groups. This mechanism is effective in reducing the impact of load imbalance and increases the parallel efficiency by pipelining multiple operations. We provide a proof-of-concept implementation using MPI, the de-facto programming system on current supercomputers. We demonstrate the effectiveness of this strategy by decoupling the reduce, particle communication, halo exchange and I/O operations in a set of scientific and data-analytics applications. A performance evaluation on 8,192 processes of a Cray XC40 supercomputer shows that the proposed approach can achieve up to 4x performance improvement.Comment: The 46th International Conference on Parallel Processing (ICPP-2017

arXiv.org e-Print Archive

Crossref