Search CORE

3,992 research outputs found

4.45 Pflops Astrophysical N-Body Simulation on K computer -- The Gravitational Trillion-Body Problem

Author: Ishiyama Tomoaki
Makino Junichiro
Nitadori Keigo
Publication venue
Publication date: 13/04/2015
Field of study

As an entry for the 2012 Gordon-Bell performance prize, we report performance results of astrophysical N-body simulations of one trillion particles performed on the full system of K computer. This is the first gravitational trillion-body simulation in the world. We describe the scientific motivation, the numerical algorithm, the parallelization strategy, and the performance analysis. Unlike many previous Gordon-Bell prize winners that used the tree algorithm for astrophysical N-body simulations, we used the hybrid TreePM method, for similar level of accuracy in which the short-range force is calculated by the tree algorithm, and the long-range force is solved by the particle-mesh algorithm. We developed a highly-tuned gravity kernel for short-range forces, and a novel communication algorithm for long-range forces. The average performance on 24576 and 82944 nodes of K computer are 1.53 and 4.45 Pflops, which correspond to 49% and 42% of the peak speed.Comment: 10 pages, 6 figures, Proceedings of Supercomputing 2012 (http://sc12.supercomputing.org/), Gordon Bell Prize Winner. Additional information is http://www.ccs.tsukuba.ac.jp/CCS/eng/gbp201

arXiv.org e-Print Archive

CiteSeerX

A low-cost parallel implementation of direct numerical simulation of wall turbulence

Author: Bertolotti
del Álamo
Dmitruk
Günther
Iovieno
Jiménez
Kim
Kim
Kwok
Lele
Mahesh
Maurizio Quadrio
Moin
Moser
Na
Paolo Luchini
Pelz
Pozzi
Quadrio
Quadrio
Spotz
Thomas
Publication venue: 'Elsevier BV'
Publication date: 18/06/2005
Field of study

A numerical method for the direct numerical simulation of incompressible wall turbulence in rectangular and cylindrical geometries is presented. The distinctive feature resides in its design being targeted towards an efficient distributed-memory parallel computing on commodity hardware. The adopted discretization is spectral in the two homogeneous directions; fourth-order accurate, compact finite-difference schemes over a variable-spacing mesh in the wall-normal direction are key to our parallel implementation. The parallel algorithm is designed in such a way as to minimize data exchange among the computing machines, and in particular to avoid taking a global transpose of the data during the pseudo-spectral evaluation of the non-linear terms. The computing machines can then be connected to each other through low-cost network devices. The code is optimized for memory requirements, which can moreover be subdivided among the computing nodes. The layout of a simple, dedicated and optimized computing system based on commodity hardware is described. The performance of the numerical method on this computing system is evaluated and compared with that of other codes described in the literature, as well as with that of the same code implementing a commonly employed strategy for the pseudo-spectral calculation.Comment: To be published in J. Comp. Physic

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Archivio della Ricerca - Università di Salerno

CERN Document Server

A Hybrid Decomposition Parallel Implementation of the Car-Parrinello Method

Author: Andersen
Andreoni
Angelopoulos
Bachelet
Ballone
Brocks
Brommer
Brommer
Car
Car
Car
Clarke
Gupta
Hannes Jónsson
Hohenberg
Hohl
Hohl
Hoover
James Wiggs
King-Smith
Kleinman
Kohn
Littlefield
Marinescu
Nelson
Nosé
Payne
Ryckaert
Troullier
Wiggs
Williams
Štich
Štich
Štich
Publication venue: 'Elsevier BV'
Publication date: 14/11/1994
Field of study

We have developed a flexible hybrid decomposition parallel implementation of the first-principles molecular dynamics algorithm of Car and Parrinello. The code allows the problem to be decomposed either spatially, over the electronic orbitals, or any combination of the two. Performance statistics for 32, 64, 128 and 512 Si atom runs on the Touchstone Delta and Intel Paragon parallel supercomputers and comparison with the performance of an optimized code running the smaller systems on the Cray Y-MP and C90 are presented.Comment: Accepted by Computer Physics Communications, latex, 34 pages without figures, 15 figures available in PostScript form via WWW at http://www-theory.chem.washington.edu/~wiggs/hyb_figures.htm

arXiv.org e-Print Archive

Crossref

Matched filters for coalescing binaries detection on massively parallel computers

Author: A. Viceré
Allen
Bartoloni
Blanchet
Blanchet
Christ
Cutler
E. Calzavarini
F. Schifano
L. Sartori
Owen
Owen
R. Tripiccione
Rudiger
Tripiccione
Viceré
Publication venue: 'Elsevier BV'
Publication date: 18/07/2002
Field of study

We discuss some computational problems associated to matched filtering of experimental signals from gravitational wave interferometric detectors in a parallel-processing environment. We then specialize our discussion to the use of the APEmille and apeNEXT processors for this task. Finally, we accurately estimate the performance of an APEmille system on a computational load appropriate for the LIGO and VIRGO experiments, and extrapolate our results to apeNEXT.Comment: 19 pages, 6 figure

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Urbino

Crossref

Archivio istituzionale della ricerca - Università di Ferrara

CERN Document Server

Benchmarking CPUs and GPUs on embedded platforms for software receiver usage

Author: Bär W.
Closas Gómez Pau
Dampf J.
Fürlinger K.
García Molina J. A.
Pany T.
Stöber C.
Winkel J.
Publication venue
Publication date: 01/01/2015
Field of study

Smartphones containing multi-core central processing units (CPUs) and powerful many-core graphics processing units (GPUs) bring supercomputing technology into your pocket (or into our embedded devices). This can be exploited to produce power-efficient, customized receivers with flexible correlation schemes and more advanced positioning techniques. For example, promising techniques such as the Direct Position Estimation paradigm or usage of tracking solutions based on particle filtering, seem to be very appealing in challenging environments but are likewise computationally quite demanding. This article sheds some light onto recent embedded processor developments, benchmarks Fast Fourier Transform (FFT) and correlation algorithms on representative embedded platforms and relates the results to the use in GNSS software radios. The use of embedded CPUs for signal tracking seems to be straight forward, but more research is required to fully achieve the nominal peak performance of an embedded GPU for FFT computation. Also the electrical power consumption is measured in certain load levels.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Computational Physics on Graphics Processing Units

Author: A. Asadchev
A. Castro
A. Harju
A. Harju
A. McAdams
A.G. Anderson
A.P. Lyubartsev
A.W. Götz
B.L. Tembre
C. Bonati
C. McNeile
C.M. Isborn
D.J. Hardy
E. Darve
G. Bhanot
G. Egri
G. Kresse
H.J. Rothe
I. Montvay
I. Samish
I. Ufimtsev
I.S. Ufimtsev
I.S. Ufimtsev
I.S. Ufimtsev
J. Enkovaara
J. Gao
J. Hubbard
J.A. Anderson
J.A. McCammon
J.E. Stone
J.S. Meredith
K. Esler
K. Moreland
K. Yasuda
K. Yasuda
L. Genovese
L. Genovese
L. Greengard
L. Gu
L. Ha
M. Bordag
M. Göckeler
M. Hasenbusch
M. Hutchinson
M. Macedonia
M.C. Gutzwiller
M.C. Payne
M.P. Allen
N. Cardoso
N. Goodnight
N. Luehr
N.A. Gumerov
P. Giannozzi
P. Kipfer
P. Petreczky
R. Parr
R.D. Mawhinney
R.D. Skeel
R.G. Belleman
S. Hakala
S. Ihnatsenka
S. Maintz
T. Shirakawa
T. Siro
T. Takahashi
T.W. Chiu
V. Rokhlin
V. Springel
W. Jia
W. Kohn
W.M.C. Foulkes
X. Andrade
Y. Aoki
Y. Chen
Z. Fodor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The use of graphics processing units for scientific computations is an emerging strategy that can significantly speed up various different algorithms. In this review, we discuss advances made in the field of computational physics, focusing on classical molecular dynamics, and on quantum simulations for electronic structure calculations using the density functional theory, wave function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012, Helsinki, Finland, June 10-13, 201

arXiv.org e-Print Archive

Crossref

Meteorological modelling on the ICL distributed array processor and other parallel computers

Author: Carver Glenn Derek
Publication venue: The University of Edinburgh
Publication date: 01/01/1990
Field of study

Edinburgh Research Archive