Search CORE

792 research outputs found

A hybrid MPI-OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence

Author: Mininni Pablo D.
Pouquet Annick
Reddy Raghu
Rosenberg Duane L.
Publication venue
Publication date: 22/03/2010
Field of study

A hybrid scheme that utilizes MPI for distributed memory parallelism and OpenMP for shared memory parallelism is presented. The work is motivated by the desire to achieve exceptionally high Reynolds numbers in pseudospectral computations of fluid turbulence on emerging petascale, high core-count, massively parallel processing systems. The hybrid implementation derives from and augments a well-tested scalable MPI-parallelized pseudospectral code. The hybrid paradigm leads to a new picture for the domain decomposition of the pseudospectral grids, which is helpful in understanding, among other things, the 3D transpose of the global data that is necessary for the parallel fast Fourier transforms that are the central component of the numerical discretizations. Details of the hybrid implementation are provided, and performance tests illustrate the utility of the method. It is shown that the hybrid scheme achieves near ideal scalability up to ~20000 compute cores with a maximum mean efficiency of 83%. Data are presented that demonstrate how to choose the optimal number of MPI processes and OpenMP threads in order to optimize code performance on two different platforms.Comment: Submitted to Parallel Computin

arXiv.org e-Print Archive

CONICET Digital

A Parallel Adaptive P3M code with Hierarchical Particle Reordering

Author: Anderson
Bagla
Balsara
Barnes
Becciani
Blumenthal
Bode
Boris
Brieu
Couchman
Couchman
Dave
Decyk
Dubinski
Dubinski
Eastwood
Efstathiou
Evrard
Ferrell
Frenk
Frigo
Gingold
Greengard
H.M.P. Couchman
Hernquist
Hernquist
Hockney
Kawata
Kravtsov
Li
Lia
MacFarland
Miocchi
Monaghan
Navarro
Pearce
Robert J. Thacker
Serna
Snir
Spergel
Springel
Springel
Steinmetz
Sugimoto
Swarztrauber
Thacker
Thacker
Thacker
Thacker
Theuns
Vetterling
Wadsley
White
Wisdom
Wood
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

We discuss the design and implementation of HYDRA_OMP a parallel implementation of the Smoothed Particle Hydrodynamics-Adaptive P3M (SPH-AP3M) code HYDRA. The code is designed primarily for conducting cosmological hydrodynamic simulations and is written in Fortran77+OpenMP. A number of optimizations for RISC processors and SMP-NUMA architectures have been implemented, the most important optimization being hierarchical reordering of particles within chaining cells, which greatly improves data locality thereby removing the cache misses typically associated with linked lists. Parallel scaling is good, with a minimum parallel scaling of 73% achieved on 32 nodes for a variety of modern SMP architectures. We give performance data in terms of the number of particle updates per second, which is a more useful performance metric than raw MFlops. A basic version of the code will be made available to the community in the near future.Comment: 34 pages, 12 figures, accepted for publication in Computer Physics Communication

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Estimating the Potential Speedup of Computer Vision Applications on Embedded Multiprocessors

Author: Cleyet-Merle Sébastien
Issard Alain
Mancini Stéphane
Schwambach Vítor
Publication venue
Publication date: 26/02/2015
Field of study

Computer vision applications constitute one of the key drivers for embedded multicore architectures. Although the number of available cores is increasing in new architectures, designing an application to maximize the utilization of the platform is still a challenge. In this sense, parallel performance prediction tools can aid developers in understanding the characteristics of an application and finding the most adequate parallelization strategy. In this work, we present a method for early parallel performance estimation on embedded multiprocessors from sequential application traces. We describe its implementation in Parana, a fast trace-driven simulator targeting OpenMP applications on the STMicroelectronics' STxP70 Application-Specific Multiprocessor (ASMP). Results for the FAST key point detector application show an error margin of less than 10% compared to the reference cycle-approximate simulator, with lower modeling effort and up to 20x faster execution time.Comment: Presented at DATE Friday Workshop on Heterogeneous Architectures and Design Methods for Embedded Image Systems (HIS 2015) (arXiv:1502.07241

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

New Algebraic Formulation of Density Functional Calculation

Author: Arias T. A.
Ismail-Beigi Sohrab
Publication venue: 'Elsevier BV'
Publication date: 01/01/2000
Field of study

This article addresses a fundamental problem faced by the ab initio community: the lack of an effective formalism for the rapid exploration and exchange of new methods. To rectify this, we introduce a novel, basis-set independent, matrix-based formulation of generalized density functional theories which reduces the development, implementation, and dissemination of new ab initio techniques to the derivation and transcription of a few lines of algebra. This new framework enables us to concisely demystify the inner workings of fully functional, highly efficient modern ab initio codes and to give complete instructions for the construction of such for calculations employing arbitrary basis sets. Within this framework, we also discuss in full detail a variety of leading-edge ab initio techniques, minimization algorithms, and highly efficient computational kernels for use with scalar as well as shared and distributed-memory supercomputer architectures

arXiv.org e-Print Archive

CiteSeerX

Crossref

A Parallel Mesh-Adaptive Framework for Hyperbolic Conservation Laws

Author: Berger
Friedel
Fryxell
Godunov
Grauer
Grauer
Groth
Hilbert
Jürgen Dreher
Keppens
Kurganov
Lax
MacNeice
Nessyahu
Powell
Rainer Grauer
Roe
Steiner
Toro
Tóth
Woodward
Ziegler
Zumbusch
Zumbusch
Publication venue: 'Elsevier BV'
Publication date: 01/02/2006
Field of study

We report on the development of a computational framework for the parallel, mesh-adaptive solution of systems of hyperbolic conservation laws like the time-dependent Euler equations in compressible gas dynamics or Magneto-Hydrodynamics (MHD) and similar models in plasma physics. Local mesh refinement is realized by the recursive bisection of grid blocks along each spatial dimension, implemented numerical schemes include standard finite-differences as well as shock-capturing central schemes, both in connection with Runge-Kutta type integrators. Parallel execution is achieved through a configurable hybrid of POSIX-multi-threading and MPI-distribution with dynamic load balancing. One- two- and three-dimensional test computations for the Euler equations have been carried out and show good parallel scaling behavior. The Racoon framework is currently used to study the formation of singularities in plasmas and fluids.Comment: late submissio

arXiv.org e-Print Archive

Crossref

CERN Document Server

SpECTRE: A Task-based Discontinuous Galerkin Code for Relativistic Astrophysics

Author: Bohn Andy
Deppe Nils
Diener Peter
Field Scott E.
Foucart Francois
Hébert François
Kidder Lawrence E.
Lippuner Jonas
Miller Jonah
Ott Christian D.
Scheel Mark A.
Schnetter Erik
Teukolsky Saul A.
Vincent Trevor
Publication venue: 'Elsevier BV'
Publication date: 15/04/2017
Field of study

We introduce a new relativistic astrophysics code, SpECTRE, that combines a discontinuous Galerkin method with a task-based parallelism model. SpECTRE's goal is to achieve more accurate solutions for challenging relativistic astrophysics problems such as core-collapse supernovae and binary neutron star mergers. The robustness of the discontinuous Galerkin method allows for the use of high-resolution shock capturing methods in regions where (relativistic) shocks are found, while exploiting high-order accuracy in smooth regions. A task-based parallelism model allows efficient use of the largest supercomputers for problems with a heterogeneous workload over disparate spatial and temporal scales. We argue that the locality and algorithmic structure of discontinuous Galerkin methods will exhibit good scalability within a task-based parallelism framework. We demonstrate the code on a wide variety of challenging benchmark problems in (non)-relativistic (magneto)-hydrodynamics. We demonstrate the code's scalability including its strong scaling on the NCSA Blue Waters supercomputer up to the machine's full capacity of 22,380 nodes using 671,400 threads.Comment: 41 pages, 13 figures, and 7 tables. Ancillary data contains simulation input file

arXiv.org e-Print Archive

Louisiana State University

Caltech Authors

FullSWOF_Paral: Comparison of two parallelization strategies (MPI and SKELGIS) on a software designed for hydrology applications

Author: Cordier Stéphane
Coullon Hélène
Delestre Olivier
Laguerre Christian
Le Minh Hoang
Pierre Daniel
Sadaka Georges
Publication venue: 'EDP Sciences'
Publication date: 18/07/2013
Field of study

In this paper, we perform a comparison of two approaches for the parallelization of an existing, free software, FullSWOF 2D (http://www. univ-orleans.fr/mapmo/soft/FullSWOF/ that solves shallow water equations for applications in hydrology) based on a domain decomposition strategy. The first approach is based on the classical MPI library while the second approach uses Parallel Algorithmic Skeletons and more precisely a library named SkelGIS (Skeletons for Geographical Information Systems). The first results presented in this article show that the two approaches are similar in terms of performance and scalability. The two implementation strategies are however very different and we discuss the advantages of each one.Comment: 27 page

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

HAL Descartes