Search CORE

1,355 research outputs found

A hybrid MPI-OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence

Author: Mininni Pablo D.
Pouquet Annick
Reddy Raghu
Rosenberg Duane L.
Publication venue
Publication date: 22/03/2010
Field of study

A hybrid scheme that utilizes MPI for distributed memory parallelism and OpenMP for shared memory parallelism is presented. The work is motivated by the desire to achieve exceptionally high Reynolds numbers in pseudospectral computations of fluid turbulence on emerging petascale, high core-count, massively parallel processing systems. The hybrid implementation derives from and augments a well-tested scalable MPI-parallelized pseudospectral code. The hybrid paradigm leads to a new picture for the domain decomposition of the pseudospectral grids, which is helpful in understanding, among other things, the 3D transpose of the global data that is necessary for the parallel fast Fourier transforms that are the central component of the numerical discretizations. Details of the hybrid implementation are provided, and performance tests illustrate the utility of the method. It is shown that the hybrid scheme achieves near ideal scalability up to ~20000 compute cores with a maximum mean efficiency of 83%. Data are presented that demonstrate how to choose the optimal number of MPI processes and OpenMP threads in order to optimize code performance on two different platforms.Comment: Submitted to Parallel Computin

arXiv.org e-Print Archive

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

Large scale ab initio calculations based on three levels of parallelization

Author: Andrew Knyazev
Blöchl
Brommer
Bylander
Cricchio
Davidson
Dulub
François Bottin
Gilles Zérah
Goedecker
Gonze
Hohenberg
Hutter
Hutter
Knyazev
Knyazev
Knyazev
Knyazev
Knyazev
Kohn
Kresse
Kresse
Kresse
Lanczos
Ogitsu
Payne
Pulay
Segeva
Skylaris
Skylaris
Stéphane Leroux
Sugino
Teter
Vadali
Vanderbilt
Wang
Wood
Yang
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

We suggest and implement a parallelization scheme based on an efficient multiband eigenvalue solver, called the locally optimal block preconditioned conjugate gradient LOBPCG method, and using an optimized three-dimensional (3D) fast Fourier transform (FFT) in the ab initio}plane-wave code ABINIT. In addition to the standard data partitioning over processors corresponding to different k-points, we introduce data partitioning with respect to blocks of bands as well as spatial partitioning in the Fourier space of coefficients over the plane waves basis set used in ABINIT. This k-points-multiband-FFT parallelization avoids any collective communications on the whole set of processors relying instead on one-dimensional communications only. For a single k-point, super-linear scaling is achieved for up to 100 processors due to an extensive use of hardware optimized BLAS, LAPACK, and SCALAPACK routines, mainly in the LOBPCG routine. We observe good performance up to 200 processors. With 10 k-points our three-way data partitioning results in linear scaling up to 1000 processors for a practical system used for testing.Comment: 8 pages, 5 figures. Accepted to Computational Material Scienc

arXiv.org e-Print Archive

CiteSeerX

Crossref

COLAB:A Collaborative Multi-factor Scheduler for Asymmetric Multicore Processors

Author: Janjic Vladimir
Leather Hugh
Petoumenos Pavlos
Thomson John Donald
Yu Teng
Zhu Mingcan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Funding: Partially funded by the UK EPSRC grants Discovery: Pattern Discovery and Program Shaping for Many-core Systems (EP/P020631/1) and ABC: Adaptive Brokerage for Cloud (EP/R010528/1); Royal Academy of Engineering under the Research Fellowship scheme.Increasingly prevalent asymmetric multicore processors (AMP) are necessary for delivering performance in the era of limited power budget and dark silicon. However, the software fails to use them efficiently. OS schedulers, in particular, handle asymmetry only under restricted scenarios. We have efficient symmetric schedulers, efficient asymmetric schedulers for single-threaded workloads, and efficient asymmetric schedulers for single program workloads. What we do not have is a scheduler that can handle all runtime factors affecting AMP for multi-threaded multi-programmed workloads. This paper introduces the first general purpose asymmetry-aware scheduler for multi-threaded multi-programmed workloads. It estimates the performance of each thread on each type of core and identifies communication patterns and bottleneck threads. The scheduler then makes coordinated core assignment and thread selection decisions that still provide each application its fair share of the processor's time. We evaluate our approach using the GEM5 simulator on four distinct big.LITTLE configurations and 26 mixed workloads composed of PARSEC and SPLASH2 benchmarks. Compared to the state-of-the art Linux CFS and AMP-aware schedulers, we demonstrate performance gains of up to 25% and 5% to 15% on average depending on the hardware setup.Postprin

Crossref

The University of Manchester - Institutional Repository

University of Dundee Online Publications

University of St. Andrews - Pure

St Andrews Research Repository

Write-Avoiding Algorithms

Author: Demmel James W.
Erin Carson
Grigori Laura
Harsha Vardhan
Nick Knight
Penporn Koanantakool
Schwartz Oded
Publication venue: HAL CCSD
Publication date: 01/01/2016
Field of study

Short version of the technical report available at http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-163.pdf as Technical Report No. UCB/EECS-2015-163International audienc

INRIA a CCSD electronic archive server

Hal-Diderot