Search CORE

6 research outputs found

Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study

Author: Afzal Ayesha
Hager Georg
Wellein Gerhard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/08/2019
Field of study

Analytic, first-principles performance modeling of distributed-memory applications is difficult due to a wide spectrum of random disturbances caused by the application and the system. These disturbances (commonly called "noise") destroy the assumptions of regularity that one usually employs when constructing simple analytic models. Despite numerous efforts to quantify, categorize, and reduce such effects, a comprehensive quantitative understanding of their performance impact is not available, especially for long delays that have global consequences for the parallel application. In this work, we investigate various traces collected from synthetic benchmarks that mimic real applications on simulated and real message-passing systems in order to pinpoint the mechanisms behind delay propagation. We analyze the dependence of the propagation speed of idle waves emanating from injected delays with respect to the execution and communication properties of the application, study how such delays decay under increased noise levels, and how they interact with each other. We also show how fine-grained noise can make a system immune against the adverse effects of propagating idle waves. Our results contribute to a better understanding of the collective phenomena that manifest themselves in distributed-memory parallel applications.Comment: 10 pages, 9 figures; title change

arXiv.org e-Print Archive

Crossref

Analytic Performance Modeling and Analysis of Detailed Neuron Simulations

Author: Cremonesi Francesco
Hager Georg
Schürmann Felix
Wellein Gerhard
Publication venue: 'SAGE Publications'
Publication date: 16/01/2019
Field of study

Big science initiatives are trying to reconstruct and model the brain by attempting to simulate brain tissue at larger scales and with increasingly more biological detail than previously thought possible. The exponential growth of parallel computer performance has been supporting these developments, and at the same time maintainers of neuroscientific simulation code have strived to optimally and efficiently exploit new hardware features. Current state of the art software for the simulation of biological networks has so far been developed using performance engineering practices, but a thorough analysis and modeling of the computational and performance characteristics, especially in the case of morphologically detailed neuron simulations, is lacking. Other computational sciences have successfully used analytic performance engineering and modeling methods to gain insight on the computational properties of simulation kernels, aid developers in performance optimizations and eventually drive co-design efforts, but to our knowledge a model-based performance analysis of neuron simulations has not yet been conducted. We present a detailed study of the shared-memory performance of morphologically detailed neuron simulations based on the Execution-Cache-Memory (ECM) performance model. We demonstrate that this model can deliver accurate predictions of the runtime of almost all the kernels that constitute the neuron models under investigation. The gained insight is used to identify the main governing mechanisms underlying performance bottlenecks in the simulation. The implications of this analysis on the optimization of neural simulation software and eventually co-design of future hardware architectures are discussed. In this sense, our work represents a valuable conceptual and quantitative contribution to understanding the performance properties of biological networks simulations.Comment: 18 pages, 6 figures, 15 table

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Benchmarking the first generation of production quality Arm-based supercomputers

Author: Deakin Tom J
McIntosh-Smith Simon N
Poenaru Andrei
Price James
Publication venue: 'Wiley'
Publication date: 11/11/2019
Field of study

Crossref

Explore Bristol Research

Efficient Parallel Solution of the 3D Stationary Boltzmann Transport Equation for Diffusive Problems

Author: Faverge Mathieu
Févotte François
Moustafa Salli
Plagne Laurent
Ramet Pierre
Publication venue: 'Elsevier BV'
Publication date: 21/03/2019
Field of study

International audienceThis paper presents an efficient parallel method for the deterministic solution of the 3D stationary Boltzmann transport equation applied to diffusive problems such as nuclear core criticality computations. Based on standard MultiGroup-Sn-DD discretization schemes, our approach combines a highly efficient nested parallelization strategy [1] with the PDSA parallel acceleration technique [2] applied for the first time to 3D transport problems. These two key ingredients enable us to solve extremely large neutronic problems involving up to 10 12 degrees of freedom in less than an hour using 64 super-computer nodes

INRIA a CCSD electronic archive server

A performance analysis of the first generation of HPC-optimized Arm processors

Author: Deakin Tom
McIntosh-Smith Simon
Poenaru Andrei
Price James
Publication venue: 'Wiley'
Publication date: 25/08/2019
Field of study

Crossref

Explore Bristol Research