Search CORE

165 research outputs found

Parallel 3D Sweep Kernel with PARSEC

Author: Faverge Mathieu
Moustafa Salli
Plagne Laurent
Ramet Pierre
Publication venue: HAL CCSD
Publication date: 20/08/2014
Field of study

International audienceHigh-fidelity nuclear power plant core simulations require solving the Boltzmann transport equation. In discrete ordinates methods, the most computationally demanding operation of this equation is the sweep operation. Considering the evolution of computer architectures, we propose in this paper, as a first step toward heterogeneous distributed architectures, a hybrid parallel implementation of the sweep operation on top of the generic task-based runtime system: PaRSEC. Such an implementation targets three nested levels of parallelism: message passing, multi-threading, and vectorization. The proposed parallel implementation of the Sweep achieves a sustained performance of 6.1 Tflop/s, corresponding to 33.9% of the peak performance of the targeted supercomputer. This implementation compares favorably with state-of-art solvers such as PARTISN; and it can therefore serve as a building block for a massively parallel version of the neutron transport solver DOMINO developed at EDF

INRIA a CCSD electronic archive server

Efficient Parallel Solution of the 3D Stationary Boltzmann Transport Equation for Diffusive Problems

Author: Faverge Mathieu
Févotte François
Moustafa Salli
Plagne Laurent
Ramet Pierre
Publication venue: 'Elsevier BV'
Publication date: 21/03/2019
Field of study

International audienceThis paper presents an efficient parallel method for the deterministic solution of the 3D stationary Boltzmann transport equation applied to diffusive problems such as nuclear core criticality computations. Based on standard MultiGroup-Sn-DD discretization schemes, our approach combines a highly efficient nested parallelization strategy [1] with the PDSA parallel acceleration technique [2] applied for the first time to 3D transport problems. These two key ingredients enable us to solve extremely large neutronic problems involving up to 10 12 degrees of freedom in less than an hour using 64 super-computer nodes

INRIA a CCSD electronic archive server

Résolution parallèle efficace de l'équation de transport 3D de Boltzmann pour des problèmes diffusifs

Author: Faverge Mathieu
Févotte François
Moustafa Salli
Plagne Laurent
Ramet Pierre
Publication venue: HAL CCSD
Publication date: 05/09/2017
Field of study

This paper presents an efficient parallel method for the deterministic solution of the 3D stationary Boltzmann transport equation applied to diffusive problems such as nuclear core criticality computations. Based on standard MultiGroup-Sn-DD discretization schemes, our approach combines a highly efficient nested parallelization strategy with the PDSA parallel acceleration technique applied for the first time to 3D transport problems. These two key ingredients enable us to solve extremely large neutronic problems involving up to 10^(12) degrees of freedom in less than an hour using 64 super-computer nodes.Ce papier présente une méthode efficace pour le calcul déterministe d'une solution au problème des équations de Boltzmann pour le transport stationnaire 3D appliqué à des problèmes diffusifs de calcul de criticité dans les coeurs de réacteurs nucléaires. Notre approche, basée sur un schéma de discrétisation standard en multi-groupes Sn-DD, combine une stratégie de parallélisation efficace avec la technique d'accélération parallèle PDSA appliquées pour la première fois à des problèmes de transports 3D. Ces deux ingrédients clés nous ont permis de résoudre des problèmes de neutronique extrêmement large impliquant jusqu'à 10^(12) degrés de libertés en moins d'1 heure sur 64 noeuds d'un super-calculateur

INRIA a CCSD electronic archive server

Recommended from our members

Scalable electronic structure methods to solve the Kohn-Sham equation

Author: Lena Charles Manuel
Publication venue
Publication date: 30/01/2018
Field of study

From the single hydrogen to proteins in the hundreds of thousands of kilodaltons, scientists can use the electronic structure of interacting atoms to predict their material properties. Knowing the material properties through solving the electronic structure problem, would allow for the controlled prediction and corresponding design of materials. The Kohn-Sham equations, based on density functional theory, transform a many-body problem impossible to solve for anything but the smallest molecules, into a practical problem which can be used to predict material properties. Although KSDFT scales as the cube of the number of electrons in the system, there are additional well documented approximations to further reduce the number of electrons, such as the pseudopotential method. The incoming exascale era will lead to unavoidable challenges in solving the Kohn-Sham equations. These challenges include communication and hardware considerations. Old paradigms, epitomized by repeated series of globally forced synchronization points, will give way to new breeds of algorithms to maximizing scaling performance while maintaining portability. This thesis focuses on the solution to Kohn-Sham DFT in real space at scale. Key to this effort is a parallel treatment of numerical elements involving the Rayleigh-Ritz method. At minimum, the Rayleigh-Ritz projection requires a number of distributed matrix vector operations equal to the number of electrons solved for in a system. Furthermore, the projection requires that number, squared and then halved, of dot products. The memory cost for such an algorithm also grows very large quickly, and explicit intelligent management is not an option. I demonstrate the computational requirements for the various steps in solving for the electronic structure problem for both large and small molecular systems. This thesis also discusses opportunities in real space Kohn-Sham DFT to further utilize floating point optimized hardware the with higher order stencils.Chemical Engineerin

Texas ScholarWorks

Evaluation of Distributed Programming Models and Extensions to Task-based Runtime Systems

Author: Pei Yu
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2022
Field of study

High Performance Computing (HPC) has always been a key foundation for scientific simulation and discovery. And more recently, deep learning models\u27 training have further accelerated the demand of computational power and lower precision arithmetic. In this era following the end of Dennard\u27s Scaling and when Moore\u27s Law seemingly still holds true to a lesser extent, it is not a coincidence that HPC systems are equipped with multi-cores CPUs and a variety of hardware accelerators that are all massively parallel. Coupling this with interconnect networks\u27 speed improvements lagging behind those of computational power increases, the current state of HPC systems is heterogeneous and extremely complex. This was heralded as a great challenge to the software stacks and their ability to extract performance from these systems, but also as a great opportunity to innovate at the programming model level to explore the different approaches and propose new solutions. With usability, portability, and performance as the main factors to consider, this dissertation first evaluates some of the widely used parallel programming models (MPI, MPI+OpenMP, and task-based runtime systems) ability to manage the load imbalance among the processes computing the LU factorization of a large dense matrix stored in the Block Low-Rank (BLR) format. Next I proposed a number of optimizations and implemented them in PaRSEC\u27s Dynamic Task Discovery (DTD) model, including user-level graph trimming and direct Application Programming Interface (API) calls to perform data broadcast operation to further extend the limit of STF model. On the other hand, the Parameterized Task Graph (PTG) approach in PaRSEC is the most scalable approach for many different applications, which I then explored the possibility of combining both the algorithmic approach of Communication-Avoiding (CA) and the communication-computation overlapping benefits provided by runtime systems using 2D five-point stencil as the test case. This broad programming models evaluation and extension work highlighted the abilities of task-based runtime system in achieving scalable performance and portability on contemporary heterogeneous HPC systems. Finally, I summarized the profiling capability of PaRSEC runtime system, and demonstrated with a use case its important role in the performance bottleneck identification leading to optimizations

University of Tennessee, Knoxville: Trace

Design trade-offs for emerging HPC processors based on mobile market technology

Author: Armejach Sanosa Adrià
Casas Marc
Moreto Planas Miquel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

This is a post-peer-review, pre-copyedit version of an article published in The Journal of Supercomputing. The final authenticated version is available online at: http://dx.doi.org/10.1007/s11227-019-02819-4High-performance computing (HPC) is at the crossroads of a potential transition toward mobile market processor technology. Unlike in prior transitions, numerous hardware vendors and integrators will have access to state-of-the-art processor designs due to Arm’s licensing business model. This fact gives them greater flexibility to implement custom HPC-specific designs. In this paper, we undertake a study to quantify the different energy-performance trade-offs when architecting a processor based on mobile market technology. Through detailed simulations over a representative set of benchmarks, our results show that: (i) a modest amount of last-level cache per core is sufficient, leading to significant power and area savings; (ii) in-order cores offer favorable trade-offs when compared to out-of-order cores for a wide range of benchmarks; and (iii) heterogeneous configurations help to improve processor performance and energy efficiency.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Simulation Framework of Wireless Sensor Network (WSN) Using MATLAB/SIMULINK Software

Author: Ali Qutaiba I.
Publication venue: 'IntechOpen'
Publication date: 26/09/2012
Field of study

IntechOpen

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

Author: Fernandez Ivan
Ghose Saugata
Gómez-Luna Juan
Mutlu Onur
Oliveira Geraldo F.
Orosa Lois
Sadrosadati Mohammad
Vijaykumar Nandita
Publication venue
Publication date: 01/01/2021
Field of study

Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ a range of techniques to reduce overheads tied to data movement, spanning from traditional mechanisms (e.g., deep multi-level cache hierarchies, aggressive hardware prefetchers) to emerging techniques such as Near-Data Processing (NDP), where some computation is moved close to memory. Our goal is to methodically identify potential sources of data movement over a broad set of applications and to comprehensively compare traditional compute-centric data movement mitigation techniques to more memory-centric techniques, thereby developing a rigorous understanding of the best techniques to mitigate each source of data movement. With this goal in mind, we perform the first large-scale characterization of a wide variety of applications, across a wide range of application domains, to identify fundamental program properties that lead to data movement to/from main memory. We develop the first systematic methodology to classify applications based on the sources contributing to data movement bottlenecks. From our large-scale characterization of 77K functions across 345 applications, we select 144 functions to form the first open-source benchmark suite (DAMOV) for main memory data movement studies. We select a diverse range of functions that (1) represent different types of data movement bottlenecks, and (2) come from a wide range of application domains. Using NDP as a case study, we identify new insights about the different data movement bottlenecks and use these insights to determine the most suitable data movement mitigation mechanism for a particular application. We open-source DAMOV and the complete source code for our new characterization methodology at https://github.com/CMU-SAFARI/DAMOV.Comment: Our open source software is available at https://github.com/CMU-SAFARI/DAMO

arXiv.org e-Print Archive

Repository for Publications and Research Data

Directory of Open Access Journals

Hydrodynamic Evolution of Sgr A East: The Imprint of A Supernova Remnant in the Galactic Center

Author: Li Zhiyuan
Morris Ziqian Hua Mark R.
Zhang Mengfei
Publication venue: 'Oxford University Press (OUP)'
Publication date: 21/04/2023
Field of study

We perform three-dimensional numerical simulations to study the hydrodynamic evolution of Sgr A East, the only known supernova remnant (SNR) in the center of our Galaxy, to infer its debated progenitor SN type and its potential impact on the Galactic center environment. Three sets of simulations are performed, each of which represents a represent a certain type of SN explosion (SN Iax, SN Ia or core-collapse SN) expanding against a nuclear outflow of hot gas driven by massive stars, whose thermodynamical properties have been well established by previous work and fixed in the simulations. All three simulations can simultaneously roughly reproduce the extent of Sgr A East and the position and morphology of an arc-shaped thermal X-ray feature, known as the "ridge". Confirming previous work, our simulations show that the ridge is the manifestation of a strong collision between the expanding SN ejecta and the nuclear outflow. The simulation of the core-collapse SN, with an assumed explosion energy of 5x10^50 erg and an ejecta mass of 10 M_sun, can well match the X-ray flux of the ridge, whereas the simulations of the SN Iax and SN Ia explosions underpredict its X-ray emission, due to a smaller ejecta mass. All three simulations constrain the age of Sgr A East to be <1500 yr and predict that the ridge should fade out over the next few hundred years. We address the implications of these results for our understanding of the Galactic center environment.Comment: 21 pages, 18 figures. Accepted for publication on MNRA

arXiv.org e-Print Archive