Search CORE

1,792 research outputs found

nSharma: Numerical Simulation Heterogeneity Aware Runtime Manager for OpenFOAM

Author: A Basermann
AR Brodtkorb
C Chevalier
D Clarke
D Clarke
G Costa Da
JA Martínez
K Barker
K Schloegel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

CFD simulations are a fundamental engineering application,implying huge workloads, often with dynamic behaviour due to run-time mesh refinement. Parallel processing over heterogeneous distributedmemory clusters is often used to process such workloads. The executionof dynamic workloads over a set of heterogeneous resources leads to loadimbalances that severely impacts execution time, when static uniformload distribution is used. This paper proposes applying dynamic, het-erogeneity aware, load balancing techniques within CFD simulations.nSharma, a software package that fully integrates with OpenFOAM, ispresented and assessed. Performance gains are demonstrated, achievedby reducing busy times standard deviation among resources, i.e. hetero-geneous computing resources are kept busy with useful work due to aneffective workload distribution. To best of authors’ knowledge, nSharmais the first implementation and integration of heterogeneity aware loadbalancing in OpenFOAM and will be made publicly available in order tofoster its adoption by the large community of OpenFOAM users.The authors would like to thank the financial funding by FEDER through the COMPETE 2020 Program, the National Funds through FCT under the projects UID/CTM/50025/2013. The first author was partially funded by the PT-FLAD Chair on Smart Cities & Smart Governance and also by the School of Engineering, University of Minho within project Performance Portability on Scalable Heterogeneous Computing Systems. The authors also wish to thank Kyle Mooney for making available his code supporting migration of dynamically refined meshes, as well as acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources

Universidade do Minho: RepositoriUM

Crossref

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors

Author: Liu Weifeng
Vinter Brian
Publication venue: 'Elsevier BV'
Publication date: 14/09/2015
Field of study

Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of cores in a CPU-GPU heterogeneous processor. We first speculatively execute segmented sum operations on the GPU part of a heterogeneous processor and generate a possibly incorrect results. Then the CPU part of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over the best existing CSR-based SpMV algorithms. The source code of this work is downloadable at https://github.com/bhSPARSE/Benchmark_SpMV_using_CSRComment: 22 pages, 8 figures, Published at Parallel Computing (PARCO

arXiv.org e-Print Archive

Copenhagen University Research Information System

Dataflow Programming Paradigms for Computational Chemistry Methods

Author: Jagode Heike
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2017
Field of study

The transition to multicore and heterogeneous architectures has shaped the High Performance Computing (HPC) landscape over the past decades. With the increase in scale, complexity, and heterogeneity of modern HPC platforms, one of the grim challenges for traditional programming models is to sustain the expected performance at scale. By contrast, dataflow programming models have been growing in popularity as a means to deliver a good balance between performance and portability in the post-petascale era. This work introduces dataflow programming models for computational chemistry methods, and compares different dataflow executions in terms of programmability, resource utilization, and scalability. This effort is driven by computational chemistry applications, considering that they comprise one of the driving forces of HPC. In particular, many-body methods, such as Coupled Cluster methods (CC), which are the gold standard to compute energies in quantum chemistry, are of particular interest for the applied chemistry community. On that account, the latest development for CC methods is used as the primary vehicle for this research, but our effort is not limited to CC and can be applied across other application domains. Two programming paradigms for expressing CC methods into a dataflow form, in order to make them capable of utilizing task scheduling systems, are presented. Explicit dataflow, is the programming model where the dataflow is explicitly specified by the developer, is contrasted with implicit dataflow, where a task scheduling runtime derives the dataflow. An abstract model is derived to explore the limits of the different dataflow programming paradigms

University of Tennessee, Knoxville: Trace

Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems

Author: Abalenkovs Maksims
Abdelfattah Ahmad
Dongarra Jack
Gates M.
Haidar A
Kurzak Jakub
Luszczek Piotr
Tomov Stanimire
Yamazaki I.
YarKhan A.
Publication venue: 'FSAEIHE South Ural State University (National Research University)'
Publication date: 01/01/2015
Field of study

The University of Manchester - Institutional Repository

SignalPU: A programming model for DSP applications on parallel and heterogeneous clusters

Author: Houzet Dominique
Huet Sylvain
Mansouri Farouk
Publication venue: IEEE Computer Society
Publication date: 20/08/2014
Field of study

International audience—The biomedical imagery, the numeric communi-cations, the acoustic signal processing and many others digital signal processing applications (DSP) are present more and more everyday in the numeric world. They process growing data volume which is represented with more and more accuracy, and using complex algorithms with time constraints to satisfying. Con-sequently, a high requirement of computing power characterize them. To satisfy this need, it's inevitable today to use parallel and heterogeneous architectures in order to speed-up the processing, where the best examples are the supercomputers like "Tianhe-2" and "Titan" of the ranking top500. These architectures with their multi-core nodes supported by many-core accelerators offer a good response to this problem, but they are still hard to program in order to make performance because of lot of things like synchronization, the memory management, the hardware specifications . . . In the present work, we propose a high level programming model to implement easily and efficiently digital signal processing applications on heterogeneous clusters

Hal - Université Grenoble Alpes

Dynamic load balancing of parallel road traffic simulation

Author: Igbe D.
Igbe D.
Publication venue
Publication date: 01/01/2010
Field of study

The objective of this research was to investigate, develop and evaluate dynamic load-balancing strategies for parallel execution of microscopic road traffic simulations. Urban road traffic simulation presents irregular, and dynamically varying distributed computational load for a parallel processor system. The dynamic nature of road traffic simulation systems lead to uneven load distribution during simulation, even for a system that starts off with even load distributions. Load balancing is a potential way of achieving improved performance by reallocating work from highly loaded processors to lightly loaded processors leading to a reduction in the overall computational time. In dynamic load balancing, workloads are adjusted continually or periodically throughout the computation. In this thesis load balancing strategies were evaluated and some load balancing policies developed. A load index and a profitability determination algorithms were developed. These were used to enhance two load balancing algorithms. One of the algorithms exhibits local communications and distributed load evaluation between the neighbour partitions (diffusion algorithm) and the other algorithm exhibits both local and global communications while the decision making is centralized (MaS algorithm). The enhanced algorithms were implemented and synthesized with a research parallel traffic simulation. The performance of the research parallel traffic simulator, optimized with the two modified dynamic load balancing strategies were studied

WestminsterResearch