Search CORE

90 research outputs found

High performance computing: 32nd international conference, ISC high performance 2017, Frankfurt, Germany, June 18-22, 2017, proceedings

Author: Balaji Pavan
Keyes David
Kunkel Julian M
Yokota Rio
Publication venue: Springer International Publishing AG
Publication date: 01/01/2017
Field of study

On the accuracy and usefulness of analytic energy models for contemporary multicore processors

Author: D Khabi
J Hammer
J Hofmann
K Vogeleer De
M Wittmann
Samuel Williams
T Rauber
T Wilde
VW Freeh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/03/2018
Field of study

This paper presents refinements to the execution-cache-memory performance model and a previously published power model for multicore processors. The combination of both enables a very accurate prediction of performance and energy consumption of contemporary multicore processors as a function of relevant parameters such as number of active cores as well as core and Uncore frequencies. Model validation is performed on the Sandy Bridge-EP and Broadwell-EP microarchitectures. Production-related variations in chip quality are demonstrated through a statistical analysis of the fit parameters obtained on one hundred Broadwell-EP CPUs of the same model. Insights from the models are used to explain the performance- and energy-related behavior of the processors for scalable as well as saturating (i.e., memory-bound) codes. In the process we demonstrate the models' capability to identify optimal operating points with respect to highest performance, lowest energy-to-solution, and lowest energy-delay product and identify a set of best practices for energy-efficient execution

arXiv.org e-Print Archive

Crossref

Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study

Author: Afzal Ayesha
Hager Georg
Wellein Gerhard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/08/2019
Field of study

Analytic, first-principles performance modeling of distributed-memory applications is difficult due to a wide spectrum of random disturbances caused by the application and the system. These disturbances (commonly called "noise") destroy the assumptions of regularity that one usually employs when constructing simple analytic models. Despite numerous efforts to quantify, categorize, and reduce such effects, a comprehensive quantitative understanding of their performance impact is not available, especially for long delays that have global consequences for the parallel application. In this work, we investigate various traces collected from synthetic benchmarks that mimic real applications on simulated and real message-passing systems in order to pinpoint the mechanisms behind delay propagation. We analyze the dependence of the propagation speed of idle waves emanating from injected delays with respect to the execution and communication properties of the application, study how such delays decay under increased noise levels, and how they interact with each other. We also show how fine-grained noise can make a system immune against the adverse effects of propagating idle waves. Our results contribute to a better understanding of the collective phenomena that manifest themselves in distributed-memory parallel applications.Comment: 10 pages, 9 figures; title change

arXiv.org e-Print Archive

Crossref

Reviewing the Computational Performance of Structured and Unstructured Grid Deterministic SN Transport Sweeps on Many-core Architectures

Author: Deakin Tom
Hagues Andrew
Lovegrove Justin
McIntosh-Smith Simon
Smedley-Stevenson Richard
Publication venue: 'Informa UK Limited'
Publication date: 07/06/2020
Field of study

Explore Bristol Research

A performance analysis of the first generation of HPC-optimized Arm processors

Author: Deakin Tom
McIntosh-Smith Simon
Poenaru Andrei
Price James
Publication venue: 'Wiley'
Publication date: 25/08/2019
Field of study

Crossref

Explore Bristol Research

SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient

Author: Borzunov Alexander
Dettmers Tim
Diskin Michael
Ryabinin Max
Publication venue
Publication date: 29/06/2023
Field of study

Many deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for specialized HPC clusters. In this work, we consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions. We analyze the performance of existing model-parallel algorithms in these conditions and find configurations where training larger models becomes less communication-intensive. Based on these findings, we propose SWARM parallelism, a model-parallel training algorithm designed for poorly connected, heterogeneous and unreliable devices. SWARM creates temporary randomized pipelines between nodes that are rebalanced in case of failure. We empirically validate our findings and compare SWARM parallelism with existing large-scale training approaches. Finally, we combine our insights with compression strategies to train a large Transformer language model with 1B shared parameters (approximately 13B before sharing) on preemptible T4 GPUs with less than 200Mb/s network.Comment: Accepted to International Conference on Machine Learning (ICML) 2023. 25 pages, 8 figure

arXiv.org e-Print Archive

Long-Term Durability of Rooftop Grid-Connected Solar Photovoltaic Systems

Author: Okorieimoh Chibuisi Chinasaokwu
Publication venue: Technological University Dublin
Publication date: 01/01/2022
Field of study

Compared to their initial performance, solar photovoltaic (PV) arrays show long-term performance degradation, resulting in lower like-for-like efficiencies and performance ratios. The long-term durability of polycrystalline silicon (p-Si) solar PV modules in three roof-top grid-connected arrays has been examined. Electrical output, ambient temperature, cell temperature, solar irradiance, solar irradiation, and wind speed data were collected at hourly intervals from 2017 to 2021 from three 50 kWp PV installations in Northern Ireland. The results show the extent to which higher PV temperatures associated with more intense solar radiation decrease efficiency, fill factor and maximum power output for PV arrays in a temperate climate. Long-term durability trends for grid-connected roof-top solar photovoltaic systems can be obscured by diurnal and seasonal changes in environmental conditions. To reduce the influence of variable conditions, performance ratios (PRcorr) were “corrected” using the measured annual average cell temperature (Tcell_avg). Introduction of this temperature-correction reduced the seasonal variation of the performance ratio. Using temperature-corrected performance ratios, long-term (in this case those seen after fiveyears operation) performance degradation trends become evident with high confidence after six months for one PV array and within three years for the two other arrays. If lower statistical confidence in trends is acceptable, long-term degradation rates can be identified within one year of operation for all PV arrays examined. These results have the important implication that relatively short-duration outdoor PV performance monitoring may be reliably used to estimate long-term degradation and/or to calibrate normally-conducted accelerated testing

Arrow@TUDublin