90 research outputs found
On the accuracy and usefulness of analytic energy models for contemporary multicore processors
This paper presents refinements to the execution-cache-memory performance
model and a previously published power model for multicore processors. The
combination of both enables a very accurate prediction of performance and
energy consumption of contemporary multicore processors as a function of
relevant parameters such as number of active cores as well as core and Uncore
frequencies. Model validation is performed on the Sandy Bridge-EP and
Broadwell-EP microarchitectures. Production-related variations in chip quality
are demonstrated through a statistical analysis of the fit parameters obtained
on one hundred Broadwell-EP CPUs of the same model. Insights from the models
are used to explain the performance- and energy-related behavior of the
processors for scalable as well as saturating (i.e., memory-bound) codes. In
the process we demonstrate the models' capability to identify optimal operating
points with respect to highest performance, lowest energy-to-solution, and
lowest energy-delay product and identify a set of best practices for
energy-efficient execution
Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study
Analytic, first-principles performance modeling of distributed-memory
applications is difficult due to a wide spectrum of random disturbances caused
by the application and the system. These disturbances (commonly called "noise")
destroy the assumptions of regularity that one usually employs when
constructing simple analytic models. Despite numerous efforts to quantify,
categorize, and reduce such effects, a comprehensive quantitative understanding
of their performance impact is not available, especially for long delays that
have global consequences for the parallel application. In this work, we
investigate various traces collected from synthetic benchmarks that mimic real
applications on simulated and real message-passing systems in order to pinpoint
the mechanisms behind delay propagation. We analyze the dependence of the
propagation speed of idle waves emanating from injected delays with respect to
the execution and communication properties of the application, study how such
delays decay under increased noise levels, and how they interact with each
other. We also show how fine-grained noise can make a system immune against the
adverse effects of propagating idle waves. Our results contribute to a better
understanding of the collective phenomena that manifest themselves in
distributed-memory parallel applications.Comment: 10 pages, 9 figures; title change
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Many deep learning applications benefit from using large models with billions
of parameters. Training these models is notoriously expensive due to the need
for specialized HPC clusters. In this work, we consider alternative setups for
training large models: using cheap "preemptible" instances or pooling existing
resources from multiple regions. We analyze the performance of existing
model-parallel algorithms in these conditions and find configurations where
training larger models becomes less communication-intensive. Based on these
findings, we propose SWARM parallelism, a model-parallel training algorithm
designed for poorly connected, heterogeneous and unreliable devices. SWARM
creates temporary randomized pipelines between nodes that are rebalanced in
case of failure. We empirically validate our findings and compare SWARM
parallelism with existing large-scale training approaches. Finally, we combine
our insights with compression strategies to train a large Transformer language
model with 1B shared parameters (approximately 13B before sharing) on
preemptible T4 GPUs with less than 200Mb/s network.Comment: Accepted to International Conference on Machine Learning (ICML) 2023.
25 pages, 8 figure
Long-Term Durability of Rooftop Grid-Connected Solar Photovoltaic Systems
Compared to their initial performance, solar photovoltaic (PV) arrays show long-term performance degradation, resulting in lower like-for-like efficiencies and performance ratios. The long-term durability of polycrystalline silicon (p-Si) solar PV modules in three roof-top grid-connected arrays has been examined. Electrical output, ambient temperature, cell temperature, solar irradiance, solar irradiation, and wind speed data were collected at hourly intervals from 2017 to 2021 from three 50 kWp PV installations in Northern Ireland. The results show the extent to which higher PV temperatures associated with more intense solar radiation decrease efficiency, fill factor and maximum power output for PV arrays in a temperate climate.
Long-term durability trends for grid-connected roof-top solar photovoltaic systems can be obscured by diurnal and seasonal changes in environmental conditions. To reduce the influence of variable conditions, performance ratios (PRcorr) were “corrected” using the measured annual average cell temperature (Tcell_avg). Introduction of this temperature-correction reduced the seasonal variation of the performance ratio.
Using temperature-corrected performance ratios, long-term (in this case those seen after fiveyears operation) performance degradation trends become evident with high confidence after six months for one PV array and within three years for the two other arrays. If lower statistical confidence in trends is acceptable, long-term degradation rates can be identified within one year of operation for all PV arrays examined.
These results have the important implication that relatively short-duration outdoor PV performance monitoring may be reliably used to estimate long-term degradation and/or to calibrate normally-conducted accelerated testing
- …