90 research outputs found

    On the accuracy and usefulness of analytic energy models for contemporary multicore processors

    Full text link
    This paper presents refinements to the execution-cache-memory performance model and a previously published power model for multicore processors. The combination of both enables a very accurate prediction of performance and energy consumption of contemporary multicore processors as a function of relevant parameters such as number of active cores as well as core and Uncore frequencies. Model validation is performed on the Sandy Bridge-EP and Broadwell-EP microarchitectures. Production-related variations in chip quality are demonstrated through a statistical analysis of the fit parameters obtained on one hundred Broadwell-EP CPUs of the same model. Insights from the models are used to explain the performance- and energy-related behavior of the processors for scalable as well as saturating (i.e., memory-bound) codes. In the process we demonstrate the models' capability to identify optimal operating points with respect to highest performance, lowest energy-to-solution, and lowest energy-delay product and identify a set of best practices for energy-efficient execution

    Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study

    Full text link
    Analytic, first-principles performance modeling of distributed-memory applications is difficult due to a wide spectrum of random disturbances caused by the application and the system. These disturbances (commonly called "noise") destroy the assumptions of regularity that one usually employs when constructing simple analytic models. Despite numerous efforts to quantify, categorize, and reduce such effects, a comprehensive quantitative understanding of their performance impact is not available, especially for long delays that have global consequences for the parallel application. In this work, we investigate various traces collected from synthetic benchmarks that mimic real applications on simulated and real message-passing systems in order to pinpoint the mechanisms behind delay propagation. We analyze the dependence of the propagation speed of idle waves emanating from injected delays with respect to the execution and communication properties of the application, study how such delays decay under increased noise levels, and how they interact with each other. We also show how fine-grained noise can make a system immune against the adverse effects of propagating idle waves. Our results contribute to a better understanding of the collective phenomena that manifest themselves in distributed-memory parallel applications.Comment: 10 pages, 9 figures; title change

    SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient

    Full text link
    Many deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for specialized HPC clusters. In this work, we consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions. We analyze the performance of existing model-parallel algorithms in these conditions and find configurations where training larger models becomes less communication-intensive. Based on these findings, we propose SWARM parallelism, a model-parallel training algorithm designed for poorly connected, heterogeneous and unreliable devices. SWARM creates temporary randomized pipelines between nodes that are rebalanced in case of failure. We empirically validate our findings and compare SWARM parallelism with existing large-scale training approaches. Finally, we combine our insights with compression strategies to train a large Transformer language model with 1B shared parameters (approximately 13B before sharing) on preemptible T4 GPUs with less than 200Mb/s network.Comment: Accepted to International Conference on Machine Learning (ICML) 2023. 25 pages, 8 figure

    Long-Term Durability of Rooftop Grid-Connected Solar Photovoltaic Systems

    Get PDF
    Compared to their initial performance, solar photovoltaic (PV) arrays show long-term performance degradation, resulting in lower like-for-like efficiencies and performance ratios. The long-term durability of polycrystalline silicon (p-Si) solar PV modules in three roof-top grid-connected arrays has been examined. Electrical output, ambient temperature, cell temperature, solar irradiance, solar irradiation, and wind speed data were collected at hourly intervals from 2017 to 2021 from three 50 kWp PV installations in Northern Ireland. The results show the extent to which higher PV temperatures associated with more intense solar radiation decrease efficiency, fill factor and maximum power output for PV arrays in a temperate climate. Long-term durability trends for grid-connected roof-top solar photovoltaic systems can be obscured by diurnal and seasonal changes in environmental conditions. To reduce the influence of variable conditions, performance ratios (PRcorr) were “corrected” using the measured annual average cell temperature (Tcell_avg). Introduction of this temperature-correction reduced the seasonal variation of the performance ratio. Using temperature-corrected performance ratios, long-term (in this case those seen after fiveyears operation) performance degradation trends become evident with high confidence after six months for one PV array and within three years for the two other arrays. If lower statistical confidence in trends is acceptable, long-term degradation rates can be identified within one year of operation for all PV arrays examined. These results have the important implication that relatively short-duration outdoor PV performance monitoring may be reliably used to estimate long-term degradation and/or to calibrate normally-conducted accelerated testing
    • …
    corecore