13 research outputs found

    Investigation of LSTM Based Prediction for Dynamic Energy Management in Chip Multiprocessors

    Get PDF
    In this paper, we investigate the effectiveness of using long short-term memory (LSTM) instead of Kalman filtering to do prediction for the purpose of constructing dynamic energy management (DEM) algorithms in chip multi-processors (CMPs). Either of the two prediction methods is employed to estimate the workload in the next control period for each of the processor cores. These estimates are then used to select voltage-frequency (VF) pairs for each core of the CMP during the next control period as part of a dynamic voltage and frequency scaling (DVFS) technique. The objective of the DVFS technique is to reduce energy consumption under performance constraints that are set by the user. We conduct our investigation using a custom Sniper system simulation framework. Simulation results for 16 and 64 core network-on-chip based CMP architectures and using several benchmarks demonstrate that the LSTM is slightly better than Kalman filtering

    Investigation of LSTM Based Prediction for Dynamic Energy Management in Chip Multiprocessors

    Get PDF
    In this paper, we investigate the effectiveness of using long short-term memory (LSTM) instead of Kalman filtering to do prediction for the purpose of constructing dynamic energy management (DEM) algorithms in chip multi-processors (CMPs). Either of the two prediction methods is employed to estimate the workload in the next control period for each of the processor cores. These estimates are then used to select voltage-frequency (VF) pairs for each core of the CMP during the next control period as part of a dynamic voltage and frequency scaling (DVFS) technique. The objective of the DVFS technique is to reduce energy consumption under performance constraints that are set by the user. We conduct our investigation using a custom Sniper system simulation framework. Simulation results for 16 and 64 core network-on-chip based CMP architectures and using several benchmarks demonstrate that the LSTM is slightly better than Kalman filtering

    Runtime estimation of performance–power in CMPs under QoS constraints

    Get PDF
    One of the main challenges in data center systems is operating under certain Quality of Service (QoS) while minimizing power consumption. Increasingly, data centers are exploring and adopting heterogeneous server architectures with different power and performance trade-offs. This not only requires careful understanding of the application behavior across multiple architectures at runtime so as to enable meeting power and performance requirements but also an understanding of individual and aggregated behaviour of application and server level performance and power metrics

    REPP-H: runtime estimation of power and performance on heterogeneous data centers

    Get PDF
    Modern data centers increasingly demand improved performance with minimal power consumption. Managing the power and performance requirements of the applications is challenging because these data centers, incidentally or intentionally, have to deal with server architecture heterogeneity [19], [22]. One critical challenge that data centers have to face is how to manage system power and performance given the different application behavior across multiple different architectures.This work has been supported by the EU FP7 program (Mont-Blanc 2, ICT-610402), by the Ministerio de Economia (CAP-VII, TIN2015-65316-P), and the Generalitat de Catalunya (MPEXPAR, 2014-SGR-1051). The material herein is based in part upon work supported by the US NSF, grant numbers ACI-1535232 and CNS-1305220.Peer ReviewedPostprint (author's final draft

    Dynamic Energy Management for Chip Multi-processors under Performance Constraints

    Get PDF
    We introduce a novel algorithm for dynamic energy management (DEM) under performance constraints in chip multi-processors (CMPs). Using the novel concept of delayed instructions count, performance loss estimations are calculated at the end of each control period for each core. In addition, a Kalman filtering based approach is employed to predict workload in the next control period for which voltage-frequency pairs must be selected. This selection is done with a novel dynamic voltage and frequency scaling (DVFS) algorithm whose objective is to reduce energy consumption but without degrading performance beyond the user set threshold. Using our customized Sniper based CMP system simulation framework, we demonstrate the effectiveness of the proposed algorithm for a variety of benchmarks for 16 core and 64 core network-on-chip based CMP architectures. Simulation results show consistent energy savings across the board. We present our work as an investigation of the tradeoff between the achievable energy reduction via DVFS when predictions are done using the effective Kalman filter for different performance penalty thresholds

    Utilizing Criticality Stacks for Dynamic Voltage and Frequency Scaling

    Get PDF
    Thread imbalance is inevitable for multithreaded applications due to the necessity of synchronization primitives to coordinate access to memory and system resources. This imbalance leads to a bounding of application performance, but, more importantly for mobile devices, this imbalance also leads to energy inefficiencies. Recent works have begun to quantify this imbalance and look to leverage it not only for performance improvements, but for energy savings as well. All these works, though, test the theory through the use of simulators and power estimation tools. These results may show that the theory is sound, but the complexities of how a real machine handles synchronization may lead to diminished results by either having too large of a performance impact, or too little energy savings. In this work, we implement one such algorithm, PCSLB, and improve upon it in order to see if the results shown for this technique are feasible for use in real machines. With the improved algorithm, PCSLB-Max, and the CritScale Linux kernel module, we show that, in fact, there are energy saving available to us while mitigating the performance

    GDP : using dataflow properties to accurately estimate interference-free performance at runtime

    Get PDF
    Multi-core memory systems commonly share resources between processors. Resource sharing improves utilization at the cost of increased inter-application interference which may lead to priority inversion, missed deadlines and unpredictable interactive performance. A key component to effectively manage multi-core resources is performance accounting which aims to accurately estimate interference-free application performance. Previously proposed accounting systems are either invasive or transparent. Invasive accounting systems can be accurate, but slow down latency-sensitive processes. Transparent accounting systems do not affect performance, but tend to provide less accurate performance estimates. We propose a novel class of performance accounting systems that achieve both performance-transparency and superior accuracy. We call the approach dataflow accounting, and the key idea is to track dynamic dataflow properties and use these to estimate interference-free performance. Our main contribution is Graph-based Dynamic Performance (GDP) accounting. GDP dynamically builds a dataflow graph of load requests and periods where the processor commits instructions. This graph concisely represents the relationship between memory loads and forward progress in program execution. More specifically, GDP estimates interference-free stall cycles by multiplying the critical path length of the dataflow graph with the estimated interference-free memory latency. GDP is very accurate with mean IPC estimation errors of 3.4% and 9.8% for our 4- and 8-core processors, respectively. When GDP is used in a cache partitioning policy, we observe average system throughput improvements of 11.9% and 20.8% compared to partitioning using the state-of-the-art Application Slowdown Model

    Adaptive Resource and Job Management for Limited Power Consumption

    Get PDF
    International audienceThe last decades have been characterized by anever growing requirement in terms of computing and storage resources.This tendency has recently put the pressure on the abilityto efficiently manage the power required to operate the hugeamount of electrical components associated with state-of-the-arthigh performance computing systems. The power consumption ofa supercomputer needs to be adjusted based on varying powerbudget or electricity availabilities. As a consequence, Resourceand Job Management Systems have to be adequately adaptedin order to efficiently schedule jobs with optimized performancewhile limiting power usage whenever needed.We introduce in this paper a new scheduling strategy thatcan adapt the executed workload to a limited power budget. Theoriginality of this approach relies upon a combination of speedscaling and node shutdown techniques for power reductions. It isimplemented into the widely used resource and job managementsystem SLURM. Finally, it is validated through large scale emulationsusing real production workload traces of the supercomputerCurie

    Utilizing Criticality Stacks for Dynamic Voltage and Frequency Scaling

    Get PDF
    Thread imbalance is inevitable for multithreaded applications due to the necessity of synchronization primitives to coordinate access to memory and system resources. This imbalance leads to a bounding of application performance, but, more importantly for mobile devices, this imbalance also leads to energy inefficiencies. Recent works have begun to quantify this imbalance and look to leverage it not only for performance improvements, but for energy savings as well. All these works, though, test the theory through the use of simulators and power estimation tools. These results may show that the theory is sound, but the complexities of how a real machine handles synchronization may lead to diminished results by either having too large of a performance impact, or too little energy savings. In this work, we implement one such algorithm, PCSLB, and improve upon it in order to see if the results shown for this technique are feasible for use in real machines. With the improved algorithm, PCSLB-Max, and the CritScale Linux kernel module, we show that, in fact, there are energy saving available to us while mitigating the performance

    Exploiting variability for energy optimization of parallel programs

    Full text link
    In this paper we present optimizations that use DVFS mechanisms to reduce the total energy usage in scientific applications. Our main insight is that noise is intrinsic to large scale parallel executions and it appears whenever shared resources are contended. The presence of noise allows us to identify and manipulate any program regions amenable to DVFS. When compared to previous energy optimizations that make per core decisions using predictions of the running time, our scheme uses a qualitative approach to recognize the signature of executions amenable to DVFS. By recognizing the "shape of variability" we can optimize codes with highly dynamic behavior, which pose challenges to all existing DVFS techniques. We validate our approach using offline and online analyses for one-sided and two-sided communication paradigms. We have applied our methods to NWChem, and we show best case improvements in energy use of 12% at no loss in performance when using online optimizations running on 720 Haswell cores with one-sided communication. With NWChem on MPI two-sided and offline analysis, capturing the initialization, we find energy savings of up to 20%, with less than 1% performance cost
    corecore