3,996 research outputs found

    EECluster: An Energy-Efficient Tool for managing HPC Clusters

    Get PDF
    High Performance Computing clusters have become a very important element in research, academic and industrial communities because they are an excellent platform for solving a wide range of problems through parallel and distributed applications. Nevertheless, this high performance comes at the price of consuming large amounts of energy, which combined with notably increasing electricity prices are having an important economical impact, driving up power and cooling costs and forcing IT companies to reduce operation costs. To reduce the high energy consumptions of HPC clusters we propose a tool, named EECluster, for managing the energy-efficient allocation of the cluster resources, that works with both OGE/SGE and PBS/TORQUE Resource Management Systems (RMS) and whose decision-making mechanism is tuned automatically in a machine learning approach. Experimental studies have been made using actual workloads from the Scientific Modelling Cluster at Oviedo University and the academic-cluster used by the Oviedo University for teaching high performance computing subjects to evaluate the results obtained with the adoption of this too

    Hierarchical Agent-based Adaptation for Self-Aware Embedded Computing Systems

    Get PDF
    Siirretty Doriast

    EECluster: An Energy-Efficient Tool for managing HPC Clusters

    Get PDF
    High Performance Computing clusters have become a very important element in research, academic and industrial communities because they are an excellent platform for solving a wide range of problems through parallel and distributed applications. Nevertheless, this high performance comes at the price of consuming large amounts of energy, which combined with notably increasing electricity prices are having an important economical impact, driving up power and cooling costs and forcing IT companies to reduce operation costs. To reduce the high energy consumptions of HPC clusters we propose a tool, named EECluster, for managing the energy-efficient allocation of the cluster resources, that works with both OGE/SGE and PBS/TORQUE Resource Management Systems (RMS) and whose decision-making mechanism is tuned automatically in a machine learning approach. Experimental studies have been made using actual workloads from the Scientific Modelling Cluster at Oviedo University and the academic-cluster used by the Oviedo University for teaching high performance computing subjects to evaluate the results obtained with the adoption of this tool

    Energy Concerns with HPC Systems and Applications

    Full text link
    For various reasons including those related to climate changes, {\em energy} has become a critical concern in all relevant activities and technical designs. For the specific case of computer activities, the problem is exacerbated with the emergence and pervasiveness of the so called {\em intelligent devices}. From the application side, we point out the special topic of {\em Artificial Intelligence}, who clearly needs an efficient computing support in order to succeed in its purpose of being a {\em ubiquitous assistant}. There are mainly two contexts where {\em energy} is one of the top priority concerns: {\em embedded computing} and {\em supercomputing}. For the former, power consumption is critical because the amount of energy that is available for the devices is limited. For the latter, the heat dissipated is a serious source of failure and the financial cost related to energy is likely to be a significant part of the maintenance budget. On a single computer, the problem is commonly considered through the electrical power consumption. This paper, written in the form of a survey, we depict the landscape of energy concerns in computer activities, both from the hardware and the software standpoints.Comment: 20 page

    Chapter One – An Overview of Architecture-Level Power- and Energy-Efficient Design Techniques

    Get PDF
    Power dissipation and energy consumption became the primary design constraint for almost all computer systems in the last 15 years. Both computer architects and circuit designers intent to reduce power and energy (without a performance degradation) at all design levels, as it is currently the main obstacle to continue with further scaling according to Moore's law. The aim of this survey is to provide a comprehensive overview of power- and energy-efficient “state-of-the-art” techniques. We classify techniques by component where they apply to, which is the most natural way from a designer point of view. We further divide the techniques by the component of power/energy they optimize (static or dynamic), covering in that way complete low-power design flow at the architectural level. At the end, we conclude that only a holistic approach that assumes optimizations at all design levels can lead to significant savings.Peer ReviewedPostprint (published version

    Energy-Efficient and Reliable Computing in Dark Silicon Era

    Get PDF
    Dark silicon denotes the phenomenon that, due to thermal and power constraints, the fraction of transistors that can operate at full frequency is decreasing in each technology generation. Moore’s law and Dennard scaling had been backed and coupled appropriately for five decades to bring commensurate exponential performance via single core and later muti-core design. However, recalculating Dennard scaling for recent small technology sizes shows that current ongoing multi-core growth is demanding exponential thermal design power to achieve linear performance increase. This process hits a power wall where raises the amount of dark or dim silicon on future multi/many-core chips more and more. Furthermore, from another perspective, by increasing the number of transistors on the area of a single chip and susceptibility to internal defects alongside aging phenomena, which also is exacerbated by high chip thermal density, monitoring and managing the chip reliability before and after its activation is becoming a necessity. The proposed approaches and experimental investigations in this thesis focus on two main tracks: 1) power awareness and 2) reliability awareness in dark silicon era, where later these two tracks will combine together. In the first track, the main goal is to increase the level of returns in terms of main important features in chip design, such as performance and throughput, while maximum power limit is honored. In fact, we show that by managing the power while having dark silicon, all the traditional benefits that could be achieved by proceeding in Moore’s law can be also achieved in the dark silicon era, however, with a lower amount. Via the track of reliability awareness in dark silicon era, we show that dark silicon can be considered as an opportunity to be exploited for different instances of benefits, namely life-time increase and online testing. We discuss how dark silicon can be exploited to guarantee the system lifetime to be above a certain target value and, furthermore, how dark silicon can be exploited to apply low cost non-intrusive online testing on the cores. After the demonstration of power and reliability awareness while having dark silicon, two approaches will be discussed as the case study where the power and reliability awareness are combined together. The first approach demonstrates how chip reliability can be used as a supplementary metric for power-reliability management. While the second approach provides a trade-off between workload performance and system reliability by simultaneously honoring the given power budget and target reliability

    3rd Many-core Applications Research Community (MARC) Symposium. (KIT Scientific Reports ; 7598)

    Get PDF
    This manuscript includes recent scientific work regarding the Intel Single Chip Cloud computer and describes approaches for novel approaches for programming and run-time organization
    corecore