6 research outputs found

    queue: Customized large-scale clock frequency scaling

    Get PDF
    Abstract-We examine the scalability of a set of techniques related to Dynamic Voltage-Frequency Scaling (DVFS) on HPC systems to reduce the energy consumption of scientific applications through an application-aware analysis and runtime framework, Green Queue. Green Queue supports making CPU clock frequency changes in response to intra-node and internode observations about application behavior. Our intra-node approach reduces CPU clock frequencies and therefore power consumption while CPUs lacks computational work due to inefficient data movement. Our inter-node approach reduces clock frequencies for MPI ranks that lack computational work. We investigate these techniques on a set of large scientific applications on 1024 cores of Gordon, an Intel Sandybridgebased supercomputer at the San Diego Supercomputer Center. Our optimal intra-node technique showed an average measured energy savings of 10.6% and a maximum of 21.0% over regular application runs. Our optimal inter-node technique showed an average 17.4% and a maximum of 31.7% energy savings

    Enabling fair pricing on HPC systems with node sharing

    Get PDF
    Abstract not provide

    Selective Core Boosting: The Return of the Turbo Button

    Get PDF
    Several modern multi-core architectures support the dynamic control of the CPU's clock rate, allowing processor cores to temporarily operate at speeds exceeding the operational base frequency. Conversely, cores can operate at a lower speed or be disabled altogether to save power. Such facilities are notably provided by Intel's Turbo Boost and AMD's Turbo CORE technologies. Frequency control is typically driven by the operating system which requests changes to the performance state of the processor based on the current load of the system. In this paper, we investigate the use of dynamic frequency scaling from user space to speed up multi-threaded applications that must occasionally execute time-critical tasks or to solve problems that have heterogeneous computing requirements. We propose a general-purpose library that allows selective control of the frequency of the cores - subject to the limitations of the target architecture. We analyze the performance trade-offs and illustrate its benefits using several benchmarks and real-world workloads when temporarily boosting selected cores executing time-critical operations. While our study primarily focuses on AMD's architecture, we also provide a comparative evaluation of the features, limitations, and runtime overheads of both Turbo Boost and Turbo CORE technologies. Our results show that we can successful exploit these new hardware facilities to accelerate the execution of key sections of code (critical paths) improving overall performance of some multi-threaded applications. Unlike prior research, we focus on performance instead of power conservation. Our results further can give guidelines for the design of hardware power management facilities and the operating system interfaces to those facilities

    Application aware performance, power consumption, and reliability tradeoff

    Get PDF
    There has been an unprecedented increase in the drive for microprocessor performance. This drive is motivated by the increase in software complexity, opportunity to solve previously unattempted problems especially in scientific domain, and a need to crunch the ever growing `Big Data\u27 to enable a multitude of technological advances to benefit mankind. A consequence of these phenomena is the ever increasing transistor count in deployed computing systems. Although technology scaling leads to lower power consumption per transistor, the overall system level power consumption is on the rise. This leads to a variety of power supply related issues. As the chip die area is not increasing significantly, and the supply voltage reduction is not keeping on par with the reduction in device dimensions, an increase in power density is observed. This manifests as an increased temperature profile on the chip floorplan. A rise in temperature necessitates aggressive and costly cooling mechanisms adding to the design complexity and manufacturing efforts. It also triggers various failure mechanisms leading to reduction in the expected chip lifetime/reliability. Given the conflicting trends in Performance, Power consumption, and chip Reliability (PPR), it is imperative to balance them in a fine-grained fashion to meet system level goals and expectations. Sole dependence on the advancements in manufacturing technology is no longer sufficient. Alternate venues for PPR management are being increasingly paid attention to. On the other hand, the PPR demands are usually time dependent. For example, the constraint on power consumption in a green data center is dictated by the energy reserve. The demand on performance in a cloud based platform depends on the agreed Quality of Service (QOS) requirements. The reliability of a microprocessor is dependent on the deployment domain. The goal of our research is to address the issue of growing microprocessor power consumption subject to performance and/or reliability goals. Through our developed schemes, we tailor the execution context to match application requirements. This leads to judicious use of power while adhering to aforementioned constraints. It is to be noted that the actual demands on performance, power consumption, and reliability are highly variant, and depend upon executing applications and operating conditions. As such, we develop schemes to cater to these variant demands. To meet these demands efficiently, the solutions developed are tailored to current hardware-software interaction characteristics. Two techniques that are very relevant in this area, namely dynamic voltage and frequency scaling (DVFS) and microarchitectural adaptation, are leveraged to produce expected PPR characteristics when executing a wide variety of tasks. In this dissertation, we demonstrate how the expected chip lifetime can be augmented in a real-time setting using DVFS while paying heed to performance constraints modeled as QoS requirements. Individual tasks in a task queue are assigned specific voltage and frequency pairs to utilize for their execution. This assignment is empowered by knowledge of application-wise hardware-software interactions to reach solutions that are tailored to the current execution scenario. Our observations indicate that a 2 to 18 fold improvement in chip lifetime can be expected by the utilization of the schemes we develop in this regard. Capitalizing on the power of microarchitectural adaptation, we further improve chip lifetime expectations 2-8 times, based on the failure mechanism investigated. This increase in expected chip lifetime directly translates to reduction of both operational and replacement costs. We also provide mechanisms to co-manage performance and power consumption constraints. Comprehensive microarchitectural adaptation space is very complex and its usage thus leads to significant runtime overhead. To tackle this, we devote a fair bit of attention to its pruning so as to narrow down on and utilize only the most effective adaptations. A two stage adaptation process is provided to a) improve optimality of the solutions delivered, and b) to keep the runtime overhead in check. We observe that our schemes provide 20\% higher normalized energy efficiency compared to the state of the art techniques proposed, while using just a very small fraction of the configuration space. We also find that our schemes effectively cater to a wide variety of demands on performance and power consumption, providing the necessary hardware characteristics within 10\% bound. Since only the most useful configuration space is retained for adaptation, occurrence of a fault that prohibits the usage of a certain adaptive control can lead to the inability to satisfy a subset of hardware demands. A detailed analysis has been carried out to understand how the remaining active configurations can preserve the expected hardware behavior. To a good extent, we observe that the system behavior under a failure closely tracks (with less than 5\% tracking error) the obtainable behavior without the presence of the fault. We believe that application tailored schemes for PPR management become increasingly relevant as the microprocessor design advancements saturate in the future. They will be extremely relevant to extract every possible ounce of performance while confirming to constraints on power consumption and reliability. Given the effectiveness of our schemes, we are confident that such schemes are applicable in different markets like embedded computing, desktop computing, cloud platforms and high performance computing. Insights drawn from our research will guide chip designers in the provision of effective adaptive controls to combat increasing demands on PPR characteristics

    XSEDE: eXtreme Science and Engineering Discovery Environment Third Quarter 2012 Report

    Get PDF
    The Extreme Science and Engineering Discovery Environment (XSEDE) is the most advanced, powerful, and robust collection of integrated digital resources and services in the world. It is an integrated cyberinfrastructure ecosystem with singular interfaces for allocations, support, and other key services that researchers can use to interactively share computing resources, data, and expertise.This a report of project activities and highlights from the third quarter of 2012.National Science Foundation, OCI-105357
    corecore