35 research outputs found
Thermally-aware composite run-time CPU power models
Accurate and stable CPU power modelling is fundamental in modern system-on-chips (SoCs) for two main reasons: 1) they enable significant online energy savings by providing a run-time manager with reliable power consumption data for controlling CPU energy-saving techniques; 2) they can be used as accurate and trusted reference models for system design and exploration. We begin by showing the limitations in typical performance monitoring counter (PMC) based power modelling approaches and illustrate how an improved model formulation results in a more stable model that efficiently captures relationships between the input variables and the power consumption. Using this as a solid foundation, we present a methodology for adding thermal-awareness and analytically decomposing the power into its constituting parts. We develop and validate our methodology using data recorded from a quad-core ARM Cortex-A15 mobile CPU and we achieve an average prediction error of 3.7% across 39 diverse workloads, 8 Dynamic Voltage-Frequency Scaling (DVFS) levels and with a CPU temperature ranging from 31 degrees C to 91 degrees C. Moreover, we measure the effect of switching cores offline and decompose the existing power model to estimate the static power of each CPU and L2 cache, the dynamic power due to constant background (BG) switching, and the dynamic power caused by the activity of each CPU individually. Finally, we provide our model equations and software tools for implementing in a run-time manager or for using with an architectural simulator, such as gem5
Selective Core Boosting: The Return of the Turbo Button
Several modern multi-core architectures support the dynamic control of the CPU's clock rate, allowing processor cores to temporarily operate at speeds exceeding the operational base frequency. Conversely, cores can operate at a lower speed or be disabled altogether to save power. Such facilities are notably provided by Intel's Turbo Boost and AMD's Turbo CORE technologies. Frequency control is typically driven by the operating system which requests changes to the performance state of the processor based on the current load of the system.
In this paper, we investigate the use of dynamic frequency scaling from user space to speed up multi-threaded applications that must occasionally execute time-critical tasks or to solve problems that have heterogeneous computing requirements. We propose a general-purpose library that allows selective control of the frequency of the cores - subject to the limitations of the target architecture. We analyze the performance trade-offs and illustrate its benefits using several benchmarks and real-world workloads when temporarily boosting selected cores executing time-critical operations. While our study primarily focuses on AMD's architecture, we also provide a comparative evaluation of the features, limitations, and runtime overheads of both Turbo Boost and Turbo CORE technologies. Our results show that we can successful exploit these new hardware facilities to
accelerate the execution of key sections of code (critical paths) improving overall performance of some multi-threaded applications. Unlike prior research, we focus on performance instead of power conservation. Our results further can give guidelines for the design of hardware power management facilities and the operating system interfaces to those facilities
Receptor Density-Dependent Motility of Influenza Virus Particles on Surface Gradients
Influenza viruses can move across the surface of host cells while interacting with their glycocalyx. This motility may assist in finding or forming locations for cell entry and thereby promote cellular uptake. Because the binding to and cleavage of cell surface receptors forms the driving force for the process, the surface-bound motility of influenza is expected to be dependent on the receptor density. Surface gradients with gradually varying receptor densities are thus a valuable tool to study binding and motility processes of influenza and can function as a mimic for local receptor density variations at the glycocalyx that may steer the directionality of a virus particle in finding the proper site of uptake. We have tracked individual influenza virus particles moving over surfaces with receptor density gradients. We analyzed the extracted virus tracks first at a general level to verify neuraminidase activity and subsequently with increasing detail to quantify the receptor density-dependent behavior on the level of individual virus particles. While a directional bias was not observed, most likely due to limitations of the steepness of the surface gradient, the surface mobility and the probability of sticking were found to be significantly dependent on receptor density. A combination of high surface mobility and high dissociation probability of influenza was observed at low receptor densities, while the opposite occurred at higher receptor densities. These properties result in an effective mechanism for finding high-receptor density patches, which are believed to be a key feature of potential locations for cell entry.</p
Receptor Density-Dependent Motility of Influenza Virus Particles on Surface Gradients
Influenza viruses can move across the surface of host cells while interacting with their glycocalyx. This motility may assist in finding or forming locations for cell entry and thereby promote cellular uptake. Because the binding to and cleavage of cell surface receptors forms the driving force for the process, the surface-bound motility of influenza is expected to be dependent on the receptor density. Surface gradients with gradually varying receptor densities are thus a valuable tool to study binding and motility processes of influenza and can function as a mimic for local receptor density variations at the glycocalyx that may steer the directionality of a virus particle in finding the proper site of uptake. We have tracked individual influenza virus particles moving over surfaces with receptor density gradients. We analyzed the extracted virus tracks first at a general level to verify neuraminidase activity and subsequently with increasing detail to quantify the receptor density-dependent behavior on the level of individual virus particles. While a directional bias was not observed, most likely due to limitations of the steepness of the surface gradient, the surface mobility and the probability of sticking were found to be significantly dependent on receptor density. A combination of high surface mobility and high dissociation probability of influenza was observed at low receptor densities, while the opposite occurred at higher receptor densities. These properties result in an effective mechanism for finding high-receptor density patches, which are believed to be a key feature of potential locations for cell entry
Receptor Density-Dependent Motility of Influenza Virus Particles on Surface Gradients
Influenza viruses can move across the surface of host cells while interacting with their glycocalyx. This motility may assist in finding or forming locations for cell entry and thereby promote cellular uptake. Because the binding to and cleavage of cell surface receptors forms the driving force for the process, the surface-bound motility of influenza is expected to be dependent on the receptor density. Surface gradients with gradually varying receptor densities are thus a valuable tool to study binding and motility processes of influenza and can function as a mimic for local receptor density variations at the glycocalyx that may steer the directionality of a virus particle in finding the proper site of uptake. We have tracked individual influenza virus particles moving over surfaces with receptor density gradients. We analyzed the extracted virus tracks first at a general level to verify neuraminidase activity and subsequently with increasing detail to quantify the receptor density-dependent behavior on the level of individual virus particles. While a directional bias was not observed, most likely due to limitations of the steepness of the surface gradient, the surface mobility and the probability of sticking were found to be significantly dependent on receptor density. A combination of high surface mobility and high dissociation probability of influenza was observed at low receptor densities, while the opposite occurred at higher receptor densities. These properties result in an effective mechanism for finding high-receptor density patches, which are believed to be a key feature of potential locations for cell entry
Interaction of Hardware Transactional Memory and Microprocessor Microarchitecture
Microprocessors have experienced a significant stall in single-thread performance since about 2004. Instead of significant annual performance improvements for a single core, it is easier to increase performance by providing multiple, independent cores that the application programmer has to coordinate. Exposing concurrency to the applications requires mechanisms to control it. Hardware Transactional Memory (HTM) is an abstraction that provides optimistic, fine-grained concurrency control with a simple application interface, and has received significant research attentions fro 2004 - 2010, with initial publications in the mid-90s.
The central thesis of my work is that detailed analysis and ISA modelling of HTM is necessary to understand actual implementation and usage challenges, and get more realistic results. Instead of overly complicating the design of HTM with features that would be extremely hard to implement right in a more detailed microarchitecture and ISA proposal, I suggest that getting a base-line HTM specification and micro-architecture right is a challenge in itself. Yet, despite the complexity, there are interesting implementation options and extensions that can provide benefits to applications using HTM–but they are not on the trajectory taken by most papers published between 2004 and 2010
Interaction of Hardware Transactional Memory and Microprocessor Microarchitecture
Microprocessors have experienced a significant stall in single-thread performance since about 2004. Instead of significant annual performance improvements for a single core, it is easier to increase performance by providing multiple, independent cores that the application programmer has to coordinate. Exposing concurrency to the applications requires mechanisms to control it. Hardware Transactional Memory (HTM) is an abstraction that provides optimistic, fine-grained concurrency control with a simple application interface, and has received significant research attentions fro 2004 - 2010, with initial publications in the mid-90s.
The central thesis of my work is that detailed analysis and ISA modelling of HTM is necessary to understand actual implementation and usage challenges, and get more realistic results. Instead of overly complicating the design of HTM with features that would be extremely hard to implement right in a more detailed microarchitecture and ISA proposal, I suggest that getting a base-line HTM specification and micro-architecture right is a challenge in itself. Yet, despite the complexity, there are interesting implementation options and extensions that can provide benefits to applications using HTM–but they are not on the trajectory taken by most papers published between 2004 and 2010
Accurate and stable empirical CPU power modelling for multi- and many-core systems
Modern processors must provide an increasing level of performance, and are therefore including higher numbers of Heterogeneous Multi-Processing (HMP) elements. Intelligent run-time control of performance and power consumption is required to extend battery-life in mobile systems, reduce energy and cooling costs in data centres, and increase peak performance while respecting thermal and power constraints. Accurate online power estimation is essential in guiding run-time power management mechanisms and energy-aware scheduling decisions. We present a statistically-rigorous methodology for developing accurate and stable run-time power models and we experimentally demonstrate their ability to perform more accurately across a wider range of workloads. We highlight significant shortcomings in existing techniques and present an improved model formulation that also accounts for thermal effects. Moreover, we present the Powmon software tools that automates our methodology, allowing power models to be developed for other platforms.Accurate performance and power modelling is also essential in full-system simulation. We present the GemStone open-source software tool, which automates the process of characterising hardware platforms; identifying sources of error in gem5 performance models using machine learning techniques; applying the empirical power models to simulation data; and quantifying the effect of simulation errors on the performance, power and energy estimations, including their scaling across Dynamic Voltage-Frequency Scaling (DVFS) levels and HMP core types.The presented work enables the development and implementation of smart run-time power management and energy-aware scheduling algorithms, as well as hardware-validated performance, power and energy simulation for design-space exploration and optimisation of future systems