374 research outputs found

    Energy Efficient Task Mapping and Resource Management on Multi-core Architectures

    Get PDF
    Reducing energy consumption of parallel applications executing on chip multi- processors (CMPs) is important for green computing. Hardware vendors have been developing a variety of system features to support energy efficient computing, for example, integrating asymmetric core types on a single chip referred to as static asymmetry and supporting dynamic voltage and frequency scaling (DVFS) referred to as dynamic asymmetry.A common parallelization scheme to exploit CMPs is task parallelism, which can express a wide range of computations in the form of task directed acyclic graphs (DAGs). Existing studies that target energy efficient task scheduling have demonstrated the benefits of leveraging DVFS, particularly per-core DVFS. Their scheduling decisions are mainly based on heuristics, such as task criticality, task dependencies and workload sizes. To enable energy efficient task scheduling, we identify multiple crucial factors that influence energy consumption - varying task characteristics, exploitation of intra-task parallelism (task moldability), and task granularity - which we collectively refer to as task heterogeneity. Task heterogeneity and architecture asymmetry features together complicate the task scheduling problem, since the most energy efficient configuration of resource allocation and frequency setting varies with each task. Our analysis shows that leveraging task heterogeneity in conjunction with static and dynamic asymmetry provides significant opportunities for energy reduction.This thesis contributes two scheduling techniques - ERASE and STEER - that target different scenarios. ERASE focuses on fine-grained tasking and in environments where DVFS is not under user control. It leverages the insights of task characteristics, task moldability, and instantaneous task parallelism detection for guiding scheduling decisions. ERASE comprises four modules: online performance modeling, power profiling, core activity tracing and a task scheduler. Online performance modeling and power profiling provide runtime with execution time and power predictions. Core activity tracing offers the instantaneous task parallelism and the task scheduler combines these information to enable the energy predictions and dynamically determine the best resource allocation for each task during runtime. STEER focuses on environments where DVFS is under user control and where the platform comprises multiple asymmetric cores grouped into clusters. STEER explores how much energy could be potentially saved by leveraging static asymmetry, dynamic asymmetry and task heterogeneity in conjunction. STEER comprises two predictive models for performance and power predictions, and a task scheduler that utilizes models for energy predictions and then identifies the best resource allocation and frequency settings for tasks. Moreover, it applies adaptive scheduling techniques based on task granularity to manage DVFS overheads, and coordinates the cluster frequency settings to reduce interference from concurrent running tasks on cluster-based architectures.The evaluation on an NVIDIA Jetson TX2 shows that ERASE achieves 10% energy savings on average compared to the state-of-the-art DVFS-based schedulers and can adapt to external DVFS changes, and STEER consumes 38% less energy on average than both the state-of-the-art and ERASE

    Hipster: hybrid task manager for latency-critical cloud workloads

    Get PDF
    In 2013, U. S. data centers accounted for 2.2% of the country's total electricity consumption, a figure that is projected to increase rapidly over the next decade. Many important workloads are interactive, and they demand strict levels of quality-of-service (QoS) to meet user expectations, making it challenging to reduce power consumption due to increasing performance demands. This paper introduces Hipster, a technique that combines heuristics and reinforcement learning to manage latency-critical workloads. Hipster's goal is to improve resource efficiency in data centers while respecting the QoS of the latency-critical workloads. Hipster achieves its goal by exploring heterogeneous multi-cores and dynamic voltage and frequency scaling (DVFS). To improve data center utilization and make best usage of the available resources, Hipster can dynamically assign remaining cores to batch workloads without violating the QoS constraints for the latency-critical workloads. We perform experiments using a 64-bit ARM big.LITTLE platform, and show that, compared to prior work, Hipster improves the QoS guarantee for Web-Search from 80% to 96%, and for Memcached from 92% to 99%, while reducing the energy consumption by up to 18%.Peer ReviewedPostprint (author's final draft

    Modeling DVFS and Power-Gating Actuators for Cycle-Accurate NoC-Based Simulators

    Get PDF
    Networks-on-chip (NoCs) are a widely recognized viable interconnection paradigm to support the multi-core revolution. One of the major design issues of multicore architectures is still the power, which can no longer be considered mainly due to the cores, since the NoC contribution to the overall energy budget is relevant. To face both static and dynamic power while balancing NoC performance, different actuators have been exploited in literature, mainly dynamic voltage frequency scaling (DVFS) and power gating. Typically, simulation-based tools are employed to explore the huge design space by adopting simplified models of the components. As a consequence, the majority of state-of-the-art on NoC power-performance optimization do not accurately consider timing and power overheads of actuators, or (even worse) do not consider them at all, with the risk of overestimating the benefits of the proposed methodologies. This article presents a simulation framework for power-performance analysis of multicore architectures with specific focus on the NoC. It integrates accurate power gating and DVFS models encompassing also their timing and power overheads. The value added of our proposal is manyfold: (i) DVFS and power gating actuators are modeled starting from SPICE-level simulations; (ii) such models have been integrated in the simulation environment; (iii) policy analysis support is plugged into the framework to enable assessment of different policies; (iv) a flexible GALS (globally asynchronous locally synchronous) support is provided, covering both handshake and FIFO re-synchronization schemas. To demonstrate both the flexibility and extensibility of our proposal, two simple policies exploiting the modeled actuators are discussed in the article

    FlexClock: Generic Clock Reconfiguration for Low-end IoT Devices

    Full text link
    Clock configuration within constrained general-purpose microcontrollers takes a key role in tuning performance, power consumption, and timing accuracy of applications in the Internet of Things (IoT). Subsystems governing the underlying clock tree must nonetheless cope with a huge parameter space, complex dependencies, and dynamic constraints. Manufacturers expose the underlying functions in very diverse ways, which leads to specialized implementations of low portability. In this paper, we propose FlexClock, an approach for generic online clock reconfiguration on constrained IoT devices. We argue that (costly) generic clock configuration of general purpose computers and powerful mobile devices need to slim down to the lower end of the device spectrum. In search of a generalized solution, we identify recurring patterns and building blocks, which we use to decompose clock trees into independent, reusable components. With this segmentation we derive an abstract representation of vendor-specific clock trees, which then can be dynamically reconfigured at runtime. We evaluate our implementation on common hardware. Our measurements demonstrate how FlexClock significantly improves peak power consumption and energy efficiency by enabling dynamic voltage and frequency scaling (DVFS) in a platform-agnostic way

    Simulation of Efficient Real-Time Scheduling and Power Optimisation

    Get PDF
    International audienceSophisticated applications turn out to be executed upon more than one CPU for practical and economic reasons. Due to advances in circuit technology and performance limitation, multi-core technology has become the mainstream in CPU designs. However, the most serious limitation of these devices is the battery lifetime since battery technology is not keeping up with the rest of the power-hungry processors and peripherals used in today's mobile devices. As a solution, many investigations have turned toward the algorithms of power management combined with some scheduling policies. They can make significant energy saving while preserving the temporal constraints of these embedded systems. Reducing energy, especially, affect not only the battery lifetime, but also aim to reduce the heat generated by real-time embedded controller in various products or even to decrease the conditions of cooling and the costs, in the large scale, of giant multiprocessor computers. To assess the behavior and performance of the strategy of scheduling a flexible multiprocessor scheduling simulation and evaluation platform is needed. This paper puts forth the claim that the STORM simulator improves application quality both in terms of execution time and energy consumption for a high performance mobile computing embedded system design

    Economic impact of energy saving techniques in cloud server

    Get PDF
    In recent years, lot of research has been carried in the field of cloud computing and distributed systems to investigate and understand their performance. Economic impact of energy consumption is of major concern for major companies. Cloud Computing companies (Google, Yahoo, Gaikai, ONLIVE, Amazon and eBay) use large data centers which are comprised of virtual computers that are placed globally and require a lot of power cost to maintain. Demand for energy consumption is increasing day by day in IT firms. Therefore, Cloud Computing companies face challenges towards the economic impact in terms of power costs. Energy consumption is dependent upon several factors, e.g., service level agreement, virtual machine selection techniques, optimization policies, workload types etc. We address a solution for the energy saving problem by enabling dynamic voltage and frequency scaling technique for gaming data centers. The dynamic voltage and frequency scaling technique is compared against non-power aware and static threshold detection techniques. This helps service providers to meet the quality of service and quality of experience constraints by meeting service level agreements. The CloudSim platform is used for implementation of the scenario in which game traces are used as a workload for testing the technique. Selection of better techniques can help gaming servers to save energy cost and maintain a better quality of service for users placed globally. The novelty of the work provides an opportunity to investigate which technique behaves better, i.e., dynamic, static or non-power aware. The results demonstrate that less energy is consumed by implementing a dynamic voltage and frequency approach in comparison with static threshold consolidation or non-power aware technique. Therefore, more economical quality of services could be provided to the end users

    Low-power high-efficiency video decoding using general purpose processors

    Get PDF
    In this article, we investigate how code optimization techniques and low-power states of general-purpose processors improve the power efficiency of HEVC decoding. The power and performance efficiency of the use of SIMD instructions, multicore architectures, and low-power active and idle states are analyzed in detail for offline video decoding. In addition, the power efficiency of techniques such as “race to idle” and “exploiting slack” with DVFS are evaluated for real-time video decoding. Results show that “exploiting slack” is more power efficient than “race to idle” for all evaluated platforms representing smartphone, tablet, laptop, and desktop computing systems
    corecore