4,882 research outputs found

    A software-hardware hybrid steering mechanism for clustered microarchitectures

    Get PDF
    Clustered microarchitectures provide a promising paradigm to solve or alleviate the problems of increasing microprocessor complexity and wire delays. High- performance out-of-order processors rely on hardware-only steering mechanisms to achieve balanced workload distribution among clusters. However, the additional steering logic results in a significant increase on complexity, which actually decreases the benefits of the clustered design. In this paper, we address this complexity issue and present a novel software-hardware hybrid steering mechanism for out-of-order processors. The proposed software- hardware cooperative scheme makes use of the concept of virtual clusters. Instructions are distributed to virtual clusters at compile time using static properties of the program such as data dependences. Then, at runtime, virtual clusters are mapped into physical clusters by considering workload information. Experiments using SPEC CPU2000 benchmarks show that our hybrid approach can achieve almost the same performance as a state-of-the-art hardware-only steering scheme, while requiring low hardware complexity. In addition, the proposed mechanism outperforms state-of-the-art software-only steering mechanisms by 5% and 10% on average for 2-cluster and 4-cluster machines, respectively.Peer ReviewedPostprint (published version

    Power, Energy, and Thermal Management for Clustered Manycores

    Get PDF
    Efficient and effective system-level power, energy, and thermal management are very important issues in modern computing systems, for which clustered architectures with multiple voltage islands are an expected compromise between global and per-core DVFS. In this dissertation, we focus on two of the most relevant problems for such architectures, specifically, optimizing performance under power/thermal constraints, and minimizing energy under performance constraints

    Energy Saving and Scavenging in Stand-alone and Large Scale Distributed Systems.

    Full text link
    This thesis focuses on energy management techniques for distributed systems such as hand-held mobile devices, sensor nodes, and data center servers. One of the major design problems in multiple application domains is the mismatch between workloads and resources. Sub-optimal assignment of workloads to resources can cause underloaded or overloaded resources, resulting in performance degradation or energy waste. This work specifically focuses on the heterogeneity in system hardware components and workloads. It includes energy management solutions for unregulated or batteryless embedded systems; and data center servers with heterogeneous workloads, machines, and processor wear states. This thesis describes four major contributions: (1) This thesis describes a battery test and energy delivery system design process to maintain battery life in embedded systems without voltage regulators. (2) In battery-less sensor nodes, this thesis demonstrates a routing protocol to maintain reliable transmission through the sensor network. (3) This thesis has characterized typical workloads and developed two models to capture the heterogeneity of data center tasks and machines: a task performance model and a machine resource utilization model. These models allow users to predict task finish time on individual machines. It then integrates these two models into a task scheduler based on the Hadoop framework for MapReduce tasks, and uses this scheduler for server energy minimization using task concentration. (4) In addition to saving server energy consumption, this thesis describes a method of reducing data center cooling energy by maintaining optimal server processor temperature setpoints through a task assignment algorithm. This algorithm considers the reliability impact of processor wear states. It records processor wear states through automatic timing slack tests on a cluster of machines with varying core temperatures, voltages, and frequencies. These optimal temperature setpoints are used in a task scheduling algorithm that saves both server and cooling energy.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116746/1/xjhe_1.pd

    Mapreduce and Heterogeneity: Power-Aware Bag-of-Tasks, Framework Parameter Sensitivity, and Dynamic Cluster Aware Framework Configuration

    Get PDF
    This dissertation presents the techniques for adaptation of MapReduce frameworks to incorporate heterogeneity-aware scheduling algorithms, an inspection of cluster configurations and how they impact these scheduling algorithms, an analysis regarding how the cluster configuration and the heterogeneity-aware scheduling can work together to minimize turnaround time and/or power consumption of the cluster when executing MapReduce applications, and how these lessons can be applied more broadly to Big Data infrastructure outside of MapReduce that supports multiple Big Data frameworks simultaneously. Heterogeneity exists in various capacities in any given cluster, from static (Physical and Platform) heterogeneity to dynamic heterogeneity (Transient Data, Transient Applications, and Irregular Hardware Behavior). Within the cluster there are historically several types of mitigation strategies for each of these types of heterogeneity, and each has their pros and cons. We discuss these mitigation strategies and the types of heterogeneity each of these strategies is able to address, and the history of the related work in the field. After this, we consider taking host-level metrics and using them to schedule tasks in real time, with a desire to address cluster-wide energy usage. To do this, we consider estimators for power consumption that are available on-chip, namely temperature. We establish a correlation between CPU temperature and power consumption, then derive a scheduling algorithm that eliminates nodes that are consuming too much power from the pool of schedule-able resources. In order to do this we focus on the ability of MapReduce frameworks, constructed as we have constructed the frameworks described in this thesis, to delay binding of tasks to specific workers. We analyze the impacts this has on turnaround time of a MapReduce application, with analysis around setting this threshold properly to reduce impact on turnaround time while shifting power consumption around in the cluster, away from nodes that are over-consuming. We also address concerns with respect to upgrading a cluster in stages, introducing more Physical Heterogeneity at various levels and the types of adjustments that need to be made to MapReduce configurations in order to combat the increased Heterogeneity. In particular, we look at the concerns for MapReduce platform mis-configuration and its impacts on turnaround time, analyzing the ways in which these types of errors can be mitigated between incremental platform upgrades. In an effort to address this, we introduce a Dynamic Heterogeneity Awareness (DHA) module to our MapReduce framework in order to address these upgrades, and allow better spreading of tasks by the framework, in order to further improve turnaround time and resource utilization. Finally we consider the implications for framework and application co-tenancy, and we describe the state of art in these areas. We focus on describing what co-tenancy is, why it\u27s important, and how the state of the art can be expanded to in order to leverage findings from this thesis to make these co-tenant clusters increase application and framework performance as well as improving these clusters with considerations for energy efficiency

    Thermal profiling of homogeneous multi-core processors using sensor mini-networks

    Get PDF
    With large-scale integration and high power density in current generation microprocessors, thermal management is becoming a critical component of system design. Specifically, accurate thermal monitoring using on-die sensors is vital for system reliability and recovery. Achieving an accurate thermal profile of a system with an optimal number of sensors is integral for thermal management. This work focuses on a sensor placement mechanism and an on-chip sensor mini-network to combine temperatures from multiple sensors to determine the full thermal profile of a chip. The sensor placement mechanism proposed in this work uses non-uniform subsampling of thermal maps with k-means clustering. Using this sensing technique with cubic interpolation, an 8-core architecture thermal map was successfully recovered with an average error improvement of 90% over sensor placement via basic k-means clustering. All the simulations were run using HotSpot 5.0 modeling Alpha 21364 processor as a baseline core. The sensor mini-network using both differential encoding and distributed source coding was analyzed on a 1024-core architecture. Distributed source coding compression required fewer transmissions than differential encoding and reduced the number of transmitted bits by 36% over a sensor mini-network with no compression

    Virtual cluster scheduling through the scheduling graph

    Get PDF
    This paper presents an instruction scheduling and cluster assignment approach for clustered processors. The proposed technique makes use of a novel representation named the scheduling graph which describes all possible schedules. A powerful deduction process is applied to this graph, reducing at each step the set of possible schedules. In contrast to traditional list scheduling techniques, the proposed scheme tries to establish relations among instructions rather than assigning each instruction to a particular cycle. The main advantage is that wrong or poor schedules can be anticipated and discarded earlier. In addition, cluster assignment of instructions is performed using another novel concept called virtual clusters, which define sets of instructions that must execute in the same cluster. These clusters are managed during the deduction process to identify incompatibilities among instructions. The mapping of virtual to physical clusters is postponed until the scheduling of the instructions has finalized. The advantages this novel approach features include: (1) accurate scheduling information when assigning, and, (2) accurate information of the cluster assignment constraints imposed by scheduling decisions. We have implemented and evaluated the proposed scheme with superblocks extracted from Speclnt95 and MediaBench. The results show that this approach produces better schedules than the previous state-of-the-art. Speed-ups are up to 15%, with average speed-ups ranging from 2.5% (2-Clusters) to 9.5% (4-Clusters).Peer ReviewedPostprint (published version

    ENERGY-AWARE OPTIMIZATION FOR EMBEDDED SYSTEMS WITH CHIP MULTIPROCESSOR AND PHASE-CHANGE MEMORY

    Get PDF
    Over the last two decades, functions of the embedded systems have evolved from simple real-time control and monitoring to more complicated services. Embedded systems equipped with powerful chips can provide the performance that computationally demanding information processing applications need. However, due to the power issue, the easy way to gain increasing performance by scaling up chip frequencies is no longer feasible. Recently, low-power architecture designs have been the main trend in embedded system designs. In this dissertation, we present our approaches to attack the energy-related issues in embedded system designs, such as thermal issues in the 3D chip multiprocessor (CMP), the endurance issue in the phase-change memory(PCM), the battery issue in the embedded system designs, the impact of inaccurate information in embedded system, and the cloud computing to move the workload to remote cloud computing facilities. We propose a real-time constrained task scheduling method to reduce peak temperature on a 3D CMP, including an online 3D CMP temperature prediction model and a set of algorithm for scheduling tasks to different cores in order to minimize the peak temperature on chip. To address the challenging issues in applying PCM in embedded systems, we propose a PCM main memory optimization mechanism through the utilization of the scratch pad memory (SPM). Furthermore, we propose an MLC/SLC configuration optimization algorithm to enhance the efficiency of the hybrid DRAM + PCM memory. We also propose an energy-aware task scheduling algorithm for parallel computing in mobile systems powered by batteries. When scheduling tasks in embedded systems, we make the scheduling decisions based on information, such as estimated execution time of tasks. Therefore, we design an evaluation method for impacts of inaccurate information on the resource allocation in embedded systems. Finally, in order to move workload from embedded systems to remote cloud computing facility, we present a resource optimization mechanism in heterogeneous federated multi-cloud systems. And we also propose two online dynamic algorithms for resource allocation and task scheduling. We consider the resource contention in the task scheduling
    • …
    corecore