9 research outputs found

    Cooperative Power Management for Chip Multiprocessors using Space-Shared Scheduling

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2015. 8. Bernhard Egger.์ตœ๊ทผ Cloud Computing ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•˜๋Š” ๋ฐ์ดํ„ฐ์„ผํ„ฐ ๋“ฑ์—์„œ๋Š” Many-core chip์ด ๊ธฐ์กด Multi-core๋ฅผ ๋Œ€์ฒดํ•˜์—ฌ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ์œผ๋ฉฐ Operating System๋„ Many-core ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ Space-sharing ๋ฐฉ์‹์œผ๋กœ ์„ค๊ณ„๊ฐ€ ๋ณ€๊ฒฝ๋˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ถ”์„ธ์†์—์„œ ๊ธฐ์กด์˜ ์ „ํ†ต์ ์ธ DVFS ๋ฐฉ์‹์„ ์ด์šฉํ•ด์„œ๋Š” Many-core ํ™˜๊ฒฝ์—์„œ ํšจ์œจ์ ์ธ ์ „๋ ฅ ์‚ฌ์šฉ์ด ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์— ์ถ”๊ฐ€์ ์ธ ์ „๋ ฅ ๊ด€๋ฆฌ ๋ฐฉ๋ฒ•๊ณผ Many-core์˜ ํŠน์„ฑ์„ ๊ณ ๋ คํ•œ Core ์žฌ๋ฐฐ์น˜ ๊ธฐ์ˆ ์ด ํ•„์š”ํ•˜๋‹ค. Space-shared OS๋Š” Core์™€ ๋ฌผ๋ฆฌ์ ์ธ ๋ฉ”๋ชจ๋ฆฌ์˜ ๊ตฌ์„ฑ์— ๋Œ€ํ•œ ์ž์› ๊ด€๋ฆฌ๋ฅผ ํ•˜๋Š”๋ฐ, ์ตœ๊ทผ์˜ Chip multiprocessor (CMP) ๋“ค์€ ๊ฐ๊ฐ์˜ Core์—์„œ ๋…๋ฆฝ์ ์œผ๋กœ DVFS๋ฅผ ๋™์ž‘ํ•˜๋„๋ก ํ•˜์ง€ ์•Š๊ณ  ๋ช‡๊ฐœ์˜ Core๋“ค์„ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ Voltage ๋˜๋Š” Frequency๋ฅผ ํ•จ๊ป˜ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ๋„๋ก ์ง€์›ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ ๋ฉ”๋ชจ๋ฆฌ ๋˜ํ•œ Coarse-grained ๋ฐฉ์‹์œผ๋กœ ๋…๋ฆฝ๋œ ํŒŒํ‹ฐ์…˜์œผ๋กœ ํ• ๋‹น ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๊ด€๋ฆฌ๋œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์ด๋Ÿฌํ•œ CMP์˜ ํŠน์„ฑ์„ ๊ณ ๋ คํ•˜์—ฌ Core ์žฌ๋ฐฐ์น˜์™€ DVFS ๊ธฐ์ˆ ์„ ์ด์šฉํ•œ ๊ณ„์ธต์  ์ „๋ ฅ ๊ด€๋ฆฌ ์‹œ์Šคํ…œ์„ ์—ฐ๊ตฌํ•˜๋Š”๋ฐ ๋ชฉํ‘œ๊ฐ€ ์žˆ๋‹ค. ํŠนํžˆ Core ์žฌ๋ฐฐ์น˜ ๊ธฐ์ˆ ์€ Core์˜ ์œ„์น˜์— ๋”ฐ๋ฅธ Data ์„ฑ๋Šฅ๋„ ํ•จ๊ป˜ ๊ณ ๋ คํ•˜๊ณ  ์žˆ๋‹ค. ์ด์— ์ถ”๊ฐ€๋กœ DVFS ์„ฑ๋Šฅ ์†์‹ค์„ ๊ณ ๋ คํ•œ ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ ์ƒ์Šน๊ณผ Core ์žฌ๋ฐฐ์น˜์‹œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ํšจ๊ณผ๋ฅผ ๋ฏธ๋ฆฌ ๊ณ„์‚ฐํ•˜์—ฌ ์ตœ์†Œํ•œ์˜ ์„ฑ๋Šฅ์ €ํ•˜๋กœ ๋” ์ข‹์€ ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ์„ ์–ป์„ ์ˆ˜ ์žˆ๋„๋ก ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ๋˜ํ•œ ์‹ค์ œ ๊ตฌํ˜„ ๋ฐ ์‹คํ—˜์€ Intel์—์„œ ์ถœ์‹œํ•œ Single-chip Cloud Computer (SCC)์—์„œ ์ง„ํ–‰ํ•˜์˜€์œผ๋ฉฐ ์‹œ๋‚˜๋ฆฌ์˜ค๋ณ„๋กœ 1-2%์˜ ์„ฑ๋Šฅ ์†์‹ค๋กœ Performance per watt ratio๊ฐ€ 27-32% ํ–ฅ์ƒ๋˜์—ˆ๋‹ค. ๋˜ํ•œ Migration ํšจ๊ณผ์™€ Data ์ง€์—ญ์„ฑ ๋“ฑ์„ ๊ณ ๋ คํ•˜์ง€ ์•Š์•˜๋˜ ๊ธฐ์กด ์—ฐ๊ตฌ๋ณด๋‹ค ์„ฑ๋Šฅ์ด 5-11% ์ข‹์•„์กŒ๋‹ค.Nowadays, many-core chips are especially attractive for data center operators to provide cloud computing service models. The trend in operating system designs, furthermore, is changing from traditional time-sharing to space-shared approaches to support recent many-core architectures. These CPU and OS changes make power and thermal constraints becoming one of most important design issues. Additional power management methods and core re-allocation techniques are necessary to overcome the limitations of traditional dynamic voltage and frequency scaling (DVFS). In this thesis, we present a cooperative hierarchical power management for many-core systems running a space-shared operating system. We consider two levels of space-shared system resources: space in the form of cores and physical memory. Recent chip multiprocessors (CMPs) provide group-level DVFS in which the voltage/frequency of cores is managed at the level of several cores instead of every single core. Memory is also allocated by a coarse-grained resource manager to isolate space partitions. Our research reflects these characteristics of CMPs. We show how to integrate core re-allocation and DVFS techniques through cooperative hierarchical power management. The core re-allocation technique considers the data performance in dependence of the core location. In addition, two important factors are performance loss caused by DVFS and the benefit of core re-allocation. We have implemented this framework on the Intel Single Chip Cloud Computer (SCC) and achieve a 27-32% better performance per watt ratio than naive DVFS policies at the expense of a minimal 1-2% overall performance loss. Furthermore, we have achieved a 5-11% higher performance than previous research with a migration technique that uses a naive migration algorithm that does also not consider the migration benefit and data locality.Abstract i Contents iii List of Figures vi List of Tables viii Chapter 1 Introduction 1 Chapter 2 Related Work 4 Chapter 3 Many-core Architectures 6 3.1 The Intel Single-chip Cloud Computer 6 3.1.1 Architecture Overview 6 3.1.2 Memory Addressing 7 3.1.3 DVFS Capabilities 8 3.2 Tilera 10 3.2.1 Architecture Overview 10 3.2.2 Memory Architecture 10 3.2.3 Switch Interface and Mesh 11 Chapter 4 Zero-copy OS Migration 13 4.1 Cooperative OS Migration 14 4.2 Migration Steps 14 4.3 Migration Volatile State 15 4.4 Networking 16 Chapter 5 Cooperative Hierarchical Power Management 17 5.1 Cooperative Core Re-Allocation 17 5.2 Hierarchical Organization 18 Chapter 6 Core Re-Allocation and DVFS Policies 21 6.1 Core Re-Allocation Considerations 22 6.2 Core Re-Allocation Algorithm 24 6.3 Evaluation of Core Re-Allocation 27 6.4 DVFS Policies 28 Chapter 7 Experimentation and Evaluation 29 7.1 Experimental Setup 29 7.2 Power Management Considerations 30 7.2.1 DVFS Performance Loss 31 7.2.2 Migration Benefit 32 7.2.3 Data-location Aware Migration 33 7.3 Results 34 7.3.1 Synthetic Periodic Workload 34 7.3.2 Profiled Workload 37 7.3.3 World Cup Workload 40 7.3.4 Overall Results 40 Chapter 8 Conclusion 43 APPENDICES 43 Chapter A Profiled Workload Benchmark Scenarios 44 A.1 Synthetic Benchmark Scenario based on Periodic Workloads 45 A.1.1 Synthetic Benchmark Scenario 1 45 A.1.2 Synthetic Benchmark Scenario 2 45 A.2 Memory Synthetic Benchmark Scenario based on Periodic Workloads 46 A.2.1 Memory Synthetic Benchmark Scenario 1 46 A.2.2 Memory Synthetic Benchmark Scenario 2 46 A.3 Benchmark Scenario based on Profiled Workloads 47 A.3.1 Profiled Benchmark Scenario 1 47 A.3.2 Profiled Benchmark Scenario 2 47 A.3.3 Profiled Benchmark Scenario 3 48 ์š”์•ฝ 54 Acknowledgements 55Maste

    Physical Planning and Uncore Power Management for Multi-Core Processors

    Get PDF
    For the microprocessor technology of today and the foreseeable future, multi-core is a key engine that drives performance growth under very tight power dissipation constraints. While previous research has been mostly focused on individual processor cores, there is a compelling need for studying how to efficiently manage shared resources among cores, including physical space, on-chip communication and on-chip storage. In managing physical space, floorplanning is the first and most critical step that largely affects communication efficiency and cost-effectiveness of chip designs. We consider floorplanning with regularity constraints that requires identical processing/memory cores to form an array. Such regularity can greatly facilitate design modularity and therefore shorten design turn-around time. Very little attention has been paid to automatic floorplanning considering regularity constraints because manual floorplanning has difficulty handling the complexity as chip core count increases. In this dissertation work, we investigate the regularity constraints in a simulated-annealing based floorplanner for multi/many core processor designs. A simple and effective technique is proposed to encode the regularity constraints in sequence-pair, which is a classic format of data representation in automatic floorplanning. To the best of our knowledge, this is the first work on regularity-constrained floorplanning in the context of multi/many core processor designs. On-chip communication and shared last level cache (LLC) play a role that is at least as equally important as processor cores in terms of chip performance and power. This dissertation research studies dynamic voltage and frequency scaling for on-chip network and LLC, which forms a single uncore domain of voltage and frequency. This is in contrast to most previous works where the network and LLC are partitioned and associated with processor cores based on physical proximity. The single shared domain can largely avoid the interfacing overhead across domain boundaries and is practical and very useful for industrial products. Our goal is to minimize uncore energy dissipation with little, e.g., 5% or less, performance degradation. The first part of this study is to identify a metric that can reflect the chip performance determined by uncore voltage/frequency. The second part is about how to monitor this metric with low overhead and high fidelity. The last part is the control policy that decides uncore voltage/frequency based on monitoring results. Our approach is validated through full system simulations on public architecture benchmarks

    Performance Controlled Power Optimization for Virtualized Internet Datacenters

    Get PDF
    Modern data centers must provide performance assurance for complex system software such as web applications. In addition, the power consumption of data centers needs to be minimized to reduce operating costs and avoid system overheating. In recent years, more and more data centers start to adopt server virtualization strategies for resource sharing to reduce hardware and operating costs by consolidating applications previously running on multiple physical servers onto a single physical server. In this dissertation, several power efficient algorithms are proposed to effectively reduce server power consumption while achieving the required application-level performance for virtualized servers. First, at the server level this dissertation proposes two control solutions based on dynamic voltage and frequency scaling (DVFS) technology and request batching technology. The two solutions share a performance balancing technique that maintains performance balancing among all virtual machines so that they can have approximately the same performance level relative to their allowed peak values. Then, when the workload intensity is light, we adopt the request batching technology by using a controller to determine the time length for periodically batching incoming requests and putting the processor into sleep mode. When the workload intensity changes from light to moderate, request batching is automatically switched to DVFS to increase the processor frequency for performance guarantees. Second, at the datacenter level, this dissertation proposes a performance-controlled power optimization solution for virtualized server clusters with multi-tier applications. The solution utilizes both DVFS and server consolidation strategies for maximized power savings by integrating feedback control with optimization strategies. At the application level, a multi-input-multi-output controller is designed to achieve the desired performance for applications spanning multiple VMs, on a short time scale, by reallocating the CPU resources and DVFS. At the cluster level, a power optimizer is proposed to incrementally consolidate VMs onto the most power-efficient servers on a longer time scale. Finally, this dissertation proposes a VM scheduling algorithm that exploits core performance heterogeneity to optimize the overall system energy efficiency. The four algorithms at the three different levels are demonstrated with empirical results on hardware testbeds and trace-driven simulations and compared against state-of-the-art baselines

    Distributed IC Power Delivery: Stability-Constrained Design Optimization and Workload-Aware Power Management

    Get PDF
    ABSTRACT Power delivery presents key design challenges in todayโ€™s systems ranging from high performance micro-processors to mobile systems-on-a-chips (SoCs). A robust power delivery system is essential to ensure reliable operation of on-die devices. Nowadays it has become an important design trend to place multiple voltage regulators on-chip in a distributive manner to cope with power supply noise. However, stability concern arises because of the complex interactions be-tween multiple voltage regulators and bulky network of the surrounding passive parasitics. The recently developed hybrid stability theorem (HST) is promising to deal with the stability of such system by efficiently capturing the effects of all interactions, however, large overdesign and hence severe performance degradation are caused by the intrinsic conservativeness of the underlying HST framework. To address such challenge, this dissertation first extends the HST by proposing a frequency-dependent system partitioning technique to substantially reduce the pessimism in stability evaluation. By systematically exploring the theoretical foundation of the HST framework, we recognize all the critical constraints under which the partitioning technique can be performed rigorously to remove conservativeness while maintaining key theoretical properties of the partitioned subsystems. Based on that, we develop an efficient stability-ensuring automatic design flow for large power delivery systems with distributed on-chip regulation. In use of the proposed approach, we further discover new design insights for circuit designers such as how regulator topology, on-chip decoupling capacitance, and the number of integrated voltage regulators can be optimized for improved system tradeoffs between stability and performances. Besides stability, power efficiency must be improved in every possible way while maintaining high power quality. It can be argued that the ultimate power integrity and efficiency may be best achieved via a heterogeneous chain of voltage processing starting from on-board switching voltage regulators (VRs), to on-chip switching VRs, and finally to networks of distributed on-chip linear VRs. As such, we propose a heterogeneous voltage regulation (HVR) architecture encompassing regulators with complimentary characteristics in response time, size, and efficiency. By exploring the rich heterogeneity and tunability in HVR, we develop systematic workload-aware control policies to adapt heterogeneous VRs with respect to workload change at multiple temporal scales to significantly improve system power efficiency while providing a guarantee for power integrity. The proposed techniques are further supported by hardware-accelerated machine learning prediction of non-uniform spatial workload distributions for more accurate HVR adaptation at fine time granularity. Our evaluations based on the PARSEC benchmark suite show that the proposed adaptive 3-stage HVR reduces the total system energy dissipation by up to 23.9% and 15.7% on average compared with the conventional static two-stage voltage regulation using off- and on-chip switching VRs. Compared with the 3-stage static HVR, our runtime control reduces system energy by up to 17.9% and 12.2% on average. Furthermore, the proposed machine learning prediction offers up to 4.1% reduction of system energy

    Multiple clock and voltage domains for chip multi processors

    No full text
    Power and thermal are major constraints for delivering compute performance in high-end CPU and are expected to be so in the future. CMP is becoming important by delivering more compute performance within the power constraints. Dynamic Voltage and Frequency Scaling (DVFS) has been studied in past work as a mean to increase save power and improving the overall processorโ€™s performance while meeting the total power and/or thermal constraints. For such systems, power delivery limitations are becoming a significant practical design consideration, unfortunately this aspect of the design was almost ignored by many research works. This paper explores the various possible topologies to build a high end multi-core CPU and the available policies that maximize performance within the set of physical limitations. It evaluates single and multiple voltage and frequency domains and introduces a new clustered topology, grouping several cores together. A hybrid model, using measurements of a real CPU, cycle accurate simulator and an analytical model is introduced. The results presented indicate that considering power delivery limitations diverts the conclusions when such limitations are ignored. This paper shows that a single power domain topology performs up to 30 % better than multiple power domains on light-threaded workload. In the fully threaded application the results divert. Clustered topology performs well for any number of 1 threads

    NUMA ๊ตฌ์กฐ๋ฅผ ์ธ์ง€ํ•œ ์นฉ ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์„œ๋ฅผ ์œ„ํ•œ ๊ณ„์ธต์  ์ „๋ ฅ ๊ด€๋ฆฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2017. 8. Bernhard Egger.๋Œ€์นญํ˜• ๋‹ค์ค‘ ์ฒ˜๋ฆฌ ์šด์˜์ฒด์ œ๋ฅผ ์‹คํ–‰ ์‹œํ‚ค๋Š” ์บ์‰ฌ ์ผ๊ด€์„ฑ์„ ๊ฐ€์ง€๋Š” ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์œ„ํ•œ ์ „ํ†ต์ ์ธ ์ ‘๊ทผ ๋ฐฉ๋ฒ•์€ ์ „๋ ฅ๊ด€๋ฆฌ๊ฐ€ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋ฌธ์ œ ์ค‘ ํ•˜๋‚˜๋กœ ์กด์žฌํ•˜๋Š” ๋ฏธ๋ž˜์˜ ๋งค๋‹ˆ์ฝ”์–ด ์‹œ์Šคํ…œ์—๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋งค๋‹ˆ์ฝ”์–ด ์‹œ์Šคํ…œ์„ ์œ„ํ•œ ๊ณ„์ธต์  ์ „๋ ฅ๊ด€๋ฆฌ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์†Œ๊ฐœํ•œ๋‹ค. ์ œ์•ˆํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์บ์‰ฌ ์ผ๊ด€์„ฑ์„ ๊ฐ€์ง€๋Š” ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํ•„์š” ์—†์œผ๋ฉฐ, ๋‹ค์ˆ˜์˜ ์ฝ”์–ด๋“ค์ด ์ „์••/์ฃผํŒŒ์ˆ˜๋ฅผ ๊ณต์œ ํ•˜๊ณ  ๋‹ค์ค‘ ์ „์••/๋‹ค์ค‘ ์ฃผํŒŒ์ˆ˜๋ฅผ ์ง€์›ํ•˜๋Š” ์•„ํ‚คํ…์ฒ˜์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค. ์ด ํ”„๋ ˆ์ž„์›Œํฌ๋Š” NUMA-์ธ์ง€ ๊ณ„์ธต์  ์ „๋ ฅ๊ด€๋ฆฌ ๊ธฐ์ˆ ๋กœ ๋™์  ์ „์•• ๋ฐ ์ฃผํŒŒ์ˆ˜ ๊ตํ™˜(DVFS)๊ณผ ์›Œํฌ๋กœ๋“œ ๋งˆ์ด๊ทธ๋ž˜์ด์…˜์„ ์‚ฌ์šฉํ•œ๋‹ค. ์—ฌ๊ธฐ์„œ ์›Œํฌ๋กœ๋“œ ๋งˆ์ด๊ทธ๋ž˜์ด์…˜ ๊ณ„ํš์„ ์œ„ํ•ด ์‚ฌ์šฉ๋œ ํƒ์š• ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์„œ๋กœ ์ƒ์ถฉํ•˜๋Š” ๋น„์Šทํ•œ ์ž‘์—…๋Ÿ‰์˜ ํŒจํ„ด์„ ๊ฐ€์ง„ ์ž‘์—…์„ ๊ฐ™์€ ์ „์•• ์˜์—ญ์œผ๋กœ ๋ชจ์œผ๋Š” ๋ชฉํ‘œ์™€ ์ž‘์—…์„ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š” ์œ„์น˜์™€ ๊ฐ€๊นŒ์šด ๊ณณ์œผ๋กœ ์ด๋™ํ•˜๋Š” ๋ชฉํ‘œ๋ฅผ ๊ณ ๋ คํ•œ๋‹ค. ์ œ์•ˆ๋œ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์†Œํ”„ํŠธ์›จ์–ด๋กœ ๊ตฌํ˜„๋˜์–ด ์บ์‰ฌ ์ผ๊ด€์„ฑ์ด ์—†๋Š” 48 ์ฝ”์–ด์˜ ์นฉ ๋ ˆ๋ฒจ ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์„œ ํ•˜๋“œ์›จ์–ด์—์„œ ํ‰๊ฐ€๋˜์—ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๋ฐ ์ดํ„ฐ ์„ผํ„ฐ ์ž‘์—… ํŒจํ„ด์œผ๋กœ ๊ด‘๋ฒ”์œ„์— ๊ฑธ์นœ ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ•œ ๊ฒฐ๊ณผ ์ตœ์ฒจ๋‹จ์˜ DVFS ๊ธฐ์ˆ ๊ณผ DVFS์™€ NUMA-๋น„์ธ์ง€ ์›Œํฌ๋กœ๋“œ ๋งˆ์ด๊ทธ๋ž˜์ด์…˜์„ ๊ฐ™์ด ์‚ฌ์šฉํ•œ ์ „๋ ฅ๊ด€๋ฆฌ ๊ธฐ์ˆ ์— ๋น„ํ•ด ์ƒ๋Œ€์ ์œผ๋กœ ๊ฐ๊ฐ 30%์™€ 5%์˜ ์ „๋ ฅ์†Œ๋ชจ๋‹น ์ฒ˜๋ฆฌ ์ž‘์—…๋Ÿ‰ ํ–ฅ์ƒ์„ ํฐ ์„ฑ๋Šฅ์†์‹ค ์—†์ด ์ด๋ฃจ์—ˆ๋‹ค.Traditional approaches for cache-coherent shared-memory architectures running symmetric multiprocessing (SMP) operating systems are not adequate for future many-core chips where power management presents one of the most important challenges. In this thesis, we present a hierarchical power management framework for many-core systems. The framework does not require coherent shared memory and supports multiple voltage/multiple-frequency (MVMF) architectures where several cores share the same voltage/frequency. We propose a hierarchical NUMA-aware power management technique that combines dynamic voltage and frequency scaling (DVFS) with workload migration. A greedy algorithm considers the conflicing goals of grouping workloads with similar utilization patterns in voltage domains and placing workloads as close as possible to their data. We implement the proposed scheme in software and evaluated it on existing hardware, a non-cache-coherent 48-core CMP. Compared to state-of-the-art power management techniques using DVFS-only and DVFS with NUMA-unaware migration, we achieve on average, a relative performance-per-watt improvement of 30 and 5 percent, respectively, for a wide range of datacenter workloads at no significant performance degradation.1 Introduction 1 2 Motivation and RelatedWork 5 2.1 Characteristics of Chip Multiprocessors 5 2.2 Dynamic Voltage and Frequency Scaling 7 2.3 Power Management on CMPs 8 2.4 Related Work 10 3 Cooperative Power Management 13 3.1 Cooperative Workload Migration 13 3.2 Hierarchical Organization 14 3.3 Domain Controllers 15 3.3.1 Core Controller 15 3.3.2 Frequency Controller 15 3.3.3 Voltage Controller 16 3.3.4 Chip Controller 16 3.3.5 Location of the Controllers 16 4 DVFS andWorkload Migration Policies 18 4.1 DVFS Policies 18 4.2 Phase Ordering and Frequency Considerations 19 4.3 Migration of Workloads 20 4.4 Scheduling Workload Migration 20 4.4.1 Schedule migration 21 4.4.2 Level migration 22 4.4.3 Assign target 25 4.4.4 Assign victim 26 4.5 Workload Migration Evaluation Model 27 5 Implementation 29 5.1 The Intel Single-chip Cloud Computer 29 5.2 Implementing Workload Migration 31 5.2.1 Migration Steps 31 5.2.2 Networking 33 5.3 Domain Controller Implementation 33 6 Experimental Setup 34 6.1 Hardware 34 6.2 Benchmark Scenarios 35 6.3 Comparison of Results 37 7 Results 38 7.1 Synthetic Scenarios 38 7.2 Datacenter Scenarios 42 7.2.1 Varying Number of Workloads 42 7.2.2 Independent Workloads 45 7.3 Overall Results Comparison 46 8 Discussion 48 8.1 Limitations 48 8.2 Extra Hardware Support 49 9 Conclusion 50 Appendices 51 A Benchmark Scenario Details 51 A.1 Synthetic Benchmark 53 A.2 Real World Benchmark 56 Bibliography 67 ์š”์•ฝ 73Maste

    Embedded computing systems design: architectural and application perspectives

    Get PDF
    Questo elaborato affronta varie problematiche legate alla progettazione e all'implementazione dei moderni sistemi embedded di computing, ponendo in rilevo, e talvolta in contrapposizione, le sfide che emergono all'avanzare della tecnologia ed i requisiti che invece emergono a livello applicativo, derivanti dalle necessitร  degli utenti finali e dai trend di mercato. La discussione sarร  articolata tenendo conto di due punti di vista: la progettazione hardware e la loro applicazione a livello di sistema. A livello hardware saranno affrontati nel dettaglio i problemi di interconnettivitร  on-chip. Aspetto che riguarda la parallelizzazione del calcolo, ma anche l'integrazione di funzionalitร  eterogenee. Sarร  quindi discussa un'architettura d'interconnessione denominata Network-on-Chip (NoC). La soluzione proposta รจ in grado di supportare funzionalitร  avanzate di networking direttamente in hardware, consentendo tuttavia di raggiungere sempre un compromesso ottimale tra prestazioni in termini di traffico e requisiti di implementazioni a seconda dell'applicazione specifica. Nella discussione di questa tematica, verrร  posto l'accento sul problema della configurabilitร  dei blocchi che compongono una NoC. Quello della configurabilitร , รจ un problema sempre piรน sentito nella progettazione dei sistemi complessi, nei quali si cerca di sviluppare delle funzionalitร , anche molto evolute, ma che siano semplicemente riutilizzabili. A tale scopo sarร  introdotta una nuova metodologia, denominata Metacoding che consiste nell'astrarre i problemi di configurabilitร  attraverso linguaggi di programmazione di alto livello. Sulla base del metacoding verrร  anche proposto un flusso di design automatico in grado di semplificare la progettazione e la configurazione di una NoC da parte del designer di rete. Come anticipato, la discussione si sposterร  poi a livello di sistema, per affrontare la progettazione di tali sistemi dal punto di vista applicativo, focalizzando l'attenzione in particolare sulle applicazioni di monitoraggio remoto. A tal riguardo saranno studiati nel dettaglio tutti gli aspetti che riguardano la progettazione di un sistema per il monitoraggio di pazienti affetti da scompenso cardiaco cronico. Si partirร  dalla definizione dei requisiti, che, come spesso accade a questo livello, derivano principalmente dai bisogni dell'utente finale, nel nostro caso medici e pazienti. Verranno discusse le problematiche di acquisizione, elaborazione e gestione delle misure. Il sistema proposto introduce vari aspetti innovativi tra i quali il concetto di protocollo operativo e l'elevata interoperabilitร  offerta. In ultima analisi, verranno riportati i risultati relativi alla sperimentazione del sistema implementato. Infine, il tema del monitoraggio remoto sarร  concluso con lo studio delle reti di distribuzione elettrica intelligenti: le Smart Grid, cercando di fare uno studio dello stato dell'arte del settore, proponendo un'architettura di Home Area Network (HAN) e suggerendone una possibile implementazione attraverso Commercial Off the Shelf (COTS)
    corecore