206 research outputs found

    Intelligent Management of Mobile Systems through Computational Self-Awareness

    Full text link
    Runtime resource management for many-core systems is increasingly complex. The complexity can be due to diverse workload characteristics with conflicting demands, or limited shared resources such as memory bandwidth and power. Resource management strategies for many-core systems must distribute shared resource(s) appropriately across workloads, while coordinating the high-level system goals at runtime in a scalable and robust manner. To address the complexity of dynamic resource management in many-core systems, state-of-the-art techniques that use heuristics have been proposed. These methods lack the formalism in providing robustness against unexpected runtime behavior. One of the common solutions for this problem is to deploy classical control approaches with bounds and formal guarantees. Traditional control theoretic methods lack the ability to adapt to (1) changing goals at runtime (i.e., self-adaptivity), and (2) changing dynamics of the modeled system (i.e., self-optimization). In this chapter, we explore adaptive resource management techniques that provide self-optimization and self-adaptivity by employing principles of computational self-awareness, specifically reflection. By supporting these self-awareness properties, the system can reason about the actions it takes by considering the significance of competing objectives, user requirements, and operating conditions while executing unpredictable workloads

    Cooperative Power Management for Chip Multiprocessors using Space-Shared Scheduling

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2015. 8. Bernhard Egger.์ตœ๊ทผ Cloud Computing ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•˜๋Š” ๋ฐ์ดํ„ฐ์„ผํ„ฐ ๋“ฑ์—์„œ๋Š” Many-core chip์ด ๊ธฐ์กด Multi-core๋ฅผ ๋Œ€์ฒดํ•˜์—ฌ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ์œผ๋ฉฐ Operating System๋„ Many-core ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ Space-sharing ๋ฐฉ์‹์œผ๋กœ ์„ค๊ณ„๊ฐ€ ๋ณ€๊ฒฝ๋˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ถ”์„ธ์†์—์„œ ๊ธฐ์กด์˜ ์ „ํ†ต์ ์ธ DVFS ๋ฐฉ์‹์„ ์ด์šฉํ•ด์„œ๋Š” Many-core ํ™˜๊ฒฝ์—์„œ ํšจ์œจ์ ์ธ ์ „๋ ฅ ์‚ฌ์šฉ์ด ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์— ์ถ”๊ฐ€์ ์ธ ์ „๋ ฅ ๊ด€๋ฆฌ ๋ฐฉ๋ฒ•๊ณผ Many-core์˜ ํŠน์„ฑ์„ ๊ณ ๋ คํ•œ Core ์žฌ๋ฐฐ์น˜ ๊ธฐ์ˆ ์ด ํ•„์š”ํ•˜๋‹ค. Space-shared OS๋Š” Core์™€ ๋ฌผ๋ฆฌ์ ์ธ ๋ฉ”๋ชจ๋ฆฌ์˜ ๊ตฌ์„ฑ์— ๋Œ€ํ•œ ์ž์› ๊ด€๋ฆฌ๋ฅผ ํ•˜๋Š”๋ฐ, ์ตœ๊ทผ์˜ Chip multiprocessor (CMP) ๋“ค์€ ๊ฐ๊ฐ์˜ Core์—์„œ ๋…๋ฆฝ์ ์œผ๋กœ DVFS๋ฅผ ๋™์ž‘ํ•˜๋„๋ก ํ•˜์ง€ ์•Š๊ณ  ๋ช‡๊ฐœ์˜ Core๋“ค์„ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ Voltage ๋˜๋Š” Frequency๋ฅผ ํ•จ๊ป˜ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ๋„๋ก ์ง€์›ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ ๋ฉ”๋ชจ๋ฆฌ ๋˜ํ•œ Coarse-grained ๋ฐฉ์‹์œผ๋กœ ๋…๋ฆฝ๋œ ํŒŒํ‹ฐ์…˜์œผ๋กœ ํ• ๋‹น ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๊ด€๋ฆฌ๋œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์ด๋Ÿฌํ•œ CMP์˜ ํŠน์„ฑ์„ ๊ณ ๋ คํ•˜์—ฌ Core ์žฌ๋ฐฐ์น˜์™€ DVFS ๊ธฐ์ˆ ์„ ์ด์šฉํ•œ ๊ณ„์ธต์  ์ „๋ ฅ ๊ด€๋ฆฌ ์‹œ์Šคํ…œ์„ ์—ฐ๊ตฌํ•˜๋Š”๋ฐ ๋ชฉํ‘œ๊ฐ€ ์žˆ๋‹ค. ํŠนํžˆ Core ์žฌ๋ฐฐ์น˜ ๊ธฐ์ˆ ์€ Core์˜ ์œ„์น˜์— ๋”ฐ๋ฅธ Data ์„ฑ๋Šฅ๋„ ํ•จ๊ป˜ ๊ณ ๋ คํ•˜๊ณ  ์žˆ๋‹ค. ์ด์— ์ถ”๊ฐ€๋กœ DVFS ์„ฑ๋Šฅ ์†์‹ค์„ ๊ณ ๋ คํ•œ ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ ์ƒ์Šน๊ณผ Core ์žฌ๋ฐฐ์น˜์‹œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ํšจ๊ณผ๋ฅผ ๋ฏธ๋ฆฌ ๊ณ„์‚ฐํ•˜์—ฌ ์ตœ์†Œํ•œ์˜ ์„ฑ๋Šฅ์ €ํ•˜๋กœ ๋” ์ข‹์€ ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ์„ ์–ป์„ ์ˆ˜ ์žˆ๋„๋ก ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ๋˜ํ•œ ์‹ค์ œ ๊ตฌํ˜„ ๋ฐ ์‹คํ—˜์€ Intel์—์„œ ์ถœ์‹œํ•œ Single-chip Cloud Computer (SCC)์—์„œ ์ง„ํ–‰ํ•˜์˜€์œผ๋ฉฐ ์‹œ๋‚˜๋ฆฌ์˜ค๋ณ„๋กœ 1-2%์˜ ์„ฑ๋Šฅ ์†์‹ค๋กœ Performance per watt ratio๊ฐ€ 27-32% ํ–ฅ์ƒ๋˜์—ˆ๋‹ค. ๋˜ํ•œ Migration ํšจ๊ณผ์™€ Data ์ง€์—ญ์„ฑ ๋“ฑ์„ ๊ณ ๋ คํ•˜์ง€ ์•Š์•˜๋˜ ๊ธฐ์กด ์—ฐ๊ตฌ๋ณด๋‹ค ์„ฑ๋Šฅ์ด 5-11% ์ข‹์•„์กŒ๋‹ค.Nowadays, many-core chips are especially attractive for data center operators to provide cloud computing service models. The trend in operating system designs, furthermore, is changing from traditional time-sharing to space-shared approaches to support recent many-core architectures. These CPU and OS changes make power and thermal constraints becoming one of most important design issues. Additional power management methods and core re-allocation techniques are necessary to overcome the limitations of traditional dynamic voltage and frequency scaling (DVFS). In this thesis, we present a cooperative hierarchical power management for many-core systems running a space-shared operating system. We consider two levels of space-shared system resources: space in the form of cores and physical memory. Recent chip multiprocessors (CMPs) provide group-level DVFS in which the voltage/frequency of cores is managed at the level of several cores instead of every single core. Memory is also allocated by a coarse-grained resource manager to isolate space partitions. Our research reflects these characteristics of CMPs. We show how to integrate core re-allocation and DVFS techniques through cooperative hierarchical power management. The core re-allocation technique considers the data performance in dependence of the core location. In addition, two important factors are performance loss caused by DVFS and the benefit of core re-allocation. We have implemented this framework on the Intel Single Chip Cloud Computer (SCC) and achieve a 27-32% better performance per watt ratio than naive DVFS policies at the expense of a minimal 1-2% overall performance loss. Furthermore, we have achieved a 5-11% higher performance than previous research with a migration technique that uses a naive migration algorithm that does also not consider the migration benefit and data locality.Abstract i Contents iii List of Figures vi List of Tables viii Chapter 1 Introduction 1 Chapter 2 Related Work 4 Chapter 3 Many-core Architectures 6 3.1 The Intel Single-chip Cloud Computer 6 3.1.1 Architecture Overview 6 3.1.2 Memory Addressing 7 3.1.3 DVFS Capabilities 8 3.2 Tilera 10 3.2.1 Architecture Overview 10 3.2.2 Memory Architecture 10 3.2.3 Switch Interface and Mesh 11 Chapter 4 Zero-copy OS Migration 13 4.1 Cooperative OS Migration 14 4.2 Migration Steps 14 4.3 Migration Volatile State 15 4.4 Networking 16 Chapter 5 Cooperative Hierarchical Power Management 17 5.1 Cooperative Core Re-Allocation 17 5.2 Hierarchical Organization 18 Chapter 6 Core Re-Allocation and DVFS Policies 21 6.1 Core Re-Allocation Considerations 22 6.2 Core Re-Allocation Algorithm 24 6.3 Evaluation of Core Re-Allocation 27 6.4 DVFS Policies 28 Chapter 7 Experimentation and Evaluation 29 7.1 Experimental Setup 29 7.2 Power Management Considerations 30 7.2.1 DVFS Performance Loss 31 7.2.2 Migration Benefit 32 7.2.3 Data-location Aware Migration 33 7.3 Results 34 7.3.1 Synthetic Periodic Workload 34 7.3.2 Profiled Workload 37 7.3.3 World Cup Workload 40 7.3.4 Overall Results 40 Chapter 8 Conclusion 43 APPENDICES 43 Chapter A Profiled Workload Benchmark Scenarios 44 A.1 Synthetic Benchmark Scenario based on Periodic Workloads 45 A.1.1 Synthetic Benchmark Scenario 1 45 A.1.2 Synthetic Benchmark Scenario 2 45 A.2 Memory Synthetic Benchmark Scenario based on Periodic Workloads 46 A.2.1 Memory Synthetic Benchmark Scenario 1 46 A.2.2 Memory Synthetic Benchmark Scenario 2 46 A.3 Benchmark Scenario based on Profiled Workloads 47 A.3.1 Profiled Benchmark Scenario 1 47 A.3.2 Profiled Benchmark Scenario 2 47 A.3.3 Profiled Benchmark Scenario 3 48 ์š”์•ฝ 54 Acknowledgements 55Maste

    Dynamic Lifetime Reliability and Energy Management for Network-on-Chip based Chip Multiprocessors

    Get PDF
    In this dissertation, we study dynamic reliability management (DRM) and dynamic energy management (DEM) techniques for network-on-chip (NoC) based chip multiprocessors (CMPs). In the first part, the proposed DRM algorithm takes both the computational and the communication components of the CMP into consideration and combines thread migration and dynamic voltage and frequency scaling (DVFS) as the two primary techniques to change the CMP operation. The goal is to increase the lifetime reliability of the overall system to the desired target with minimal performance degradation. The simulation results on a variety of benchmarks on 16 and 64 core NoC based CMP architectures demonstrate that lifetime reliability can be improved by 100% for an average performance penalty of 7.7% and 8.7% for the two CMP architectures. In the second part of this dissertation, we first propose novel algorithms that employ Kalman filtering and long short term memory (LSTM) for workload prediction. These predictions are then used as the basis on which voltage/frequency (V/F) pairs are selected for each core by an effective dynamic voltage and frequency scaling algorithm whose objective is to reduce energy consumption but without degrading performance beyond the user set threshold. Secondly, we investigate the use of deep neural network (DNN) models for energy optimization under performance constraints in CMPs. The proposed algorithm is implemented in three phases. The first phase collects the training data by employing Kalman filtering for workload prediction and an efficient heuristic algorithm based on DVFS. The second phase represents the training process of the DNN model and in the last phase, the DNN model is used to directly identify V/F pairs that can achieve lower energy consumption without performance degradation beyond the acceptable threshold set by the user. Simulation results on 16 and 64 core NoC based architectures demonstrate that the proposed approach can achieve up to 55% energy reduction for 10% performance degradation constraints. Simulation experiments compare the proposed algorithm against existing approaches based on reinforcement learning and Kalman filtering and show that the proposed DNN technique provides average improvements in energy-delay-product (EDP) of 6.3% and 6% for the 16 core architecture and of 7.4% and 5.5% for the 64 core architecture

    DYNAMIC THERMAL MANAGEMENT FOR MICROPROCESSORS THROUGH TASK SCHEDULING

    Get PDF
    With continuous IC(Integrated Circuit) technology size scaling, more and more transistors are integrated in a tiny area of the processor. Microprocessors experience unprecedented high power and high temperatures on chip, which can easily violate the thermal constraint. High temperature on the chip, if not controlled, can damage or even burn the chip. There are also emerging technologies which can exacerbate the thermal condition on modern processors. For example, 3D stacking is an IC technology that stacks several die layers together, in order to shorten the communication path between the dies to improve the chip performance. This technology unfortunately increases the power density per unit volumn, and the heat from each layer needs to dissipate vertically through the same heat sink. Another example is chip multi-processor. A chip multi-processor(CMP) integrates two or more independent actual processors (called โ€œcoresโ€), onto a single integrated circuit die. As IC technology nodes continually scale down to 45nm and below, there is significant within-die process variation(PV) in the current and near-future CMPs. Process variation makes the cores in the chip differ in their maximum operable frequency, and the amount of leakage power they consume. This can result in the immense spatial variation of the temperatures of the cores on the same chip, which means the temperatures of some cores can be much higher than other cores. One of the most commonly used methods to constrain a CPU from overheating is hardware dynamic thermal management(HW DTM), due to the high cost and inefficiency of current mechanical cooling techniques. Dynamic voltage/frequency scaling(DVFS) is such a broad-spectrum dynamic thermal management technique that can be applied to all types of processors, so we adopt DVFS as the HW DTM method in this thesis to simplify problem discussion. DVFS lowers the CPU power consumption by reducing CPU frequency or voltage when temperature overshoots, which constrains the temperature at the price of performance loss, in terms of reduced CPU throughput, or longer execution time of the programs. This thesis mainly addresses this problem, with the goal of eliminating unnecessary hardware-level DVFS and improving chip performance. The methodology of the experiments in this thesis are based on the accurate estimation of power and temperature on the processor. The CPU power usage of different benchmarks are estimated by reading the performance counters on a real P4 chip, and measuring the activities of different CPU functional units. The jobs are then categorized into powerintensive(hot) ones and power non-intensive(cool) ones. Many combinations of the jobs with mixed power(thermal) characteristics are used to evaluate the effectiveness of the algorithms we propose. When the experiments are conducted on a single-core processor, a compact dynamic thermal model embedded in Linux kernel is used to calculate the CPU temperature. When the experiments are conducted on the CMP with 3D stacked dies, or the CMP affected by significant process variation, a thermal simulation tool well recognized in academia is used. The contribution of the thesis is that it proposes new software-level task scheduling algorithms to avoid unnecessary hardware-level DVFS. New task scheduling algorithms are proposed not only for the single-core processor, but aslo for the CMP with 3D stacked dies, and the CMP under process variation. Compared with the state-of-the-art algorithms proposed by other researchers, the new algorithms we propose all show significant performance improvement. To improve the performance of the single-core processors, which is harmed by the thermal overshoots and the HW DTMs, we propose a heuristic algorithm named ThreshHot, which judiciously schedules hot jobs before cool jobs, to make the future temperature lower. Furthermore, it always makes the temperature stay as close to the threshold as possible while not overshooting. In the CMPs with 3D stacked dies, three heuristics are proposed and combined as one algorithm. First, the vertically stacked cores are treated as a core stack. The power of jobs is balanced among the core stacks instead of the individual cores. Second, the hot jobs are moved close to the heat sink to expedite heat dissipation. Third, when the thermal emergencies happen, the most power-intensive job in a core stack is penalized in order to lower the temperature quickly. When CMPs are under significant process variation, each core on the CMP has distinct maximum frequency and leakage power. Maximizing the overall CPU throughput on all the cores is in conflict with satisfying on-chip thermal constraints imposed on each core. A maximum bipartite matching algorithm is used to solve this dilemma, to exploit the maximum performance of the chip

    Maximizing heterogeneous processor performance under power constraints

    Get PDF

    Exploiting heterogeneity in Chip-Multiprocessor Design

    Get PDF
    In the past decade, semiconductor manufacturers are persistent in building faster and smaller transistors in order to boost the processor performance as projected by Mooreโ€™s Law. Recently, as we enter the deep submicron regime, continuing the same processor development pace becomes an increasingly difficult issue due to constraints on power, temperature, and the scalability of transistors. To overcome these challenges, researchers propose several innovations at both architecture and device levels that are able to partially solve the problems. These diversities in processor architecture and manufacturing materials provide solutions to continuing Mooreโ€™s Law by effectively exploiting the heterogeneity, however, they also introduce a set of unprecedented challenges that have been rarely addressed in prior works. In this dissertation, we present a series of in-depth studies to comprehensively investigate the design and optimization of future multi-core and many-core platforms through exploiting heteroge-neities. First, we explore a large design space of heterogeneous chip multiprocessors by exploiting the architectural- and device-level heterogeneities, aiming to identify the optimal design patterns leading to attractive energy- and cost-efficiencies in the pre-silicon stage. After this high-level study, we pay specific attention to the architectural asymmetry, aiming at developing a heterogeneity-aware task scheduler to optimize the energy-efficiency on a given single-ISA heterogeneous multi-processor. An advanced statistical tool is employed to facilitate the algorithm development. In the third study, we shift our concentration to the device-level heterogeneity and propose to effectively leverage the advantages provided by different materials to solve the increasingly important reliability issue for future processors

    Energy-Efficient and Reliable Computing in Dark Silicon Era

    Get PDF
    Dark silicon denotes the phenomenon that, due to thermal and power constraints, the fraction of transistors that can operate at full frequency is decreasing in each technology generation. Mooreโ€™s law and Dennard scaling had been backed and coupled appropriately for five decades to bring commensurate exponential performance via single core and later muti-core design. However, recalculating Dennard scaling for recent small technology sizes shows that current ongoing multi-core growth is demanding exponential thermal design power to achieve linear performance increase. This process hits a power wall where raises the amount of dark or dim silicon on future multi/many-core chips more and more. Furthermore, from another perspective, by increasing the number of transistors on the area of a single chip and susceptibility to internal defects alongside aging phenomena, which also is exacerbated by high chip thermal density, monitoring and managing the chip reliability before and after its activation is becoming a necessity. The proposed approaches and experimental investigations in this thesis focus on two main tracks: 1) power awareness and 2) reliability awareness in dark silicon era, where later these two tracks will combine together. In the first track, the main goal is to increase the level of returns in terms of main important features in chip design, such as performance and throughput, while maximum power limit is honored. In fact, we show that by managing the power while having dark silicon, all the traditional benefits that could be achieved by proceeding in Mooreโ€™s law can be also achieved in the dark silicon era, however, with a lower amount. Via the track of reliability awareness in dark silicon era, we show that dark silicon can be considered as an opportunity to be exploited for different instances of benefits, namely life-time increase and online testing. We discuss how dark silicon can be exploited to guarantee the system lifetime to be above a certain target value and, furthermore, how dark silicon can be exploited to apply low cost non-intrusive online testing on the cores. After the demonstration of power and reliability awareness while having dark silicon, two approaches will be discussed as the case study where the power and reliability awareness are combined together. The first approach demonstrates how chip reliability can be used as a supplementary metric for power-reliability management. While the second approach provides a trade-off between workload performance and system reliability by simultaneously honoring the given power budget and target reliability
    • โ€ฆ
    corecore