Search CORE

18 research outputs found

Recommended from our members

A RISC-V Vector Processor With Simultaneous-Switching Switched-Capacitor DC-DC Converters in 28 nm FDSOI

Author: Alon E
Asanović K
Avizienis R
Bailey S
Blagojević M
Chen PH
Chiu PF
Flatresse P
Jevtić R
Keller B
Kwak J
Le HP
Lee Y
Nikolić B
Puggelli A
Richards B
Sutardja N
Waterman A
Zimmer B
Publication venue: eScholarship, University of California
Publication date: 01/04/2016
Field of study

This work demonstrates a RISC-V vector microprocessor implemented in 28 nm FDSOI with fully integrated simultaneous-switching switched-capacitor DC-DC (SC DC-DC) converters and adaptive clocking that generates four on-chip voltages between 0.45 and 1 V using only 1.0 V core and 1.8 V IO voltage inputs. The converters achieve high efficiency at the system level by switching simultaneously to avoid charge-sharing losses and by using an adaptive clock to maximize performance for the resulting voltage ripple. Details about the implementation of the DC-DC switches, DC-DC controller, and adaptive clock are provided, and the sources of conversion loss are analyzed based on measured results. This system pushes the capabilities of dynamic voltage scaling by enabling fast transitions (20 ns), simple packaging (no off-chip passives), low area overhead (16%), high conversion efficiency (80%-86%), and high energy efficiency (26.2 DP GFLOPS/W) for mobile devices

eScholarship - University of California

Design and Analysis of an Adaptive Asynchronous System Architecture for Energy Efficiency

Author: Hollosi Brent Michael
Publication venue: ScholarWorks@UARK
Publication date: 01/12/2012
Field of study

Power has become a critical design parameter for digital CMOS integrated circuits. With performance still garnering much concern, a central idea has emerged: minimizing power consumption while maintaining performance. The use of dynamic voltage scaling (DVS) with parallelism has shown to be an effective way of saving power while maintaining performance. However, the potency of DVS and parallelism in traditional, clocked synchronous systems is limited because of the strict timing requirements such systems must comply with. Delay-insensitive (DI) asynchronous systems have the potential to benefit more from these techniques due to their flexible timing requirements and high modularity. This dissertation presents the design and analysis of a real-time adaptive DVS architecture for paralleled Multi-Threshold NULL Convention Logic (MTNCL) systems. Results show that energy-efficient systems with low area overhead can be created using this approach

ScholarWorks@UARK

UARK (University of Arkansas )

Cooperative Power Management for Chip Multiprocessors using Space-Shared Scheduling

Author: 이승열
Publication venue: 서울대학교 대학원
Publication date: 01/08/2015
Field of study

학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 8. Bernhard Egger.최근 Cloud Computing 서비스를 제공하는 데이터센터 등에서는 Many-core chip이 기존 Multi-core를 대체하여 사용되고 있으며 Operating System도 Many-core 시스템을 사용할 수 있게 Space-sharing 방식으로 설계가 변경되고 있다. 이러한 추세속에서 기존의 전통적인 DVFS 방식을 이용해서는 Many-core 환경에서 효율적인 전력 사용이 어렵기 때문에 추가적인 전력 관리 방법과 Many-core의 특성을 고려한 Core 재배치 기술이 필요하다. Space-shared OS는 Core와 물리적인 메모리의 구성에 대한 자원 관리를 하는데, 최근의 Chip multiprocessor (CMP) 들은 각각의 Core에서 독립적으로 DVFS를 동작하도록 하지 않고 몇개의 Core들을 그룹화하여 Voltage 또는 Frequency를 함께 변경할 수 있도록 지원하고 있으며 메모리 또한 Coarse-grained 방식으로 독립된 파티션으로 할당 할 수 있게 관리된다. 본 연구는 이러한 CMP의 특성을 고려하여 Core 재배치와 DVFS 기술을 이용한 계층적 전력 관리 시스템을 연구하는데 목표가 있다. 특히 Core 재배치 기술은 Core의 위치에 따른 Data 성능도 함께 고려하고 있다. 이에 추가로 DVFS 성능 손실을 고려한 에너지 효율성 상승과 Core 재배치시 발생할 수 있는 효과를 미리 계산하여 최소한의 성능저하로 더 좋은 에너지 효율성을 얻을 수 있도록 연구를 진행하였다. 또한 실제 구현 및 실험은 Intel에서 출시한 Single-chip Cloud Computer (SCC)에서 진행하였으며 시나리오별로 1-2%의 성능 손실로 Performance per watt ratio가 27-32% 향상되었다. 또한 Migration 효과와 Data 지역성 등을 고려하지 않았던 기존 연구보다 성능이 5-11% 좋아졌다.Nowadays, many-core chips are especially attractive for data center operators to provide cloud computing service models. The trend in operating system designs, furthermore, is changing from traditional time-sharing to space-shared approaches to support recent many-core architectures. These CPU and OS changes make power and thermal constraints becoming one of most important design issues. Additional power management methods and core re-allocation techniques are necessary to overcome the limitations of traditional dynamic voltage and frequency scaling (DVFS). In this thesis, we present a cooperative hierarchical power management for many-core systems running a space-shared operating system. We consider two levels of space-shared system resources: space in the form of cores and physical memory. Recent chip multiprocessors (CMPs) provide group-level DVFS in which the voltage/frequency of cores is managed at the level of several cores instead of every single core. Memory is also allocated by a coarse-grained resource manager to isolate space partitions. Our research reflects these characteristics of CMPs. We show how to integrate core re-allocation and DVFS techniques through cooperative hierarchical power management. The core re-allocation technique considers the data performance in dependence of the core location. In addition, two important factors are performance loss caused by DVFS and the benefit of core re-allocation. We have implemented this framework on the Intel Single Chip Cloud Computer (SCC) and achieve a 27-32% better performance per watt ratio than naive DVFS policies at the expense of a minimal 1-2% overall performance loss. Furthermore, we have achieved a 5-11% higher performance than previous research with a migration technique that uses a naive migration algorithm that does also not consider the migration benefit and data locality.Abstract i Contents iii List of Figures vi List of Tables viii Chapter 1 Introduction 1 Chapter 2 Related Work 4 Chapter 3 Many-core Architectures 6 3.1 The Intel Single-chip Cloud Computer 6 3.1.1 Architecture Overview 6 3.1.2 Memory Addressing 7 3.1.3 DVFS Capabilities 8 3.2 Tilera 10 3.2.1 Architecture Overview 10 3.2.2 Memory Architecture 10 3.2.3 Switch Interface and Mesh 11 Chapter 4 Zero-copy OS Migration 13 4.1 Cooperative OS Migration 14 4.2 Migration Steps 14 4.3 Migration Volatile State 15 4.4 Networking 16 Chapter 5 Cooperative Hierarchical Power Management 17 5.1 Cooperative Core Re-Allocation 17 5.2 Hierarchical Organization 18 Chapter 6 Core Re-Allocation and DVFS Policies 21 6.1 Core Re-Allocation Considerations 22 6.2 Core Re-Allocation Algorithm 24 6.3 Evaluation of Core Re-Allocation 27 6.4 DVFS Policies 28 Chapter 7 Experimentation and Evaluation 29 7.1 Experimental Setup 29 7.2 Power Management Considerations 30 7.2.1 DVFS Performance Loss 31 7.2.2 Migration Benefit 32 7.2.3 Data-location Aware Migration 33 7.3 Results 34 7.3.1 Synthetic Periodic Workload 34 7.3.2 Profiled Workload 37 7.3.3 World Cup Workload 40 7.3.4 Overall Results 40 Chapter 8 Conclusion 43 APPENDICES 43 Chapter A Profiled Workload Benchmark Scenarios 44 A.1 Synthetic Benchmark Scenario based on Periodic Workloads 45 A.1.1 Synthetic Benchmark Scenario 1 45 A.1.2 Synthetic Benchmark Scenario 2 45 A.2 Memory Synthetic Benchmark Scenario based on Periodic Workloads 46 A.2.1 Memory Synthetic Benchmark Scenario 1 46 A.2.2 Memory Synthetic Benchmark Scenario 2 46 A.3 Benchmark Scenario based on Profiled Workloads 47 A.3.1 Profiled Benchmark Scenario 1 47 A.3.2 Profiled Benchmark Scenario 2 47 A.3.3 Profiled Benchmark Scenario 3 48 요약 54 Acknowledgements 55Maste

SNU Open Repository and Archive

Resource and thermal management in 3D-stacked multi-/many-core systems

Author: Zhang Tiansheng
Publication venue
Publication date: 10/03/2017
Field of study

Continuous semiconductor technology scaling and the rapid increase in computational needs have stimulated the emergence of multi-/many-core processors. While up to hundreds of cores can be placed on a single chip, the performance capacity of the cores cannot be fully exploited due to high latencies of interconnects and memory, high power consumption, and low manufacturing yield in traditional (2D) chips. 3D stacking is an emerging technology that aims to overcome these limitations of 2D designs by stacking processor dies over each other and using through-silicon-vias (TSVs) for on-chip communication, and thus, provides a large amount of on-chip resources and shortens communication latency. These benefits, however, are limited by challenges in high power densities and temperatures. 3D stacking also enables integrating heterogeneous technologies into a single chip. One example of heterogeneous integration is building many-core systems with silicon-photonic network-on-chip (PNoC), which reduces on-chip communication latency significantly and provides higher bandwidth compared to electrical links. However, silicon-photonic links are vulnerable to on-chip thermal and process variations. These variations can be countered by actively tuning the temperatures of optical devices through micro-heaters, but at the cost of substantial power overhead. This thesis claims that unearthing the energy efficiency potential of 3D-stacked systems requires intelligent and application-aware resource management. Specifically, the thesis improves energy efficiency of 3D-stacked systems via three major components of computing systems: cache, memory, and on-chip communication. We analyze characteristics of workloads in computation, memory usage, and communication, and present techniques that leverage these characteristics for energy-efficient computing. This thesis introduces 3D cache resource pooling, a cache design that allows for flexible heterogeneity in cache configuration across a 3D-stacked system and improves cache utilization and system energy efficiency. We also demonstrate the impact of resource pooling on a real prototype 3D system with scratchpad memory. At the main memory level, we claim that utilizing heterogeneous memory modules and memory object level management significantly helps with energy efficiency. This thesis proposes a memory management scheme at a finer granularity: memory object level, and a page allocation policy to leverage the heterogeneity of available memory modules and cater to the diverse memory requirements of workloads. On the on-chip communication side, we introduce an approach to limit the power overhead of PNoC in (3D) many-core systems through cross-layer thermal management. Our proposed thermally-aware workload allocation policies coupled with an adaptive thermal tuning policy minimize the required thermal tuning power for PNoC, and in this way, help broader integration of PNoC. The thesis also introduces techniques in placement and floorplanning of optical devices to reduce optical loss and, thus, laser source power consumption.2018-03-09T00:00:00

Boston University Institutional Repository (OpenBU)

Tangle: Route-oriented dynamic voltage minimization for variation-afflicted, energy-efficient on-chip networks

Author: Amin Ansari
Asit Mishra
Jianping Xu
Josep Torrellas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/12/2014
Field of study

On-chip networks are especially vulnerable to within-die pa-rameter variations. Since they connect distant parts of the chip, they need to be designed to work under the most unfavorable parameter values in the chip. This results in energy-inefficient designs. To improve the energy efficiency of on-chip networks, this paper presents a novel approach that relies on monitoring the errors of messages as they traverse the network. Based on the observed errors of messages, the system dynamically decreases or increases the voltage (Vdd) of groups of network routers. With this approach, called Tangle, the differentVdd val-ues applied to different groups of network routers progressively converge to their lowest, variation-aware, error-free values — always keeping the network frequency unchanged. This saves substantial network energy. In a simulated 64-router network with 4 Vdd domains, Tangle reduces the network energy con-sumption by an average of 22 % with negligible performance impact. In a future network design with one Vdd domain per router, Tangle lowers the network Vdd by an average of 21%, reducing the network energy consumption by an average of 28 % with negligible performance impact. 1

CiteSeerX

Crossref

RUNTIME METHODS TO IMPROVE ENERGY EFFICIENCY IN SUPERCOMPUTING APPLICATIONS

Author: Bhalachandra Sridutt
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2018
Field of study

Energy efficiency in supercomputing is critical to limit operating costs and carbon footprints. While the energy efficiency of future supercomputing centers needs to improve at all levels, the energy consumed by the processing units is a large fraction of the total energy consumed by High Performance Computing (HPC) systems. HPC applications use a parallel programming paradigm like the Message Passing Interface (MPI) to coordinate computation and communication among thousands of processors. With dynamically-changing factors both in hardware and software affecting energy usage of processors, there exists a need for power monitoring and regulation at runtime to achieve savings in energy. This dissertation highlights an adaptive runtime framework that enables processors with core-specific power control by dynamically adapting to workload characteristics to reduce power with little or no performance impact. Two opportunities to improve the energy efficiency of processors running MPI applications are identified - computational workload imbalance and waiting on memory. Monitoring of performance and power regulation is performed by the framework transparently within the MPI runtime system, eliminating the need for code changes to MPI applications. The effect of enforcing power limits (capping) on processors is also investigated. Experiments on 32 nodes (1024 cores) show that in presence of workload imbalance, the runtime reduces Central Processing Unit (CPU) frequency on cores not on the critical path, thereby reducing power and hence energy usage without deteriorating performance. Using this runtime, six MPI mini-applications and a full MPI application show an overall 20% decrease in energy use with less than 1% increase in execution time. In addition, the lowering of frequency on non-critical cores reduces run-to-run performance variation and improves performance. For the full application, an average speedup of 11% is seen, while the power is lowered by about 31% for an energy savings of up to 42%. Another experiment on 16 nodes (256 cores) that are power capped also shows performance improvement along with power reduction. Thus, energy optimization can also be a performance optimization. For applications that are limited by memory access times, memory metrics identified facilitate lowering of power by up to 32% without adversely impacting performance.Doctor of Philosoph

Carolina Digital Repository

Addressing Manufacturing Challenges in NoC-based ULSI Designs

Author: Hernández Luz Carles
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 19/07/2012
Field of study

Hernández Luz, C. (2012). Addressing Manufacturing Challenges in NoC-based ULSI Designs [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1669

RiuNet

Design for Reliability and Low Power in Emerging Technologies

Author: Alsalamin Sami
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 24/09/2021
Field of study

Die fortlaufende Verkleinerung von Transistor-Strukturgrößen ist einer der wichtigsten Antreiber für das Wachstum in der Halbleitertechnologiebranche. Seit Jahrzehnten erhöhen sich sowohl Integrationsdichte als auch Komplexität von Schaltkreisen und zeigen damit einen fortlaufenden Trend, der sich über alle modernen Fertigungsgrößen erstreckt. Bislang ging das Verkleinern von Transistoren mit einer Verringerung der Versorgungsspannung einher, was zu einer Reduktion der Leistungsaufnahme führte und damit eine gleichbleibenden Leistungsdichte sicherstellte. Doch mit dem Beginn von Strukturgrößen im Nanometerbreich verlangsamte sich die fortlaufende Skalierung. Viele Schwierigkeiten, sowie das Erreichen von physikalischen Grenzen in der Fertigung und Nicht-Idealitäten beim Skalieren der Versorgungsspannung, führten zu einer Zunahme der Leistungsdichte und, damit einhergehend, zu erschwerten Problemen bei der Sicherstellung der Zuverlässigkeit. Dazu zählen, unter anderem, Alterungseffekte in Transistoren sowie übermäßige Hitzeentwicklung, nicht zuletzt durch stärkeres Auftreten von Selbsterhitzungseffekten innerhalb der Transistoren. Damit solche Probleme die Zuverlässigkeit eines Schaltkreises nicht gefährden, werden die internen Signallaufzeiten üblicherweise sehr pessimistisch kalkuliert. Durch den so entstandenen zeitlichen Sicherheitsabstand wird die korrekte Funktionalität des Schaltkreises sichergestellt, allerdings auf Kosten der Performance. Alternativ kann die Zuverlässigkeit des Schaltkreises auch durch andere Techniken erhöht werden, wie zum Beispiel durch Null-Temperatur-Koeffizienten oder Approximate Computing. Wenngleich diese Techniken einen Großteil des üblichen zeitlichen Sicherheitsabstandes einsparen können, bergen sie dennoch weitere Konsequenzen und Kompromisse. Bleibende Herausforderungen bei der Skalierung von CMOS Technologien führen außerdem zu einem verstärkten Fokus auf vielversprechende Zukunftstechnologien. Ein Beispiel dafür ist der Negative Capacitance Field-Effect Transistor (NCFET), der eine beachtenswerte Leistungssteigerung gegenüber herkömmlichen FinFET Transistoren aufweist und diese in Zukunft ersetzen könnte. Des Weiteren setzen Entwickler von Schaltkreisen vermehrt auf komplexe, parallele Strukturen statt auf höhere Taktfrequenzen. Diese komplexen Modelle benötigen moderne Power-Management Techniken in allen Aspekten des Designs. Mit dem Auftreten von neuartigen Transistortechnologien (wie zum Beispiel NCFET) müssen diese Power-Management Techniken neu bewertet werden, da sich Abhängigkeiten und Verhältnismäßigkeiten ändern. Diese Arbeit präsentiert neue Herangehensweisen, sowohl zur Analyse als auch zur Modellierung der Zuverlässigkeit von Schaltkreisen, um zuvor genannte Herausforderungen auf mehreren Designebenen anzugehen. Diese Herangehensweisen unterteilen sich in konventionelle Techniken ((a), (b), (c) und (d)) und unkonventionelle Techniken ((e) und (f)), wie folgt:

\textbf{(a)}

Analyse von Leistungszunahmen in Zusammenhang mit der Maximierung von Leistungseffizienz beim Betrieb nahe der Transistor Schwellspannung, insbesondere am optimalen Leistungspunkt. Das genaue Ermitteln eines solchen optimalen Leistungspunkts ist eine besondere Herausforderung bei Multicore Designs, da dieser sich mit den jeweiligen Optimierungszielsetzungen und der Arbeitsbelastung verschiebt.

\textbf{(b)}

Aufzeigen versteckter Interdependenzen zwischen Alterungseffekten bei Transistoren und Schwankungen in der Versorgungsspannung durch „IR-drops“. Eine neuartige Technik wird vorgestellt, die sowohl Über- als auch Unterschätzungen bei der Ermittlung des zeitlichen Sicherheitsabstands vermeidet und folglich den kleinsten, dennoch ausreichenden Sicherheitsabstand ermittelt.

\textbf{(c)}

Eindämmung von Alterungseffekten bei Transistoren durch „Graceful Approximation“, eine Technik zur Erhöhung der Taktfrequenz bei Bedarf. Der durch Alterungseffekte bedingte zeitlich Sicherheitsabstand wird durch Approximate Computing Techniken ersetzt. Des Weiteren wird Quantisierung verwendet um ausreichend Genauigkeit bei den Berechnungen zu gewährleisten.

\textbf{(d)}

Eindämmung von temperaturabhängigen Verschlechterungen der Signallaufzeit durch den Betrieb nahe des Null-Temperatur Koeffizienten (N-ZTC). Der Betrieb bei N-ZTC minimiert temperaturbedingte Abweichungen der Performance und der Leistungsaufnahme. Qualitative und quantitative Vergleiche gegenüber dem traditionellen zeitlichen Sicherheitsabstand werden präsentiert.

\textbf{(e)}

Modellierung von Power-Management Techniken für NCFET-basierte Prozessoren. Die NCFET Technologie hat einzigartige Eigenschaften, durch die herkömmliche Verfahren zur Spannungs- und Frequenzskalierungen zur Laufzeit (DVS/DVFS) suboptimale Ergebnisse erzielen. Dies erfordert NCFET-spezifische Power-Management Techniken, die in dieser Arbeit vorgestellt werden.

\textbf{(f)}

Vorstellung eines neuartigen heterogenen Multicore Designs in NCFET Technologie. Das Design beinhaltet identische Kerne; Heterogenität entsteht durch die Anwendung der individuellen, optimalen Konfiguration der Kerne. Amdahls Gesetz wird erweitert, um neue system- und anwendungsspezifische Parameter abzudecken und die Vorzüge des neuen Designs aufzuzeigen. Die Auswertungen der vorgestellten Techniken werden mithilfe von Implementierungen und Simulationen auf Schaltkreisebene (gate-level) durchgeführt. Des Weiteren werden Simulatoren auf Systemebene (system-level) verwendet, um Multicore Designs zu implementieren und zu simulieren. Zur Validierung und Bewertung der Effektivität gegenüber dem Stand der Technik werden analytische, gate-level und system-level Simulationen herangezogen, die sowohl synthetische als auch reale Anwendungen betrachten

KITopen

Cross-Layer Approaches for an Aging-Aware Design of Nanoscale Microprocessors

Author: Oboril Fabian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2015
Field of study

Thanks to aggressive scaling of transistor dimensions, computers have revolutionized our life. However, the increasing unreliability of devices fabricated in nanoscale technologies emerged as a major threat for the future success of computers. In particular, accelerated transistor aging is of great importance, as it reduces the lifetime of digital systems. This thesis addresses this challenge by proposing new methods to model, analyze and mitigate aging at microarchitecture-level and above

KITopen

NUMA 구조를 인지한 칩 멀티프로세서를 위한 계층적 전력 관리

Author: 안창민
Publication venue: 서울대학교 대학원
Publication date: 01/08/2017
Field of study

학위논문 (석사)-- 서울대학교 대학원 공과대학 컴퓨터공학부, 2017. 8. Bernhard Egger.대칭형 다중 처리 운영체제를 실행 시키는 캐쉬 일관성을 가지는 공유 메모리 아키텍처를 위한 전통적인 접근 방법은 전력관리가 가장 중요한 문제 중 하나로 존재하는 미래의 매니코어 시스템에는 적합하지 않다. 본 논문에서는 매니코어 시스템을 위한 계층적 전력관리 프레임워크를 소개한다. 제안한 프레임워크는 캐쉬 일관성을 가지는 공유 메모리가 필요 없으며, 다수의 코어들이 전압/주파수를 공유하고 다중 전압/다중 주파수를 지원하는 아키텍처에서 사용 가능하다. 이 프레임워크는 NUMA-인지 계층적 전력관리 기술로 동적 전압 및 주파수 교환(DVFS)과 워크로드 마이그래이션을 사용한다. 여기서 워크로드 마이그래이션 계획을 위해 사용된 탐욕 알고리즘은 서로 상충하는 비슷한 작업량의 패턴을 가진 작업을 같은 전압 영역으로 모으는 목표와 작업을 데이터가 있는 위치와 가까운 곳으로 이동하는 목표를 고려한다. 제안된 프레임워크는 소프트웨어로 구현되어 캐쉬 일관성이 없는 48 코어의 칩 레벨 멀티프로세서 하드웨어에서 평가되었다. 본 논문의 프레임워크를 데 이터 센터 작업 패턴으로 광범위에 걸친 실험을 수행한 결과 최첨단의 DVFS 기술과 DVFS와 NUMA-비인지 워크로드 마이그래이션을 같이 사용한 전력관리 기술에 비해 상대적으로 각각 30%와 5%의 전력소모당 처리 작업량 향상을 큰 성능손실 없이 이루었다.Traditional approaches for cache-coherent shared-memory architectures running symmetric multiprocessing (SMP) operating systems are not adequate for future many-core chips where power management presents one of the most important challenges. In this thesis, we present a hierarchical power management framework for many-core systems. The framework does not require coherent shared memory and supports multiple voltage/multiple-frequency (MVMF) architectures where several cores share the same voltage/frequency. We propose a hierarchical NUMA-aware power management technique that combines dynamic voltage and frequency scaling (DVFS) with workload migration. A greedy algorithm considers the conflicing goals of grouping workloads with similar utilization patterns in voltage domains and placing workloads as close as possible to their data. We implement the proposed scheme in software and evaluated it on existing hardware, a non-cache-coherent 48-core CMP. Compared to state-of-the-art power management techniques using DVFS-only and DVFS with NUMA-unaware migration, we achieve on average, a relative performance-per-watt improvement of 30 and 5 percent, respectively, for a wide range of datacenter workloads at no significant performance degradation.1 Introduction 1 2 Motivation and RelatedWork 5 2.1 Characteristics of Chip Multiprocessors 5 2.2 Dynamic Voltage and Frequency Scaling 7 2.3 Power Management on CMPs 8 2.4 Related Work 10 3 Cooperative Power Management 13 3.1 Cooperative Workload Migration 13 3.2 Hierarchical Organization 14 3.3 Domain Controllers 15 3.3.1 Core Controller 15 3.3.2 Frequency Controller 15 3.3.3 Voltage Controller 16 3.3.4 Chip Controller 16 3.3.5 Location of the Controllers 16 4 DVFS andWorkload Migration Policies 18 4.1 DVFS Policies 18 4.2 Phase Ordering and Frequency Considerations 19 4.3 Migration of Workloads 20 4.4 Scheduling Workload Migration 20 4.4.1 Schedule migration 21 4.4.2 Level migration 22 4.4.3 Assign target 25 4.4.4 Assign victim 26 4.5 Workload Migration Evaluation Model 27 5 Implementation 29 5.1 The Intel Single-chip Cloud Computer 29 5.2 Implementing Workload Migration 31 5.2.1 Migration Steps 31 5.2.2 Networking 33 5.3 Domain Controller Implementation 33 6 Experimental Setup 34 6.1 Hardware 34 6.2 Benchmark Scenarios 35 6.3 Comparison of Results 37 7 Results 38 7.1 Synthetic Scenarios 38 7.2 Datacenter Scenarios 42 7.2.1 Varying Number of Workloads 42 7.2.2 Independent Workloads 45 7.3 Overall Results Comparison 46 8 Discussion 48 8.1 Limitations 48 8.2 Extra Hardware Support 49 9 Conclusion 50 Appendices 51 A Benchmark Scenario Details 51 A.1 Synthetic Benchmark 53 A.2 Real World Benchmark 56 Bibliography 67 요약 73Maste

SNU Open Repository and Archive