Search CORE

2 research outputs found

매니코어 가속기의 결함을 고려한 태스크 매핑 및 자원 관리 기법

Author: 이찬희
Publication venue: 서울대학교 대학원
Publication date: 01/08/2014
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 8. 하순회.기술이 발전함에 따라 하나의 칩 안에 집적되는 프로세서의 갯수가 점점 증가하게 되었다. 또한, 응용들의 보다 높은 연산 능력에 대한 요구로 인해 매니코어 가속기는 시스템-온-칩에서 중요한 연산 장치가 되었다. 시스템의 상태가 여러가지 요인에 의해 동적으로 변하기 때문에, 시스템 수행중에 그러한 가속기를 효과적으로 다루는 것은 매우 어려운 문제이다. 시스템 수준에서는 응용들이 사용자의 요구에 따라 시작 또는 종료가 되고, 응용 레벨에서는 응용 자체의 동작이 입력 데이타나 수행모드에 따라 동적으로 변하게 된다. 아키텍처 수준에서는 프로세서의 영구 고장으로 인해 하드웨어 컴포넌트의 사용 가능한 상황이 변하게 된다. 본 학위논문에서는 가속기를 다루는데 있어서의 위와 같은 어려움들을 해결하기 위해 세가지 기법을 제시하였다. 첫번째 기법은 프로세서의 영구 고장이 발생하였을 때, 전체 응용들을 시간 제약 하에 처리량의 저하를 최소화하며 재스케쥴을 하는 것이다. 최적의 재스케쥴 결과들은 진화 알고리즘을 이용하여 컴파일 시에, 각각의 프로세서 고장 상황에 따라 준비가 된다. 수행 시간에 프로세서 고장이 감지되면, 정상적으로 동작하는 프로세서들이 저장된 스케쥴을 가지고 태스크 이주를 수행한 후 태스크들의 나머지 수행을 지속한다. 이 기법에서는 또한 더 좋은 성능을 얻기 위해, 선점, 비선점 및 융합 이주 정책이 제안되었다. 제안된 기법의 가능성은 실제 디지털 신호처리 응용들과 임의로 생성된 응용들에 대해 시간제약과 다양한 프로세서 고장 상황에 대해 검증되었다. 두 번째로 제안된 기법은 복합 자원 관리 기법으로, 첫번째 기법에서 다룬 프로세서 영구고장 뿐만 아니라, 동기화 데이타-흐름 그래프로 기술된 여러 응용들과 응용들의 동적 양상을 다루는 것까지로 확장이 된 것이다. 제안된 기법에서는, 우선 설계 수준에서 할당되는 프로세서의 갯수를 변화시켜가면서 동기화된 데이타-흐름 그래프들의 처리량이 최대로 얻어지는 매핑 결과들을 얻는다. 그리고나서 수행 시간에는 미리 계산된 매핑 정보들을 가지고 수행중인 응용들의 매핑을, 동적인 시스템 변화가 발생할 때마다 적용하게 된다. 제안된 자원 관리 기법은 Noxim이라는 네트워크-온-칩 시뮬레이터 위에서 구현이 되었으며, 실험 결과들은 제안된 기법이 최신의 다른 기법들과 비교하여 더 좋은 성능을 보였다. 마지막으로는, 시스템의 성능을 시스템-온-칩 제작 이전에 보다 정확하게 평가하기 위해서, 두 번째 기법을 구현한 소프트웨어 플랫폼이 매니코어 아키텍처를 대상으로 제안되었다. 기존의 매니코어 아키텍처를 대상으로 한 연구들은 주로 상위 수준의 시뮬레이션 모델을 사용하여 성능을 측정하였기 때문에, 실제 성능과 시뮬레이션 성능이 얼마나 차이가 날지를 정확하게 알 수가 없었다. 이러한 한계를 극복하기 위하여 소프트웨어 플랫폼과, 가상 프로토타이핑 시스템 및 제온 에뮬레이션 시스템에서의 플랫폼 구현 방법이 제안이 되었다. 이러한 실제 시스템 구현을 통하여 제안된 복합 자원 관리 기법에서의 다양한 동적 비용들이 정확하게 추산이 될 수 있었다. 실험에서는 제안된 소프트웨어 기법이 태스크들의 동적 매핑과 체크-포인팅을 통한 프로세서 영구 고장을 효과적으로 감내할 수 있음을 보였다.Owing to the incessant technology improvement, the number of processors integrated into a single chip increases consistently, integrating more and more applications. Also, demand for higher computing capability for applications makes a many-core accelerator become an important computing resource in a system-on-chip. Efficient handling of the accelerator at run-time, however, is very challenging because the system status is subject to change dynamically by various factors. At the system level, the set of applications running concurrently may change according to user request. At the application level, the application behavior may change dynamically depending on input data or operation mode. At the architecture level, hardware resource availability may vary since hardware components may experience transient or permanent failures. In this thesis, to resolve the difficulties in handling many-core accelerator, three techniques are proposed. The first technique is the re-scheduling of the entire application to minimize throughput degradation under a latency constraint when a permanent processor failure occurs. Sub-optimal re-scheduling results using a genetic algorithm for each scenario of processor failures are obtained at compile-time. If a failure is detected at run-time, the live processors obtain the saved schedule, perform task transfer, and execute the remaining tasks of the current iteration. In this technique, preemptive and non-preemptive migration policies and a hybrid policy are proposed to obtain better performance. The viability of the proposed technique with real-life DSP applications as well as randomly generated graphs under timing constraints and random fault scenarios are shown through experiments. The second technique is a hybrid resource management scheme, expanded version of the first technique that also handles multi-applications specified as SDF graph and their relevant dynamisms such as application/task arrivals/ends as well as processor permanent failures. In the proposed technique, at design-time, throughput-maximized mappings of each SDF graph by varying the number of allocated processors are determined. Then, at run-time, the pre-computed mapping information is exploited to adjust the mapping of active applications to the processors without user intervention on the system status change. The proposed resource management is evaluated through intensive experiments with an in-house simulator built on top of Noxim, a Network-on-Chip simulator. Experimental results show the enhanced adaptability to dynamic system status change compared to other state-of-the-art approaches. Finally, the software platform for a homogeneous many-core architecture that implements the second technique is proposed to evaluate the system performance more accurately before SoC fabrication. Existing approaches usually use a high-level simulation model to estimate the performance without knowing how much actual performance will be deviated from the estimation. To overcome the limitation, the software platform is proposed and implementation details on a virtual prototyping system and on an emulation system realized with an Intel Xeon-Phi coprocessor are presented. Actual implementation enables us to investigate the overheads involved in the hybrid resource management technique in detail, which was not possible in high-level simulation. Experimental results confirm that the proposed software platform adapts to the dynamic workload variation effectively by dynamic mapping of tasks and tolerate unexpected core failures by check-pointing.Abstract i Contents iv List of Figures viii List of Tables xii Chapter 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . 1 1.2 Contribution . . . . . . . . . . . . 5 1.3 Thesis Organization . . . . . . . . . . . 7 Chapter 2 Preliminaries 8 2.1 Application Model . . . . . . . . . . 8 2.2 Architecture Model . . . . . . . . . . 13 2.3 Fault Model . . . . . . . . . . . . 15 2.4 Thesis Overview . . . . . . . . . . . 15 Chapter 3 Fault-aware Task Mapping 17 3.1 Introduction . . . . . . . . . . . . 17 3.2 Related Work . . . . . . . . . . . . 20 3.2.1 Static Approach . . . . . . . . . . 21 3.2.2 Dynamic Approach . . . . . . . . . . 22 3.3 Proposed Task Remapping/Rescheduling Technique . . 23 3.3.1 Remapping Technique . . . . . . . . 23 3.3.2 Rescheduling Technique . . . . . . . . 31 3.4 Experiments . . . . . . . . . . . . . 38 3.4.1 Remapping Results . . . . . . . . 38 3.4.2 Rescheduling Results . . . . . . . . 46 Chapter 4 Fault-aware Resource Management 53 4.1 Introduction . . . . . . . . . . . . 53 4.2 Related Work . . . . . . . . . . . . 54 4.2.1 Static Approach . . . . . . . . . . 55 4.2.2 Dynamic Approach . . . . . . . . . 55 4.2.3 Hybrid Approach . . . . . . . . . . 57 4.2.4 Summary . . . . . . . . . . . . 57 4.3 Background . . . . . . . . . . . . . 58 4.3.1 Energy Model . . . . . . . . . . . 59 4.3.2 Notation . . . . . . . . . . . . 60 4.4 Proposed Resource Management Technique . . . . 61 4.4.1 Motivational Example . . . . . . . . . 61 4.4.2 Overall Procedure . . . . . . . . . . 65 4.4.3 Design-time Analysis . . . . . . . . . 66 4.4.4 Run-time Mapping . . . . . . . . . . 67 4.5 Experiments . . . . . . . . . . . . . 74 4.5.1 Setup . . . . . . . . . . . . . . 74 4.5.2 Analysis of Run-time Overheads . . . . . . 75 4.5.3 Comparison with Other Approaches . . . . 79 Chapter 5 Software Platform for Resource Management 86 5.1 Introduction . . . . . . . . . . . . 86 5.2 Related Work . . . . . . . . . . . . 87 5.3 Overall Structure . . . . . . . . . . . . 88 5.4 Components of Software Platform . . . . . . 89 5.4.1 Application API Layer . . . . . . . . . 89 5.4.2 Communication Interface Module . . . . . 92 5.4.3 Host Interface Layer . . . . . . . . . 93 5.4.4 Memory Management Module . . . . . . 94 5.4.5 Design-time Analysis . . . . . . . . . 94 5.4.6 Slave Manager . . . . . . . . . . . 98 5.5 Software Platform Implementation . . . . . . 99 5.5.1 Scheduling Information . . . . . . . . 100 5.5.2 Function Migration and Execution . . . . . 101 5.5.3 Function Migration and Execution . . . . . 102 5.6 Virtual Prototyping System . . . . . . . . 105 5.7 Xeon Emulation System . . . . . . . . . 106 5.8 Experiments . . . . . . . . . . . . . 107 5.8.1 Setup . . . . . . . . . . . . . . 107 5.8.2 Experiments on the Virtual Prototyping System . . 108 5.8.3 Experiments on the Xeon Emulation System . . . 111 Chapter 6 Conclusion 116 Bibliography 119 Abstract in Korean 130Docto

SNU Open Repository and Archive