Search CORE

6 research outputs found

Task Migration for Fault-Tolerance in Mixed-Criticality Embedded Systems

Author: Madsen Jan
Pop Paul
Saraswat Prabhat Kumar
Publication venue
Publication date: 01/01/2009
Field of study

In this paper we are interested in mixed-criticality embed-ded applications implemented on distributed architectures. Depending on their time-criticality, tasks can be hard or soft real-time and regarding safety-criticality, tasks can be fault-tolerant to transient faults, permanent faults, or have no dependability requirements. We use Earliest Deadline First (EDF) scheduling for the hard tasks and the Constant Bandwidth Server (CBS) for the soft tasks. The CBS pa-rameters determine the quality of service (QoS) of soft tasks. Transient faults are tolerated using checkpointing with roll-back recovery. For tolerating permanent faults in proces-sors, we use task migration, i.e., restarting the safety-critical tasks on other processors. We propose a Greedy-based on-line heuristic for the migration of safety-critical tasks, in response to permanent faults, and the adjustment of CBS parameters on the target processors, such that the faults are tolerated, the deadlines for the hard real-time tasks are sat-isfied and the QoS for soft tasks is maximized. The proposed online adaptive approach has been evaluated using several synthetic benchmarks and a real-life case study. 1

CiteSeerX

Crossref

Online Research Database In Technology

Task migration for fault-tolerance in mixed-criticality embedded systems

Author: Jan Madsen
Kopetz H.
Paul Pop
Prabhat Kumar Saraswat
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Tradeoff analysis for Dependable Real-Time Embedded Systems during the Early Design Phases

Author: Gan Junhe
Publication venue: Technical University of Denmark
Publication date: 01/01/2014
Field of study

Online Research Database In Technology

매니코어 가속기의 결함을 고려한 태스크 매핑 및 자원 관리 기법

Author: 이찬희
Publication venue: 서울대학교 대학원
Publication date: 01/08/2014
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 8. 하순회.기술이 발전함에 따라 하나의 칩 안에 집적되는 프로세서의 갯수가 점점 증가하게 되었다. 또한, 응용들의 보다 높은 연산 능력에 대한 요구로 인해 매니코어 가속기는 시스템-온-칩에서 중요한 연산 장치가 되었다. 시스템의 상태가 여러가지 요인에 의해 동적으로 변하기 때문에, 시스템 수행중에 그러한 가속기를 효과적으로 다루는 것은 매우 어려운 문제이다. 시스템 수준에서는 응용들이 사용자의 요구에 따라 시작 또는 종료가 되고, 응용 레벨에서는 응용 자체의 동작이 입력 데이타나 수행모드에 따라 동적으로 변하게 된다. 아키텍처 수준에서는 프로세서의 영구 고장으로 인해 하드웨어 컴포넌트의 사용 가능한 상황이 변하게 된다. 본 학위논문에서는 가속기를 다루는데 있어서의 위와 같은 어려움들을 해결하기 위해 세가지 기법을 제시하였다. 첫번째 기법은 프로세서의 영구 고장이 발생하였을 때, 전체 응용들을 시간 제약 하에 처리량의 저하를 최소화하며 재스케쥴을 하는 것이다. 최적의 재스케쥴 결과들은 진화 알고리즘을 이용하여 컴파일 시에, 각각의 프로세서 고장 상황에 따라 준비가 된다. 수행 시간에 프로세서 고장이 감지되면, 정상적으로 동작하는 프로세서들이 저장된 스케쥴을 가지고 태스크 이주를 수행한 후 태스크들의 나머지 수행을 지속한다. 이 기법에서는 또한 더 좋은 성능을 얻기 위해, 선점, 비선점 및 융합 이주 정책이 제안되었다. 제안된 기법의 가능성은 실제 디지털 신호처리 응용들과 임의로 생성된 응용들에 대해 시간제약과 다양한 프로세서 고장 상황에 대해 검증되었다. 두 번째로 제안된 기법은 복합 자원 관리 기법으로, 첫번째 기법에서 다룬 프로세서 영구고장 뿐만 아니라, 동기화 데이타-흐름 그래프로 기술된 여러 응용들과 응용들의 동적 양상을 다루는 것까지로 확장이 된 것이다. 제안된 기법에서는, 우선 설계 수준에서 할당되는 프로세서의 갯수를 변화시켜가면서 동기화된 데이타-흐름 그래프들의 처리량이 최대로 얻어지는 매핑 결과들을 얻는다. 그리고나서 수행 시간에는 미리 계산된 매핑 정보들을 가지고 수행중인 응용들의 매핑을, 동적인 시스템 변화가 발생할 때마다 적용하게 된다. 제안된 자원 관리 기법은 Noxim이라는 네트워크-온-칩 시뮬레이터 위에서 구현이 되었으며, 실험 결과들은 제안된 기법이 최신의 다른 기법들과 비교하여 더 좋은 성능을 보였다. 마지막으로는, 시스템의 성능을 시스템-온-칩 제작 이전에 보다 정확하게 평가하기 위해서, 두 번째 기법을 구현한 소프트웨어 플랫폼이 매니코어 아키텍처를 대상으로 제안되었다. 기존의 매니코어 아키텍처를 대상으로 한 연구들은 주로 상위 수준의 시뮬레이션 모델을 사용하여 성능을 측정하였기 때문에, 실제 성능과 시뮬레이션 성능이 얼마나 차이가 날지를 정확하게 알 수가 없었다. 이러한 한계를 극복하기 위하여 소프트웨어 플랫폼과, 가상 프로토타이핑 시스템 및 제온 에뮬레이션 시스템에서의 플랫폼 구현 방법이 제안이 되었다. 이러한 실제 시스템 구현을 통하여 제안된 복합 자원 관리 기법에서의 다양한 동적 비용들이 정확하게 추산이 될 수 있었다. 실험에서는 제안된 소프트웨어 기법이 태스크들의 동적 매핑과 체크-포인팅을 통한 프로세서 영구 고장을 효과적으로 감내할 수 있음을 보였다.Owing to the incessant technology improvement, the number of processors integrated into a single chip increases consistently, integrating more and more applications. Also, demand for higher computing capability for applications makes a many-core accelerator become an important computing resource in a system-on-chip. Efficient handling of the accelerator at run-time, however, is very challenging because the system status is subject to change dynamically by various factors. At the system level, the set of applications running concurrently may change according to user request. At the application level, the application behavior may change dynamically depending on input data or operation mode. At the architecture level, hardware resource availability may vary since hardware components may experience transient or permanent failures. In this thesis, to resolve the difficulties in handling many-core accelerator, three techniques are proposed. The first technique is the re-scheduling of the entire application to minimize throughput degradation under a latency constraint when a permanent processor failure occurs. Sub-optimal re-scheduling results using a genetic algorithm for each scenario of processor failures are obtained at compile-time. If a failure is detected at run-time, the live processors obtain the saved schedule, perform task transfer, and execute the remaining tasks of the current iteration. In this technique, preemptive and non-preemptive migration policies and a hybrid policy are proposed to obtain better performance. The viability of the proposed technique with real-life DSP applications as well as randomly generated graphs under timing constraints and random fault scenarios are shown through experiments. The second technique is a hybrid resource management scheme, expanded version of the first technique that also handles multi-applications specified as SDF graph and their relevant dynamisms such as application/task arrivals/ends as well as processor permanent failures. In the proposed technique, at design-time, throughput-maximized mappings of each SDF graph by varying the number of allocated processors are determined. Then, at run-time, the pre-computed mapping information is exploited to adjust the mapping of active applications to the processors without user intervention on the system status change. The proposed resource management is evaluated through intensive experiments with an in-house simulator built on top of Noxim, a Network-on-Chip simulator. Experimental results show the enhanced adaptability to dynamic system status change compared to other state-of-the-art approaches. Finally, the software platform for a homogeneous many-core architecture that implements the second technique is proposed to evaluate the system performance more accurately before SoC fabrication. Existing approaches usually use a high-level simulation model to estimate the performance without knowing how much actual performance will be deviated from the estimation. To overcome the limitation, the software platform is proposed and implementation details on a virtual prototyping system and on an emulation system realized with an Intel Xeon-Phi coprocessor are presented. Actual implementation enables us to investigate the overheads involved in the hybrid resource management technique in detail, which was not possible in high-level simulation. Experimental results confirm that the proposed software platform adapts to the dynamic workload variation effectively by dynamic mapping of tasks and tolerate unexpected core failures by check-pointing.Abstract i Contents iv List of Figures viii List of Tables xii Chapter 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . 1 1.2 Contribution . . . . . . . . . . . . 5 1.3 Thesis Organization . . . . . . . . . . . 7 Chapter 2 Preliminaries 8 2.1 Application Model . . . . . . . . . . 8 2.2 Architecture Model . . . . . . . . . . 13 2.3 Fault Model . . . . . . . . . . . . 15 2.4 Thesis Overview . . . . . . . . . . . 15 Chapter 3 Fault-aware Task Mapping 17 3.1 Introduction . . . . . . . . . . . . 17 3.2 Related Work . . . . . . . . . . . . 20 3.2.1 Static Approach . . . . . . . . . . 21 3.2.2 Dynamic Approach . . . . . . . . . . 22 3.3 Proposed Task Remapping/Rescheduling Technique . . 23 3.3.1 Remapping Technique . . . . . . . . 23 3.3.2 Rescheduling Technique . . . . . . . . 31 3.4 Experiments . . . . . . . . . . . . . 38 3.4.1 Remapping Results . . . . . . . . 38 3.4.2 Rescheduling Results . . . . . . . . 46 Chapter 4 Fault-aware Resource Management 53 4.1 Introduction . . . . . . . . . . . . 53 4.2 Related Work . . . . . . . . . . . . 54 4.2.1 Static Approach . . . . . . . . . . 55 4.2.2 Dynamic Approach . . . . . . . . . 55 4.2.3 Hybrid Approach . . . . . . . . . . 57 4.2.4 Summary . . . . . . . . . . . . 57 4.3 Background . . . . . . . . . . . . . 58 4.3.1 Energy Model . . . . . . . . . . . 59 4.3.2 Notation . . . . . . . . . . . . 60 4.4 Proposed Resource Management Technique . . . . 61 4.4.1 Motivational Example . . . . . . . . . 61 4.4.2 Overall Procedure . . . . . . . . . . 65 4.4.3 Design-time Analysis . . . . . . . . . 66 4.4.4 Run-time Mapping . . . . . . . . . . 67 4.5 Experiments . . . . . . . . . . . . . 74 4.5.1 Setup . . . . . . . . . . . . . . 74 4.5.2 Analysis of Run-time Overheads . . . . . . 75 4.5.3 Comparison with Other Approaches . . . . 79 Chapter 5 Software Platform for Resource Management 86 5.1 Introduction . . . . . . . . . . . . 86 5.2 Related Work . . . . . . . . . . . . 87 5.3 Overall Structure . . . . . . . . . . . . 88 5.4 Components of Software Platform . . . . . . 89 5.4.1 Application API Layer . . . . . . . . . 89 5.4.2 Communication Interface Module . . . . . 92 5.4.3 Host Interface Layer . . . . . . . . . 93 5.4.4 Memory Management Module . . . . . . 94 5.4.5 Design-time Analysis . . . . . . . . . 94 5.4.6 Slave Manager . . . . . . . . . . . 98 5.5 Software Platform Implementation . . . . . . 99 5.5.1 Scheduling Information . . . . . . . . 100 5.5.2 Function Migration and Execution . . . . . 101 5.5.3 Function Migration and Execution . . . . . 102 5.6 Virtual Prototyping System . . . . . . . . 105 5.7 Xeon Emulation System . . . . . . . . . 106 5.8 Experiments . . . . . . . . . . . . . 107 5.8.1 Setup . . . . . . . . . . . . . . 107 5.8.2 Experiments on the Virtual Prototyping System . . 108 5.8.3 Experiments on the Xeon Emulation System . . . 111 Chapter 6 Conclusion 116 Bibliography 119 Abstract in Korean 130Docto

SNU Open Repository and Archive

Cross-layer fault tolerance in networks-on-chip

Author: Schley Gert
Publication venue
Publication date: 01/01/2018
Field of study

The design of Networks-on-Chip follows the Open Systems Interconnection (OSI) reference model. The OSI model defines strictly separated network abstraction layers and specifies their functionality. Each layer has layer-specific information about the network that can be exclusively accessed by the methods of the layer. Adhering to the strict layer boundaries, however, leads to methods of the individual layers working in isolation from each other. This lack of interaction between methods is disadvantageous for fault diagnosis and fault tolerance in Networks-on-Chip as it results in solutions that have a high effort in terms of the time and implementation costs required to deal with faults. For Networks-on-Chip cross-layer design is considered as a promising method to remedy these shortcomings. It removes the strict layer boundaries by the exchange of information between layers. This interaction enables methods of different layers to cooperate, and thus, deal with faults more efficiently. Furthermore, providing lower layer information to the software allows hardware methods to be implemented as software tasks resulting in a reduction of the hardware complexity. The goal of this dissertation is the investigation of cross-layer design for fault diagnosis and fault tolerance in Networks-on-Chip. For fault diagnosis a scheme is proposed that allows the interaction of protocol-based diagnosis of the transport layer with functional diagnosis of the network layer and structural diagnosis of the physical layer by exchanging diagnostic information. The techniques use this information for optimizing their own diagnosis process. For protocol-based diagnosis on the transport layer, a diagnosis protocol is proposed that is able to locate faulty links, switches, and crossbar connections. For this purpose, the technique utilizes available information of lower layers. As proof of concept for the proposed interaction scheme, the diagnosis protocol is combined with a functional and a structural diagnosis approach and the performance and diagnosis quality of the resulting combinations is investigated. The results show that the combinations of the diagnosis protocol with one of the lower layer techniques have a considerably reduced fault localization latency compared to the functional and the structural standalone techniques. This reduction, however, comes at the expense of a reduced diagnosis quality. In terms of fault tolerance, the focus of this dissertation is on the design and implementation of cross-layer approaches utilizing software methods to provide fault tolerance for network layer routings. Two approaches for different routings are presented. The requirements to provide information of lower layers to the software using the available Network-on-Chip resources and interfaces for data communication are discussed. The concepts of two mechanisms of the data link layer are presented for converting status information into communicable units and for preventing communication resources from being blocked. In the first approach, software-based packet rerouting is proposed. By incorporating information from different layers, this approach provides fault tolerance for deterministic network layer routings. As specialization of software-based rerouting, dimension-order XY rerouting is presented. In the second approach, a reconfigurable routing for Networks-on-Chip with logical hierarchy is proposed in which cross-layer interaction is used to enable hierarchical units to manage themselves autonomously and to reconfigure the routing. Both approaches are evaluated regarding their performance as well as their implementation costs. In a final study, the cross-layer diagnosis technique and cross-layer fault tolerance approaches are combined. The information obtained by the diagnosis technique is used by the fault tolerance approaches for packet rerouting or for routing reconfiguration. The combinations are evaluated regarding their impact on Networks-on-Chip performance. The results show that the crosslayer information exchange with software has a considerable impact on performance when the amount of information becomes too large. In case of crosslayer diagnosis, however, the impact on Networks-on-Chip performance is significantly lower compared to functional and structural diagnosis

Design Space Exploration and Resource Management of Multi/Many-Core Systems

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends

Directory of Open Access Books (DOAB)