Search CORE

1,534 research outputs found

Distributed real-time fault tolerance in a virtualized separation kernel

Author: Missimer Eric
Publication venue
Publication date: 14/02/2018
Field of study

Computers are increasingly being placed in scenarios where a computer error could result in the loss of human life or significant financial loss. Fault tolerant techniques must be employed to prevent an error from resulting in a fault causing such losses. Two types of errors that are common in real-time and embedded system are soft errors, i.e. data bit corruption, and timing errors, such as missed deadlines. Purely software based techniques to address these types of errors have the advantage of not requiring specialized hardware and are able to use more readily available commercial off-the-shelf hardware. Timing errors are addressed using Adaptive Mixed-Criticality, a scheduling technique where higher criticality tasks are given precedence over those of lower criticality when it is impossible to guarantee the schedulability of all tasks. While mixed-criticality scheduling has gained attention in recent years, most approaches assume a periodic task model and that the system has a single criticality level which dictates the available budget to all tasks. In practice these assumptions do not hold: different types of tasks are better served by different scheduling approaches and only a subset of high critical tasks might require additional capacity to meet deadlines. In the latter case, this occurs when a process has experienced a fault and requires additional capacity to perform the recovery. In this thesis, soft errors are addressed using a novel real-time fault tolerance method based on a virtualized separation kernel. Instead of executing redundant copies of an application on separate machines, the applications are consolidated onto one multi-core processor and use hardware virtualization extensions to partition the applications. This allows new recovery schemes to be explored. In addition, the maximum recovery time is sufficiently bounded to ensure recovery occurs in a timely manner without affecting the normal execution of the application. A virtualized separation kernel in combination with Adaptive Mixed-Criticality techniques creates a fault tolerant system that predictably detects and recovers from timing and soft errors

Boston University Institutional Repository (OpenBU)

ATMP: An Adaptive Tolerance-based Mixed-criticality Protocol for Multi-core Systems

Author: Iacovelli Saverio
Kirner Raimund
Menon Catherine
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/06/2018
Field of study

© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted ncomponent of this work in other works.The challenge of mixed-criticality scheduling is to keep tasks of higher criticality running in case of resource shortages caused by faults. Traditionally, mixedcriticality scheduling has focused on methods to handle faults where tasks overrun their optimistic worst-case execution time (WCET) estimate. In this paper we present the Adaptive Tolerance based Mixed-criticality Protocol (ATMP), which generalises the concept of mixed-criticality scheduling to handle also faults of other nature, like failure of cores in a multi-core system. ATMP is an adaptation method triggered by resource shortage at runtime. The first step of ATMP is to re-partition the task to the available cores and the second step is to optimise the utility at each core using the tolerance-based real-time computing model (TRTCM). The evaluation shows that the utility optimisation of ATMP can achieve a smoother degradation of service compared to just abandoning tasks

Crossref

University of Hertfordshire Research Archive

Recommended from our members

Towards a Fault-tolerant, Scheduling Methodology for Safety-critical Certified Information Systems

Author: Lin Jian
Publication venue: CSUSB ScholarWorks
Publication date: 01/01/2019
Field of study

Today, many critical information systems have safety-critical and non-safety-critical functions executed on the same platform in order to reduce design and implementation costs. The set of safety-critical functionality is subject to certification requirements and the rest of the functionality does not need to be certified, or is certified to a lower level. The resulting mixed-criticality systems bring challenges in designing such systems, especially when the critical tasks are required to complete with a timing constraint. This paper studies a problem of scheduling a mixed-criticality system with fault tolerance. A fault-recovery technique called checkpointing is used where a program can go back to a recent checkpoint for re-execution upon errors occurred. A novel schedulability test is derived to ensure that the safety-critical tasks are completed before their deadlines and the theoretical correctness is shown

CSUSB ScholarWorks

FANTOM: Fault Tolerant Task-Drop Aware Scheduling for Mixed-Criticality Systems

Author: Ejlali Alireza
Kumar Akash
Ranjbar Behnaz
Safaei Bardia
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 27/01/2021
Field of study

Mixed-Criticality (MC) systems have emerged as an effective solution in various industries, where multiple tasks with various real-time and safety requirements (different levels of criticality) are integrated onto a common hardware platform. In these systems, a fault may occur due to different reasons, e.g., hardware defects, software errors or the arrival of unexpected events. In order to tolerate faults in MC systems, the re-execution technique is typically employed, which may lead to overrun of high-criticality tasks (HCTs), which necessitates the drop of low-criticality tasks (LCTs) or degrading their quality. However, frequent drops or relatively long execution times of LCTs (especially mission-critical tasks) are not always desirable and it may impose a negative impact on the performance, or the functionality of MC systems. In this regard, this article proposes a realistic MC task model and develops a design-time task-drop aware schedulability analysis based on the Earliest Deadline First with Virtual Deadline (EDF-VD) algorithm. According to this analysis and the proposed scheduling policy based on the new MC task model, in the high-criticality (HI) mode, when an HCT overruns and the system switches to the HI mode, the number of drops per LCT is prohibited from passing a predefined threshold. In addition, to guarantee the real-time constraints and safety requirements of MC tasks in the presence of faults (assuming transient faults in this article), a corresponding scheduling mechanism has been developed. According to the obtained results from an extensive set of simulations, which have been validated through a realistic avionic application, the proposed method improves the acceptance ratio by up to 43.9% compared to state-of-the-art

KITopen

Software Fault Tolerance in Real-Time Systems: Identifying the Future Research Questions

Author: FEDERICO REGHENZANI
WILLIAM FORNACIARI
ZHISHAN GUO
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2023
Field of study

Tolerating hardware faults in modern architectures is becoming a prominent problem due to the miniaturization of the hardware components, their increasing complexity, and the necessity to reduce the costs. Software-Implemented Hardware Fault Tolerance approaches have been developed to improve the system dependability to hardware faults without resorting to custom hardware solutions. However, these come at the expense of making the satisfaction of the timing constraints of the applications/activities harder from a scheduling standpoint. This paper surveys the current state of the art of fault tolerance approaches when used in the context real-time systems, identifying the main challenges and the cross-links between these two topics. We propose a joint scheduling-failure analysis model that highlights the formal interactions among software fault tolerance mechanisms and timing properties. This model allows us to present and discuss many open research questions with the final aim to spur the future research activities

Archivio istituzionale della ricerca - Politecnico di Milano

A Survey of Fault-Tolerance Techniques for Embedded Systems from the Perspective of Power, Energy, and Thermal Issues

Author: Ansari M.
Ejlali A.
Henkel J.
Hessabi S.
Khdr H.
Nazari P. G.
Safari S.
Yari-Karin S.
Yeganeh-Khaksar A.
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 02/02/2022
Field of study

The relentless technology scaling has provided a significant increase in processor performance, but on the other hand, it has led to adverse impacts on system reliability. In particular, technology scaling increases the processor susceptibility to radiation-induced transient faults. Moreover, technology scaling with the discontinuation of Dennard scaling increases the power densities, thereby temperatures, on the chip. High temperature, in turn, accelerates transistor aging mechanisms, which may ultimately lead to permanent faults on the chip. To assure a reliable system operation, despite these potential reliability concerns, fault-tolerance techniques have emerged. Specifically, fault-tolerance techniques employ some kind of redundancies to satisfy specific reliability requirements. However, the integration of fault-tolerance techniques into real-time embedded systems complicates preserving timing constraints. As a remedy, many task mapping/scheduling policies have been proposed to consider the integration of fault-tolerance techniques and enforce both timing and reliability guarantees for real-time embedded systems. More advanced techniques aim additionally at minimizing power and energy while at the same time satisfying timing and reliability constraints. Recently, some scheduling techniques have started to tackle a new challenge, which is the temperature increase induced by employing fault-tolerance techniques. These emerging techniques aim at satisfying temperature constraints besides timing and reliability constraints. This paper provides an in-depth survey of the emerging research efforts that exploit fault-tolerance techniques while considering timing, power/energy, and temperature from the real-time embedded systems’ design perspective. In particular, the task mapping/scheduling policies for fault-tolerance real-time embedded systems are reviewed and classified according to their considered goals and constraints. Moreover, the employed fault-tolerance techniques, application models, and hardware models are considered as additional dimensions of the presented classification. Lastly, this survey gives deep insights into the main achievements and shortcomings of the existing approaches and highlights the most promising ones

KITopen

CSP channels for CAN-bus connected embedded control systems

Author: Broenink Jan F.
Orlic Bojan
Publication venue: STW Technology Foundation
Publication date: 01/01/2002
Field of study

Closed loop control system typically contains multitude of sensors and actuators operated simultaneously. So they are parallel and distributed in its essence. But when mapping this parallelism to software, lot of obstacles concerning multithreading communication and synchronization issues arise. To overcome this problem, the CT kernel/library based on CSP algebra has been developed. This project (TES.5410) is about developing communication extension to the CT library to make it applicable in distributed systems. Since the library is tailored for control systems, properties and requirements of control systems are taken into special consideration. Applicability of existing middleware solutions is examined. A comparison of applicable fieldbus protocols is done in order to determine most suitable ones and CAN fieldbus is chosen to be first fieldbus used. Brief overview of CSP and existing CSP based libraries is given. Middleware architecture is proposed along with few novel ideas

University of Twente Research Information

A Survey of Research into Mixed Criticality Systems

Author: Burns Alan
Davis Robert Ian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/11/2017
Field of study

This survey covers research into mixed criticality systems that has been published since Vestal’s seminal paper in 2007, up until the end of 2016. The survey is organised along the lines of the major research areas within this topic. These include single processor analysis (including fixed priority and EDF scheduling, shared resources and static and synchronous scheduling), multiprocessor analysis, realistic models, and systems issues. The survey also explores the relationship between research into mixed criticality systems and other topics such as hard and soft time constraints, fault tolerant scheduling, hierarchical scheduling, cyber physical systems, probabilistic real-time systems, and industrial safety standards

White Rose Research Online