Search CORE

406 research outputs found

Comparison of Different Methods Making Use of Backup Copies for Fault-Tolerant Scheduling on Embedded Multiprocessor Systems

Author: Casseau Emmanuel
Dobiáš Petr
Sinnen Oliver
Publication venue: HAL CCSD
Publication date: 09/10/2018
Field of study

International audienceAs transistors scale down, systems are more vulnerable to faults. Their reliability consequently becomes the main concern, especially in safety-critical applications such as automotive sector, aeronautics or nuclear plants. Many methods have already been introduced to conceive fault-tolerant systems and therefore improve the reliability. Nevertheless, several of them are not suitable for real-time embedded systems since they incur significant overheads, other methods may be less intrusive but at the cost of being too specific to a dedicated system. The aim of this paper is to analyse a method making use of two task copies when on-line scheduling tasks on multiprocessor systems. This method can guarantee the system reliability without causing too much overhead and requiring any special hardware components. In addition, it remains general and thus applicable to large amount of systems. Last but not least, this paper studies two techniques of processor allocation policies: the exhaustive search and the first found solution search. It is shown that the exhaustive search is not necessary for efficient fault-tolerant scheduling and that the latter search significantly reduces the computation complexity, which is interesting for embedded systems

INRIA a CCSD electronic archive server

Designing and Valuating System on Dependability Analysis of Cluster-Based Multiprocessor System

Author: Dr. Sudarson Jena
Pulicherla Radhika
Publication venue: Global Journals Inc. (US)
Publication date: 15/03/2020
Field of study

Analysis of dependability is a significant stage in structuring and examining the safety of protection systems and computer systems. The introduction of virtual machines and multiprocessors leads to increasing the faults of the system, particularly for the failures that are software- induced, affecting the overall dependability. Also, it is different for the successful operation of the safety system at any dynamic stage, since there is a tremendous distinction in the rate of failure among the failures that are induced by the software and the hardware. Thus this paper presents a review or different dependability analysis techniques employed in multiprocessor system

Global Journal of Computer Science and Technology (GJCST)

A Survey of Fault-Tolerance Techniques for Embedded Systems from the Perspective of Power, Energy, and Thermal Issues

Author: Ansari M.
Ejlali A.
Henkel J.
Hessabi S.
Khdr H.
Nazari P. G.
Safari S.
Yari-Karin S.
Yeganeh-Khaksar A.
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 02/02/2022
Field of study

The relentless technology scaling has provided a significant increase in processor performance, but on the other hand, it has led to adverse impacts on system reliability. In particular, technology scaling increases the processor susceptibility to radiation-induced transient faults. Moreover, technology scaling with the discontinuation of Dennard scaling increases the power densities, thereby temperatures, on the chip. High temperature, in turn, accelerates transistor aging mechanisms, which may ultimately lead to permanent faults on the chip. To assure a reliable system operation, despite these potential reliability concerns, fault-tolerance techniques have emerged. Specifically, fault-tolerance techniques employ some kind of redundancies to satisfy specific reliability requirements. However, the integration of fault-tolerance techniques into real-time embedded systems complicates preserving timing constraints. As a remedy, many task mapping/scheduling policies have been proposed to consider the integration of fault-tolerance techniques and enforce both timing and reliability guarantees for real-time embedded systems. More advanced techniques aim additionally at minimizing power and energy while at the same time satisfying timing and reliability constraints. Recently, some scheduling techniques have started to tackle a new challenge, which is the temperature increase induced by employing fault-tolerance techniques. These emerging techniques aim at satisfying temperature constraints besides timing and reliability constraints. This paper provides an in-depth survey of the emerging research efforts that exploit fault-tolerance techniques while considering timing, power/energy, and temperature from the real-time embedded systems’ design perspective. In particular, the task mapping/scheduling policies for fault-tolerance real-time embedded systems are reviewed and classified according to their considered goals and constraints. Moreover, the employed fault-tolerance techniques, application models, and hardware models are considered as additional dimensions of the presented classification. Lastly, this survey gives deep insights into the main achievements and shortcomings of the existing approaches and highlights the most promising ones

KITopen

Special session: Operating systems under test: An overview of the significance of the operating system in the resiliency of the computing continuum

Author: Bosio A.
Casseau E.
Di Carlo S.
Dobias P.
Kastensmidt F.
Rebaudengo M.
Rodrigues G. S.
Savino A.
Sinnen O.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

The computing continuum's actual trend is facing a growth in terms of devices with any degree of computational capability. Those devices may or may not include a full-stack, including the Operating System layer and the Application layer, or just facing pure bare-metal solutions. In either case, the reliability of the full system stack has to be guaranteed. It is crucial to provide data regarding the impact of faults at all system stack levels and potential hardening solutions to design highly resilient systems. While most of the work usually concentrates on the application reliability, the special session aims to provide a deep comprehension of the impact on the reliability of an embedded system when faults in the hardware substrate of the system stack surface at the Operating System layer. For this reason, we will cover a comparison from an application perspective when hardware faults happen in bare metal vs. real-time OS vs. general-purpose OS. Then we will go deeper within a FreeRTOS to evaluate the contribution of all parts of the OS. Eventually, the Special Session will propose some hardening techniques at the Operating System level by exploiting the scheduling capabilities

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Comparison of Enhancing Methods for Primary/Backup Approach Meant for Fault Tolerant Scheduling

Author: Casseau Emmanuel
Dobiáš Petr
Sinnen Oliver
Publication venue: HAL CCSD
Publication date: 26/10/2021
Field of study

This report explores algorithms aiming at reducing the algorithm run-time and rejection rate when online scheduling tasks on real-time embedded systems consisting of several processors prone to fault occurrence. The authors introduce a new processor scheduling policy and propose new enhancing methods for the primary/backup approach and analyse their performances. The studied techniques are as follows: (i) the method of restricted scheduling windows within which the primary and backup copies can be scheduled, (ii) the method of limitation on the number of comparisons, accounting for the algorithm run-time, when scheduling a task on a system, and (iii) the method of several scheduling attempts. Last but not least, we inject faults to evaluate the impact on scheduling algorithms. Thorough experiments show that the best proposed method is based on the combination of the limitation on the number of comparisons and two scheduling attempts. When it is compared to the primary/backup approach without this method, the algorithm run-time is reduced by 23% (mean value) and 67% (maximum value) and the rejection rate is decreased by 4%. This improvement in the algorithm run-time is significant, especially for embedded systems dealing with hard real-time tasks. Finally, we found out that the studied algorithm performs well in a harsh environment

INRIA a CCSD electronic archive server

Contribution à l’ordonnancement dynamique, tolérant aux fautes, de tâches pour les systèmes embarqués temps-réel multiprocesseurs

Author: Dobiáš Petr
Publication venue: HAL CCSD
Publication date: 02/10/2020
Field of study

The thesis is concerned with online mapping and scheduling of tasks on multiprocessor embedded systems in order to improve the reliability subject to various constraints regarding e.g. time, or energy. To evaluate system performances, the number of rejected tasks, algorithm complexity and resilience assessed by injecting faults are analysed. The research was applied to: (i) the primary/backup approach technique, which is a fault tolerant one based on two task copies, and (ii) the scheduling algorithms for small satellites called CubeSats. The chief objective for the primary/backup approach is to analyse processor allocation strategies, devise novel enhancing scheduling methods and to choose one, which significantly reduces the algorithm run-time without worsening the system performances. Regarding CubeSats, the proposed idea is to gather all processors built into satellites on one board and design scheduling algorithms to make CubeSats more robust as to the faults. Two real CubeSat scenarios are analysed and it is found that it is useless to consider systems with more than six processors and that the presented algorithms perform well in a harsh environment and with energy constraints.La thèse se focalise sur le placement et l’ordonnancement dynamique des tâches sur les systèmes embarqués multiprocesseurs pour améliorer leur fiabilité tout en tenant compte des contraintes telles que le temps réel ou l’énergie. Afin d’évaluer les performances du système, le nombre de tâches rejetées, la complexité de l’algorithme et la résilience estimée en injectant des fautes sont principalement analysés. La recherche est appliquée (i) à l’approche de « primary/backup » qui est une technique de tolérance aux fautes basée sur deux copies d’une tâche et (ii) aux algorithmes de placement pour les petits satellites appelés CubeSats. Quant à l’approche de « primary/backup », l’objectif principal est d’étudier les stratégies d’allocation des processeurs, de proposer de nouvelles méthodes d’amélioration pour l’ordonnancement et d’en choisir une qui diminue considérablement la durée de l’exécution de l’algorithme sans dégrader les performances du système. En ce qui concerne les CubeSats, l’idée est de regrouper tous les processeurs à bord et de concevoir des algorithmes d’ordonnancement afin de rendre les CubeSats plus robustes. Les scénarios provenant de deux CubeSats réels sont étudiés et les résultats montrent qu’il est inutile de considérer les systèmes ayant plus de six processeurs et que les algorithmes proposés fonctionnent bien même avec des capacités énergétiques limitées et dans un environnement hostile

INRIA a CCSD electronic archive server

Fault Tolerant Real Time Dynamic Scheduling Algorithm For Heterogeneous Distributed System

Author: Ekka A A
Publication venue
Publication date: 01/01/2007
Field of study

Fault-tolerance becomes an important key to establish dependability in Real Time Distributed Systems (RTDS). In fault-tolerant Real Time Distributed systems, detection of fault and its recovery should be executed in timely manner so that in spite of fault occurrences the intended output of real-time computations always take place on time. Hardware and software redundancy are well-known e ective methods for faulttolerance, where extra hard ware (e.g., processors, communication links) and software (e.g., tasks, messages) are added into the system to deal with faults. Performances of RTDS are mostly guided by eciency of scheduling algorithm and schedulability analysis are performed on the system to ensure the timing constrains. This thesis examines the scenarios where a real time system requires very little redundant hardware resources to tolerate failures in heterogeneous real time distributed systems with point-to-point communication links. Fault tolerance can be achieved by..

ethesis@nitr

Space Station Freedom data management system growth and evolution report

Author: Bartlett R.
Davis G.
Gibson J.
Grant T. L.
Hedges R.
Johnson M. J.
Liu Y. K.
Patterson-Hine A.
Sliwa N.
Sowizral H.
Publication venue
Publication date
Field of study

The Information Sciences Division at the NASA Ames Research Center has completed a 6-month study of portions of the Space Station Freedom Data Management System (DMS). This study looked at the present capabilities and future growth potential of the DMS, and the results are documented in this report. Issues have been raised that were discussed with the appropriate Johnson Space Center (JSC) management and Work Package-2 contractor organizations. Areas requiring additional study have been identified and suggestions for long-term upgrades have been proposed. This activity has allowed the Ames personnel to develop a rapport with the JSC civil service and contractor teams that does permit an independent check and balance technique for the DMS

NASA Technical Reports Server