Search CORE

245 research outputs found

Synthesis of Fault-Tolerant Embedded Systems with Checkpointing and Replication

Author: Eles Petru
Izosimov Viacheslav
Peng Zebo
Pop Paul
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

We present an approach to the synthesis of fault-tolerant hard real-time systems for safety-critical applications. We use checkpointing with rollback recovery and active replication for tolerating transient faults. Processes are statically scheduled and communications are performed using the time-triggered protocol. Our synthesis approach decides the assignment of fault-tolerance policies to processes, the optimal placement of checkpoints and the mapping of processes to processors such that transient faults are tolerated and the timing constraints of the application are satisfied. We present several synthesis algorithms which are able to find fault-tolerant implementations given a limited amount of resources. The developed algorithms are evaluated using extensive experiments, including a real-life example. 1

CiteSeerX

Online Research Database In Technology

Synthesis of Fault-Tolerant Embedded Systems

Author: Eles Petru
Izosimov Viacheslav
Peng Zebo
Pop Paul
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

This work addresses the issue of design optimization for faulttolerant hard real-time systems. In particular, our focus is on the handling of transient faults using both checkpointing with rollback recovery and active replication. Fault tolerant schedules are generated based on a conditional process graph representation. The formulated system synthesis approaches decide the assignment of fault-tolerance policies to processes, the optimal placement of checkpoints and the mapping of processes to processors, such that multiple transient faults are tolerated, transparency requirements are considered, and the timing constraints of the application are satisfied. 1

CiteSeerX

Crossref

Online Research Database In Technology

Design Optimization of Time- and Cost-Constrained Fault-Tolerant Embedded Systems with Checkpointing and Replication

Author: Eles Petru
Izosimov Viacheslav
Peng Zebo
Pop Paul
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

We present an approach to the synthesis of fault-tolerant hard real-time systems for safety-critical applications. We use checkpointing with rollback recovery and active replication for tolerating transient faults. Processes and communications are statically scheduled. Our synthesis approach decides the assign-ment of fault-tolerance policies to processes, the optimal place-ent of checkpoints and the mapping of processes to processors such that multiple transient faults are tolerated and the timing con-straints of the application are satisfied. We present several design optimization approaches which are able to find fault-tolerant im-plementations given a limited amount of resources. The developed algorithms are evaluated using extensive experiments, including a real-life example

CiteSeerX

Online Research Database In Technology

A Survey of Fault-Tolerance Techniques for Embedded Systems from the Perspective of Power, Energy, and Thermal Issues

Author: Ansari M.
Ejlali A.
Henkel J.
Hessabi S.
Khdr H.
Nazari P. G.
Safari S.
Yari-Karin S.
Yeganeh-Khaksar A.
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 02/02/2022
Field of study

The relentless technology scaling has provided a significant increase in processor performance, but on the other hand, it has led to adverse impacts on system reliability. In particular, technology scaling increases the processor susceptibility to radiation-induced transient faults. Moreover, technology scaling with the discontinuation of Dennard scaling increases the power densities, thereby temperatures, on the chip. High temperature, in turn, accelerates transistor aging mechanisms, which may ultimately lead to permanent faults on the chip. To assure a reliable system operation, despite these potential reliability concerns, fault-tolerance techniques have emerged. Specifically, fault-tolerance techniques employ some kind of redundancies to satisfy specific reliability requirements. However, the integration of fault-tolerance techniques into real-time embedded systems complicates preserving timing constraints. As a remedy, many task mapping/scheduling policies have been proposed to consider the integration of fault-tolerance techniques and enforce both timing and reliability guarantees for real-time embedded systems. More advanced techniques aim additionally at minimizing power and energy while at the same time satisfying timing and reliability constraints. Recently, some scheduling techniques have started to tackle a new challenge, which is the temperature increase induced by employing fault-tolerance techniques. These emerging techniques aim at satisfying temperature constraints besides timing and reliability constraints. This paper provides an in-depth survey of the emerging research efforts that exploit fault-tolerance techniques while considering timing, power/energy, and temperature from the real-time embedded systems’ design perspective. In particular, the task mapping/scheduling policies for fault-tolerance real-time embedded systems are reviewed and classified according to their considered goals and constraints. Moreover, the employed fault-tolerance techniques, application models, and hardware models are considered as additional dimensions of the presented classification. Lastly, this survey gives deep insights into the main achievements and shortcomings of the existing approaches and highlights the most promising ones

KITopen

Synthesis of fault-tolerant embedded systems

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

Crossref

Task Migration for Fault-Tolerance in Mixed-Criticality Embedded Systems

Author: Madsen Jan
Pop Paul
Saraswat Prabhat Kumar
Publication venue
Publication date: 01/01/2009
Field of study

In this paper we are interested in mixed-criticality embed-ded applications implemented on distributed architectures. Depending on their time-criticality, tasks can be hard or soft real-time and regarding safety-criticality, tasks can be fault-tolerant to transient faults, permanent faults, or have no dependability requirements. We use Earliest Deadline First (EDF) scheduling for the hard tasks and the Constant Bandwidth Server (CBS) for the soft tasks. The CBS pa-rameters determine the quality of service (QoS) of soft tasks. Transient faults are tolerated using checkpointing with roll-back recovery. For tolerating permanent faults in proces-sors, we use task migration, i.e., restarting the safety-critical tasks on other processors. We propose a Greedy-based on-line heuristic for the migration of safety-critical tasks, in response to permanent faults, and the adjustment of CBS parameters on the target processors, such that the faults are tolerated, the deadlines for the hard real-time tasks are sat-isfied and the QoS for soft tasks is maximized. The proposed online adaptive approach has been evaluated using several synthetic benchmarks and a real-life case study. 1

CiteSeerX

Online Research Database In Technology

Design Optimization of Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

Author: Eles Petru
Izosimov Viacheslav
Peng Zebo
Pop Paul
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Submitted on behalf of EDAA (http://www.edaa.com/)International audienceIn this paper we present an approach to the design optimization of fault-tolerant embedded systems for safety-critical applications. Processes are statically scheduled and communications are performed using the time-triggered protocol. We use process re-execution and replication for tolerating transient faults. Our design optimization approach decides the mapping of processes to processors and the assignment of fault-tolerant policies to processes such that transient faults are tolerated and the timing constraints of the application are satisfied. We present several heuristics which are able to find fault-tolerant implementations given a limited amount of resources. The developed algorithms are evaluated using extensive experiments, including a real-life example

Crossref

Online Research Database In Technology

Mapping of Fault-Tolerant Applications with Transparency on Distributed Embedded Systems

Author: Eles Petru
Izosimov Viacheslav
Peng Zebo
Pop Paul
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Online Research Database In Technology

A Constraint Logic Programming Framework for the Synthesis of Fault-Tolerant Schedules for Distributed Embedded Systems

Author: Izosimov Viacheslav
Pop Paul
Poulsen Kåre Harbo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Archivio della ricerca - Università degli studi di Napoli Federico II

Crossref

Online Research Database In Technology

Shadow replication: An energy-aware, fault-tolerant computational model for green cloud computing

Author: Cui X
Melhem R
Mills B
Znati T
Publication venue: 'MDPI AG'
Publication date: 01/01/2014
Field of study

As the demand for cloud computing continues to increase, cloud service providers face the daunting challenge to meet the negotiated SLA agreement, in terms of reliability and timely performance, while achieving cost-effectiveness. This challenge is increasingly compounded by the increasing likelihood of failure in large-scale clouds and the rising impact of energy consumption and CO2 emission on the environment. This paper proposes Shadow Replication, a novel fault-tolerance model for cloud computing, which seamlessly addresses failure at scale, while minimizing energy consumption and reducing its impact on the environment. The basic tenet of the model is to associate a suite of shadow processes to execute concurrently with the main process, but initially at a much reduced execution speed, to overcome failures as they occur. Two computationally-feasible schemes are proposed to achieve Shadow Replication. A performance evaluation framework is developed to analyze these schemes and compare their performance to traditional replication-based fault tolerance methods, focusing on the inherent tradeoff between fault tolerance, the specified SLA and profit maximization. The results show that Shadow Replication leads to significant energy reduction, and is better suited for compute-intensive execution models, where up to 30% more profit increase can be achieved due to reduced energy consumption

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

D-Scholarship@Pitt