Timing failures refer to a situation where the environment in which a system operates does not behave as expected regarding the timing assumptions, that is, the timing constraints are not met. In the immense body of work on the designing fault-tolerant systems, the type of failures that are usually considered are, process failures, link failures, messages loss and memory failures; and it is usually (implicitly) assumed that there are no timing failures. In this paper we investigate the ability to recover automatically from transient timing failures. We introduce and formally define the concept of algorithms that are resilient to timing failures, and demonstrate the importance of the new concept by presenting consensus and mutual exclusion algorithms, using atomic registers only, that are resilient to timing failures
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.