10,334 research outputs found
Rapid Recovery for Systems with Scarce Faults
Our goal is to achieve a high degree of fault tolerance through the control
of a safety critical systems. This reduces to solving a game between a
malicious environment that injects failures and a controller who tries to
establish a correct behavior. We suggest a new control objective for such
systems that offers a better balance between complexity and precision: we seek
systems that are k-resilient. In order to be k-resilient, a system needs to be
able to rapidly recover from a small number, up to k, of local faults
infinitely many times, provided that blocks of up to k faults are separated by
short recovery periods in which no fault occurs. k-resilience is a simple but
powerful abstraction from the precise distribution of local faults, but much
more refined than the traditional objective to maximize the number of local
faults. We argue why we believe this to be the right level of abstraction for
safety critical systems when local faults are few and far between. We show that
the computational complexity of constructing optimal control with respect to
resilience is low and demonstrate the feasibility through an implementation and
experimental results.Comment: In Proceedings GandALF 2012, arXiv:1210.202
Recommended from our members
Protective wrapping of off-the-shelf components
System designers using off-the-shelf components (OTSCs), whose internals they cannot change, often use add-on “wrappers” to adapt the OTSCs’ behaviour as required. In most cases, wrappers are used to change “functional” properties of the components they wrap. In this paper we discuss instead protective wrapping, the use of wrappers to improve the dependability – i.e., “non-functional” properties like availability, reliability, security, and/or safety – of a component and thus of a system. Wrappers can improve dependability by adding fault tolerance, e.g. graceful degradation, or error recovery mechanisms. We discuss the rational specification of such protective wrappers in view of system dependability requirements, and highlight some of the design trade-offs and uncertainties that affect system design with OTSCs and wrappers, and that differentiate it from other forms of fault-tolerant design
Design diversity: an update from research on reliability modelling
Diversity between redundant subsystems is, in various forms, a common design approach for improving system dependability. Its value in the case of software-based systems is still controversial. This paper gives an overview of reliability modelling work we carried out in recent projects on design diversity, presented in the context of previous knowledge and practice. These results provide additional insight for decisions in applying diversity and in assessing diverseredundant systems. A general observation is that, just as diversity is a very general design approach, the models of diversity can help conceptual understanding of a range of different situations. We summarise results in the general modelling of common-mode failure, in inference from observed failure data, and in decision-making for diversity in development.
Automatic Software Repair: a Bibliography
This article presents a survey on automatic software repair. Automatic
software repair consists of automatically finding a solution to software bugs
without human intervention. This article considers all kinds of repairs. First,
it discusses behavioral repair where test suites, contracts, models, and
crashing inputs are taken as oracle. Second, it discusses state repair, also
known as runtime repair or runtime recovery, with techniques such as checkpoint
and restart, reconfiguration, and invariant restoration. The uniqueness of this
article is that it spans the research communities that contribute to this body
of knowledge: software engineering, dependability, operating systems,
programming languages, and security. It provides a novel and structured
overview of the diversity of bug oracles and repair operators used in the
literature
What is a quantum computer, and how do we build one?
The DiVincenzo criteria for implementing a quantum computer have been seminal
in focussing both experimental and theoretical research in quantum information
processing. These criteria were formulated specifically for the circuit model
of quantum computing. However, several new models for quantum computing
(paradigms) have been proposed that do not seem to fit the criteria well. The
question is therefore what are the general criteria for implementing quantum
computers. To this end, a formal operational definition of a quantum computer
is introduced. It is then shown that according to this definition a device is a
quantum computer if it obeys the following four criteria: Any quantum computer
must (1) have a quantum memory; (2) facilitate a controlled quantum evolution
of the quantum memory; (3) include a method for cooling the quantum memory; and
(4) provide a readout mechanism for subsets of the quantum memory. The criteria
are met when the device is scalable and operates fault-tolerantly. We discuss
various existing quantum computing paradigms, and how they fit within this
framework. Finally, we lay out a roadmap for selecting an avenue towards
building a quantum computer. This is summarized in a decision tree intended to
help experimentalists determine the most natural paradigm given a particular
physical implementation
Pinwheel Scheduling for Fault-tolerant Broadcast Disks in Real-time Database Systems
The design of programs for broadcast disks which incorporate real-time and fault-tolerance requirements is considered. A generalized model for real-time fault-tolerant broadcast disks is defined. It is shown that designing programs for broadcast disks specified in this model is closely related to the scheduling of pinwheel task systems. Some new results in pinwheel scheduling theory are derived, which facilitate the efficient generation of real-time fault-tolerant broadcast disk programs.National Science Foundation (CCR-9308344, CCR-9596282
- …