Search CORE

11,482 research outputs found

Rapid Recovery for Systems with Scarce Faults

Author: Huang Chung-Hao
Peled Doron
Schewe Sven
Wang Farn
Publication venue: 'Open Publishing Association'
Publication date: 01/10/2012
Field of study

Our goal is to achieve a high degree of fault tolerance through the control of a safety critical systems. This reduces to solving a game between a malicious environment that injects failures and a controller who tries to establish a correct behavior. We suggest a new control objective for such systems that offers a better balance between complexity and precision: we seek systems that are k-resilient. In order to be k-resilient, a system needs to be able to rapidly recover from a small number, up to k, of local faults infinitely many times, provided that blocks of up to k faults are separated by short recovery periods in which no fault occurs. k-resilience is a simple but powerful abstraction from the precise distribution of local faults, but much more refined than the traditional objective to maximize the number of local faults. We argue why we believe this to be the right level of abstraction for safety critical systems when local faults are few and far between. We show that the computational complexity of constructing optimal control with respect to resilience is low and demonstrate the feasibility through an implementation and experimental results.Comment: In Proceedings GandALF 2012, arXiv:1210.202

arXiv.org e-Print Archive

Directory of Open Access Journals

A group membership algorithm with a practical specification

Author: Bruck Jehoshua
Franceschetti Martin
Publication venue
Publication date: 01/11/2001
Field of study

Presents a solvable specification and gives an algorithm for the group membership problem in asynchronous systems with crash failures. Our specification requires processes to maintain a consistent history in their sequences of views. This allows processes to order failures and recoveries in time and simplifies the programming of high level applications. Previous work has proven that the group membership problem cannot be solved in asynchronous systems with crash failures. We circumvent this impossibility result building a weaker, yet nontrivial specification. We show that our solution is an improvement upon previous attempts to solve this problem using a weaker specification. We also relate our solution to other methods and give a classification of progress properties that can be achieved under different models

Caltech Authors

Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management

Author: Fernandez RC
Kalyvianaki E
Migliavacca M
Pietzuch P
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2013
Field of study

As users of big data applications expect fresh results, we witness a new breed of stream processing systems (SPS) that are designed to scale to large numbers of cloud-hosted machines. Such systems face new challenges: (i) to benefit from the pay-as-you-go model of cloud computing, they must scale out on demand, acquiring additional virtual machines (VMs) and parallelising operators when the workload increases; (ii) failures are common with deployments on hundreds of VMs - systems must be fault-tolerant with fast recovery times, yet low per-machine overheads. An open question is how to achieve these two goals when stream queries include stateful operators, which must be scaled out and recovered without affecting query results. Our key idea is to expose internal operator state explicitly to the SPS through a set of state management primitives. Based on them, we describe an integrated approach for dynamic scale out and recovery of stateful operators. Externalised operator state is checkpointed periodically by the SPS and backed up to upstream VMs. The SPS identifies individual operator bottlenecks and automatically scales them out by allocating new VMs and partitioning the check-pointed state. At any point, failed operators are recovered by restoring checkpointed state on a new VM and replaying unprocessed tuples. We evaluate this approach with the Linear Road Benchmark on the Amazon EC2 cloud platform and show that it can scale automatically to a load factor of L=350 with 50 VMs, while recovering quickly from failures. Copyright © 2013 ACM

CiteSeerX

City Research Online

Crossref

Spiral - Imperial College Digital Repository

Kent Academic Repository

Fault estimation and active fault tolerant control for linear parameter varying descriptor systems

Author: Patton Ron J.
Shi Fengming
Publication venue: 'Wiley'
Publication date: 06/11/2014
Field of study

Starting with the baseline controller design, this paper proposes an integrated approach of active fault tolerant control based on proportional derivative extended state observer (PDESO) for linear parameter varying descriptor systems. The PDESO can simultaneously provide the estimates of the system states, sensor faults, and actuator faults. The L₂ robust performance of the closed-loop system to bounded exogenous disturbance and bounded uncertainty is achieved by a two-step design procedure adapted from the traditional observer-based controller design. Furthermore, an LMI pole-placement region and the L₂ robustness performance are combined into a multiobjective formulation by suitably combing the appropriate LMI descriptions. A parameter-varying system example is given to illustrate the design procedure and the validity of the proposed integrated design approach

Repository@Hull - Worktribe

Recommended from our members

FOREVER: Fault/intrusiOn REmoVal through Evolution & Recovery

Author: Bessani A. N.
Daidone A.
Distler T.
Gashi I.
Kapitza R.
Obelheiro R. R.
Reiser H. P.
Sousa P.
Stankovic V.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

The goal of the FOREVER project is to develop a service for Fault/intrusiOn REmoVal through Evolution & Recovery. In order to achieve this goal, our work addresses three main tasks: the definition of the FOREVER service architecture; the analysis of how diversity techniques can improve resilience; and the evaluation of the FOREVER service. The FOREVER service is an important contribution to intrustion-tolerant replication middleware and significantly enhances the resilience

City Research Online

Universidade de Lisboa: Repositório.UL

Layered architecture for quantum computing

Author: Alexei Yu. Kitaev
Andrew M. Steane
Andrew M. Steane
Austin G. Fowler
Austin G. Fowler
Austin G. Fowler
Christopher M. Dawson
D. Aharonov
Daniel A. Lidar
Dean Copsey
G. N. Nielson
John Paul Shen
Jungsang Kim
M. Oskin
M. Whitney
M. Whitney
Michael A. Nielsen
N. Isailovic
N. Isailovic
Panos Aliferis
Stéphane Beauregard
Thomas G. Draper
Tzvetan S. Metodi
Yasuhiro Takahashi
Publication venue: 'American Physical Society (APS)'
Publication date: 01/07/2012
Field of study

We develop a layered quantum computer architecture, which is a systematic framework for tackling the individual challenges of developing a quantum computer while constructing a cohesive device design. We discuss many of the prominent techniques for implementing circuit-model quantum computing and introduce several new methods, with an emphasis on employing surface code quantum error correction. In doing so, we propose a new quantum computer architecture based on optical control of quantum dots. The timescales of physical hardware operations and logical, error-corrected quantum gates differ by several orders of magnitude. By dividing functionality into layers, we can design and analyze subsystems independently, demonstrating the value of our layered architectural approach. Using this concrete hardware platform, we provide resource analysis for executing fault-tolerant quantum algorithms for integer factoring and quantum simulation, finding that the quantum dot architecture we study could solve such problems on the timescale of days.Comment: 27 pages, 20 figure

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals