Search CORE

3,572 research outputs found

Study of fault-tolerant software technology

Author: Broglio C.
Goldberg J.
Hitt E.
Levitt K.
Slivinski T.
Webb J.
Wild C.
Publication venue
Publication date
Field of study

Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance

NASA Technical Reports Server

An automated wrapper-based approach to the design of dependable software

Author: Jhumka Arshad
Leeke Matthew
Publication venue
Publication date: 01/01/2011
Field of study

The design of dependable software systems invariably comprises two main activities: (i) the design of dependability mechanisms, and (ii) the location of dependability mechanisms. It has been shown that these activities are intrinsically difficult. In this paper we propose an automated wrapper-based methodology to circumvent the problems associated with the design and location of dependability mechanisms. To achieve this we replicate important variables so that they can be used as part of standard, efficient dependability mechanisms. These well-understood mechanisms are then deployed in all relevant locations. To validate the proposed methodology we apply it to three complex software systems, evaluating the dependability enhancement and execution overhead in each case. The results generated demonstrate that the system failure rate of a wrapped software system can be several orders of magnitude lower than that of an unwrapped equivalent

CiteSeerX

Warwick Research Archives Portal Repository

Towards automatic Markov reliability modeling of computer architectures

Author: Liceaga C. A.
Siewiorek D. P.
Publication venue
Publication date
Field of study

The analysis and evaluation of reliability measures using time-varying Markov models is required for Processor-Memory-Switch (PMS) structures that have competing processes such as standby redundancy and repair, or renewal processes such as transient or intermittent faults. The task of generating these models is tedious and prone to human error due to the large number of states and transitions involved in any reasonable system. Therefore model formulation is a major analysis bottleneck, and model verification is a major validation problem. The general unfamiliarity of computer architects with Markov modeling techniques further increases the necessity of automating the model formulation. This paper presents an overview of the Automated Reliability Modeling (ARM) program, under development at NASA Langley Research Center. ARM will accept as input a description of the PMS interconnection graph, the behavior of the PMS components, the fault-tolerant strategies, and the operational requirements. The output of ARM will be the reliability of availability Markov model formulated for direct use by evaluation programs. The advantages of such an approach are (a) utility to a large class of users, not necessarily expert in reliability analysis, and (b) a lower probability of human error in the computation

NASA Technical Reports Server

Deja Fu: a concurrency testing library for Haskell

Author: Walker Michael Stewart
Runciman Colin
Publication venue: ASSOC COMPUTING MACHINERY
Publication date: 01/01/2015
Field of study

Biblioteca Digital de la Comunidad de Madrid

White Rose Research Online

Deja Fu: a concurrency testing library for Haskell

Author: Runciman Colin
Walker Michael Stewart
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2015
Field of study

White Rose Research Online

Recommended from our members

Fault tolerance via diversity for off-the-shelf products: A study with SQL database servers

Author: Gashi I.
Popov P. T.
Strigini L.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2007
Field of study

If an off-the-shelf software product exhibits poor dependability due to design faults, then software fault tolerance is often the only way available to users and system integrators to alleviate the problem. Thanks to low acquisition costs, even using multiple versions of software in a parallel architecture, which is a scheme formerly reserved for few and highly critical applications, may become viable for many applications. We have studied the potential dependability gains from these solutions for off-the-shelf database servers. We based the study on the bug reports available for four off-the-shelf SQL servers plus later releases of two of them. We found that many of these faults cause systematic noncrash failures, which is a category ignored by most studies and standard implementations of fault tolerance for databases. Our observations suggest that diverse redundancy would be effective for tolerating design faults in this category of products. Only in very few cases would demands that triggered a bug in one server cause failures in another one, and there were no coincident failures in more than two of the servers. Use of different releases of the same product would also tolerate a significant fraction of the faults. We report our results and discuss their implications, the architectural options available for exploiting them, and the difficulties that they may present

City Research Online

Crossref

Exception handling in the development of fault-tolerant component-based systems

Author: Lima Filho Fernando Jose Castor de
Publication venue: [s.n.]
Publication date: 10/08/2018
Field of study

Orientador: Cecilia Mary Fischer RubiraTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Mecanismos de tratamento de exceções foram concebidos com o intuito de facilitar o gerenciamento da complexidade de sistemas de software tolerantes a falhas. Eles promovem uma separação textual explícita entre o código normal e o código que lida com situações anormais, afim de dar suporte a construção de programas que são mais concisos fáceis de evoluir e confáveis. Diversas linguagens de programação modernas e a maioria dos modelos de componentes implementam mecanismos de tratamento de exceções. Apesar de seus muitos benefícios, tratamento de exceções pode ser a fonte de diversas falhas de projeto se usado de maneira indisciplinada. Estudos recentes mostram que desenvolvedores de sistemas de grande escala baseados em infra-estruturas de componentes têm hábitos, no tocante ao uso de tratamento de exceções, que tornam suas aplicações vulneráveis a falhas e difíceis de se manter. Componentes de software criam novos desafios com os quais mecanismos de tratamento de exceções tradicionais não lidam, o que aumenta a probabilidade de que problemas ocorram. Alguns exemplos são indisponibilidade de código fonte e incompatibilidades arquiteturais. Neste trabalho propomos duas técnicas complementares centradas em tratamento de exceções para a construção de sistemas tolerantes a falhas baseados em componentes. Ambas têm ênfase na estrutura do sistema como um meio para se reduzir o impacto de mecanismos de tolerância a falhas em sua complexidade total e o número de falhas de projeto decorrentes dessa complexidade. A primeira é uma abordagem para o projeto arquitetural dos mecanismos de recuperação de erros de um sistema. Ela trata do problema de verificar se uma arquitetura de software satisfaz certas propriedades relativas ao fluxo de exceções entre componentes arquiteturais, por exemplo, se todas as exceções lançadas no nível arquitetural são tratadas. A abordagem proposta lança de diversas ferramentas existentes para automatizar ao máximo esse processo. A segunda consiste em aplicar programação orientada a aspectos (AOP) afim de melhorar a modularização de código de tratamento de exceções. Conduzimos um estudo aprofundado com o objetivo de melhorar o entendimento geral sobre o efeitos de AOP no código de tratamento de exceções e identificar as situações onde seu uso é vantajoso e onde não éAbstract: Exception handling mechanisms were conceived as a means to help managing the complexity of fault-tolerant software. They promote an explicit textual separation between normal code and the code that deals with abnormal situations, in order to support the construction of programs that are more concise, evolvable, and reliable. Several mainstream programming languages and most of the existing component models implement exception handling mechanisms. In spite of its many bene?ts, exception handling can be a source of many design faults if used in an ad hoc fashion. Recent studies show that developers of large-scale software systems based on component infrastructures have habits concerning the use of exception handling that make applications vulnerable to faults and hard to maintain. Software components introduce new challenges which are not addressed by traditional exception handling mechanisms and increase the chances of problems occurring. Examples include unavailability of source code and architectural mismatches. In this work, we propose two complementary techniques centered on exception handling for the construction of fault-tolerant component-based systems. Both of them emphasize system structure as a means to reduce the impactof fault tolerance mechanisms on the overall complexity of a software system and the number of design faults that stem from complexity. The ?rst one is an approach for the architectural design of a system?s error handling capabilities. It addresses the problem of verifying whether a software architecture satis?es certain properties of interest pertaining the ?ow of exceptions between architectural components, e.g., if all the exceptions signaled at the architectural level are eventually handled. The proposed approach is based on a set of existing tools that automate this process as much as possible. The second one consists in applying aspect-oriented programming (AOP) to better modularize exception handling code. We have conducted a through study aimed at improving our understanding of the efects of AOP on exception handling code and identifying the situations where its use is advantageous and the ones where it is notDoutoradoDoutor em Ciência da Computaçã

Repositorio da Producao Cientifica e Intelectual da Unicamp