3,572 research outputs found
Study of fault-tolerant software technology
Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance
An automated wrapper-based approach to the design of dependable software
The design of dependable software systems invariably comprises two main activities: (i) the design of dependability mechanisms, and (ii) the location of dependability mechanisms. It has been shown that these activities are intrinsically difficult. In this paper we propose an automated wrapper-based methodology to circumvent the problems associated with the design and location of dependability mechanisms. To achieve this we replicate important variables so that they can be used as part of standard, efficient dependability mechanisms. These well-understood mechanisms are then deployed in all relevant locations. To validate the proposed methodology we apply it to three complex software systems, evaluating the dependability enhancement and execution overhead in each case. The results generated demonstrate that the system failure rate of a wrapped software system can be several orders of magnitude lower than that of an unwrapped equivalent
Towards automatic Markov reliability modeling of computer architectures
The analysis and evaluation of reliability measures using time-varying Markov models is required for Processor-Memory-Switch (PMS) structures that have competing processes such as standby redundancy and repair, or renewal processes such as transient or intermittent faults. The task of generating these models is tedious and prone to human error due to the large number of states and transitions involved in any reasonable system. Therefore model formulation is a major analysis bottleneck, and model verification is a major validation problem. The general unfamiliarity of computer architects with Markov modeling techniques further increases the necessity of automating the model formulation. This paper presents an overview of the Automated Reliability Modeling (ARM) program, under development at NASA Langley Research Center. ARM will accept as input a description of the PMS interconnection graph, the behavior of the PMS components, the fault-tolerant strategies, and the operational requirements. The output of ARM will be the reliability of availability Markov model formulated for direct use by evaluation programs. The advantages of such an approach are (a) utility to a large class of users, not necessarily expert in reliability analysis, and (b) a lower probability of human error in the computation
Recommended from our members
Fault tolerance via diversity for off-the-shelf products: A study with SQL database servers
If an off-the-shelf software product exhibits poor dependability due to design faults, then software fault tolerance is often the only way available to users and system integrators to alleviate the problem. Thanks to low acquisition costs, even using multiple versions of software in a parallel architecture, which is a scheme formerly reserved for few and highly critical applications, may become viable for many applications. We have studied the potential dependability gains from these solutions for off-the-shelf database servers. We based the study on the bug reports available for four off-the-shelf SQL servers plus later releases of two of them. We found that many of these faults cause systematic noncrash failures, which is a category ignored by most studies and standard implementations of fault tolerance for databases. Our observations suggest that diverse redundancy would be effective for tolerating design faults in this category of products. Only in very few cases would demands that triggered a bug in one server cause failures in another one, and there were no coincident failures in more than two of the servers. Use of different releases of the same product would also tolerate a significant fraction of the faults. We report our results and discuss their implications, the architectural options available for exploiting them, and the difficulties that they may present
Exception handling in the development of fault-tolerant component-based systems
Orientador: Cecilia Mary Fischer RubiraTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Mecanismos de tratamento de exceçÔes foram concebidos com o intuito de facilitar o gerenciamento da complexidade de sistemas de software tolerantes a falhas. Eles promovem uma separação textual explĂcita entre o cĂłdigo normal e o cĂłdigo que lida com situaçÔes anormais, afim de dar suporte a construção de programas que sĂŁo mais concisos fĂĄceis de evoluir e confĂĄveis. Diversas linguagens de programação modernas e a maioria dos modelos de componentes implementam mecanismos de tratamento de exceçÔes. Apesar de seus muitos benefĂcios, tratamento de exceçÔes pode ser a fonte de diversas falhas de projeto se usado de maneira indisciplinada. Estudos recentes mostram que desenvolvedores de sistemas de grande escala baseados em infra-estruturas de componentes tĂȘm hĂĄbitos, no tocante ao uso de tratamento de exceçÔes, que tornam suas aplicaçÔes vulnerĂĄveis a falhas e difĂceis de se manter. Componentes de software criam novos desafios com os quais mecanismos de tratamento de exceçÔes tradicionais nĂŁo lidam, o que aumenta a probabilidade de que problemas ocorram. Alguns exemplos sĂŁo indisponibilidade de cĂłdigo fonte e incompatibilidades arquiteturais. Neste trabalho propomos duas tĂ©cnicas complementares centradas em tratamento de exceçÔes para a construção de sistemas tolerantes a falhas baseados em componentes. Ambas tĂȘm ĂȘnfase na estrutura do sistema como um meio para se reduzir o impacto de mecanismos de tolerĂąncia a falhas em sua complexidade total e o nĂșmero de falhas de projeto decorrentes dessa complexidade. A primeira Ă© uma abordagem para o projeto arquitetural dos mecanismos de recuperação de erros de um sistema. Ela trata do problema de verificar se uma arquitetura de software satisfaz certas propriedades relativas ao fluxo de exceçÔes entre componentes arquiteturais, por exemplo, se todas as exceçÔes lançadas no nĂvel arquitetural sĂŁo tratadas. A abordagem proposta lança de diversas ferramentas existentes para automatizar ao mĂĄximo esse processo. A segunda consiste em aplicar programação orientada a aspectos (AOP) afim de melhorar a modularização de cĂłdigo de tratamento de exceçÔes. Conduzimos um estudo aprofundado com o objetivo de melhorar o entendimento geral sobre o efeitos de AOP no cĂłdigo de tratamento de exceçÔes e identificar as situaçÔes onde seu uso Ă© vantajoso e onde nĂŁo Ă©Abstract: Exception handling mechanisms were conceived as a means to help managing the complexity of fault-tolerant software. They promote an explicit textual separation between normal code and the code that deals with abnormal situations, in order to support the construction of programs that are more concise, evolvable, and reliable. Several mainstream programming languages and most of the existing component models implement exception handling mechanisms. In spite of its many bene?ts, exception handling can be a source of many design faults if used in an ad hoc fashion. Recent studies show that developers of large-scale software systems based on component infrastructures have habits concerning the use of exception handling that make applications vulnerable to faults and hard to maintain. Software components introduce new challenges which are not addressed by traditional exception handling mechanisms and increase the chances of problems occurring. Examples include unavailability of source code and architectural mismatches. In this work, we propose two complementary techniques centered on exception handling for the construction of fault-tolerant component-based systems. Both of them emphasize system structure as a means to reduce the impactof fault tolerance mechanisms on the overall complexity of a software system and the number of design faults that stem from complexity. The ?rst one is an approach for the architectural design of a system?s error handling capabilities. It addresses the problem of verifying whether a software architecture satis?es certain properties of interest pertaining the ?ow of exceptions between architectural components, e.g., if all the exceptions signaled at the architectural level are eventually handled. The proposed approach is based on a set of existing tools that automate this process as much as possible. The second one consists in applying aspect-oriented programming (AOP) to better modularize exception handling code. We have conducted a through study aimed at improving our understanding of the efects of AOP on exception handling code and identifying the situations where its use is advantageous and the ones where it is notDoutoradoDoutor em CiĂȘncia da Computaçã
- âŠ