10 research outputs found

    CSP methods for identifying atomic actions in the design of fault tolerant concurrent systems

    Get PDF
    Limiting the extent of error propagation when faults occur and localizing the subsequent error recovery are common concerns in the design of fault tolerant parallel processing systems, Both activities are made easier if the designer associates fault tolerance mechanisms with the underlying atomic actions of the system, With this in mind, this paper has investigated two methods for the identification of atomic actions in parallel processing systems described using CSP, Explicit trace evaluation forms the basis of the first algorithm, which enables a designer to analyze interprocess communications and thereby locate atomic action boundaries in a hierarchical fashion, The second method takes CSP descriptions of the parallel processes and uses structural arguments to infer the atomic action boundaries. This method avoids the difficulties involved with producing full trace sets, but does incur the penalty of a more complex algorithm

    Implementing atomic actions in Ada 95

    Get PDF
    Atomic actions are an important dynamic structuring technique that aid the construction of fault-tolerant concurrent systems. Although they were developed some years ago, none of the well-known commercially-available programming languages directly support their use. This paper summarizes software fault tolerance techniques for concurrent systems, evaluates the Ada 95 programming language from the perspective of its support for software fault tolerance, and shows how Ada 95 can be used to implement software fault tolerance techniques. In particular, it shows how packages, protected objects, requeue, exceptions, asynchronous transfer of control, tagged types, and controlled types can be used as building blocks from which to construct atomic actions with forward and backward error recovery, which are resilient to deserter tasks and task abortion

    The implementation and use of Ada on distributed systems with high reliability requirements

    Get PDF
    The general inadequacy of Ada for programming systems that must survive processor loss was shown. A solution to the problem was proposed in which there are no syntatic changes to Ada. The approach was evaluated using a full-scale, realistic application. The application used was the Advanced Transport Operating System (ATOPS), an experimental computer control system developed for a modified Boeing 737 aircraft. The ATOPS system is a full authority, real-time avionics system providing a large variety of advanced features. Methods of building fault tolerance into concurrent systems were explored. A set of criteria by which the proposed method will be judged was examined. Extensive interaction with personnel from Computer Sciences Corporation and NASA Langley occurred to determine the requirements of the ATOPS software. Backward error recovery in concurrent systems was assessed

    EOS: A project to investigate the design and construction of real-time distributed embedded operating systems

    Get PDF
    The EOS project is investigating the design and construction of a family of real-time distributed embedded operating systems for reliable, distributed aerospace applications. Using the real-time programming techniques developed in co-operation with NASA in earlier research, the project staff is building a kernel for a multiple processor networked system. The first six months of the grant included a study of scheduling in an object-oriented system, the design philosophy of the kernel, and the architectural overview of the operating system. In this report, the operating system and kernel concepts are described. An environment for the experiments has been built and several of the key concepts of the system have been prototyped. The kernel and operating system is intended to support future experimental studies in multiprocessing, load-balancing, routing, software fault-tolerance, distributed data base design, and real-time processing

    The implementation and use of Ada on distributed systems with high reliability requirements

    Get PDF
    A preliminary analysis of the Ada implementation of the Advanced Transport Operating System (ATOPS), an experimental computer control system developed at NASA Langley for a modified Boeing 737 aircraft, is presented. The criteria that was determined for the evaluation of this approach is described. A preliminary version of the requirements for the ATOPS is contained. This requirements specification is not a formal document, but rather a description of certain aspects of the ATOPS system at a level of detail that best suits the needs of the research. The survey of backward error recovery techniques is also presented

    Implementação de sistemas tolerantes a falhas usando programação orientada a objetos

    Get PDF
    Orientador: Cecilia Mary Fischer RubiraDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Este trabalho tem por objetivo desenvolver uma arquitetura orientada a objetos para dar suporte às aplicações tolerantes a falhas de software. Técnicas de orientação a objetos, tais como, abstração de dados, herança, ligação dinâmica e polimorfismo são exploradas, visando obter aplicações de software de melhor confiabilidade e qualidade. Nosso objetivo é prover um suporte para aplicações que requeiram tolerância falhas de software através de técnicas já conhecidas de diversidade de projeto, integrando essas técnicas ao mecanismo de tratamento de exceções criando assim um framework composto por componentes de software genéricos que formam uma infra-estrutura para dar suporte ao desenvolvimento de sistemas tolerantes a falhas distribuídos(FOOD) .Abstract: The major goal of this work is to develop an object-oriented architecture for software fault-tolerant applications. Object-oriented techniques, such as data abstraction, inheritance and polymorphism are explored to improve software reliability and quality. Thus, our goal is to support software fault tolerance using design diversity, so that this support can be incorporated to the exception handling mechanism in the application. For the understanding and validation of these techniques, we have developed a fault-tolerant object-oriented distributed ftamework (FOOD).MestradoMestre em Ciência da Computaçã

    Using mobility and exception handling to achieve mobile agents that survive server crash failures

    Get PDF
    Mobile agent technology, when designed and used effectively, can minimize bandwidth consumption and autonomously provide a snapshot of the current context of a distributed system. Protecting mobile agents from server crashes is a challenging issue, since developers normally have no control over remote servers. Server crash failures can leave replicas, instable storage, unavailable for an unknown time period. Furthermore, few systems have considered the need for using a fault tolerant protocol among a group of collaborating mobile agents. This thesis uses exception handling to protect mobile agents from server crash failures. An exception model is proposed for mobile agents and two exception handler designs are investigated. The first exists at the server that created the mobile agent and uses a timeout mechanism. The second, the mobile shadow scheme, migrates with the mobile agent and operates at the previous server visited by the mobile agent. A case study application has been developed to compare the performance of the two exception handler designs. Performance results demonstrate that although the second design is slower it offers the smaller trip time when handling a server crash. Furthermore, no modification of the server environment is necessary. This thesis shows that the mobile shadow exception handling scheme reduces complexity for a group of mobile agents to survive server crashes. The scheme deploys a replica that monitors the server occupied by the master, at each stage of the itinerary. The replica exists at the previous server visited in the itinerary. Consequently, each group member is a single fault tolerant entity with respect to server crash failures. Other schemes introduce greater complexity and performance overheads since, for each stage of the itinerary, a group of replicas is sent to servers that offer an equivalent service. In addition, future research is established for fault tolerance in groups of collaborating mobile agents

    Uma arquitetura de software baseada em padrões de projeto para o desenvolvimento de aplicações concorrentes confiaveis

    Get PDF
    Orientador : Cecilia Mary Fischer RubiraTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Sistemas computacionais complexos estão sujeitos a diferentes tipos de falhas, e a maneira mais adequada de lidar com tais falhas é aceitar que qualquer sistema pode apresenta-lás e empregar técnicas apropriadas para tolerá-Ias durante a execução do sistema. Desta forma, a abordagem mais apropriada para a construção de sistemas complexos confiáveis consiste na utilização de técnicas de tolerância a falhas que nos permitem definir regiões de confinamento e recuperação de erros. No entanto, técnicas de tolerância a falhas são geralmente utilizadas na fase de implementação do ciclo de desenvolvimento do sistema. Desta forma, não é freqüentemente fácil empregá-las, desde que projetistas necessitam levar em conta muitos detalhes de implementação. Neste contexto, este trabalho faz duas contribuições relevantes. A primeira contribuição é a utilização prática de técnicas recentes de estruturação de sistemas na definição de uma arquitetura de software genérica para introduzir atomicidade, redundância de software, tratamento de exceções e recuperação de erros coordenada no desenvolvimento de sistemas orientados a objetos confiáveis durante o ciclo de desenvolvimento do sistema, iniciando-se na fase de projeto arquitetural passando pelo projeto detalhado e terminando na fase de implementaçãojcodificação do sistema. A segunda contribuição é a definição de um conjunto coeso de padrões de projetos que refinam os elementos arquiteturais da arquitetura de software proposta e provêem uma clara e transparente separação de interesses entre a funcionalidade da aplicação e a funcionalidade relacionada à provisão da confiabilidade do sistemaAbstract: Complex computer systems are prone to errors of many kinds, and the most reasonable way of dealing with them is to accept that any complex system has faults and to employ appropriate features for tolerating them during run time. We claim that the most beneficial way of achieving fault tolerance in complex systems is to use system structuring which has fault tolerance measures associated with it. ln this case, structuring units serve as natural areas of error containment and error recovery. However, these techniques are mainly developed for employment at the implementation phase of the system development. Hence, it is not often easy to apply them correctly, as the designers have to take into account many implementation details. ln this context, this work makes two main contributions. The first contribution is the practical employment of recent system structuring techniques in the definition of a generic software architecture for introducing atomicity, exception handling, and coordinated error recovery into dependable object-oriented systems at the earlier phases of system development. That is, from architectural design, through detailed design to coding. The second contribution is the definition of a set of design patterns which refine the architectural elements of the proposed software architecture and provide a clear and transparent separation of concerns between the application functionality, and the functionality related to providing system dependabilityDoutoradoDoutor em Ciência da Computaçã
    corecore