750 research outputs found

    Performance and evaluation of real-time multicomputer control systems

    Get PDF
    New performance measures, detailed examples, modeling of error detection process, performance evaluation of rollback recovery methods, experiments on FTMP, and optimal size of an NMR cluster are discussed

    Validation of multiprocessor systems

    Get PDF
    Experiments that can be used to validate fault free performance of multiprocessor systems in aerospace systems integrating flight controls and avionics are discussed. Engineering prototypes for two fault tolerant multiprocessors are tested

    On Fault Tolerance Methods for Networks-on-Chip

    Get PDF
    Technology scaling has proceeded into dimensions in which the reliability of manufactured devices is becoming endangered. The reliability decrease is a consequence of physical limitations, relative increase of variations, and decreasing noise margins, among others. A promising solution for bringing the reliability of circuits back to a desired level is the use of design methods which introduce tolerance against possible faults in an integrated circuit. This thesis studies and presents fault tolerance methods for network-onchip (NoC) which is a design paradigm targeted for very large systems-onchip. In a NoC resources, such as processors and memories, are connected to a communication network; comparable to the Internet. Fault tolerance in such a system can be achieved at many abstraction levels. The thesis studies the origin of faults in modern technologies and explains the classification to transient, intermittent and permanent faults. A survey of fault tolerance methods is presented to demonstrate the diversity of available methods. Networks-on-chip are approached by exploring their main design choices: the selection of a topology, routing protocol, and flow control method. Fault tolerance methods for NoCs are studied at different layers of the OSI reference model. The data link layer provides a reliable communication link over a physical channel. Error control coding is an efficient fault tolerance method especially against transient faults at this abstraction level. Error control coding methods suitable for on-chip communication are studied and their implementations presented. Error control coding loses its effectiveness in the presence of intermittent and permanent faults. Therefore, other solutions against them are presented. The introduction of spare wires and split transmissions are shown to provide good tolerance against intermittent and permanent errors and their combination to error control coding is illustrated. At the network layer positioned above the data link layer, fault tolerance can be achieved with the design of fault tolerant network topologies and routing algorithms. Both of these approaches are presented in the thesis together with realizations in the both categories. The thesis concludes that an optimal fault tolerance solution contains carefully co-designed elements from different abstraction levelsSiirretty Doriast

    Definition and trade-off study of reconfigurable airborne digital computer system organizations

    Get PDF
    A highly-reliable, fault-tolerant reconfigurable computer system for aircraft applications was developed. The development and application reliability and fault-tolerance assessment techniques are described. Particular emphasis is placed on the needs of an all-digital, fly-by-wire control system appropriate for a passenger-carrying airplane

    Achieving fault tolerance via robust partitioning and N-Modular Redundancy

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2007.Includes bibliographical references (p. 165-169).This thesis describes the design and performance results for the P-NMR fault tolerant avionics system architecture being developed at Draper Laboratory. The two key principles of the architecture are robust software partitioning (P), as defined by the ARINC 653 open standard, and N-Modular Redundancy (NMR). The P-NMR architecture uses cross channel data exchange and voting to implement fault detection, isolation and recovery (FDIR). The FDIR function is implemented in software that executes on commercial-off-the-shelf (COTS) hardware components that are also based on open standards. The FDIR function and the user applications execute on the same processor. The robust partitioning is provided by a COTS real-time operating system that complies with the ARINC 653 standard. A Triple Modular Redundant (TMR) prototype was developed and various performance metrics were collected. Evaluation of the TMR prototype indicates that the ARINC 653 standard is compatible with an NMR and FDIR architecture. Application partitions can be considered software fault containment regions which enhance the overall integrity of the system. The P-NMR performance metrics were compared with a previous Draper Laboratory design called the Fault Tolerant Parallel Processor (FTPP). This design did not make use of robust partitioning and it used proprietary hardware for implementing certain FDIR functions. The comparison demonstrated that the P-NMR system prototype could perform at an acceptable level and that the development of the system should continue. This research was done in the context of developing cost effective avionics systems for space exploration vehicles such as those being developed for NASA's Constellation program.by Brendan Anthony O'Connell.S.M

    Dynamic reconfiguration in distributed hard real-time systems

    Get PDF

    Constructing fail-controlled nodes for distributed systems: a software approach

    Get PDF
    PhD ThesisDesigning and implementing distributed systems which continue to provide specified services in the presence of processing site and communication failures is a difficult task. To facilitate their development, distributed systems have been built assuming that their underlying hardware components are Jail-controlled, i.e. present a well defined failure mode. However, if conventional hardware cannot provide the assumed failure mode, there is a need to build processing sites or nodes, and communication infra-structure that present the fail-controlled behaviour assumed. Coupling a number of redundant processors within a replicated node is a well known way of constructing fail-controlled nodes. Computation is replicated and executed simultaneously at each processor, and by employing suitable validation techniques to the outputs generated by processors (e.g. majority voting, comparison), outputs from faulty processors can be prevented from appearing at the application level. One way of constructing replicated nodes is by introducing hardwired mechanisms to couple replicated processors with specialised validation hardware circuits. Processors are tightly synchronised at the clock cycle level, and have their outputs validated by a reliable validation hardware. Another approach is to use software mechanisms to perform synchronisation of processors and validation of the outputs. The main advantage of hardware based nodes is the minimum performance overhead incurred. However, the introduction of special circuits may increase the complexity of the design tremendously. Further, every new microprocessor architecture requires considerable redesign overhead. Software based nodes do not present these problems, on the other hand, they introduce much bigger performance overheads to the system. In this thesis we investigate alternative ways of constructing efficient fail-controlled, software based replicated nodes. In particular, we present much more efficient order protocols, which are necessary for the implementation of these nodes. Our protocols, unlike others published to date, do not require processors' physical clocks to be explicitly synchronised. The main contribution of this thesis is the precise definition of the semantics of a software based Jail-silent node, along with its efficient design, implementation and performance evaluation.The Brazilian National Research Council (CNPq/Brasil)

    The Second NASA Formal Methods Workshop 1992

    Get PDF
    The primary goal of the workshop was to bring together formal methods researchers and aerospace industry engineers to investigate new opportunities for applying formal methods to aerospace problems. The first part of the workshop was tutorial in nature. The second part of the workshop explored the potential of formal methods to address current aerospace design and verification problems. The third part of the workshop involved on-line demonstrations of state-of-the-art formal verification tools. Also, a detailed survey was filled in by the attendees; the results of the survey are compiled

    Department of Computer Science Activity 1998-2004

    Get PDF
    This report summarizes much of the research and teaching activity of the Department of Computer Science at Dartmouth College between late 1998 and late 2004. The material for this report was collected as part of the final report for NSF Institutional Infrastructure award EIA-9802068, which funded equipment and technical staff during that six-year period. This equipment and staff supported essentially all of the department\u27s research activity during that period

    Design of a modular digital computer system, DRL 4

    Get PDF
    The design is reported of an advanced modular computer system designated the Automatically Reconfigurable Modular Multiprocessor System, which anticipates requirements for higher computing capacity and reliability for future spaceborne computers. Subjects discussed include: an overview of the architecture, mission analysis, synchronous and nonsynchronous scheduling control, reliability, and data transmission
    corecore