315 research outputs found

    Fault-tolerant computer study

    Get PDF
    A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed

    Computer Sciences and Data Systems, volume 1

    Get PDF
    Topics addressed include: software engineering; university grants; institutes; concurrent processing; sparse distributed memory; distributed operating systems; intelligent data management processes; expert system for image analysis; fault tolerant software; and architecture research

    Sharing memory in distributed systems

    Full text link
    We propose an algorithm for simulating atomic registers, test-and-set, fetch-and-add, and read-modify-write registers in a message passing system. The algorithm is fault tolerant and works correctly in presence of up to (N/2) -1 node failures where N is the number of processors in the system. The high resilience of the algorithm is obtained by using randomized consensus algorithms and a robust communication primitive. The use of this primitive allows a processor to exchange local information with a majority of processors in a consistent way, and therefore to take decisions safely. The simulator makes it possible to translate algorithms for the shared memory model to that for the message passing model. With some minor modifications the algorithm can be used to robustly simulate shared queues, shared stacks, etc. (Abstract shortened with permission of author.)

    Fault recovery in distributed processing loop networks

    Full text link
    A graph model is introduced to formalize the fault recovery process in distributed loop networks. This model is applicable to centralized as well as distributed recovery. Key fault tolerance and recovery parameters including redundancy, fault model, recovery time, and recovery strategy are characterized. Centralized recovery strategies for a given fault-tolerant loop network are presented and analyzed. A distributed recovery strategy, which depends on the cooperation of a set of processors, is given, and its application to a new class of fault-tolerant loop networksis evaluated.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/27151/1/0000145.pd

    Software implemented fault tolerance for microprocessor controllers: fault tolerance for microprocessor controllers

    Get PDF
    It is generally accepted that transient faults are a major cause of failure in micro processor systems. Industrial controllers with embedded microprocessors are particularly at risk from this type of failure because their working environments are prone to transient disturbances which can generate transient faults. In order to improve the reliability of processor systems for industrial applications within a limited budget, fault tolerant techniques for uniprocessors are implemented. These techniques aim to identify characteristics of processor operation which are attributed to erroneous behaviour. Once detection is achieved, a programme of restoration activity can be initiated. This thesis initially develops a previous model of erroneous microprocessor behaviour from which characteristics particular to mal-operation are identified. A new technique is proposed, based on software implemented fault tolerance which, by recognizing a particular behavioural characteristic, facilitates the self-detection of erroneous execution. The technique involves inserting detection mechanisms into the target software. This can be quite a complex process and so a prototype software tool called Post-programming Automated Recovery UTility (PARUT) is developed to automate the technique's application. The utility can be used to apply the proposed behavioural fault tolerant technique for a selection of target processors. Fault injection and emulation experiments assess the effectiveness of the proposed fault tolerant technique for three application programs implemented on an 8, 16, and 32- bit processors respectively. The modified application programs are shown to have an improved detection capability and hence reliability when the proposed fault tolerant technique is applied. General assessment of the technique cannot be made, however, because its effectiveness is application specific. The thesis concludes by considering methods of generating non-hazardous application programs at the compilation stage, and design features for incorporation into the architecture of a microprocessor which inherently reduce the hazard, and increase the detection capability of the target software. Particular suggestions are made to add a 'PARUT' phase to the translation process, and to orientate microprocessor design towards the instruction opcode map

    Distributed Operating Systems

    Get PDF
    Distributed operating systems have many aspects in common with centralized ones, but they also differ in certain ways. This paper is intended as an introduction to distributed operating systems, and especially to current university research about them. After a discussion of what constitutes a distributed operating system and how it is distinguished from a computer network, various key design issues are discussed. Then several examples of current research projects are examined in some detail, namely, the Cambridge Distributed Computing System, Amoeba, V, and Eden. © 1985, ACM. All rights reserved

    Building a generalized distributed system model

    Get PDF
    The key elements in the second year (1991-92) of our project are: (1) implementation of the distributed system prototype; (2) successful passing of the candidacy examination and a PhD proposal acceptance by the funded student; (3) design of storage efficient schemes for replicated distributed systems; and (4) modeling of gracefully degrading reliable computing systems. In the third year of the project (1992-93), we propose to: (1) complete the testing of the prototype; (2) enhance the functionality of the modules by enabling the experimentation with more complex protocols; (3) use the prototype to verify the theoretically predicted performance of locking protocols, etc.; and (4) work on issues related to real-time distributed systems. This should result in efficient protocols for these systems

    A Touch of Evil: High-Assurance Cryptographic Hardware from Untrusted Components

    Get PDF
    The semiconductor industry is fully globalized and integrated circuits (ICs) are commonly defined, designed and fabricated in different premises across the world. This reduces production costs, but also exposes ICs to supply chain attacks, where insiders introduce malicious circuitry into the final products. Additionally, despite extensive post-fabrication testing, it is not uncommon for ICs with subtle fabrication errors to make it into production systems. While many systems may be able to tolerate a few byzantine components, this is not the case for cryptographic hardware, storing and computing on confidential data. For this reason, many error and backdoor detection techniques have been proposed over the years. So far all attempts have been either quickly circumvented, or come with unrealistically high manufacturing costs and complexity. This paper proposes Myst, a practical high-assurance architecture, that uses commercial off-the-shelf (COTS) hardware, and provides strong security guarantees, even in the presence of multiple malicious or faulty components. The key idea is to combine protective-redundancy with modern threshold cryptographic techniques to build a system tolerant to hardware trojans and errors. To evaluate our design, we build a Hardware Security Module that provides the highest level of assurance possible with COTS components. Specifically, we employ more than a hundred COTS secure crypto-coprocessors, verified to FIPS140-2 Level 4 tamper-resistance standards, and use them to realize high-confidentiality random number generation, key derivation, public key decryption and signing. Our experiments show a reasonable computational overhead (less than 1% for both Decryption and Signing) and an exponential increase in backdoor-tolerance as more ICs are added

    Issues in providing a reliable multicast facility

    Get PDF
    Issues involved in point-to-multipoint communication are presented and the literature for proposed solutions and approaches surveyed. Particular attention is focused on the ideas and implementations that align with the requirements of the environment of interest. The attributes of multicast receiver groups that might lead to useful classifications, what the functionality of a management scheme should be, and how the group management module can be implemented are examined. The services that multicasting facilities can offer are presented, followed by mechanisms within the communications protocol that implements these services. The metrics of interest when evaluating a reliable multicast facility are identified and applied to four transport layer protocols that incorporate reliable multicast
    • …
    corecore