Search CORE

315 research outputs found

Fault-tolerant computer study

Author: Avizienis A. A.
Ercegovac M. D.
Rennels D. A.
Publication venue
Publication date
Field of study

A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed

NASA Technical Reports Server

Computer Sciences and Data Systems, volume 1

Author
Publication venue
Publication date
Field of study

Topics addressed include: software engineering; university grants; institutes; concurrent processing; sparse distributed memory; distributed operating systems; intelligent data management processes; expert system for image analysis; fault tolerant software; and architecture research

NASA Technical Reports Server

Sharing memory in distributed systems

Author: Aguilar Oscar Rodrigo
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/1990
Field of study

We propose an algorithm for simulating atomic registers, test-and-set, fetch-and-add, and read-modify-write registers in a message passing system. The algorithm is fault tolerant and works correctly in presence of up to (N/2) -1 node failures where N is the number of processors in the system. The high resilience of the algorithm is obtained by using randomized consensus algorithms and a robust communication primitive. The use of this primitive allows a processor to exchange local information with a majority of processors in a consistent way, and therefore to take decisions safely. The simulator makes it possible to translate algorithms for the shared memory model to that for the message passing model. With some minor modifications the algorithm can be used to robustly simulate shared queues, shared stacks, etc. (Abstract shortened with permission of author.)

University of Nevada, Las Vegas Repository

Fault recovery in distributed processing loop networks

Author: Hayes John P. (John Patrick)
Yanney Raif M.
Publication venue: 'Elsevier BV'
Publication date: 01/09/1988
Field of study

A graph model is introduced to formalize the fault recovery process in distributed loop networks. This model is applicable to centralized as well as distributed recovery. Key fault tolerance and recovery parameters including redundancy, fault model, recovery time, and recovery strategy are characterized. Centralized recovery strategies for a given fault-tolerant loop network are presented and analyzed. A distributed recovery strategy, which depends on the cooperation of a set of processors, is given, and its application to a new class of fault-tolerant loop networksis evaluated.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/27151/1/0000145.pd

Deep Blue Documents at the University of Michigan

Software implemented fault tolerance for microprocessor controllers: fault tolerance for microprocessor controllers

Author: Wingate Guy A.S.
Publication venue
Publication date: 01/01/1992
Field of study

It is generally accepted that transient faults are a major cause of failure in micro processor systems. Industrial controllers with embedded microprocessors are particularly at risk from this type of failure because their working environments are prone to transient disturbances which can generate transient faults. In order to improve the reliability of processor systems for industrial applications within a limited budget, fault tolerant techniques for uniprocessors are implemented. These techniques aim to identify characteristics of processor operation which are attributed to erroneous behaviour. Once detection is achieved, a programme of restoration activity can be initiated. This thesis initially develops a previous model of erroneous microprocessor behaviour from which characteristics particular to mal-operation are identified. A new technique is proposed, based on software implemented fault tolerance which, by recognizing a particular behavioural characteristic, facilitates the self-detection of erroneous execution. The technique involves inserting detection mechanisms into the target software. This can be quite a complex process and so a prototype software tool called Post-programming Automated Recovery UTility (PARUT) is developed to automate the technique's application. The utility can be used to apply the proposed behavioural fault tolerant technique for a selection of target processors. Fault injection and emulation experiments assess the effectiveness of the proposed fault tolerant technique for three application programs implemented on an 8, 16, and 32- bit processors respectively. The modified application programs are shown to have an improved detection capability and hence reliability when the proposed fault tolerant technique is applied. General assessment of the technique cannot be made, however, because its effectiveness is application specific. The thesis concludes by considering methods of generating non-hazardous application programs at the compilation stage, and design features for incorporation into the architecture of a microprocessor which inherently reduce the hazard, and increase the detection capability of the target software. Particular suggestions are made to add a 'PARUT' phase to the translation process, and to orientate microprocessor design towards the instruction opcode map

Durham e-Theses

Distributed Operating Systems

Author: ADAMS C. J.
ALMES G. T.
Andrew S. Tanenbaum
AVIZIENIS A.
AVIZIENIS A.
BALL J. E.
BIRMAN K. P.
BIRRELL A. D.
BLACK A. P.
BOGGS D. R.
BROWNBRIDGE D. R.
BRYANT R. M.
CHERITON D. R.
CHERITON D. R.
CHERITON D. R.
CHERITON D. R.
CHESSON G.
CHOW T. C.
CHOW Y. C.
CHU W. W.
DELLAR C.
EFE K.
FARBER D. J.
FINKEL R. A.
FITZGERALD R.
FRIDRICH M.
FRIDRICH M.
GAGLIANELLO R. D.
GUGOR V. D.
GYLYS V. B.
HWANG K.
ISLOOR S.
JANSON P.
JENSEN E. D.
JESSOP W. H.
LAMPSON B. W.
LAZOWSKA E. D.
LISKOV B.
LO V. M.
LUDERER G. W.
MAMRAK S. A.
MENASCE D.
MILLSTEIN R. E.
MOHAN C. K.
MULLENDER S. J.
MULLENDER S. J.
OKI B. M.
OUSTERHOUT J. K.
PASHTAN A.
POPEK G.
PU C.
RASHID R. F.
REED D. P.
Robbert Van Renesse
SATYANARAYANAN M.
SCHROEOER M.
SMITH R.
SOLOMON M. H.
SOLOMON M. H.
STANKOVIC J. A.
STONE H. S.
STONE H. S.
STONE H. S.
SVOBOOOVA L.
SW HART
TANENBAUM A. S.
TANENBAUM A. S.
VAN TILBORG A. M.
WAMBECQ A.
WEINSTEIN M. J.
WITTIE L.
WITTIE L. D.
WUPIT A.
ZIMMERMANN H.
ZIMMERMANN H.
Publication venue
Publication date: 01/01/1985
Field of study

Distributed operating systems have many aspects in common with centralized ones, but they also differ in certain ways. This paper is intended as an introduction to distributed operating systems, and especially to current university research about them. After a discussion of what constitutes a distributed operating system and how it is distinguished from a computer network, various key design issues are discussed. Then several examples of current research projects are examined in some detail, namely, the Cambridge Distributed Computing System, Amoeba, V, and Eden. © 1985, ACM. All rights reserved

VU Research Portal

Crossref

Building a generalized distributed system model

Author: Mukkamala R.
Publication venue
Publication date
Field of study

The key elements in the second year (1991-92) of our project are: (1) implementation of the distributed system prototype; (2) successful passing of the candidacy examination and a PhD proposal acceptance by the funded student; (3) design of storage efficient schemes for replicated distributed systems; and (4) modeling of gracefully degrading reliable computing systems. In the third year of the project (1992-93), we propose to: (1) complete the testing of the prototype; (2) enhance the functionality of the modules by enabling the experimentation with more complex protocols; (3) use the prototype to verify the theoretically predicted performance of locking protocols, etc.; and (4) work on issues related to real-time distributed systems. This should result in efficient protocols for these systems

NASA Technical Reports Server

A Touch of Evil: High-Assurance Cryptographic Hardware from Untrusted Components

Author: Cerulli Andrea
Cvrcek Dan
Danezis George
Klinec Dusan
Mavroudis Vasilios
Svenda Petr
Publication venue
Publication date: 28/10/2017
Field of study

The semiconductor industry is fully globalized and integrated circuits (ICs) are commonly defined, designed and fabricated in different premises across the world. This reduces production costs, but also exposes ICs to supply chain attacks, where insiders introduce malicious circuitry into the final products. Additionally, despite extensive post-fabrication testing, it is not uncommon for ICs with subtle fabrication errors to make it into production systems. While many systems may be able to tolerate a few byzantine components, this is not the case for cryptographic hardware, storing and computing on confidential data. For this reason, many error and backdoor detection techniques have been proposed over the years. So far all attempts have been either quickly circumvented, or come with unrealistically high manufacturing costs and complexity. This paper proposes Myst, a practical high-assurance architecture, that uses commercial off-the-shelf (COTS) hardware, and provides strong security guarantees, even in the presence of multiple malicious or faulty components. The key idea is to combine protective-redundancy with modern threshold cryptographic techniques to build a system tolerant to hardware trojans and errors. To evaluate our design, we build a Hardware Security Module that provides the highest level of assurance possible with COTS components. Specifically, we employ more than a hundred COTS secure crypto-coprocessors, verified to FIPS140-2 Level 4 tamper-resistance standards, and use them to realize high-confidentiality random number generation, key derivation, public key decryption and signing. Our experiments show a reasonable computational overhead (less than 1% for both Decryption and Signing) and an exponential increase in backdoor-tolerance as more ICs are added

arXiv.org e-Print Archive

UCL Discovery

Issues in providing a reliable multicast facility

Author: Dempsey Bert J.
Strayer W. Timothy
Weaver Alfred C.
Publication venue
Publication date
Field of study

Issues involved in point-to-multipoint communication are presented and the literature for proposed solutions and approaches surveyed. Particular attention is focused on the ideas and implementations that align with the requirements of the environment of interest. The attributes of multicast receiver groups that might lead to useful classifications, what the functionality of a management scheme should be, and how the group management module can be implemented are examined. The services that multicasting facilities can offer are presented, followed by mechanisms within the communications protocol that implements these services. The metrics of interest when evaluating a reliable multicast facility are identified and applied to four transport layer protocols that incorporate reliable multicast

NASA Technical Reports Server