Search CORE

312 research outputs found

Recommended from our members

Algorithm Based Fault Tolerance in Massively Parallel Systems

Author: Lerner Mark D.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1988
Field of study

An A complex computer system consists of billions of transistors, miles of wires, and many interactions with an unpredictable environment. Correct results must be produced despite faults that dynamically occur in some of these components. Many techniques have been developed for fault tolerant computation. General purpose methods are independent of the application, yet incur an overhead cost which may be unacceptable for massively parallel systems. Algorithm-specific methods, which can operate at lower cost, are a developing alternative [1, 72]. This paper first reviews the general-purpose approach and then focuses on the algorithm-specific method, with an eye toward massively parallel processors. Algorithm-based fault tolerance has the attraction of low overhead; furthermore it addresses both the detection and also the correction problems. The principle is to build low-cost checking and correcting mechanism based exclusively on the redundancies inherent in the system

Columbia University Academic Commons

Analysis and design of algorithm-based fault-tolerant systems

Author: Nair V. S. Sukumaran
Publication venue
Publication date
Field of study

An important consideration in the design of high performance multiprocessor systems is to ensure the correctness of the results computed in the presence of transient and intermittent failures. Concurrent error detection and correction have been applied to such systems in order to achieve reliability. Algorithm Based Fault Tolerance (ABFT) was suggested as a cost-effective concurrent error detection scheme. The research was motivated by the complexity involved in the analysis and design of ABFT systems. To that end, a matrix-based model was developed and, based on that, algorithms for both the design and analysis of ABFT systems are formulated. These algorithms are less complex than the existing ones. In order to reduce the complexity further, a hierarchical approach is developed for the analysis of large systems

NASA Technical Reports Server

Coding approaches to fault tolerance in dynamic systems

Author
Publication venue: Research Laboratory of Electronics, Massachusetts Institute of Technology
Publication date: 01/01/1999
Field of study

Also issued as Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 189-196).Sponsored through a contract with Sanders, A Lockheed Martin Company.Christoforos N. Hadjicostis

DSpace@MIT

Fault-Tolerant Computing: An Overview

Author: Banerjee P.
Fuchs W.K.
Horst R.
Iyer R.K.
Patel J.H.
Publication venue: Center for Reliable and High-Performance Computing, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/06/1991
Field of study

Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNASA / NAG-1-613Semiconductor Research Corporation / 90-DP-109Joint Services Electronics Program / N00014-90-J-127

Illinois Digital Environment for Access to Learning and Scholarship Repository

Self test and self repair strategies in VLSI architectures for high speed digital correlation

Author: Blackley William Sinclair
Publication venue: The University of Edinburgh
Publication date: 01/01/1985
Field of study

Edinburgh Research Archive

The NASA computer science research program plan

Author
Publication venue
Publication date
Field of study

A taxonomy of computer science is included, one state of the art of each of the major computer science categories is summarized. A functional breakdown of NASA programs under Aeronautics R and D, space R and T, and institutional support is also included. These areas were assessed against the computer science categories. Concurrent processing, highly reliable computing, and information management are identified

NASA Technical Reports Server

Testability and redundancy techniques for improved yield and reliability of CMOS VLSI circuits

Author: Bensouiah Djamel Abderrahmane
Publication venue
Publication date: 01/01/1992
Field of study

The research presented in this thesis is concerned with the design of fault-tolerant integrated circuits as a contribution to the design of fault-tolerant systems. The economical manufacture of very large area ICs will necessitate the incorporation of fault-tolerance features which are routinely employed in current high density dynamic random access memories. Furthermore, the growing use of ICs in safety-critical applications and/or hostile environments in addition to the prospect of single-chip systems will mandate the use of fault-tolerance for improved reliability. A fault-tolerant IC must be able to detect and correct all possible faults that may affect its operation. The ability of a chip to detect its own faults is not only necessary for fault-tolerance, but it is also regarded as the ultimate solution to the problem of testing. Off-line periodic testing is selected for this research because it achieves better coverage of physical faults and it requires less extra hardware than on-line error detection techniques. Tests for CMOS stuck-open faults are shown to detect all other faults. Simple test sequence generation procedures for the detection of all faults are derived. The test sequences generated by these procedures produce a trivial output, thereby, greatly simplifying the task of test response analysis. A further advantage of the proposed test generation procedures is that they do not require the enumeration of faults. The implementation of built-in self-test is considered and it is shown that the hardware overhead is comparable to that associated with pseudo-random and pseudo-exhaustive techniques while achieving a much higher fault coverage through-the use of the proposed test generation procedures. The consideration of the problem of testing the test circuitry led to the conclusion that complete test coverage may be achieved if separate chips cooperate in testing each other's untested parts. An alternative approach towards complete test coverage would be to design the test circuitry so that it is as distributed as possible and so that it is tested as it performs its function. Fault correction relies on the provision of spare units and a means of reconfiguring the circuit so that the faulty units are discarded. This raises the question of what is the optimum size of a unit? A mathematical model, linking yield and reliability is therefore developed to answer such a question and also to study the effects of such parameters as the amount of redundancy, the size of the additional circuitry required for testing and reconfiguration, and the effect of periodic testing on reliability. The stringent requirement on the size of the reconfiguration logic is illustrated by the application of the model to a typical example. Another important result concerns the effect of periodic testing on reliability. It is shown that periodic off-line testing can achieve approximately the same level of reliability as on-line testing, even when the time between tests is many hundreds of hours

Durham e-Theses

An overview of decision table literature 1982-1995.

Author: Vanthienen Jan
Verhelle M
Publication venue
Publication date
Field of study

This report gives an overview of the literature on decision tables over the past 15 years. As much as possible, for each reference, an author supplied abstract, a number of keywords and a classification are provided. In some cases own comments are added. The purpose of these comments is to show where, how and why decision tables are used. The literature is classified according to application area, theoretical versus practical character, year of publication, country or origin (not necessarily country of publication) and the language of the document. After a description of the scope of the interview, classification results and the classification by topic are presented. The main body of the paper is the ordered list of publications with abstract, classification and comments.

Research Papers in Economics

Multiple bit error correcting architectures over finite fields

Author: Poolakkaparambil M
Publication venue: 'Oxford Brookes University'
Publication date: 01/01/2012
Field of study

This thesis proposes techniques to mitigate multiple bit errors in GF arithmetic circuits. As GF arithmetic circuits such as multipliers constitute the complex and important functional unit of a crypto-processor, making them fault tolerant will improve the reliability of circuits that are employed in safety applications and the errors may cause catastrophe if not mitigated. Firstly, a thorough literature review has been carried out. The merits of efficient schemes are carefully analyzed to study the space for improvement in error correction, area and power consumption. Proposed error correction schemes include bit parallel ones using optimized BCH codes that are useful in applications where power and area are not prime concerns. The scheme is also extended to dynamically correcting scheme to reduce decoder delay. Other method that suits low power and area applications such as RFIDs and smart cards using cross parity codes is also proposed. The experimental evaluation shows that the proposed techniques can mitigate single and multiple bit errors with wider error coverage compared to existing methods with lesser area and power consumption. The proposed scheme is used to mask the errors appearing at the output of the circuit irrespective of their cause. This thesis also investigates the error mitigation schemes in emerging technologies (QCA, CNTFET) to compare area, power and delay with existing CMOS equivalent. Though the proposed novel multiple error correcting techniques can not ensure 100% error mitigation, inclusion of these techniques to actual design can improve the reliability of the circuits or increase the difficulty in hacking crypto-devices. Proposed schemes can also be extended to non GF digital circuits

Oxford Brookes University: RADAR