Search CORE

35,566 research outputs found

Redundant Logic Insertion and Fault Tolerance Improvement in Combinational Circuits

Author: Balasubramanian P
Naayagi R T
Publication venue
Publication date: 21/07/2017
Field of study

This paper presents a novel method to identify and insert redundant logic into a combinational circuit to improve its fault tolerance without having to replicate the entire circuit as is the case with conventional redundancy techniques. In this context, it is discussed how to estimate the fault masking capability of a combinational circuit using the truth-cum-fault enumeration table, and then it is shown how to identify the logic that can introduced to add redundancy into the original circuit without affecting its native functionality and with the aim of improving its fault tolerance though this would involve some trade-off in the design metrics. However, care should be taken while introducing redundant logic since redundant logic insertion may give rise to new internal nodes and faults on those may impact the fault tolerance of the resulting circuit. The combinational circuit that is considered and its redundant counterparts are all implemented in semi-custom design style using a 32/28nm CMOS digital cell library and their respective design metrics and fault tolerances are compared

arXiv.org e-Print Archive

Crossref

Algorithmic Based Fault Tolerance Applied to High Performance Computing

Author: Bosilca George
Delmas Remi
Dongarra Jack
Langou Julien
Publication venue
Publication date: 01/01/2008
Field of study

We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance technique (Huang and Abraham, 1984) to the need of parallel distributed computation. We obtain a strongly scalable mechanism for fault tolerance. We can also detect and correct errors (bit-flip) on the fly of a computation. To assess the viability of our approach, we have developed a fault tolerant matrix-matrix multiplication subroutine and we propose some models to predict its running time. Our parallel fault-tolerant matrix-matrix multiplication scores 1.4 TFLOPS on 484 processors (cluster jacquard.nersc.gov) and returns a correct result while one process failure has happened. This represents 65% of the machine peak efficiency and less than 12% overhead with respect to the fastest failure-free implementation. We predict (and have observed) that, as we increase the processor count, the overhead of the fault tolerance drops significantly

arXiv.org e-Print Archive

CiteSeerX

MIMS EPrints

The University of Manchester - Institutional Repository

Recommended from our members

Fault tolerance in super-scalar and VLIW processors

Author: Blough Douglas M.
Nicolau Alexandru
Publication venue: eScholarship, University of California
Publication date: 01/01/1991
Field of study

In this paper, we present a method for utilizing the spare capacity in super-scalar and very long instruction word (VLIW) processors to tolerate functional unit failures. Unlike previous work that was primarily interested in detection of transient faults, we are concerned with more permanent and/or intermittent faults which necessitate processor reconfiguration. Our method utilizes the VLIW compiler or the superscalar scheduler to insert redundant operations whenever idle functional units exist. The results of these redundant operations are used to detect and diagnose functional unit failures. For super-scalar processors, the scheduler can then utilize this information to ensure that operations are performed only on non-faulty units. In VLIW processors, this is equivalent to recompiling the code to run on the remaining non-faulty functional units. Since in certain applications, recompilation may not be possible, we consider two alternative reconfiguration strategies for VLIW processors. These strategies sacrifice storage space and execution time, respectively, in order to reconfigure without recompiling. We present Markov models that describe the behavior of processors using these different approaches and we evaluate their reliabilities. The results show that, while super-scalar and VLIW with recompilation provide the highest reliability, all proposed strategies significantly increase reliability over that of an unprotected processor

eScholarship - University of California

Model Prediction-Based Approach to Fault Tolerant Control with Applications

Author: Khalid Dr. Haris M.
Mahmoud Professor Magdi S.
Publication venue
Publication date: 01/02/2013
Field of study

Abstract— Fault-tolerant control (FTC) is an integral component in industrial processes as it enables the system to continue robust operation under some conditions. In this paper, an FTC scheme is proposed for interconnected systems within an integrated design framework to yield a timely monitoring and detection of fault and reconfiguring the controller according to those faults. The unscented Kalman filter (UKF)-based fault detection and diagnosis system is initially run on the main plant and parameter estimation is being done for the local faults. This critical information\ud is shared through information fusion to the main system where the whole system is being decentralized using the overlapping decomposition technique. Using this parameter estimates of decentralized subsystems, a model predictive control (MPC) adjusts its parameters according to the\ud fault scenarios thereby striving to maintain the stability of the system. Experimental results on interconnected continuous time stirred tank reactors (CSTR) with recycle and quadruple tank system indicate that the proposed method is capable to correctly identify various faults, and then controlling the system under some conditions

CogPrints Cognitive Sciences Eprint Archive

On the reliability of electrical drives for safety-critical applications

Author: Cacciato M.
Caricchi F.
De Donato G.
Giulii Capponi F.
Scarcella G.
Scelba G.
Publication venue: 'Publications Office of the European Union'
Publication date: 01/01/2018
Field of study

The aim of this work is to present some issues related to fault tolerant electric drives,which are able to overcome different types of faults occurring in the sensors, in thepower converter and in the electrical machine, without compromising the overallfunctionality of the system. These features are of utmost importance in safety-criticalapplications. In this paper, the reliability of both commercial and innovative driveconfigurations, which use redundant hardware and suitable control algorithms, will beinvestigated for the most common types of fault: besides standard three phase motordrives, also multiphase topologies, open-end winding solutions, multi-machineconfigurations will be analyzed, applied to various electric motor technologies. Thecomplexity of hardware and control strategies will also be compared in this paper, sincethis has a tremendous impact on the investment costs

Archivio della ricerca- Università di Roma La Sapienza

On quantifying fault patterns of the mesh interconnect networks

Author: Fathy M.
Khonsari A.
Khosravipour S.
Ould-Khaoua M.
Safaei F.
Shafiei H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

One of the key issues in the design of Multiprocessors System-on-Chip (MP-SoCs), multicomputers, and peerto- peer networks is the development of an efficient communication network to provide high throughput and low latency and its ability to survive beyond the failure of individual components. Generally, the faulty components may be coalesced into fault regions, which are classified into convex and concave shapes. In this paper, we propose a mathematical solution for counting the number of common fault patterns in a 2-D mesh interconnect network including both convex (|-shape, | |-shape, Ã½-shape) and concave (L-shape, Ushape, T-shape, +-shape, H-shape) regions. The results presented in this paper which have been validated through simulation experiments can play a key role when studying, particularly, the performance analysis of fault-tolerant routing algorithms and measure of a network fault-tolerance expressed as the probability of a disconnection

Crossref

Enlighten