Search CORE

40,246 research outputs found

Identification of Crash Fault & Value Fault for Random Network in Dynamic Environment

Author: Nayak Arindam
Panda Sandeep
Publication venue
Publication date: 12/05/2009
Field of study

During the past few years distributed systems have been the focus of considerable research in computer science. Fault tolerance in distributed systems is a wide area with a significant body of literature that is vastly diverse in methodology and terminology. Fault tolerance is the ability of a system to perform its function correctly even in the presence of internal faults. An extensive methodology has been developed in this field over the past few years, and a number of fault-tolerant machines have been developed but most dealing with random hardware faults, while a smaller number deal with software, design and operator faults to varying degrees. Our work mainly focuses on the simulation of the system that deals with software faults means the faults that occur because of the failure or error in the internal software component. Our work is restricted to distributed diagnosis in dynamic fault environment. Basically we have created different not-completely connected random networks with number of nodes ranging from 8 to 256.Then we have induced faults to these networks dynamically using poison distribution. Three different algorithms have been implemented to detect the faults and the comparison among these algorithms, based on delay latency and number of message exchanges, has been represented graphically. The software faults that we had dealt with are crash fault and value fault in a distributed system (not-completely connected network). Although many researches have been done in the crash fault area but very less work has been done in diagnosing the value faults in dynamic fault environment

ethesis@nitr

Control Caching : a fault-tolerant architecture for SEU mitigation in microprocessor control logic

Author: Subramanian Ganesh Tiruvaiyaru
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2006
Field of study

The importance of fault tolerance at the processor architecture level has been made increasingly important due to rapid advancements in the design and usage of high performance devices and embedded processors. System level solutions to the challenge of fault tolerance flag errors and utilize penalty cycles to recover through the re-execution of instructions. This motivates the need for a hybrid technique providing fault detection as well as fault masking, with minimal penalty cycles for recovery from detected errors. In this research, we propose Control Caching, an architectural technique comprising of three schemes to protect the control logic of microprocessors against Single Event Upsets (SEUs). High fault coverage with relatively low hardware overhead is obtained by using both fault detection with recovery and fault masking. Control signals are classified as either static or dynamic, and static signals are further classified as opcode dependent and instruction dependent. The strategy for protecting static instruction dependent control signals utilizes a distributed cache of the history of the control bits along with the Triple Modular Redundancy (TMR) concept, while the opcode dependent control signals are protected by a distributed cache which can be used to flag errors. Dynamic signals are protected by selective duplication of datapath components. The techniques are implemented on the OpenRISC 1200 processor. Our simulation results show that fault detection with single cycle recovery is provided for 92% of all instruction executions. FPGA synthesis is performed to analyze the associated cycle time and area overheads

Digital Repository @ Iowa State University (ISU)

A clustering-based hybrid replica control protocol for high availability in grid environment

Author: Abdullah Azizol
Hussin Masnida
Ibrahim Hamidah
Latip Rohaya
Mabni Zulaile
Publication venue: 'Science Publications'
Publication date: 01/01/2014
Field of study

In recent years, with the emergence of grid computing system, the number of distributed sites has become very large. When thousands of sites involved in a grid computing system, data replication can improve data availability, communication cost and provide fault-tolerance in the system. In the literature, many replica control protocols have been proposed for managing replicated data. However, in large scale distributed system, most of these protocols still require a bigger number of replicas for maintaining consistency, thus degrade the performance of the protocols. Therefore, in this study, we proposed a new replica control protocol named Clustering-Based Hybrid (CBH) protocol. CBH protocol employs a hybrid replication strategy by combining the advantages of two common replica control protocols into one to improve the performance of the existing protocols. We analyzed the communication cost and availability of the operations and compare CBH protocol with recently proposed replica control protocol named Dynamic Hybrid (DH) protocol. A simulation model was developed using Java to evaluate CBH protocol. Our results show that the proposed protocol provides lower communication cost and higher data availability than DH protocol

Crossref

Universiti Putra Malaysia Institutional Repository

SEU Mitigation Techniques for Microprocessor Control Logic

Author: Ganesh T. S.
Somani Arun
Somani Arun
Subramanian Viswanathan
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2006
Field of study

The importance of fault tolerance at the processor architecture level has been made increasingly important due to rapid advancements in the design and usage of high performance devices and embedded processors. System level solutions to the challenge of fault tolerance flag errors and utilize penalty cycles to recover through the re-execution of instructions. This motivates the need for a hybrid technique providing fault detection as well as fault masking, with minimal penalty cycles for recovery from detected errors. We propose three architectural schemes to protect the control logic of microprocessors against single event upsets (SEUs). High fault coverage with relatively low hardware overhead is obtained by using both fault detection with recovery and fault masking. Control signals are classified as either static or dynamic, and static signals are further classified as opcode dependent and instruction dependent. The strategy for protecting static instruction dependent control signals utilizes a distributed cache of the history of the control bits along with the triple modular redundancy (TMR) concept, while the opcode dependent control signals are protected by a distributed cache which is used to flag errors. Dynamic signals are protected by selective duplication of datapath components. The techniques are implemented on the OpenRISC 1200 processor. Our simulation results show that fault detection with single cycle recovery is provided for 92% of all instruction executions. FPGA synthesis is performed to analyze the associated cycle time and area overheads

Digital Repository @ Iowa State University (ISU)