Search CORE

68,009 research outputs found

Recommended from our members

Assessing the reliability of diverse fault-tolerant software-based systems

Author: Littlewood B.
Popov P. T.
Strigini L.
Publication venue: 'Elsevier BV'
Publication date: 01/12/2002
Field of study

We discuss a problem in the safety assessment of automatic control and protection systems. There is an increasing dependence on software for performing safety-critical functions, like the safety shut-down of dangerous plants. Software brings increased risk of design defects and thus systematic failures; redundancy with diversity between redundant channels is a possible defence. While diversity techniques can improve the dependability of software-based systems, they do not alleviate the difficulties of assessing whether such a system is safe enough for operation. We study this problem for a simple safety protection system consisting of two diverse channels performing the same function. The problem is evaluating its probability of failure in demand. Assuming failure independence between dangerous failures of the channels is unrealistic. One can instead use evidence from the observation of the whole system's behaviour under realistic test conditions. Standard inference procedures can then estimate system reliability, but they take no advantage of a system’s fault-tolerant structure. We show how to extend these techniques to take account of fault tolerance by a conceptually straightforward application of Bayesian inference. Unfortunately, the method is computationally complex and requires the conceptually difficult step of specifying 'prior' distributions for the parameters of interest. This paper presents the correct inference procedure, exemplifies possible pitfalls in its application and clarifies some non-intuitive issues about reliability assessment for fault-tolerant software

City Research Online

Applying Prolog to Develop Distributed Systems

Author: ANDREY RYBALCHENKO
Armstrong
Ashley-Rollman
ATUL SINGH
Casas
Chu
Clement
Jannotti
JUAN A. NAVARRO
Killian
Leonini
Li
Navarro
NUNO P. LOPES
Pérez
Rhea
Singh
Wang
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 09/07/2010
Field of study

Development of distributed systems is a difficult task. Declarative programming techniques hold a promising potential for effectively supporting programmer in this challenge. While Datalog-based languages have been actively explored for programming distributed systems, Prolog received relatively little attention in this application area so far. In this paper we present a Prolog-based programming system, called DAHL, for the declarative development of distributed systems. DAHL extends Prolog with an event-driven control mechanism and built-in networking procedures. Our experimental evaluation using a distributed hash-table data structure, a protocol for achieving Byzantine fault tolerance, and a distributed software model checker - all implemented in DAHL - indicates the viability of the approach

arXiv.org e-Print Archive

Crossref

UCL Discovery

Analytical sensor redundancy assessment

Author: Downing L. E.
Mulcare D. B.
Smith M. K.
Publication venue
Publication date
Field of study

The rationale and mechanization of sensor fault tolerance based on analytical redundancy principles are described. The concept involves the substitution of software procedures, such as an observer algorithm, to supplant additional hardware components. The observer synthesizes values of sensor states in lieu of their direct measurement. Such information can then be used, for example, to determine which of two disagreeing sensors is more correct, thus enhancing sensor fault survivability. Here a stability augmentation system is used as an example application, with required modifications being made to a quadruplex digital flight control system. The impact on software structure and the resultant revalidation effort are illustrated as well. Also, the use of an observer algorithm for wind gust filtering of the angle-of-attack sensor signal is presented

NASA Technical Reports Server

Using real options to select stable Middleware-induced software architectures

Author: Bahsoon R.
Emmerich W.
Macke J.
Publication venue
Publication date: 01/01/2005
Field of study

The requirements that force decisions towards building distributed system architectures are usually of a non-functional nature. Scalability, openness, heterogeneity, and fault-tolerance are examples of such non-functional requirements. The current trend is to build distributed systems with middleware, which provide the application developer with primitives for managing the complexity of distribution, system resources, and for realising many of the non-functional requirements. As non-functional requirements evolve, the `coupling' between the middleware and architecture becomes the focal point for understanding the stability of the distributed software system architecture in the face of change. It is hypothesised that the choice of a stable distributed software architecture depends on the choice of the underlying middleware and its flexibility in responding to future changes in non-functional requirements. Drawing on a case study that adequately represents a medium-size component-based distributed architecture, it is reported how a likely future change in scalability could impact the architectural structure of two versions, each induced with a distinct middleware: one with CORBA and the other with J2EE. An option-based model is derived to value the flexibility of the induced-architectures and to guide the selection. The hypothesis is verified to be true for the given change. The paper concludes with some observations that could stimulate future research in the area of relating requirements to software architectures

UCL Discovery

Fault Tolerance Middleware for a Multi-Core System

Author: James Mark
Some Raphael R.
Springer Paul L.
Wagner David A.
Zima Hans P.
Publication venue
Publication date
Field of study

Fault Tolerance Middleware (FTM) provides a framework to run on a dedicated core of a multi-core system and handles detection of single-event upsets (SEUs), and the responses to those SEUs, occurring in an application running on multiple cores of the processor. This software was written expressly for a multi-core system and can support different kinds of fault strategies, such as introspection, algorithm-based fault tolerance (ABFT), and triple modular redundancy (TMR). It focuses on providing fault tolerance for the application code, and represents the first step in a plan to eventually include fault tolerance in message passing and the FTM itself. In the multi-core system, the FTM resides on a single, dedicated core, separate from the cores used by the application. This is done in order to isolate the FTM from application faults and to allow it to swap out any application core for a substitute. The structure of the FTM consists of an interface to a fault tolerant strategy module, a responder module, a fault manager module, an error factory, and an error mapper that determines the severity of the error. In the present reference implementation, the only fault tolerant strategy implemented is introspection. The introspection code waits for an application node to send an error notification to it. It then uses the error factory to create an error object, and at this time, a severity level is assigned to the error. The introspection code uses its built-in knowledge base to generate a recommended response to the error. Responses might include ignoring the error, logging it, rolling back the application to a previously saved checkpoint, swapping in a new node to replace a bad one, or restarting the application. The original error and recommended response are passed to the top-level fault manager module, which invokes the response. The responder module also notifies the introspection module of the generated response. This provides additional information to the introspection module that it can use in generating its next response. For example, if the responder triggers an application rollback and errors are still occurring, the introspection module may decide to recommend an application restart

NASA Technical Reports Server

Fault-Tolerant FPGA-Based Systems

Author: Elshafey Khaled
Hlavička Jan
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 21/02/2012
Field of study

This paper presents a new approach to on-line fault tolerance via reconfiguration for the systems mapped onto field programmable gate arrays (FPGAs). The fault detection, based on self-checking technique, is introduced at application level; therefore our approach can detect the faults of configurable logic blocks (CLBs) and routing interconnections in the FPGAs concurrently with the normal system work. A grid of tiles is projected on the FPGA structure and a certain number of spare CLBs is reserved inside every tile. The number of spare CLBs per tile, which will be used as a backup upon detecting any faulty CLB, is estimated in accordance with the probability of failure. After locating the faulty CLBs, the faulty tile will be reconfigured with avoiding the faulty CLBs. Our proposed approach uses a combination of hardware and software redundancy. We assume that a module external to the FPGA controls automatically the reconfiguration process in addition to the diagnosis process (DIRC); typically this is an embedded microprocessor having some storage for the various tile configurations. We have implemented our approach using Xilinx Virtex FPGA. The DIRC code is written in JBits software tools. In response to a component failure this approach capitalizes on the unique reconfiguration capabilities of FPGAs and replaces the affected tile with a functionally equivalent one that does not rely on the faulty component. Unlike fixed structure fault-tolerance techniques for ASICs and microprocessors, this approach allows a single physical component to provide redundant backup for several types of components

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

GRAPHICAL MODELING AND SIMULATION OF A HYBRID HETEROGENEOUS AND DYNAMIC SINGLE-CHIP MULTIPROCESSOR ARCHITECTURE

Author: Zheng Chunfang
Publication venue: UKnowledge
Publication date: 01/01/2004
Field of study

A single-chip, hybrid, heterogeneous, and dynamic shared memory multiprocessor architecture is being developed which may be used for real-time and non-real-time applications. This architecture can execute any application described by a dataflow (process flow) graph of any topology; it can also dynamically reconfigure its structure at the node and processor architecture levels and reallocate its resources to maximize performance and to increase reliability and fault tolerance. Dynamic change in the architecture is triggered by changes in parameters such as application input data rates, process execution times, and process request rates. The architecture is a Hybrid Data/Command Driven Architecture (HDCA). It operates as a dataflow architecture, but at the process level rather than the instruction level. This thesis focuses on the development, testing and evaluation of a new graphic software (hdca) developed to first do a static resource allocation for the architecture to meet timing requirements of an application and then hdca simulates the architecture executing the application using statically assigned resources and parameters. While simulating the architecture executing an application, the software graphically and dynamically displays parameters and mechanisms important to the architectures operation and performance. The new graphical software is able to show system and node level dynamic capability of the HDCA. The newly developed software can model a fixed or varying input data rate. The model also allows fault tolerance analysis of the architecture

University of Kentucky

Comparing a Traditional and a Multi-Agent Load-Balancing System

Author: Bezek Andraz
Gams Matjaz
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 30/01/2012
Field of study

This article presents a comparison between agent and non-agent based approaches to building network-load-balancing systems. In particular, two large software systems are compared, one traditional and the other agent-based, both performing the same load balancing functions. Due to the two different architectures, several differences emerge. The differences are analyzed theoretically and practically in terms of design, scalability and fault-tolerance. The advantages and disadvantages of both approaches are presented by combining an analysis of the system and gathering the experience of designers, developers and users. Traditionally, designers specify rigid software structure, while for multi-agent systems the emphasis is on specifying the different tasks and roles, as well as the interconnections between the agents that cooperate autonomously and simultaneously. The major advantages of the multi-agent approach are the introduced abstract design layers and, as a consequence, the more comprehendible top-level design, the increased redundancy, and the improved fault tolerance. The major improvement in performance due to the agent architecture is observed in the case of one or more failed computers. Although the agent-oriented design might not be a silver bullet for building large distributed systems, our analysis and application confirm that it does have a number of advantages over non-agent approaches

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)