Search CORE

88,483 research outputs found

Recommended from our members

Reliability modeling of a 1-out-of-2 system: Research with diverse Off-the-shelf SQL database servers

Author: Bishop P. G.
Gashi I.
Littlewood B.
Wright D.
Publication venue
Publication date: 01/11/2007
Field of study

Fault tolerance via design diversity is often the only viable way of achieving sufficient dependability levels when using off-the-shelf components. We have reported previously on studies with bug reports of four open-source and commercial off-the-shelf database servers and later release of two of them. The results were very promising for designers of fault-tolerant solutions that wish to employ diverse servers: very few bugs caused failures in more than one server and none caused failure in more than two. In this paper we offer details of two approaches we have studied to construct reliability growth models for a 1-out-of-2 fault-tolerant server which utilize the bug reports. The models presented are of practical significance to system designers wishing to employ diversity with off-the-shelf components since often the bug reports are the only direct dependability evidence available to them

City Research Online

Crossref

Resilience in Numerical Methods: A Position on Fault Models and Methodologies

Author: Elliott James
Hoemmen Mark
Mueller Frank
Publication venue
Publication date: 13/01/2014
Field of study

Future extreme-scale computer systems may expose silent data corruption (SDC) to applications, in order to save energy or increase performance. However, resilience research struggles to come up with useful abstract programming models for reasoning about SDC. Existing work randomly flips bits in running applications, but this only shows average-case behavior for a low-level, artificial hardware model. Algorithm developers need to understand worst-case behavior with the higher-level data types they actually use, in order to make their algorithms more resilient. Also, we know so little about how SDC may manifest in future hardware, that it seems premature to draw conclusions about the average case. We argue instead that numerical algorithms can benefit from a numerical unreliability fault model, where faults manifest as unbounded perturbations to floating-point data. Algorithms can use inexpensive "sanity" checks that bound or exclude error in the results of computations. Given a selective reliability programming model that requires reliability only when and where needed, such checks can make algorithms reliable despite unbounded faults. Sanity checks, and in general a healthy skepticism about the correctness of subroutines, are wise even if hardware is perfectly reliable.Comment: Position Pape

arXiv.org e-Print Archive

CiteSeerX

Agent Based Test and Repair of Distributed Systems

Author: Benso Alfredo
Miclea L.
Prinetto Paolo Ernesto
Szilárd E.
Publication venue: IOS Press
Publication date: 01/01/2005
Field of study

This article demonstrates how to use intelligent agents for testing and repairing a distributed system, whose elements may or may not have embedded BIST (Built-In Self-Test) and BISR (Built-In Self-Repair) facilities. Agents are software modules that perform monitoring, diagnosis and repair of the faults. They form together a society whose members communicate, set goals and solve tasks. An experimental solution is presented, and future developments of the proposed approach are explore

PORTO Publications Open Repository TOrino

Software reliability and dependability: a roadmap

Author: Littlewood B.
Strigini L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2000
Field of study

Shifting the focus from software reliability to user-centred measures of dependability in complete software-based systems. Influencing design practice to facilitate dependability assessment. Propagating awareness of dependability issues and the use of existing, useful methods. Injecting some rigour in the use of process-related evidence for dependability assessment. Better understanding issues of diversity and variation as drivers of dependability. Bev Littlewood is founder-Director of the Centre for Software Reliability, and Professor of Software Engineering at City University, London. Prof Littlewood has worked for many years on problems associated with the modelling and evaluation of the dependability of software-based systems; he has published many papers in international journals and conference proceedings and has edited several books. Much of this work has been carried out in collaborative projects, including the successful EC-funded projects SHIP, PDCS, PDCS2, DeVa. He has been employed as a consultant t

CiteSeerX

City Research Online

Crossref