Search CORE

14,339 research outputs found

Design diversity: an update from research on reliability modelling

Author: B Littlewood
B Littlewood
B Littlewood
B Littlewood
DE Eckhardt
JC Knight
L Hatton
P Popov
P Popov
P Popov
Publication venue
Publication date: 01/01/2001
Field of study

Diversity between redundant subsystems is, in various forms, a common design approach for improving system dependability. Its value in the case of software-based systems is still controversial. This paper gives an overview of reliability modelling work we carried out in recent projects on design diversity, presented in the context of previous knowledge and practice. These results provide additional insight for decisions in applying diversity and in assessing diverseredundant systems. A general observation is that, just as diversity is a very general design approach, the models of diversity can help conceptual understanding of a range of different situations. We summarise results in the general modelling of common-mode failure, in inference from observed failure data, and in decision-making for diversity in development.

CiteSeerX

City Research Online

Crossref

Fault Tolerant Adaptive Parallel and Distributed Simulation through Functional Replication

Author: D'Angelo Gabriele
Ferretti Stefano
Marzolla Moreno
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation middleware. FT-GAIA has being designed to reliably handle Parallel And Distributed Simulation (PADS) models, which are needed to properly simulate and analyze complex systems arising in any kind of scientific or engineering field. PADS takes advantage of multiple execution units run in multicore processors, cluster of workstations or HPC systems. However, large computing systems, such as HPC systems that include hundreds of thousands of computing nodes, have to handle frequent failures of some components. To cope with this issue, FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes. Moreover, FT-GAIA offers some protection against Byzantine failures, since interaction messages among the simulated entities are replicated as well, so that the receiving entity can identify and discard corrupted messages. Results from an analytical model and from an experimental evaluation show that FT-GAIA provides a high degree of fault tolerance, at the cost of a moderate increase in the computational load of the execution units.Comment: arXiv admin note: substantial text overlap with arXiv:1606.0731

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Urbino

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Recommended from our members

Modeling software design diversity

Author: ANDERSON T.
BABBAGE C.
Bev Littlewood
BISHOP P. G.
BISHOP P.G.
FAA
HAGELIN G.
KNIGHT J.C.
LARYD A.
LINDEBERG J. F.
Lorenzo Strigini
Peter Popov
POPOV P.
TRAVERSE P. J.
VOGES U.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2001
Field of study

Design diversity has been used for many years now as a means of achieving a degree of fault tolerance in software-based systems. Whilst there is clear evidence that the approach can be expected to deliver some increase in reliability compared with a single version, there is not agreement about the extent of this. More importantly, it remains difficult to evaluate exactly how reliable a particular diverse fault-tolerant system is. This difficulty arises because assumptions of independence of failures between different versions have been shown not to be tenable: assessment of the actual level of dependence present is therefore needed, and this is hard. In this tutorial we survey the modelling issues here, with an emphasis upon the impact these have upon the problem of assessing the reliability of fault tolerant systems. The intended audience is one of designers, assessors and project managers with only a basic knowledge of probabilities, as well as reliability experts without detailed knowledge of software, who seek an introduction to the probabilistic issues in decisions about design diversity

City Research Online

Crossref

Quantitative evaluation of Pandora Temporal Fault Trees via Petri Nets

Author: Kabir Sohag
Papadopoulos Yiannis
Walker Martin
Publication venue: 'Elsevier BV'
Publication date: 15/10/2015
Field of study

© 2015, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Using classical combinatorial fault trees, analysts are able to assess the effects of combinations of failures on system behaviour but are unable to capture sequence dependent dynamic behaviour. Pandora introduces temporal gates and temporal laws to fault trees to allow sequence-dependent dynamic analysis of events. Pandora can be easily integrated in model-based design and analysis techniques; however, the combinatorial quantification techniques used to solve classical fault trees cannot be applied to temporal fault trees. Temporal fault trees capture state and therefore require a state space solution for quantification of probability. In this paper, we identify Petri Nets as a possible framework for quantifying temporal trees. We describe how Pandora fault trees can be mapped to Petri Nets for dynamic dependability analysis and demonstrate the process on a fault tolerant fuel distribution system model

Repository@Hull - Worktribe

A synthesis of logic and bio-inspired techniques in the design of dependable systems

Author: Azevedo Luis
Bottaci Leonardo
Kabir Sohag
Papadopoulos Yiannis
Parker David
Sharvia Septavera
Sorokos Ioannis
Walker Martin
Publication venue: 'Elsevier BV'
Publication date: 04/05/2016
Field of study

Much of the development of model-based design and dependability analysis in the design of dependable systems, including software intensive systems, can be attributed to the application of advances in formal logic and its application to fault forecasting and verification of systems. In parallel, work on bio-inspired technologies has shown potential for the evolutionary design of engineering systems via automated exploration of potentially large design spaces. We have not yet seen the emergence of a design paradigm that effectively combines these two techniques, schematically founded on the two pillars of formal logic and biology, from the early stages of, and throughout, the design lifecycle. Such a design paradigm would apply these techniques synergistically and systematically to enable optimal refinement of new designs which can be driven effectively by dependability requirements. The paper sketches such a model-centric paradigm for the design of dependable systems, presented in the scope of the HiP-HOPS tool and technique, that brings these technologies together to realise their combined potential benefits. The paper begins by identifying current challenges in model-based safety assessment and then overviews the use of meta-heuristics at various stages of the design lifecycle covering topics that span from allocation of dependability requirements, through dependability analysis, to multi-objective optimisation of system architectures and maintenance schedules

Repository@Hull - Worktribe

Recommended from our members

Enhancing Fault / Intrusion Tolerance through Design and Configuration Diversity

Author: Bessani A. N.
Daidone A.
Gashi I.
Obelheiro R. R.
Sousa P.
Stankovic V.
Publication venue
Publication date: 01/01/2009
Field of study

Fault/intrusion tolerance is usually the only viable way of improving the system dependability and security in the presence of continuously evolving threats. Many of the solutions in the literature concern a specific snapshot in the production or deployment of a fault-tolerant system and no immediate considerations are made about how the system should evolve to deal with novel threats. In this paper we outline and evaluate a set of operating systems’ and applications’ reconfiguration rules which can be used to modify the state of a system replica prior to deployment or in between recoveries, and hence increase the replicas chance of a longer intrusion-free operation

City Research Online

Universidade de Lisboa: Repositório.UL

Advanced techniques in reliability model representation and solution

Author: Nicol David M.
Palumbo Daniel L.
Publication venue
Publication date
Field of study

The current tendency of flight control system designs is towards increased integration of applications and increased distribution of computational elements. The reliability analysis of such systems is difficult because subsystem interactions are increasingly interdependent. Researchers at NASA Langley Research Center have been working for several years to extend the capability of Markov modeling techniques to address these problems. This effort has been focused in the areas of increased model abstraction and increased computational capability. The reliability model generator (RMG) is a software tool that uses as input a graphical object-oriented block diagram of the system. RMG uses a failure-effects algorithm to produce the reliability model from the graphical description. The ASSURE software tool is a parallel processing program that uses the semi-Markov unreliability range evaluator (SURE) solution technique and the abstract semi-Markov specification interface to the SURE tool (ASSIST) modeling language. A failure modes-effects simulation is used by ASSURE. These tools were used to analyze a significant portion of a complex flight control system. The successful combination of the power of graphical representation, automated model generation, and parallel computation leads to the conclusion that distributed fault-tolerant system architectures can now be analyzed

NASA Technical Reports Server