1,511 research outputs found

    Techniques for the Fast Simulation of Models of Highly dependable Systems

    Get PDF
    With the ever-increasing complexity and requirements of highly dependable systems, their evaluation during design and operation is becoming more crucial. Realistic models of such systems are often not amenable to analysis using conventional analytic or numerical methods. Therefore, analysts and designers turn to simulation to evaluate these models. However, accurate estimation of dependability measures of these models requires that the simulation frequently observes system failures, which are rare events in highly dependable systems. This renders ordinary Simulation impractical for evaluating such systems. To overcome this problem, simulation techniques based on importance sampling have been developed, and are very effective in certain settings. When importance sampling works well, simulation run lengths can be reduced by several orders of magnitude when estimating transient as well as steady-state dependability measures. This paper reviews some of the importance-sampling techniques that have been developed in recent years to estimate dependability measures efficiently in Markov and nonMarkov models of highly dependable system

    Rich Interfaces for Dependability: Compositional Methods for Dynamic Fault Trees and Arcade models

    Get PDF
    This paper discusses two behavioural interfaces for reliability analysis: dynamic fault trees, which model the system reliability in terms of the reliability of its components and Arcade, which models the system reliability at an architectural level. For both formalisms, the reliability is analyzed by transforming the DFT or Arcade model to a set of input-output Markov Chains. By using compositional aggregation techniques based on weak bisimilarity, significant reductions in the state space can be obtained

    Experimental analysis of computer system dependability

    Get PDF
    This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance

    Rare event simulation for highly dependable systems with fast repairs

    Get PDF
    Stochastic model checking has been used recently to assess, among others, dependability measures for a variety of systems. However, the employed numerical methods, as, e.g., supported by model checking tools such as PRISM and MRMC, suffer from the state-space explosion problem. The main alternative is statistical model checking, which uses standard simulation, but this performs poorly when small probabilities need to be estimated. Therefore, we propose a method based on importance sampling to speed up the simulation process in cases where the failure probabilities are small due to the high speed of the system's repair units. This setting arises naturally in Markovian models of highly dependable systems. We show that our method compares favourably to standard simulation, to existing importance sampling techniques and to the numerical techniques of PRISM

    A comparison of numerical splitting-based methods for Markovian dependability and performability models

    Get PDF
    Iterative numerical methods are an important ingredient for the solution of continuous time Markov dependability models of fault-tolerant systems. In this paper we make a numerical comparison of several splitting-based iterative methods. We consider the computation of steady-state reward rate on rewarded models. This measure requires the solution of a singular linear system. We consider two classes of models. The first class includes failure/repair models. The second class is more general and includes the modeling of periodic preventive test of spare components to reduce the probability of latent failures in inactive components. The periodic preventive test is approximated by an Erlang distribution with enough number of stages. We show that for each class of model there is a splitting-based method which is significantly more efficient than the other methods.Postprint (published version

    Approximate performability and dependability analysis using generalized stochastic Petri Nets

    Get PDF
    Since current day fault-tolerant and distributed computer and communication systems tend to be large and complex, their corresponding performability models will suffer from the same characteristics. Therefore, calculating performability measures from these models is a difficult and time-consuming task.\ud \ud To alleviate the largeness and complexity problem to some extent we use generalized stochastic Petri nets to describe to models and to automatically generate the underlying Markov reward models. Still however, many models cannot be solved with the current numerical techniques, although they are conveniently and often compactly described.\ud \ud In this paper we discuss two heuristic state space truncation techniques that allow us to obtain very good approximations for the steady-state performability while only assessing a few percent of the states of the untruncated model. For a class of reversible models we derive explicit lower and upper bounds on the exact steady-state performability. For a much wider class of models a truncation theorem exists that allows one to obtain bounds for the error made in the truncation. We discuss this theorem in the context of approximate performability models and comment on its applicability. For all the proposed truncation techniques we present examples showing their usefulness

    Run-time risk management in adaptive ICT systems

    No full text
    We will present results of the SERSCIS project related to risk management and mitigation strategies in adaptive multi-stakeholder ICT systems. The SERSCIS approach involves using semantic threat models to support automated design-time threat identification and mitigation analysis. The focus of this paper is the use of these models at run-time for automated threat detection and diagnosis. This is based on a combination of semantic reasoning and Bayesian inference applied to run-time system monitoring data. The resulting dynamic risk management approach is compared to a conventional ISO 27000 type approach, and validation test results presented from an Airport Collaborative Decision Making (A-CDM) scenario involving data exchange between multiple airport service providers

    Towards Accurate Estimation of Error Sensitivity in Computer Systems

    Get PDF
    Fault injection is an increasingly important method for assessing, measuringand observing the system-level impact of hardware and software faults in computer systems. This thesis presents the results of a series of experimental studies in which fault injection was used to investigate the impact of bit-flip errors on program execution. The studies were motivated by the fact that transient hardware faults in microprocessors can cause bit-flip errors that can propagate to the microprocessors instruction set architecture registers and main memory. As the rate of such hardware faults is expected to increase with technology scaling, there is a need to better understand how these errors (known as ‘soft errors’) influence program execution, especially in safety-critical systems.Using ISA-level fault injection, we investigate how five aspects, or factors, influence the error sensitivity of a program. We define error sensitivity as the conditional probability that a bit-flip error in live data in an ISA-register or main-memory word will cause a program to produce silent data corruption (SDC; i.e., an erroneous result). We also consider the estimation of a measure called SDC count, which represents the number of ISA-level bit flips that cause an SDC.The five factors addressed are (a) the inputs processed by a program, (b) the level of compiler optimization, (c) the implementation of the program in the source code, (d) the fault model (single bit flips vs double bit flips) and (e)the fault-injection technique (inject-on-write vs inject-on-read). Our results show that these factors affect the error sensitivity in many ways; some factors strongly impact the error sensitivity or SDC count whereas others show a weaker impact. For example, our experiments show that single bit flips tend to cause SDCs more than double bit flips; compiler optimization positively impacts the SDC count but not necessarily the error sensitivity; the error sensitivity varies between 20% and 50% among the programs we tested; and variations in input affect the error sensitivity significantly for most of the tested programs
    corecore