5,087 research outputs found

    Techniques for the Fast Simulation of Models of Highly dependable Systems

    Get PDF
    With the ever-increasing complexity and requirements of highly dependable systems, their evaluation during design and operation is becoming more crucial. Realistic models of such systems are often not amenable to analysis using conventional analytic or numerical methods. Therefore, analysts and designers turn to simulation to evaluate these models. However, accurate estimation of dependability measures of these models requires that the simulation frequently observes system failures, which are rare events in highly dependable systems. This renders ordinary Simulation impractical for evaluating such systems. To overcome this problem, simulation techniques based on importance sampling have been developed, and are very effective in certain settings. When importance sampling works well, simulation run lengths can be reduced by several orders of magnitude when estimating transient as well as steady-state dependability measures. This paper reviews some of the importance-sampling techniques that have been developed in recent years to estimate dependability measures efficiently in Markov and nonMarkov models of highly dependable system

    Rare event simulation for highly dependable systems with fast repairs

    Get PDF
    Stochastic model checking has been used recently to assess, among others, dependability measures for a variety of systems. However, the employed numerical methods, as, e.g., supported by model checking tools such as PRISM and MRMC, suffer from the state-space explosion problem. The main alternative is statistical model checking, which uses standard simulation, but this performs poorly when small probabilities need to be estimated. Therefore, we propose a method based on importance sampling to speed up the simulation process in cases where the failure probabilities are small due to the high speed of the system's repair units. This setting arises naturally in Markovian models of highly dependable systems. We show that our method compares favourably to standard simulation, to existing importance sampling techniques and to the numerical techniques of PRISM

    On-Line Dependability Enhancement of Multiprocessor SoCs by Resource Management

    Get PDF
    This paper describes a new approach towards dependable design of homogeneous multi-processor SoCs in an example satellite-navigation application. First, the NoC dependability is functionally verified via embedded software. Then the Xentium processor tiles are periodically verified via on-line self-testing techniques, by using a new IIP Dependability Manager. Based on the Dependability Manager results, faulty tiles are electronically excluded and replaced by fault-free spare tiles via on-line resource management. This integrated approach enables fast electronic fault detection/diagnosis and repair, and hence a high system availability. The dependability application runs in parallel with the actual application, resulting in a very dependable system. All parts have been verified by simulation

    Dependable Digitally-Assisted Mixed-Signal IPs Based on Integrated Self-Test & Self-Calibration

    Get PDF
    Heterogeneous SoC devices, including sensors, analogue and mixed-signal front-end circuits and the availability of massive digital processing capability, are being increasingly used in safety-critical applications like in the automotive, medical, and the security arena. Already a significant amount of attention has been paid in literature with respect to the dependability of the digital parts in heterogeneous SoCs. This is in contrast to especially the sensors and front-end mixed-signal electronics; these are however particular sensitive to external influences over time and hence determining their dependability. This paper provides an integrated SoC/IP approach to enhance the dependability. It will give an example of a digitally-assisted mixed-signal front-end IP which is being evaluated under its mission profile of an automotive tyre pressure monitoring system. It will be shown how internal monitoring and digitally-controlled adaptation by using embedded processors can help in terms of improving the dependability of this mixed-signal part under harsh conditions for a long time

    Stochastic model checking for predicting component failures and service availability

    Get PDF
    When a component fails in a critical communications service, how urgent is a repair? If we repair within 1 hour, 2 hours, or n hours, how does this affect the likelihood of service failure? Can a formal model support assessing the impact, prioritisation, and scheduling of repairs in the event of component failures, and forecasting of maintenance costs? These are some of the questions posed to us by a large organisation and here we report on our experience of developing a stochastic framework based on a discrete space model and temporal logic to answer them. We define and explore both standard steady-state and transient temporal logic properties concerning the likelihood of service failure within certain time bounds, forecasting maintenance costs, and we introduce a new concept of envelopes of behaviour that quantify the effect of the status of lower level components on service availability. The resulting model is highly parameterised and user interaction for experimentation is supported by a lightweight, web-based interface

    Estimating the Probability of a Rare Event Over a Finite Time Horizon

    Get PDF
    We study an approximation for the zero-variance change of measure to estimate the probability of a rare event in a continuous-time Markov chain. The rare event occurs when the chain reaches a given set of states before some fixed time limit. The jump rates of the chain are expressed as functions of a rarity parameter in a way that the probability of the rare event goes to zero when the rarity parameter goes to zero, and the behavior of our estimators is studied in this asymptotic regime. After giving a general expression for the zero-variance change of measure in this situation, we develop an approximation of it via a power series and show that this approximation provides a bounded relative error when the rarity parameter goes to zero. We illustrate the performance of our approximation on small numerical examples of highly reliableMarkovian systems. We compare it to a previously proposed heuristic that combines forcing with balanced failure biaising. We also exhibit the exact zero-variance change of measure for these examples and compare it with these two approximations

    Reliability of Erasure Coded Storage Systems: A Geometric Approach

    Full text link
    We consider the probability of data loss, or equivalently, the reliability function for an erasure coded distributed data storage system under worst case conditions. Data loss in an erasure coded system depends on probability distributions for the disk repair duration and the disk failure duration. In previous works, the data loss probability of such systems has been studied under the assumption of exponentially distributed disk failure and disk repair durations, using well-known analytic methods from the theory of Markov processes. These methods lead to an estimate of the integral of the reliability function. Here, we address the problem of directly calculating the data loss probability for general repair and failure duration distributions. A closed limiting form is developed for the probability of data loss and it is shown that the probability of the event that a repair duration exceeds a failure duration is sufficient for characterizing the data loss probability. For the case of constant repair duration, we develop an expression for the conditional data loss probability given the number of failures experienced by a each node in a given time window. We do so by developing a geometric approach that relies on the computation of volumes of a family of polytopes that are related to the code. An exact calculation is provided and an upper bound on the data loss probability is obtained by posing the problem as a set avoidance problem. Theoretical calculations are compared to simulation results.Comment: 28 pages. 8 figures. Presented in part at IEEE International Conference on BigData 2013, Santa Clara, CA, Oct. 2013 and to be presented in part at 2014 IEEE Information Theory Workshop, Tasmania, Australia, Nov. 2014. New analysis added May 2015. Further Update Aug. 201
    corecore