366 research outputs found

    Techniques for the Fast Simulation of Models of Highly dependable Systems

    Get PDF
    With the ever-increasing complexity and requirements of highly dependable systems, their evaluation during design and operation is becoming more crucial. Realistic models of such systems are often not amenable to analysis using conventional analytic or numerical methods. Therefore, analysts and designers turn to simulation to evaluate these models. However, accurate estimation of dependability measures of these models requires that the simulation frequently observes system failures, which are rare events in highly dependable systems. This renders ordinary Simulation impractical for evaluating such systems. To overcome this problem, simulation techniques based on importance sampling have been developed, and are very effective in certain settings. When importance sampling works well, simulation run lengths can be reduced by several orders of magnitude when estimating transient as well as steady-state dependability measures. This paper reviews some of the importance-sampling techniques that have been developed in recent years to estimate dependability measures efficiently in Markov and nonMarkov models of highly dependable system

    Rare event simulation for highly dependable systems with fast repairs

    Get PDF
    Stochastic model checking has been used recently to assess, among others, dependability measures for a variety of systems. However, the employed numerical methods, as, e.g., supported by model checking tools such as PRISM and MRMC, suffer from the state-space explosion problem. The main alternative is statistical model checking, which uses standard simulation, but this performs poorly when small probabilities need to be estimated. Therefore, we propose a method based on importance sampling to speed up the simulation process in cases where the failure probabilities are small due to the high speed of the system's repair units. This setting arises naturally in Markovian models of highly dependable systems. We show that our method compares favourably to standard simulation, to existing importance sampling techniques and to the numerical techniques of PRISM

    Stochastic model checking for predicting component failures and service availability

    Get PDF
    When a component fails in a critical communications service, how urgent is a repair? If we repair within 1 hour, 2 hours, or n hours, how does this affect the likelihood of service failure? Can a formal model support assessing the impact, prioritisation, and scheduling of repairs in the event of component failures, and forecasting of maintenance costs? These are some of the questions posed to us by a large organisation and here we report on our experience of developing a stochastic framework based on a discrete space model and temporal logic to answer them. We define and explore both standard steady-state and transient temporal logic properties concerning the likelihood of service failure within certain time bounds, forecasting maintenance costs, and we introduce a new concept of envelopes of behaviour that quantify the effect of the status of lower level components on service availability. The resulting model is highly parameterised and user interaction for experimentation is supported by a lightweight, web-based interface

    Fast simulation of the leaky bucket algorithm

    Get PDF
    We use fast simulation methods, based on importance sampling, to efficiently estimate cell loss probability in queueing models of the Leaky Bucket algorithm. One of these models was introduced by Berger (1991), in which the rare event of a cell loss is related to the rare event of an empty finite buffer in an "overloaded" queue. In particular, we propose a heuristic change of measure for importance sampling to efficiently estimate the probability of the rare empty-buffer event in an asymptotically unstable GI/GI/1/k queue. This change of measure is, in a way, "dual" to that proposed by Parekh and Walrand (1989) to estimate the probability of a rare buffer overflow event. We present empirical results to demonstrate the effectiveness of our fast simulation method. Since we have not yet obtained a mathematical proof, we can only conjecture that our heuristic is asymptotically optimal, as k/spl rarr//spl infin/

    Methodologies synthesis

    Get PDF
    This deliverable deals with the modelling and analysis of interdependencies between critical infrastructures, focussing attention on two interdependent infrastructures studied in the context of CRUTIAL: the electric power infrastructure and the information infrastructures supporting management, control and maintenance functionality. The main objectives are: 1) investigate the main challenges to be addressed for the analysis and modelling of interdependencies, 2) review the modelling methodologies and tools that can be used to address these challenges and support the evaluation of the impact of interdependencies on the dependability and resilience of the service delivered to the users, and 3) present the preliminary directions investigated so far by the CRUTIAL consortium for describing and modelling interdependencies

    Fast simulation of packet loss rates in a shared buffer communications switch

    Get PDF
    This paper describes an efficient technique for estimating, via simulation, the probability of buffer overflows in a queueing model that arises in the analysis of ATM (Asynchronous Transfer Mode) communication switches. There are multiple streams of (autocorrelated) traffic feeding the switch that has a buffer of finite capacity. Each stream is designated as either being of high or low priority. When the queue length reaches a certain threshold, only high priority packets are admitted to the switch's buffer. The problem is to estimate the loss rate of high priority packets. An asymptotically optimal importance sampling approach is developed for this rare event simulation problem. In this approach, the importance sampling is done in two distinct phases. In the first phase, an importance sampling change of measure is used to bring the queue length up to the threshold at which low priority packets get rejected. In the second phase, a different importance sampling change of measure is used to move the queue length from the threshold to the buffer capacity

    Efficient exploration of availability models guided by failure distances

    Get PDF
    Recently, a method to bound the steady-state availability using the failure distance concept has been proposed. In this paper we refine that method by introducing state space exploration techniques. In the methods proposed here, the state space is incrementally generated based on the contributions to the steady-state availability band of the states in the frontier of the currently generated state space. Several state space exploration algorithms are evaluated in terms of bounds quality and memory and CPU time requirements. The more efficient seems to be a waved algorithm which expands transition groups. We compare our new methods with the method based on the failure distance concept without state exploration and a method proposed by Souza e Silva and Ochoa which uses state space exploration but does not use the failure distance concept. Using typical examples we show that the methods proposed here can be significantly more efficient than any of the previous methods.Postprint (published version

    Resource management of replicated service systems provisioned in the cloud

    Get PDF
    Service providers seek scalable and cost-effective cloud solutions for hosting their applications. Despite significant recent advances facilitating the deployment and management of services on cloud platforms, a number of challenges still remain. Service providers are confronted with time-varying requests for the provided applications, inter- dependencies between different components, performance variability of the procured virtual resources, and cost structures that differ from conventional data centers. Moreover, fulfilling service level agreements, such as the throughput and response time percentiles, becomes of paramount importance for ensuring business advantages.In this thesis, we explore service provisioning in clouds from multiple points of view. The aim is to best provide service replicas in the form of VMs to various service applications, such that their tail throughput and tail response times, as well as resource utilization, meet the service level agreements in the most cost effective manner. In particular, we develop models, algorithms and replication strategies that consider multi-tier composed services provisioned in clouds. We also investigate how a service provider can opportunistically take advantage of observed performance variability in the cloud. Finally, we provide means of guaranteeing tail throughput and response times in the face of performance variability of VMs, using Markov chain modeling and large deviation theory. We employ methods from analytical modeling, event-driven simulations and experiments. Overall, this thesis provides not only a multi-faceted approach to exploring several crucial aspects of hosting services in clouds, i.e., cost, tail throughput, and tail response times, but our proposed resource management strategies are also rigorously validated via trace-driven simulation and extensive experiment

    Validation of aproximate dependability models of a RAID architecture with orthogonal organization

    Get PDF
    RAID (Redundant Array of Inexpensive Disks) are widely used in storage servers. Level-5 RAID is one of the most popular RAID architectures. Numerical analysis of exact Markovian dependability models of level-5 RAID architecture with orthogonal organization is unfeasible for many realistic model parameters due to the size of the resulting state space. In this paper we develop approximate dependability models for a level-5 RAID architecture with orthogonal organization which have small state spaces. We consider two measures: the steady-state unavailability and the unreliability. The models encompass disk hot spares and imperfect disk reconstruction. Using bounding techniques we analyze the accuracy of the models and show that the models are extremely accurate.Postprint (published version
    corecore