174 research outputs found

    Two methods for computing bounds for the distribution of cumulative reward for large Markov models

    Get PDF
    Degradable fault-tolerant systems can be evaluated using rewarded continuous-time Markov chain (CTMC) models. In that context, a useful measure to consider is the distribution of the cumulative reward over a time interval [0, t]. All currently available numerical methods for computing that measure tend to be very expensive when the product of the maximum output rate of the CTMC model and t is large and, in that case, their application is limited to CTMC models of moderate size. In this paper, we develop two methods to compute bounds for the cumulative reward distribution of CTMC models with reward rates associated with states: BT/RT (Bounding Transformation/Regenerative Transformation) and BT/BRT (Bounding Transformation/ Bounding Regenerative Transformation). The methods require the selection of a regenerative state, are numerically stable and compute the bounds with well-controlled error. For a class of rewarded CTMC models, class C′′′_1 , and a particular, natural selection for the regenerative state the BT/BRT method allows to trade off bounds tightness with computational cost and will provide bounds at a moderate computational cost in many cases of interest. For a class of models, class C′′_1, slightly wider than class C′′′_1 , and a particular, natural selection for the regenerative state, the BT/RT method will yield tighter bounds at a higher computational cost. Under additional conditions, the bounds obtained by the less expensive version of BT/BRT and BT/RT seem to be tight for any value of t or not small values of t, depending on the initial probability distribution of the model. Class C′′_1 and class C′′′_1 models with those additional conditions include both exact and bounding typical failure/repair performability models of fault-tolerant systems with exponential failure and repair time distributions and repair in every state with failed components and a reward rate structure which is a non-increasing function of the collection of failed components. We illustrate both the applicability and the performance of the methods using a large CTMC performability example of a fault-tolerant multiprocessor system.Postprint (published version

    Performability modelling of homogenous and heterogeneous multiserver systems with breakdowns and repairs

    Get PDF
    This thesis presents analytical modelling of homogeneous multi-server systems with reconfiguration and rebooting delays, heterogeneous multi-server systems with one main and several identical servers, and farm paradigm multi-server systems. This thesis also includes a number of other research works such as, fast performability evaluation models of open networks of nodes with repairs and finite queuing capacities, multi-server systems with deferred repairs, and two stage tandem networks with failures, repairs and multiple servers at the second stage. Applications of these for the popular Beowulf cluster systems and memory servers are also accomplished. Existing techniques used in performance evaluation of multi-server systems are investigated and analysed in detail. Pure performance modelling techniques, pure availability models, and performability models are also considered. First, the existing approaches for pure performance modelling are critically analysed with the discussions on merits and demerits. Then relevant terminology is defined and explained. Since the pure performance models tend to be too optimistic and pure availability models are too conservative, performability models are used for the evaluation of multi-server systems. Fault-tolerant multi-server systems can continue service in case of certain failures. If failure does not occur at a critical point (such as breakdown of the head processor of a farm paradigm system) the system continues serving in a degraded mode of operation. In such systems, reconfiguration and/or rebooting delays are expected while a processor is being mapped out from the system. These delay stages are also taken into account in addition to failures and repairs, in the exact performability models that are developed. Two dimensional Markov state space representations of the systems are used for performability modelling. Following the critical analysis of the existing solution techniques, the Spectral Expansion method is chosen for the solution of the models developed. In this work, open queuing networks are also considered. To evaluate their performability, existing modelling approaches are expanded and validated by simulations, for performability analysis of multistage open networks with finite queuing capacities. The performances of two extended modelling approaches are compared in terms of accuracy for open networks with various queuing capacities. Deferred repair strategies are becoming popular because of the cost reductions they can provide. Effects of using deferred repairs are analysed and performability models are provided for homogeneous multi-server systems and highly available farm paradigm multi-server systems. Since one of the random variables is used to represent the number of jobs in one of the queues, analytical models for performance evaluation of two stage tandem networks suffer because of numerical cumbersomeness. Existing approaches for modelling these systems are actually pure performance models since breakdowns and repairs cannot be considered. One way of modelling these systems can be to divide one of the random variables to present both the operative and non-operative states of the server in one dimension. However, this will give rise to state explosion problem severely limiting the maximum queue capacity that can be handled. In order to overcome this problem a new approach is presented for modelling two stage tandem networks in three dimensions. An approximate solution is presented to solve such a system. This approach manifests itself as a novel contribution for alleviating the state space explosion problem for large and/or complex systems. When two state tandem networks with feedback are modelled using this approach, the operative states can be handled independently and this makes it possible to consider multiple operative states at the second stage. The analytical models presented can be used with various parameters and they are extendible to consider systems with similar architectures. The developed three dimensional approach is capable to handle two stage tandem networks with various characteristics for performability measures. All the approaches presented give accurate results. Numerical solutions are presented for all models developed. In case the solution presented is not exact, simulations are performed to validate the accuracy of the results obtained

    Techniques for the Fast Simulation of Models of Highly dependable Systems

    Get PDF
    With the ever-increasing complexity and requirements of highly dependable systems, their evaluation during design and operation is becoming more crucial. Realistic models of such systems are often not amenable to analysis using conventional analytic or numerical methods. Therefore, analysts and designers turn to simulation to evaluate these models. However, accurate estimation of dependability measures of these models requires that the simulation frequently observes system failures, which are rare events in highly dependable systems. This renders ordinary Simulation impractical for evaluating such systems. To overcome this problem, simulation techniques based on importance sampling have been developed, and are very effective in certain settings. When importance sampling works well, simulation run lengths can be reduced by several orders of magnitude when estimating transient as well as steady-state dependability measures. This paper reviews some of the importance-sampling techniques that have been developed in recent years to estimate dependability measures efficiently in Markov and nonMarkov models of highly dependable system

    Matrix-geometric solution of infinite stochastic Petri nets

    Get PDF
    We characterize a class of stochastic Petri nets that can be solved using matrix geometric techniques. Advantages of such on approach are that very efficient mathematical technique become available for practical usage, as well as that the problem of large state spaces can be circumvented. We first characterize the class of stochastic Petri nets of interest by formally defining a number of constraints that have to be fulfilled. We then discuss the matrix geometric solution technique that can be employed and present some boundary conditions on tool support. We illustrate the practical usage of the class of stochastic Petri nets with two examples: a queueing system with delayed service and a model of connection management in ATM network

    Algorithms for Performance, Dependability, and Performability Evaluation using Stochastic Activity Networks

    Get PDF
    Modeling tools and technologies are important for aerospace development. At the University of Illinois, we have worked on advancing the state of the art in modeling by Markov reward models in two important areas: reducing the memory necessary to numerically solve systems represented as stochastic activity networks and other stochastic Petri net extensions while still obtaining solutions in a reasonable amount of time, and finding numerically stable and memory-efficient methods to solve for the reward accumulated during a finite mission time. A long standing problem when modeling with high level formalisms such as stochastic activity networks is the so-called state space explosion, where the number of states increases exponentially with size of the high level model. Thus, the corresponding Markov model becomes prohibitively large and solution is constrained by the the size of primary memory. To reduce the memory necessary to numerically solve complex systems, we propose new methods that can tolerate such large state spaces that do not require any special structure in the model (as many other techniques do). First, we develop methods that generate row and columns of the state transition-rate-matrix on-the-fly, eliminating the need to explicitly store the matrix at all. Next, we introduce a new iterative solution method, called modified adaptive Gauss-Seidel, that exhibits locality in its use of data from the state transition-rate-matrix, permitting us to cache portions of the matrix and hence reduce the solution time. Finally, we develop a new memory and computationally efficient technique for Gauss-Seidel based solvers that avoids the need for generating rows of A in order to solve Ax = b. This is a significant performance improvement for on-the-fly methods as well as other recent solution techniques based on Kronecker operators. Taken together, these new results show that one can solve very large models without any special structure

    Stochastic Activity Networks Templates: Supporting Variability in Performability Models

    Get PDF
    Model-based evaluation is extensively used to estimate performance and reliability of dependable systems. Traditionally, those systems were small and self-contained, and the main challenge for model-based evaluation has been the efficiency of the solution process. Recently, the problem of specifying and maintaining complex models has increasingly gained attention, as modern systems are characterized by many components and complex interactions. Components share similarities, but also exhibit variations in their behavior due to different configurations or roles in the system. From the modeling perspective, variations lead to replicating and altering a small set of base models multiple times. Variability is taken into account only informally, by defining a sample model and explaining its possible variations. In this paper we address the problem of including variability in performability models, focusing on Stochastic Activity Networks (SANs). We introduce the formal definition of Stochastic Activity Networks Templates (SAN-T), a formalism based on SANs with the addition of variability aspects. Differently from other approaches, parameters can also affect the structure of the model, like the number of cases of activities. We apply the SAN-T formalism to the modeling of the backbone network of an environmental monitoring infrastructure. In particular, we show how existing SAN models from the literature can be generalized using the newly introduced formalism

    Failure distance based bounds of dependability measures

    Get PDF
    El tema d'aquesta tesi Ês el desenvolupament de mètodes de fitació per a una classe de models de confiabilitat basats en cadenes de Markov de temps continu (CMTC) de sistemes tolerants a fallades.Els sistemes considerats a la tesi es conceptualitzen com formats per components (hardware o software) que fallen i, en el cas de sistemes reparables, són reparats. Els components s'agrupen en classes de forma que els components d'una mateixa classe són indistingibles. Per tant, un component Ês considerat com a una instància d'una classe de components i el sistema inclou un bag de classes de components definit sobre un cert domini. L'estat no fallada/fallada del sistema es determina a partir de l'estat no fallada/fallada dels components mitjançant una funció d'estructura coherent que s'especifica amb un arbre de fallades amb classes d'esdeveniments bàsics. (Una classe d'esdeveniment bàsic Ês la fallada d'un component d'una classe de components.)La classe de models basats en CMTC considerada a la tesi Ês força àmplia i permet, per exemple, de modelar el fet que un component pot tenir diversos modes de fallada. TambÊ permet de modelar fallades de cobertura mitjançant la introducció de components ficticis que no fallen per ells mateixos i als quals es propaguen les fallades d'altres components. En el cas de sistemes reparables, la classe de models considerada admet polítiques de reparació complexes (per exemple, nombre limitat de reparadors, prioritats, inhibició de reparació) així com reparació en grup (reparació simultània de diversos components). Tanmateix, no Ês possible de modelar la reparació diferida (Ês a dir, el fet de diferir la reparació d'un component fins que una certa condició es compleixi).A la tesi es consideren dues mesures de confiabilitat: la no fiabilitat en un instant de temps donat en el cas de sistemes no reparables i la no disponibilitat en règim estacionari en el cas sistemes reparables.Els mètodes de fitació desenvolupats a la tesi es basen en el concepte de "distància a la fallada", que es defineix com el nombre mínim de components que han de fallar a mÊs dels que ja han fallat per fer que el sistema falli.A la tesi es desenvolupen quatre mètodes de fitació. El primer mètode dóna fites per a la no fiabilitat de sistemes no reparables emprant distàncies a la fallada exactes. Aquestes distàncies es calculen usant el conjunt de talls mínims de la funció d'estructura del sistema. El conjunt de talls mínims s'obtÊ amb un algorisme desenvolupat a la tesi que obtÊ els talls mínims per a arbres de fallades amb classes d'esdeveniments bàsics. El segon mètode dóna fites per a la no fiabilitat usant fites inferiors per a les distàncies a la fallada. Aquestes fites inferiors s'obtenen analitzant l'arbre de fallades del sistema, no requereixen de conèixer el conjunt de talls mínims i el seu càlcul Ês poc costós. El tercer mètode dóna fites per a la no disponibilitat en règim estacionari de sistemes reparables emprant distàncies a la fallada exactes. El quart mètode dóna fites per a la no disponibilitat en règim estacionari emprant les fites inferiors per a les distàncies a la fallada.Finalment, s'il¡lustren les prestacions de cada mètode usant diversos exemples. La conclusió Ês que cada un dels mètodes pot funcionar molt millor que altres mètodes prèviament existents i estendre de forma significativa la complexitat de sistemes tolerants a fallades per als quals Ês possible de calcular fites ajustades per a la no fiabilitat o la no disponibilitat en règim estacionari.The subject of this dissertation is the development of bounding methods for a class of continuous-time Markov chain (CTMC) dependability models of fault-tolerant systems.The systems considered in the dissertation are conceptualized as made up of components (hardware or software) that fail and, for repairable systems, are repaired. Components are grouped into classes, the components of the same class being indistinguishable. Thus, a component is regarded as an instance of some component class and the system includes a bag of component classes defined over a certain domain. The up/down state of the system is determined from the unfailed/failed state of the components through a coherent structure function specified by a fault tree with basic event classes. (A basic event class is the failure of a component of a component class.)The class of CTMC models considered in the dissertation is quite wide and allows, for instance, to model the fact that a component may have different failure modes. It also allows to model coverage failures by means of introducing fictitious components that do not fail by themselves and to which uncovered failures of other components are propagated. In the case of repairable systems, the considered class of models supports very complex repair policies (e.g., limited repairpersons, priorities, repair preemption) as well as group repair (i.e., simultaneous repair of several components). However, deferred repair (i.e., the deferring of repair until some condition is met) is not allowed.Two dependability measures are considered in the dissertation: the unreliability at a given time epoch for non-repairable systems and the steady-state unavailability for repairable systems.The bounding methods developed in the dissertation are based on the concept of "failure distance from a state," which is defined as the minimum number of components that have to fail in addition to those already failed to take the system down.We develop four bounding methods. The first method gives bounds for the unreliability of non-repairable fault-tolerant systems using (exact) failure distances. Those distances are computed using the set of minimal cuts of the structure function of the system. The set of minimal cuts is obtained using an algorithm developed in the dissertation that obtains the minimal cuts for fault trees with basic event classes. The second method gives bounds for the unreliability using easily computable lower bounds for failure distances. Those lower bounds are obtained analyzing the fault tree of the system and do not require the knowledge of the set of minimal cuts. The third method gives bounds for the steady-state unavailability using (exact) failure distances. The fourth method gives bounds for the steady-state unavailability using the lower bounds for failure distances.Finally, the performance of each method is illustrated by means of several large examples. We conclude that the methods can outperform significantly previously existing methods and extend significantly the complexity of the fault-tolerant systems for which tight bounds for the unreliability or steady-state unavailability can be computed

    Compositional Performance Modelling with the TIPPtool

    Get PDF
    Stochastic process algebras have been proposed as compositional specification formalisms for performance models. In this paper, we describe a tool which aims at realising all beneficial aspects of compositional performance modelling, the TIPPtool. It incorporates methods for compositional specification as well as solution, based on state-of-the-art techniques, and wrapped in a user-friendly graphical front end. Apart from highlighting the general benefits of the tool, we also discuss some lessons learned during development and application of the TIPPtool. A non-trivial model of a real life communication system serves as a case study to illustrate benefits and limitations
    • …
    corecore