56 research outputs found

    A comparison of numerical splitting-based methods for Markovian dependability and performability models

    Get PDF
    Iterative numerical methods are an important ingredient for the solution of continuous time Markov dependability models of fault-tolerant systems. In this paper we make a numerical comparison of several splitting-based iterative methods. We consider the computation of steady-state reward rate on rewarded models. This measure requires the solution of a singular linear system. We consider two classes of models. The first class includes failure/repair models. The second class is more general and includes the modeling of periodic preventive test of spare components to reduce the probability of latent failures in inactive components. The periodic preventive test is approximated by an Erlang distribution with enough number of stages. We show that for each class of model there is a splitting-based method which is significantly more efficient than the other methods.Postprint (published version

    Techniques for the Fast Simulation of Models of Highly dependable Systems

    Get PDF
    With the ever-increasing complexity and requirements of highly dependable systems, their evaluation during design and operation is becoming more crucial. Realistic models of such systems are often not amenable to analysis using conventional analytic or numerical methods. Therefore, analysts and designers turn to simulation to evaluate these models. However, accurate estimation of dependability measures of these models requires that the simulation frequently observes system failures, which are rare events in highly dependable systems. This renders ordinary Simulation impractical for evaluating such systems. To overcome this problem, simulation techniques based on importance sampling have been developed, and are very effective in certain settings. When importance sampling works well, simulation run lengths can be reduced by several orders of magnitude when estimating transient as well as steady-state dependability measures. This paper reviews some of the importance-sampling techniques that have been developed in recent years to estimate dependability measures efficiently in Markov and nonMarkov models of highly dependable system

    Extended Abstracts: PMCCS3: Third International Workshop on Performability Modeling of Computer and Communication Systems

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryThe pages of the front matter that are missing from the PDF were blank

    Numerical iterative methods for Markovian dependability and performability models: new results and a comparison

    Get PDF
    In this paper we deal with iterative numerical methods to solve linear systems arising in continuous-time Markov chain (CTMC) models. We develop an algorithm to dynamically tune the relaxation parameter of the successive over-relaxation method. We give a sufficient condition for the Gauss-Seidel method to converge when computing the steady-state probability vector of a finite irreducible CTMC, an a suffient condition for the Generalized Minimal Residual projection method not to converge to the trivial solution 0 when computing that vector. Finally, we compare several splitting-based iterative methods an a variant of the Generalized Minimal Residual projection method.Postprint (published version

    Computation of Bounds for Transient Measures of Large Rewarded Markov Models using Regenerative Randomization

    Get PDF
    In this paper we generalize a method (called regenerative randomization) for the transient solution of continuous time Markov models. The generalized method allows to compute two transient measures (the expected transient reward rate and the expected averaged reward rate) for rewarded continuous time Markov models with a structure covering bounding models which are useful when a complete, exact model has unmanageable size. The method has the same good properties as the well-known (standard) randomization method: numerical stability, well-controlled computation error, and ability to specify the computation error in advance, and, for large enough models and long enough times, is significantly faster than the standard randomization method. The method requires the selection of a regenerative state and its performance depends on that selection. For a class of models, class C', including typical failure/repair models with exponential failure and repair time distributions and repair in every state with failed components, a natural selection for the regenerative state exists, and results are available assessing approximately the performance of the method for that natural selection in terms of “visible” model characteristics. Those results can be used to anticipate when the method can be expected to be significantly faster than standard randomization for models in that class. The potentially superior e6ciency ofthe regenerative randomization method compared to standard randomization for models not in class C' is illustrated using a large performability model of a fault-tolerant multiprocessor system.Postprint (published version

    Methodologies synthesis

    Get PDF
    This deliverable deals with the modelling and analysis of interdependencies between critical infrastructures, focussing attention on two interdependent infrastructures studied in the context of CRUTIAL: the electric power infrastructure and the information infrastructures supporting management, control and maintenance functionality. The main objectives are: 1) investigate the main challenges to be addressed for the analysis and modelling of interdependencies, 2) review the modelling methodologies and tools that can be used to address these challenges and support the evaluation of the impact of interdependencies on the dependability and resilience of the service delivered to the users, and 3) present the preliminary directions investigated so far by the CRUTIAL consortium for describing and modelling interdependencies

    Performability modelling of homogenous and heterogeneous multiserver systems with breakdowns and repairs

    Get PDF
    This thesis presents analytical modelling of homogeneous multi-server systems with reconfiguration and rebooting delays, heterogeneous multi-server systems with one main and several identical servers, and farm paradigm multi-server systems. This thesis also includes a number of other research works such as, fast performability evaluation models of open networks of nodes with repairs and finite queuing capacities, multi-server systems with deferred repairs, and two stage tandem networks with failures, repairs and multiple servers at the second stage. Applications of these for the popular Beowulf cluster systems and memory servers are also accomplished. Existing techniques used in performance evaluation of multi-server systems are investigated and analysed in detail. Pure performance modelling techniques, pure availability models, and performability models are also considered. First, the existing approaches for pure performance modelling are critically analysed with the discussions on merits and demerits. Then relevant terminology is defined and explained. Since the pure performance models tend to be too optimistic and pure availability models are too conservative, performability models are used for the evaluation of multi-server systems. Fault-tolerant multi-server systems can continue service in case of certain failures. If failure does not occur at a critical point (such as breakdown of the head processor of a farm paradigm system) the system continues serving in a degraded mode of operation. In such systems, reconfiguration and/or rebooting delays are expected while a processor is being mapped out from the system. These delay stages are also taken into account in addition to failures and repairs, in the exact performability models that are developed. Two dimensional Markov state space representations of the systems are used for performability modelling. Following the critical analysis of the existing solution techniques, the Spectral Expansion method is chosen for the solution of the models developed. In this work, open queuing networks are also considered. To evaluate their performability, existing modelling approaches are expanded and validated by simulations, for performability analysis of multistage open networks with finite queuing capacities. The performances of two extended modelling approaches are compared in terms of accuracy for open networks with various queuing capacities. Deferred repair strategies are becoming popular because of the cost reductions they can provide. Effects of using deferred repairs are analysed and performability models are provided for homogeneous multi-server systems and highly available farm paradigm multi-server systems. Since one of the random variables is used to represent the number of jobs in one of the queues, analytical models for performance evaluation of two stage tandem networks suffer because of numerical cumbersomeness. Existing approaches for modelling these systems are actually pure performance models since breakdowns and repairs cannot be considered. One way of modelling these systems can be to divide one of the random variables to present both the operative and non-operative states of the server in one dimension. However, this will give rise to state explosion problem severely limiting the maximum queue capacity that can be handled. In order to overcome this problem a new approach is presented for modelling two stage tandem networks in three dimensions. An approximate solution is presented to solve such a system. This approach manifests itself as a novel contribution for alleviating the state space explosion problem for large and/or complex systems. When two state tandem networks with feedback are modelled using this approach, the operative states can be handled independently and this makes it possible to consider multiple operative states at the second stage. The analytical models presented can be used with various parameters and they are extendible to consider systems with similar architectures. The developed three dimensional approach is capable to handle two stage tandem networks with various characteristics for performability measures. All the approaches presented give accurate results. Numerical solutions are presented for all models developed. In case the solution presented is not exact, simulations are performed to validate the accuracy of the results obtained

    New variance reduction methods in Monte Carlo rare event simulation

    Get PDF
    Para sistemas que proveen algún tipo de servicio mientras están operativos y dejan de proveerlo cuando fallan, es de interés determinar parámetros como, por ejemplo, la probabilidad de encontrar el sistema en falla en un instante cualquiera, el tiempo medio transcurrido entre fallas, o cualquier medida capaz de reflejar la capacidad del sistema para proveer servicio. Las determinaciones de estas medidas de seguridad de funcionamiento se ven afectadas por diversos factores, entre ellos, el tamaño del sistema y la rareza de las fallas. En esta tesis se estudian algunos métodos concebidos para determinar estas medidas sobre sistemas grandes y altamente confiables, es decir sistemas formados por gran cantidad de componentes, en los que las fallas del sistema son eventos raros. Ya sea en forma directa o indirecta, parte de las las expresiones que permiten determinar las medidas de interés corresponden a la probabilidad de que el sistema se encuentre en algún estado de falla. De un modo u otro, estas expresiones evaluan la fracción —ponderada por la distribución de probabilidad de las configuraciones del sistema—entre el número de configuraciones en las que el sistema falla y la totalidad de las configuraciones posibles. Si el sistema es grande el cálculo exacto de estas probabilidades, y consecuentemente de las medidas de interés, puede resultar inviable. Una solución alternativa es estimar estas probabilidades mediante simulación. Uno de los mecanismos para hacer estas estimaciones es la simulación de tipo Monte Carlo, cuya versión más simple es la simulación en crudo o estándar. El problema es que si las fallas son raras, el número de iteraciones necesario para estimar estas probabilidades mediante simulación estándar con una precisión aceptable, puede resultar desmesuradamente grande. En esta tesis se analizan algunos métodos existentes para mejorar la simulación estándar en el contexto de eventos raros, se hacen análisis de varianza y se prueban los métodos sobre una variedad de modelos. En todos los casos la mejora se consigue a costa de una reducción de la varianza del estimador con respecto a la varianza del estimador estándar. Gracias a la reducción de varianza es posible estimar la probabilidad de ocurrencia de eventos raros con una precisión aceptable, a partir de un número razonable de iteraciones. Como parte central del trabajo se proponen dos métodos nuevos, uno relacionado con Spliting y otro relacionado con Monte Carlo Condicional. Splitting es un método de probada eficiencia en entornos en los que se busca evaluar desempeño y confiabilidad combinados, escasamente utilizado en la simulación de sistemas altamente confiables sobre modelos estáticos (sin evolución temporal). En vi su formulación básica Splitting hace un seguimiento de las trayectorias de un proceso estocástico a través de su espacio de estados y multiplica su número ante cada cruce de umbral, para un conjunto dado de umbrales distribuidos entre los estados inicial y final. Una de las propuestas de esta tesis es una adaptación de Splitting a un modelo estático de confiabilidad de redes. En el método propuesto se construye un proceso estocástico a partir de un tiempo ficticio en el cual los enlaces van cambiando de estado y se aplica Splitting sobre ese proceso. El método exhibe elevados niveles de precisión y robustez. Monte Carlo Condicional es un método clásico de reducción de varianza cuyo uso no está muy extendido en el contexto de eventos raros. En su formulación básica Monte Carlo Condicional evalúa las probabilidades de los eventos de interés, condicionando las variables indicatrices a eventos no raros y simples de detectar. El problema es que parte de esa evaluación incluye el cálculo exacto de algunas probabilidades del modelo. Uno de los métodos propuestos en esta tesis es una adaptación de Monte Carlo Condicional al análisis de modelos Markovianos de sistemas altamente confiables. La propuesta consiste en estimar las probabilidades cuyo valor exacto se necesita, mediante una aplicación recursiva de Monte Carlo Condicional. Se estudian algunas características de este modelo y se verifica su eficiencia en forma experimental.For systems that provide some kind of service while they are operational and stop providing it when they fail, it is of interest to determine parameters such as, for example, the probability of finding the system failed at any moment, the mean time between failures, or any measure that reflects the capacity of the system to provide service. The determination of these measures —known as dependability measures— is affected by a variety of factors, including the size of the system and the rarity of failures. This thesis studies some methods designed to determine these measures on large and highly reliable systems, i.e. systems formed by a large number of components, such that systems’ failures are rare events. Either directly or indirectly, part of the expressions for determining the measures of interest correspond to the probability that the system is in some state of failure. Somehow, this expressions evaluate the ratio —weighted by the probability distribution of the systems’ configurations— between the number of configurations in which the system fails and all possible configurations. If the system is large, the exact calculation of these probabilities, and consequently of the measures of interest, may be unfeasible. An alternative solution is to estimate these probabilities by simulation. One mechanism to make such estimation is Monte Carlo simulation, whose simplest version is crude or standard simulation. The problem is that if failures are rare, the number of iterations required to estimate this probabilities by standard simulation, with acceptable accuracy, may be extremely large. In this thesis some existing methods to improve the standard simulation in the context of rare events are analyzed, some variance analyses are made and the methods are tested empirically over a variety of models. In all cases the improvement is achieved at the expense of reducing the variance of the estimator with respect to the standard estimator’s variance. Due to this variance reduction, the probability of the occurrence of rare events, with acceptable accuracy, can be achieved in a reasonable number of iterations. As a central part of this work, two new methods are proposed, one of them related to Splitting and the other one related to Conditional Monte Carlo. Splitting is a widely used method in performance and performability analysis, but scarcely applied for simulating highly reliable systems over static models (models with no temporal evolution). In its basic formulation Splitting keeps track of the trajectories of a stochastic process through its state space and it splits or multiplies the number of them at each threshold cross, for a given set of thresholds distributed between the initial and the final state. One of the proposals of this thesis is an adaptation of Splitting to a static network reliability model. In the proposed method, a fictitious time stochastic process in which the network links keep changing their state is built, and Splitting is applied to this process. The method shows to be highly accurate and robust. Conditional Monte Carlo is a classical variance reduction technique, whose use is not widespread in the field of rare events. In its basic formulation Conditional Monte Carlo evaluates the probabilities of the events of interest, conditioning the indicator variables to not rare and easy to detect events. The problem is that part of this assessment includes the exact calculation of some probabilities in the model. One of the methods proposed in this thesis is an adaptation of Conditional Monte Carlo to the analysis of highly reliable Markovian systems. The proposal consists in estimating the probabilities whose exact value is needed, by means of a recursive application of Conditional Monte Carlo. Some features of this model are discussed and its efficiency is verified experimentally

    Modelling and performability evaluation of Wireless Sensor Networks

    Get PDF
    This thesis presents generic analytical models of homogeneous clustered Wireless Sensor Networks (WSNs) with a centrally located Cluster Head (CH) coordinating cluster communication with the sink directly or through other intermediate nodes. The focus is to integrate performance and availability studies of WSNs in the presence of sensor nodes and channel failures and repair/replacement. The main purpose is to enhance improvement of WSN Quality of Service (QoS). Other research works also considered in this thesis include modelling of packet arrival distribution at the CH and intermediate nodes, and modelling of energy consumption at the sensor nodes. An investigation and critical analysis of wireless sensor network architectures, energy conservation techniques and QoS requirements are performed in order to improve performance and availability of the network. Existing techniques used for performance evaluation of single and multi-server systems with several operative states are investigated and analysed in details. To begin with, existing approaches for independent (pure) performance modelling are critically analysed with highlights on merits and drawbacks. Similarly, pure availability modelling approaches are also analysed. Considering that pure performance models tend to be too optimistic and pure availability models are too conservative, performability, which is the integration of performance and availability studies is used for the evaluation of the WSN models developed in this study. Two-dimensional Markov state space representations of the systems are used for performability modelling. Following critical analysis of the existing solution techniques, spectral expansion method and system of simultaneous linear equations are developed and used to solving the proposed models. To validate the results obtained with the two techniques, a discrete event simulation tool is explored. In this research, open queuing networks are used to model the behaviour of the CH when subjected to streams of traffic from cluster nodes in addition to dynamics of operating in the various states. The research begins with a model of a CH with an infinite queue capacity subject to failures and repair/replacement. The model is developed progressively to consider bounded queue capacity systems, channel failures and sleep scheduling mechanisms for performability evaluation of WSNs. Using the developed models, various performance measures of the considered system including mean queue length, throughput, response time and blocking probability are evaluated. Finally, energy models considering mean power consumption in each of the possible operative states is developed. The resulting models are in turn employed for the evaluation of energy saving for the proposed case study model. Numerical solutions and discussions are presented for all the queuing models developed. Simulation is also performed in order to validate the accuracy of the results obtained. In order to address issues of performance and availability of WSNs, current research present independent performance and availability studies. The concerns resulting from such studies have therefore remained unresolved over the years hence persistence poor system performance. The novelty of this research is a proposed integrated performance and availability modelling approach for WSNs meant to address challenges of independent studies. In addition, a novel methodology for modelling and evaluation of power consumption is also offered. Proposed model results provide remarkable improvement on system performance and availability in addition to providing tools for further optimisation studies. A significant power saving is also observed from the proposed model results. In order to improve QoS for WSN, it is possible to improve the proposed models by incorporating priority queuing in a mixed traffic environment. A model of multi-server system is also appropriate for addressing traffic routing. It is also possible to extend the proposed energy model to consider other sleep scheduling mechanisms other than On-demand proposed herein. Analysis and classification of possible arrival distribution of WSN packets for various application environments would be a great idea for enabling robust scientific research

    Dependency-Based Decomposition of Systems Involving Rare Events

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryIBM Ph.D. Fellowshi
    • …
    corecore