120 research outputs found
Survivability modeling for cyber-physical systems subject to data corruption
Cyber-physical critical infrastructures are created when traditional physical infrastructure is supplemented with advanced monitoring, control, computing, and communication capability. More intelligent decision support and improved efficacy, dependability, and security are expected. Quantitative models and evaluation methods are required for determining the extent to which a cyber-physical infrastructure improves on its physical predecessors. It is essential that these models reflect both cyber and physical aspects of operation and failure. In this dissertation, we propose quantitative models for dependability attributes, in particular, survivability, of cyber-physical systems. Any malfunction or security breach, whether cyber or physical, that causes the system operation to depart from specifications will affect these dependability attributes. Our focus is on data corruption, which compromises decision support -- the fundamental role played by cyber infrastructure. The first research contribution of this work is a Petri net model for information exchange in cyber-physical systems, which facilitates i) evaluation of the extent of data corruption at a given time, and ii) illuminates the service degradation caused by propagation of corrupt data through the cyber infrastructure. In the second research contribution, we propose metrics and an evaluation method for survivability, which captures the extent of functionality retained by a system after a disruptive event. We illustrate the application of our methods through case studies on smart grids, intelligent water distribution networks, and intelligent transportation systems. Data, cyber infrastructure, and intelligent control are part and parcel of nearly every critical infrastructure that underpins daily life in developed countries. Our work provides means for quantifying and predicting the service degradation caused when cyber infrastructure fails to serve its intended purpose. It can also serve as the foundation for efforts to fortify critical systems and mitigate inevitable failures --Abstract, page iii
Techniques for the Fast Simulation of Models of Highly dependable Systems
With the ever-increasing complexity and requirements of highly dependable systems, their evaluation during design and operation is becoming more crucial. Realistic models of such systems are often not amenable to analysis using conventional analytic or numerical methods. Therefore, analysts and designers turn to simulation to evaluate these models. However, accurate estimation of dependability measures of these models requires that the simulation frequently observes system failures, which are rare events in highly dependable systems. This renders ordinary Simulation impractical for evaluating such systems. To overcome this problem, simulation techniques based on importance sampling have been developed, and are very effective in certain settings. When importance sampling works well, simulation run lengths can be reduced by several orders of magnitude when estimating transient as well as steady-state dependability measures. This paper reviews some of the importance-sampling techniques that have been developed in recent years to estimate dependability measures efficiently in Markov and nonMarkov models of highly dependable system
Availability modeling and evaluation of web-based services - A pragmatic approach
Cette thèse porte sur le développement d’une approche de modélisation pragmatique permettant aux concepteurs d’applications et systèmes mis en oeuvre sur le web d’évaluer la disponibilité du service fourni aux utilisateurs. Plusieurs sources d’indisponibilité du service sont prises en compte, en particulier i) les défaillances matérielles ou logicielles affectant les serveurs et ii) des dégradations de performance (surcharge des serveurs, temps de réponse trop long, etc.). Une approche hiérarchique multi-niveau basée sur une modélisation de type performabilité est proposée, combinant des chaînes de Markov et des modèles de files d’attente. Les principaux concepts et la faisabilité de cette approche sont illustrés à travers l’exemple d’une agence de voyage. Plusieurs modèles analytiques et études de sensibilité sont présentés en considérant différentes hypothèses concernant l’architecture, les stratégies de recouvrement, les fautes, les profils d’utilisateurs, et les caractéristiques du trafic. ABSTRACT : This thesis presents a pragmatic modeling approach allowing designers of web-based applications and systems to evaluate the service availability provided to the users. Multiple sources of service unavailability are taken into account, in particular i) hardware and software failures affecting the servers, and ii) performance degradation (overload of servers, very long response time, etc.). An hierarchical multi-level approach is proposed based on performability modeling, combining Markov chains and queueing models. The main concepts and the feasibility of this approach are illustrated using a web-based travel agency. Various analytical models and sensitivity studies are presented considering different assumptions with respect to the architectures, recovery strategies, faults, users profile and traffic characteristics
Performability modelling of homogenous and heterogeneous multiserver systems with breakdowns and repairs
This thesis presents analytical modelling of homogeneous multi-server systems with reconfiguration and rebooting delays, heterogeneous multi-server systems with one main and several identical servers, and farm paradigm multi-server systems. This thesis also includes a number of other research works such as, fast performability evaluation models of open networks of nodes with repairs and finite queuing capacities, multi-server systems with deferred repairs, and two stage tandem networks with failures, repairs and multiple servers at the second stage. Applications of these for the popular Beowulf cluster systems and memory servers are
also accomplished.
Existing techniques used in performance evaluation of multi-server systems are investigated and analysed in detail. Pure performance modelling techniques, pure availability models,
and performability models are also considered. First, the existing approaches for pure performance modelling are critically analysed with the discussions on merits and demerits. Then relevant terminology is defined and explained. Since the pure performance models tend to be too optimistic and pure availability models are too conservative, performability models are used for the evaluation of multi-server systems. Fault-tolerant multi-server systems can
continue service in case of certain failures. If failure does not occur at a critical point (such as breakdown of the head processor of a farm paradigm system) the system continues serving in a degraded mode of operation. In such systems, reconfiguration and/or rebooting delays are expected while a processor is being mapped out from the system. These delay stages are also taken into account in addition to failures and repairs, in the exact performability models that are developed. Two dimensional Markov state space representations of the systems are used for performability modelling. Following the critical analysis of the existing solution techniques, the Spectral Expansion method is chosen for the solution of the models developed.
In this work, open queuing networks are also considered. To evaluate their performability, existing modelling approaches are expanded and validated by simulations, for performability
analysis of multistage open networks with finite queuing capacities. The performances of two extended modelling approaches are compared in terms of accuracy for open networks with various queuing capacities.
Deferred repair strategies are becoming popular because of the cost reductions they can provide. Effects of using deferred repairs are analysed and performability models are provided
for homogeneous multi-server systems and highly available farm paradigm multi-server systems.
Since one of the random variables is used to represent the number of jobs in one of the queues, analytical models for performance evaluation of two stage tandem networks suffer
because of numerical cumbersomeness. Existing approaches for modelling these systems are actually pure performance models since breakdowns and repairs cannot be considered. One way of modelling these systems can be to divide one of the random variables to present both the operative and non-operative states of the server in one dimension. However, this will give rise to state explosion problem severely limiting the maximum queue capacity that can be handled. In order to overcome this problem a new approach is presented for modelling two stage tandem networks in three dimensions. An approximate solution is presented to solve such a system.
This approach manifests itself as a novel contribution for alleviating the state space explosion problem for large and/or complex systems. When two state tandem networks with feedback are modelled using this approach, the operative states can be handled independently and this makes it possible to consider multiple operative states at the second stage.
The analytical models presented can be used with various parameters and they are extendible to consider systems with similar architectures. The developed three dimensional approach is capable to handle two stage tandem networks with various characteristics for performability measures. All the approaches presented give accurate results.
Numerical solutions are presented for all models developed. In case the solution presented is not exact, simulations are performed to validate the accuracy of the results obtained
Fault-Tolerant Computing: An Overview
Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNASA / NAG-1-613Semiconductor Research Corporation / 90-DP-109Joint Services Electronics Program / N00014-90-J-127
List of requirements on formalisms and selection of appropriate tools
This deliverable reports on the activities for the set-up of the modelling environments for the evaluation activities of WP5. To this objective, it reports on the identified modelling peculiarities of the electric power infrastructure and the information infrastructures and of their interdependencies, recalls the tools that have been considered and concentrates on the tools that are, and will be, used in the project: DrawNET, DEEM and EPSys which have been developed before and during the project by the partners, and M\uf6bius and PRISM, developed respectively at the University of Illinois at Urbana Champaign and at the University of Birmingham (and recently at the University of Oxford)
Recommended from our members
Multi-perspective Evaluation of Self-Healing Systems Using Simple Probabilistic Models
Quantifying the efficacy of self-healing systems is a challenging but important task, which has implications for increasing designer, operator and end-user confidence in these systems. During design system architects benefit from tools and techniques that enhance their understanding of the system, allowing them to reason about the tradeoffs of proposed or existing self-healing mechanisms and the overall effectiveness of the system as a result of different mechanism-compositions. At deployment time, system integrators and operators need to understand how the selfhealing mechanisms work and how their operation impacts the system's reliability, availability and serviceability (RAS) in order to cope with any limitations of these mechanisms when the system is placed into production. In this paper we construct an evaluation framework for selfhealing systems around simple, yet powerful, probabilistic models that capture the behavior of the system's selfhealing mechanisms from multiple perspectives (designer, operator, and end-user). We combine these analytical models with runtime fault-injection to study the operation of VM-Rejuv — a virtual machine based rejuvenation scheme for web-application servers. We use the results from the fault-injection experiments and model-analysis to reason about the efficacy of VM-Rejuv, its limitations and strategies for managing/mitigating these limitations in system deployments. Whereas we use VM-Rejuv as the subject of our evaluation in this paper, our main contribution is a practical evaluation approach that can be generalized to other self-healing systems
- …