290 research outputs found

    Transient Reward Approximation for Continuous-Time Markov Chains

    Full text link
    We are interested in the analysis of very large continuous-time Markov chains (CTMCs) with many distinct rates. Such models arise naturally in the context of reliability analysis, e.g., of computer network performability analysis, of power grids, of computer virus vulnerability, and in the study of crowd dynamics. We use abstraction techniques together with novel algorithms for the computation of bounds on the expected final and accumulated rewards in continuous-time Markov decision processes (CTMDPs). These ingredients are combined in a partly symbolic and partly explicit (symblicit) analysis approach. In particular, we circumvent the use of multi-terminal decision diagrams, because the latter do not work well if facing a large number of different rates. We demonstrate the practical applicability and efficiency of the approach on two case studies.Comment: Accepted for publication in IEEE Transactions on Reliabilit

    Failure distance based bounds of dependability measures

    Get PDF
    El tema d'aquesta tesi és el desenvolupament de mètodes de fitació per a una classe de models de confiabilitat basats en cadenes de Markov de temps continu (CMTC) de sistemes tolerants a fallades.Els sistemes considerats a la tesi es conceptualitzen com formats per components (hardware o software) que fallen i, en el cas de sistemes reparables, són reparats. Els components s'agrupen en classes de forma que els components d'una mateixa classe són indistingibles. Per tant, un component és considerat com a una instància d'una classe de components i el sistema inclou un bag de classes de components definit sobre un cert domini. L'estat no fallada/fallada del sistema es determina a partir de l'estat no fallada/fallada dels components mitjançant una funció d'estructura coherent que s'especifica amb un arbre de fallades amb classes d'esdeveniments bàsics. (Una classe d'esdeveniment bàsic és la fallada d'un component d'una classe de components.)La classe de models basats en CMTC considerada a la tesi és força àmplia i permet, per exemple, de modelar el fet que un component pot tenir diversos modes de fallada. També permet de modelar fallades de cobertura mitjançant la introducció de components ficticis que no fallen per ells mateixos i als quals es propaguen les fallades d'altres components. En el cas de sistemes reparables, la classe de models considerada admet polítiques de reparació complexes (per exemple, nombre limitat de reparadors, prioritats, inhibició de reparació) així com reparació en grup (reparació simultània de diversos components). Tanmateix, no és possible de modelar la reparació diferida (és a dir, el fet de diferir la reparació d'un component fins que una certa condició es compleixi).A la tesi es consideren dues mesures de confiabilitat: la no fiabilitat en un instant de temps donat en el cas de sistemes no reparables i la no disponibilitat en règim estacionari en el cas sistemes reparables.Els mètodes de fitació desenvolupats a la tesi es basen en el concepte de "distància a la fallada", que es defineix com el nombre mínim de components que han de fallar a més dels que ja han fallat per fer que el sistema falli.A la tesi es desenvolupen quatre mètodes de fitació. El primer mètode dóna fites per a la no fiabilitat de sistemes no reparables emprant distàncies a la fallada exactes. Aquestes distàncies es calculen usant el conjunt de talls mínims de la funció d'estructura del sistema. El conjunt de talls mínims s'obté amb un algorisme desenvolupat a la tesi que obté els talls mínims per a arbres de fallades amb classes d'esdeveniments bàsics. El segon mètode dóna fites per a la no fiabilitat usant fites inferiors per a les distàncies a la fallada. Aquestes fites inferiors s'obtenen analitzant l'arbre de fallades del sistema, no requereixen de conèixer el conjunt de talls mínims i el seu càlcul és poc costós. El tercer mètode dóna fites per a la no disponibilitat en règim estacionari de sistemes reparables emprant distàncies a la fallada exactes. El quart mètode dóna fites per a la no disponibilitat en règim estacionari emprant les fites inferiors per a les distàncies a la fallada.Finalment, s'il·lustren les prestacions de cada mètode usant diversos exemples. La conclusió és que cada un dels mètodes pot funcionar molt millor que altres mètodes prèviament existents i estendre de forma significativa la complexitat de sistemes tolerants a fallades per als quals és possible de calcular fites ajustades per a la no fiabilitat o la no disponibilitat en règim estacionari.The subject of this dissertation is the development of bounding methods for a class of continuous-time Markov chain (CTMC) dependability models of fault-tolerant systems.The systems considered in the dissertation are conceptualized as made up of components (hardware or software) that fail and, for repairable systems, are repaired. Components are grouped into classes, the components of the same class being indistinguishable. Thus, a component is regarded as an instance of some component class and the system includes a bag of component classes defined over a certain domain. The up/down state of the system is determined from the unfailed/failed state of the components through a coherent structure function specified by a fault tree with basic event classes. (A basic event class is the failure of a component of a component class.)The class of CTMC models considered in the dissertation is quite wide and allows, for instance, to model the fact that a component may have different failure modes. It also allows to model coverage failures by means of introducing fictitious components that do not fail by themselves and to which uncovered failures of other components are propagated. In the case of repairable systems, the considered class of models supports very complex repair policies (e.g., limited repairpersons, priorities, repair preemption) as well as group repair (i.e., simultaneous repair of several components). However, deferred repair (i.e., the deferring of repair until some condition is met) is not allowed.Two dependability measures are considered in the dissertation: the unreliability at a given time epoch for non-repairable systems and the steady-state unavailability for repairable systems.The bounding methods developed in the dissertation are based on the concept of "failure distance from a state," which is defined as the minimum number of components that have to fail in addition to those already failed to take the system down.We develop four bounding methods. The first method gives bounds for the unreliability of non-repairable fault-tolerant systems using (exact) failure distances. Those distances are computed using the set of minimal cuts of the structure function of the system. The set of minimal cuts is obtained using an algorithm developed in the dissertation that obtains the minimal cuts for fault trees with basic event classes. The second method gives bounds for the unreliability using easily computable lower bounds for failure distances. Those lower bounds are obtained analyzing the fault tree of the system and do not require the knowledge of the set of minimal cuts. The third method gives bounds for the steady-state unavailability using (exact) failure distances. The fourth method gives bounds for the steady-state unavailability using the lower bounds for failure distances.Finally, the performance of each method is illustrated by means of several large examples. We conclude that the methods can outperform significantly previously existing methods and extend significantly the complexity of the fault-tolerant systems for which tight bounds for the unreliability or steady-state unavailability can be computed

    Techniques for the Fast Simulation of Models of Highly dependable Systems

    Get PDF
    With the ever-increasing complexity and requirements of highly dependable systems, their evaluation during design and operation is becoming more crucial. Realistic models of such systems are often not amenable to analysis using conventional analytic or numerical methods. Therefore, analysts and designers turn to simulation to evaluate these models. However, accurate estimation of dependability measures of these models requires that the simulation frequently observes system failures, which are rare events in highly dependable systems. This renders ordinary Simulation impractical for evaluating such systems. To overcome this problem, simulation techniques based on importance sampling have been developed, and are very effective in certain settings. When importance sampling works well, simulation run lengths can be reduced by several orders of magnitude when estimating transient as well as steady-state dependability measures. This paper reviews some of the importance-sampling techniques that have been developed in recent years to estimate dependability measures efficiently in Markov and nonMarkov models of highly dependable system

    Perfomance Analysis and Resource Optimisation of Critical Systems Modelled by Petri Nets

    Get PDF
    Un sistema crítico debe cumplir con su misión a pesar de la presencia de problemas de seguridad. Este tipo de sistemas se suele desplegar en entornos heterogéneos, donde pueden ser objeto de intentos de intrusión, robo de información confidencial u otro tipo de ataques. Los sistemas, en general, tienen que ser rediseñados después de que ocurra un incidente de seguridad, lo que puede conducir a consecuencias graves, como el enorme costo de reimplementar o reprogramar todo el sistema, así como las posibles pérdidas económicas. Así, la seguridad ha de ser concebida como una parte integral del desarrollo de sistemas y como una necesidad singular de lo que el sistema debe realizar (es decir, un requisito no funcional del sistema). Así pues, al diseñar sistemas críticos es fundamental estudiar los ataques que se pueden producir y planificar cómo reaccionar frente a ellos, con el fin de mantener el cumplimiento de requerimientos funcionales y no funcionales del sistema. A pesar de que los problemas de seguridad se consideren, también es necesario tener en cuenta los costes incurridos para garantizar un determinado nivel de seguridad en sistemas críticos. De hecho, los costes de seguridad puede ser un factor muy relevante ya que puede abarcar diferentes dimensiones, como el presupuesto, el rendimiento y la fiabilidad. Muchos de estos sistemas críticos que incorporan técnicas de tolerancia a fallos (sistemas FT) para hacer frente a las cuestiones de seguridad son sistemas complejos, que utilizan recursos que pueden estar comprometidos (es decir, pueden fallar) por la activación de los fallos y/o errores provocados por posibles ataques. Estos sistemas pueden ser modelados como sistemas de eventos discretos donde los recursos son compartidos, también llamados sistemas de asignación de recursos. Esta tesis se centra en los sistemas FT con recursos compartidos modelados mediante redes de Petri (Petri nets, PN). Estos sistemas son generalmente tan grandes que el cálculo exacto de su rendimiento se convierte en una tarea de cálculo muy compleja, debido al problema de la explosión del espacio de estados. Como resultado de ello, una tarea que requiere una exploración exhaustiva en el espacio de estados es incomputable (en un plazo prudencial) para sistemas grandes. Las principales aportaciones de esta tesis son tres. Primero, se ofrecen diferentes modelos, usando el Lenguaje Unificado de Modelado (Unified Modelling Language, UML) y las redes de Petri, que ayudan a incorporar las cuestiones de seguridad y tolerancia a fallos en primer plano durante la fase de diseño de los sistemas, permitiendo así, por ejemplo, el análisis del compromiso entre seguridad y rendimiento. En segundo lugar, se proporcionan varios algoritmos para calcular el rendimiento (también bajo condiciones de fallo) mediante el cálculo de cotas de rendimiento superiores, evitando así el problema de la explosión del espacio de estados. Por último, se proporcionan algoritmos para calcular cómo compensar la degradación de rendimiento que se produce ante una situación inesperada en un sistema con tolerancia a fallos

    A formalism for describing and simulating systems with interacting components.

    Get PDF
    This thesis addresses the problem of descriptive complexity presented by systems involving a high number of interacting components. It investigates the evaluation measure of performability and its application to such systems. A new description and simulation language, ICE and it's application to performability modelling is presented. ICE (Interacting ComponEnts) is based upon an earlier description language which was first proposed for defining reliability problems. ICE is declarative in style and has a limited number of keywords. The ethos in the development of the language has been to provide an intuitive formalism with a powerful descriptive space. The full syntax of the language is presented with discussion as to its philosophy. The implementation of a discrete event simulator using an ICE interface is described, with use being made of examples to illustrate the functionality of the code and the semantics of the language. Random numbers are used to provide the required stochastic behaviour within the simulator. The behaviour of an industry standard generator within the simulator and different methods of number allocation are shown. A new generator is proposed that is a development of a fast hardware shift register generator and is demonstrated to possess good statistical properties and operational speed. For the purpose of providing a rigorous description of the language and clarification of its semantics, a computational model is developed using the formalism of extended coloured Petri nets. This model also gives an indication of the language's descriptive power relative to that of a recognised and well developed technique. Some recognised temporal and structural problems of system event modelling are identified. and ICE solutions given. The growing research area of ATM communication networks is introduced and a sophisticated top down model of an ATM switch presented. This model is simulated and interesting results are given. A generic ICE framework for performability modelling is developed and demonstrated. This is considered as a positive contribution to the general field of performability research

    List of requirements on formalisms and selection of appropriate tools

    Get PDF
    This deliverable reports on the activities for the set-up of the modelling environments for the evaluation activities of WP5. To this objective, it reports on the identified modelling peculiarities of the electric power infrastructure and the information infrastructures and of their interdependencies, recalls the tools that have been considered and concentrates on the tools that are, and will be, used in the project: DrawNET, DEEM and EPSys which have been developed before and during the project by the partners, and M\uf6bius and PRISM, developed respectively at the University of Illinois at Urbana Champaign and at the University of Birmingham (and recently at the University of Oxford)

    Energy efficient transport technology: Program summary and bibliography

    Get PDF
    The Energy Efficient Transport (EET) Program began in 1976 as an element of the NASA Aircraft Energy Efficiency (ACEE) Program. The EET Program and the results of various applications of advanced aerodynamics and active controls technology (ACT) as applicable to future subsonic transport aircraft are discussed. Advanced aerodynamics research areas included high aspect ratio supercritical wings, winglets, advanced high lift devices, natural laminar flow airfoils, hybrid laminar flow control, nacelle aerodynamic and inertial loads, propulsion/airframe integration (e.g., long duct nacelles) and wing and empennage surface coatings. In depth analytical/trade studies, numerous wind tunnel tests, and several flight tests were conducted. Improved computational methodology was also developed. The active control functions considered were maneuver load control, gust load alleviation, flutter mode control, angle of attack limiting, and pitch augmented stability. Current and advanced active control laws were synthesized and alternative control system architectures were developed and analyzed. Integrated application and fly by wire implementation of the active control functions were design requirements in one major subprogram. Additional EET research included interdisciplinary technology applications, integrated energy management, handling qualities investigations, reliability calculations, and economic evaluations related to fuel savings and cost of ownership of the selected improvements

    Stochastic scheduling and workload allocation : QoS support and profitable brokering in computing grids

    No full text
    Abstract: The Grid can be seen as a collection of services each of which performs some functionality. Users of the Grid seek to use combinations of these services to perform the overall task they need to achieve. In general this can be seen as aset of services with a workflow document describing how these services should be combined. The user may also have certain constraints on the workflow operations, such as execution time or cost ----t~ th~ user, specified in the form of a Quality of Service (QoS) document. The users . submit their workflow to a brokering service along with the QoS document. The brokering service's task is to map any given workflow to a subset of the Grid services taking the QoS and state of the Grid into account -- service availability and performance. We propose an approach for generating constraint equations describing the workflow, the QoS requirements and the state of the Grid. This set of equations may be solved using Mixed-Integer Linear Programming (MILP), which is the traditional method. We further develop a novel 2-stage stochastic MILP which is capable of dealing with the volatile nature of the Grid and adapting the selection of the services during the lifetime of the workflow. We present experimental results comparing our approaches, showing that the . 2-stage stochastic programming approach performs consistently better than other traditional approaches. Next we addresses workload allocation techniques for Grid workflows in a multi-cluster Grid We model individual clusters as MIMIk. queues and obtain a numerical solutio~ for missed deadlines (failures) of tasks of Grid workflows. We also present an efficient algorithm for obtaining workload allocations of clusters. Next we model individual cluster resources as G/G/l queues and solve an optimisation problem that minimises QoS requirement violation, provides QoS guarantee and outperforms reservation based scheduling algorithms. Both approaches are evaluated through an experimental simulation and the results confirm that the proposed workload allocation strategies combined with traditional scheduling algorithms performs considerably better in terms of satisfying QoS requirements of Grid workflows than scheduling algorithms that don't employ such workload allocation techniques. Next we develop a novel method for Grid brokers that aims at maximising profit whilst satisfying end-user needs with a sufficient guarantee in a volatile utility Grid. We develop a develop a 2-stage stochastic MILP which is capable of dealing with the volatile nature . of the Grid and obtaining cost bounds that ensure that end-user cost is minimised or satisfied and broker's profit is maximised with sufficient guarantee. These bounds help brokers know beforehand whether the budget limits of end-users can be satisfied and. if not then???????? obtain appropriate future leases from service providers. Experimental results confirm the efficacy of our approach.Imperial Users onl
    corecore