6,044 research outputs found

    Quantification of over-speed risk in wind turbine fleets

    Get PDF
    The effective life management of large and diverse fleets of wind turbines is a new problem facing power system utilities. More specifically, the minimization of over-speed risk is of high importance due to the related impacts of possible loss of life and economic implications of over-speed, such as a loss of containment event. Meeting the goal of risk minimization is complicated by the large range of turbine types present in a typical fleet. These turbines may have different pitch systems, over-speed detection systems and also different levels of functional redundancy, implying different levels of risk. The purpose of this work is to carry out a quantitative comparison of over-speed risk in different turbine configurations, using a Markov process to model detection of faults and repair actions. In the medium-long term, the risk associated with different assets can used as a decision making aid. For example if the operator is a utility, it may want to avoid purchasing high risk sites in the future, or may need to develop mitigation strategies for turbines at high risk of over speed

    Dynamic FTSS in Asynchronous Systems: the Case of Unison

    Full text link
    Distributed fault-tolerance can mask the effect of a limited number of permanent faults, while self-stabilization provides forward recovery after an arbitrary number of transient fault hit the system. FTSS protocols combine the best of both worlds since they are simultaneously fault-tolerant and self-stabilizing. To date, FTSS solutions either consider static (i.e. fixed point) tasks, or assume synchronous scheduling of the system components. In this paper, we present the first study of dynamic tasks in asynchronous systems, considering the unison problem as a benchmark. Unison can be seen as a local clock synchronization problem as neighbors must maintain digital clocks at most one time unit away from each other, and increment their own clock value infinitely often. We present many impossibility results for this difficult problem and propose a FTSS solution when the problem is solvable that exhibits optimal fault containment

    Reliability Analysis of Complex NASA Systems with Model-Based Engineering

    Get PDF
    The emergence of model-based engineering, with Model- Based Systems Engineering (MBSE) leading the way, is transforming design and analysis methodologies. The recognized benefits to systems development include moving from document-centric information systems and document-centric project communication to a model-centric environment in which control of design changes in the life cycles is facilitated. In addition, a single source of truth about the system, that is up-to-date in all respects of the design, becomes the authoritative source of data and information about the system. This promotes consistency and efficiency in regard to integration of the system elements as the design emerges and thereby may further optimize the design. Therefore Reliability Engineers (REs) supporting NASA missions must be integrated into model-based engineering to ensure the outputs of their analyses are relevant and value-needed to the design, development, and operational processes for failure risks assessment and communication

    Risk analysis and reliability of the GERDA Experiment extraction and ventilation plant at Gran Sasso mountain underground laboratory of Italian National Institute for Nuclear Physics

    Get PDF
    The aim of this study is the risk analysis evaluation about argon release from the GERDA experiment in the Gran Sasso underground National Laboratories (LNGS) of the Italian National Institute for Nuclear Physics (INFN). The GERDA apparatus, located in Hall A of the LNGS, is a facility with germanium detectors located in a wide tank filled with about 70 m3 of cold liquefied argon. This cryo-tank sits in another water-filled tank (700 m3) at atmospheric pressure. In such cryogenic processes, the main cause of an accidental scenario is lacking insulation of the cryo-tank. A preliminary HazOp analysis has been carried out on the whole system. The risk assessment identified two possible top-events: explosion due to a Rapid Phase Transition - RPT and argon runaway evaporation. Risk analysis highlighted a higher probability of occurrence of the latter top event. To avoid emission in Hall A, the HazOp, Fault Tree and Event tree analyses of the cryogenic gas extraction and ventilation plant have been made. The failures related to the ventilation system are the main cause responsible for the occurrence. To improve the system reliability some corrective actions were proposed: the use of UPS and the upgrade of damper opening devices. Furthermore, the Human Reliability Analysis identified some operating and management improvements: action procedure optimization, alert warnings and staff training. The proposed model integrates the existing analysis techniques by applying the results to an atypical work environment and there are useful suggestions for improving the system reliability

    Avoiding core's DUE & SDC via acoustic wave detectors and tailored error containment and recovery

    Get PDF
    The trend of downsizing transistors and operating voltage scaling has made the processor chip more sensitive against radiation phenomena making soft errors an important challenge. New reliability techniques for handling soft errors in the logic and memories that allow meeting the desired failures-in-time (FIT) target are key to keep harnessing the benefits of Moore's law. The failure to scale the soft error rate caused by particle strikes, may soon limit the total number of cores that one may have running at the same time. This paper proposes a light-weight and scalable architecture to eliminate silent data corruption errors (SDC) and detected unrecoverable errors (DUE) of a core. The architecture uses acoustic wave detectors for error detection. We propose to recover by confining the errors in the cache hierarchy, allowing us to deal with the relatively long detection latencies. Our results show that the proposed mechanism protects the whole core (logic, latches and memory arrays) incurring performance overhead as low as 0.60%. © 2014 IEEE.Peer ReviewedPostprint (author's final draft

    On Byzantine Broadcast in Loosely Connected Networks

    Full text link
    We consider the problem of reliably broadcasting information in a multihop asynchronous network that is subject to Byzantine failures. Most existing approaches give conditions for perfect reliable broadcast (all correct nodes deliver the authentic message and nothing else), but they require a highly connected network. An approach giving only probabilistic guarantees (correct nodes deliver the authentic message with high probability) was recently proposed for loosely connected networks, such as grids and tori. Yet, the proposed solution requires a specific initialization (that includes global knowledge) of each node, which may be difficult or impossible to guarantee in self-organizing networks - for instance, a wireless sensor network, especially if they are prone to Byzantine failures. In this paper, we propose a new protocol offering guarantees for loosely connected networks that does not require such global knowledge dependent initialization. In more details, we give a methodology to determine whether a set of nodes will always deliver the authentic message, in any execution. Then, we give conditions for perfect reliable broadcast in a torus network. Finally, we provide experimental evaluation for our solution, and determine the number of randomly distributed Byzantine failures than can be tolerated, for a given correct broadcast probability.Comment: 1

    System-of-Systems Complexity

    Full text link
    The global availability of communication services makes it possible to interconnect independently developed systems, called constituent systems, to provide new synergistic services and more efficient economic processes. The characteristics of these new Systems-of-Systems are qualitatively different from the classic monolithic systems. In the first part of this presentation we elaborate on these differences, particularly with respect to the autonomy of the constituent systems, to dependability, continuous evolution, and emergence. In the second part we look at a SoS from the point of view of cognitive complexity. Cognitive complexity is seen as a relation between a model of an SoS and the observer. In order to understand the behavior of a large SoS we have to generate models of adequate simplicity, i.e, of a cognitive complexity that can be handled by the limited capabilities of the human mind. We will discuss the importance of properly specifying and placing the relied-upon message interfaces between the constituent systems that form an open SoS and discuss simplification strategies that help to reduce the cognitive complexity.Comment: In Proceedings AiSoS 2013, arXiv:1311.319

    Bounding the Impact of Unbounded Attacks in Stabilization

    Get PDF
    Self-stabilization is a versatile approach to fault-tolerance since it permits a distributed system to recover from any transient fault that arbitrarily corrupts the contents of all memories in the system. Byzantine tolerance is an attractive feature of distributed systems that permits to cope with arbitrary malicious behaviors. Combining these two properties proved difficult: it is impossible to contain the spatial impact of Byzantine nodes in a self-stabilizing context for global tasks such as tree orientation and tree construction. We present and illustrate a new concept of Byzantine containment in stabilization. Our property, called Strong Stabilization enables to contain the impact of Byzantine nodes if they actually perform too many Byzantine actions. We derive impossibility results for strong stabilization and present strongly stabilizing protocols for tree orientation and tree construction that are optimal with respect to the number of Byzantine nodes that can be tolerated in a self-stabilizing context
    • …
    corecore