Search CORE

8,139 research outputs found

Multi-State System Reliability: A New and Systematic Review

Author: Jing Li
Yingkui Gu
Publication venue: Published by Elsevier Ltd.
Publication date: 31/12/2012
Field of study

AbstractReliability analysis considering multiple possible states is known as multi-state (MS) reliability analysis. Multi-state system reliability models allow both the system and its components to assume more than two levels of performance. Through multi-state reliability models provide more realistic and more precise representations of engineering systems, they are much more complex and present major difficulties in system definition and performance evaluation. MSS reliability has received a substantial amount of attention in the past four decades. This article presents a new and systematic review about multi-state system reliability. A timely review is an effective work related to improving the development of MSS theory. The review about the latest studies and advances about multi-state system reliability evaluation, multi-state systems optimization and multi-state systems maintenance is summarized in this paper

Elsevier - Publisher Connector

A Condition-Based Maintenance Model for Assets with Accelerated Deterioration Due to Fault Propagation

Author: Liang Z
Parlikad AK
Publication venue: IEEE Transactions on Reliability
Publication date: 01/01/2015
Field of study

Complex industrial assets such as power transformers are subject to accelerated deterioration when one of its constituent component malfunctions, affecting the condition of other components, which is a phenomenon called fault propagation. In this paper, we present a novel approach for optimizing condition-based maintenance policies for such assets by modelling their deterioration as a multiple dependent deterioration path process. The aim of the policy is to replace the malfunctioned component and mitigate accelerated deterioration at minimal impact to the business. The maintenance model provides guidance on determining inspection and maintenance strategies to optimize asset availability and operational cost.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/TR.2015.243913

Crossref

Apollo (Cambridge)

Selective maintenance for multistate series systems with S-dependent components

Author: Dao Cuong D.
Zuo M.J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2016
Field of study

YesIn this paper, we will consider the selective maintenance problem for multistate series systems with stochastic dependent components. In multistate systems, the health state of a component may vary from perfect functioning to complete failure. The stochastic dependence (S-dependence) between components is discussed and categorized into two types in multistate context. First, the failure of a component can immediately cause complete failures of some other components in the system. Second, as components deteriorate, the reduced working performance rate of a multistate component affects the state as well as the degradation rate of its subsequent components in series structure. The system reliability is evaluated using an approach based on stochastic process. A cost-based selective maintenance model is developed for the multistate system with S-dependent components to maximize the total system profit, which includes the production gain and loss in the next mission as well as possible maintenance costs for the system. Analyses of systems with independent and dependent components are provided. It is observed that ignoring S-dependence in the system may lead to alternative maintenance decision making and an optimistic estimation of the system performance

Crossref

Bradford Scholars

University of Twente Research Information

Cross-layer Soft Error Analysis and Mitigation at Nanoscale Technologies

Author: Ebrahimi Mojtaba
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2016
Field of study

This thesis addresses the challenge of soft error modeling and mitigation in nansoscale technology nodes and pushes the state-of-the-art forward by proposing novel modeling, analyze and mitigation techniques. The proposed soft error sensitivity analysis platform accurately models both error generation and propagation starting from a technology dependent device level simulations all the way to workload dependent application level analysis

KITopen

Recommended from our members

A Flexible and Efficient Protocol for Multi-Scope Service Registry Replication

Author: Schulzrinne Henning G.
Zhao Weibin
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2002
Field of study

Service registries play an important role in service discovery systems by accepting service registrations and answering service queries; they can serve a wide range of purposes, such as membership services, lookup services, and search services. To provide fault tolerant, and enhance scalability, availability and performance, service registries often need to be replicated. In this paper, we present Swift (Selective anti-entropy WIth FasT update propagation), a flexible and efficient protocol for multi-scope service registry replication. As consistency is a less of concern compared with availability in service registry replication, we choose to build Swift on top of anti-entropy to support high availability replication. Swift makes two contributions as follows. First, it defines a more general and flexible form of anti-entropy called selective anti-entropy, which extends the applicability of anti-entropy from full replication to partial replication by selectively reconciling inconsistent states between two replicas, and improves anti-entropy efficiency by fine controlling update propagation within each subset. Selective anti-entropy is the first that we are aware of in using anti-entropy to support generic partial replication. Secondly, Swift integrates service registry overlay networks with selective anti-entropy. Different topologies, such as full mesh and spanning tree, can be used for constructing service registry overlay networks. These overlay networks are used to propagate new updates quickly so as to minimize inconsistency among replicas. We have implemented Swift for replicating multi-scope Directory Agents in the Service Location Protocol. Our experience shows that Swift is flexible, efficient, and lightweight

Columbia University Academic Commons

Study of fault-tolerant software technology

Author: Broglio C.
Goldberg J.
Hitt E.
Levitt K.
Slivinski T.
Webb J.
Wild C.
Publication venue
Publication date
Field of study

Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance

NASA Technical Reports Server

The implementation and use of Ada on distributed systems with reliability requirements

Author: Knight J. C.
Reynolds P. F.
Urquhart J. I. A.
Publication venue
Publication date
Field of study

The issues involved in the use of the programming language Ada on distributed systems are discussed. The effects of Ada programs on hardware failures such as loss of a processor are emphasized. It is shown that many Ada language elements are not well suited to this environment. Processor failure can easily lead to difficulties on those processors which remain. As an example, the calling task in a rendezvous may be suspended forever if the processor executing the serving task fails. A mechanism for detecting failure is proposed and changes to the Ada run time support system are suggested which avoid most of the difficulties. Ada program structures are defined which allow programs to reconfigure and continue to provide service following processor failure

NASA Technical Reports Server

Design of a fault tolerant airborne digital computer. Volume 2: Computational requirements and technology

Author: Clark C. B.
Goldberg J.
Ratner R. S.
Shapiro E. B.
Wahlstrom S. E.
Zeidler H. M.
Publication venue
Publication date
Field of study

This final report summarizes the work on the design of a fault tolerant digital computer for aircraft. Volume 2 is composed of two parts. Part 1 is concerned with the computational requirements associated with an advanced commercial aircraft. Part 2 reviews the technology that will be available for the implementation of the computer in the 1975-1985 period. With regard to the computation task 26 computations have been categorized according to computational load, memory requirements, criticality, permitted down-time, and the need to save data in order to effect a roll-back. The technology part stresses the impact of large scale integration (LSI) on the realization of logic and memory. Also considered was module interconnection possibilities so as to minimize fault propagation

NASA Technical Reports Server

Reliability for exascale computing : system modelling and error mitigation for task-parallel HPC applications

Author: Subasi Omer
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2016
Field of study

As high performance computing (HPC) systems continue to grow, their fault rate increases. Applications running on these systems have to deal with rates on the order of hours or days. Furthermore, some studies for future Exascale systems predict the rates to be on the order of minutes. As a result, efficient fault tolerance solutions are needed to be able to tolerate frequent failures. A fault tolerance solution for future HPC and Exascale systems must be low-cost, efficient and highly scalable. It should have low overhead in fault-free execution and provide fast restart because long-running applications are expected to experience many faults during the execution. Meanwhile task-based dataflow parallel programming models (PM) are becoming a popular paradigm in HPC applications at large scale. For instance, we see the adaptation of task-based dataflow parallelism in OpenMP 4.0, OmpSs PM, Argobots and Intel Threading Building Blocks. In this thesis we propose fault-tolerance solutions for task-parallel dataflow HPC applications. Specifically, first we design and implement a checkpoint/restart and message-logging framework to recover from errors. We then develop performance models to investigate the benefits of our task-level frameworks when integrated with system-wide checkpointing. Moreover, we design and implement selective task replication mechanisms to detect and recover from silent data corruptions in task-parallel dataflow HPC applications. Finally, we introduce a runtime-based coding scheme to detect and recover from memory errors in these applications. Considering the span of all of our schemes, we see that they provide a fairly high failure coverage where both computation and memory is protected against errors.A medida que los Sistemas de Cómputo de Alto rendimiento (HPC por sus siglas en inglés) siguen creciendo, también las tasas de fallos aumentan. Las aplicaciones que se ejecutan en estos sistemas tienen una tasa de fallos que pueden estar en el orden de horas o días. Además, algunos estudios predicen que los fallos estarán en el orden de minutos en los Sistemas Exascale. Por lo tanto, son necesarias soluciones eficientes para la tolerancia a fallos que puedan tolerar fallos frecuentes. Las soluciones para tolerancia a fallos en los Sistemas futuros de HPC y Exascale tienen que ser de bajo costo, eficientes y altamente escalable. El sobrecosto en la ejecución sin fallos debe ser bajo y también se debe proporcionar reinicio rápido, ya que se espera que las aplicaciones de larga duración experimenten muchos fallos durante la ejecución. Por otra parte, los modelos de programación paralelas basados en tareas ordenadas de acuerdo a sus dependencias de datos, se están convirtiendo en un paradigma popular en aplicaciones HPC a gran escala. Por ejemplo, los siguientes modelos de programación paralela incluyen este tipo de modelo de programación OpenMP 4.0, OmpSs, Argobots e Intel Threading Building Blocks. En esta tesis proponemos soluciones de tolerancia a fallos para aplicaciones de HPC programadas en un modelo de programación paralelo basado tareas. Específicamente, en primer lugar, diseñamos e implementamos mecanismos “checkpoint/restart” y “message-logging” para recuperarse de los errores. Para investigar los beneficios de nuestras herramientas a nivel de tarea cuando se integra con los “system-wide checkpointing” se han desarrollado modelos de rendimiento. Por otra parte, diseñamos e implementamos mecanismos de replicación selectiva de tareas que permiten detectar y recuperarse de daños de datos silenciosos en aplicaciones programadas siguiendo el modelo de programación paralela basadas en tareas. Por último, se introduce un esquema de codificación que funciona en tiempo de ejecución para detectar y recuperarse de los errores de la memoria en estas aplicaciones. Todos los esquemas propuestos, en conjunto, proporcionan una cobertura bastante alta a los fallos tanto si estos se producen el cálculo o en la memoria.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa