2,152 research outputs found
Towards the Automated Verification of Weibull Distributions for System Failure Rates
Weibull distributions can be used to accurately model failure
behaviours of a wide range of critical systems such as on-orbit satellite
subsystems. Markov chains have been used extensively to model reliability
and performance of engineering systems or applications. However,
the exponentially distributed sojourn time of Continuous-Time Markov
Chains (CTMCs) can sometimes be unrealistic for satellite systems that
exhibit Weibull failures. In this paper, we develop novel semi-Markov
models that characterise failure behaviours, based on Weibull failure
modes inferred from realistic data sources. We approximate and encode
these new models with CTMCs and use the PRISM probabilistic model
checker. The key bene t of this integration is that CTMC-based model
checking tools allow us to automatically and e ciently verify reliability
properties relevant to industrial critical systems
Towards the Formal Reliability Analysis of Oil and Gas Pipelines
It is customary to assess the reliability of underground oil and gas
pipelines in the presence of excessive loading and corrosion effects to ensure
a leak-free transport of hazardous materials. The main idea behind this
reliability analysis is to model the given pipeline system as a Reliability
Block Diagram (RBD) of segments such that the reliability of an individual
pipeline segment can be represented by a random variable. Traditionally,
computer simulation is used to perform this reliability analysis but it
provides approximate results and requires an enormous amount of CPU time for
attaining reasonable estimates. Due to its approximate nature, simulation is
not very suitable for analyzing safety-critical systems like oil and gas
pipelines, where even minor analysis flaws may result in catastrophic
consequences. As an accurate alternative, we propose to use a
higher-order-logic theorem prover (HOL) for the reliability analysis of
pipelines. As a first step towards this idea, this paper provides a
higher-order-logic formalization of reliability and the series RBD using the
HOL theorem prover. For illustration, we present the formal analysis of a
simple pipeline that can be modeled as a series RBD of segments with
exponentially distributed failure times.Comment: 15 page
Towards automatic Markov reliability modeling of computer architectures
The analysis and evaluation of reliability measures using time-varying Markov models is required for Processor-Memory-Switch (PMS) structures that have competing processes such as standby redundancy and repair, or renewal processes such as transient or intermittent faults. The task of generating these models is tedious and prone to human error due to the large number of states and transitions involved in any reasonable system. Therefore model formulation is a major analysis bottleneck, and model verification is a major validation problem. The general unfamiliarity of computer architects with Markov modeling techniques further increases the necessity of automating the model formulation. This paper presents an overview of the Automated Reliability Modeling (ARM) program, under development at NASA Langley Research Center. ARM will accept as input a description of the PMS interconnection graph, the behavior of the PMS components, the fault-tolerant strategies, and the operational requirements. The output of ARM will be the reliability of availability Markov model formulated for direct use by evaluation programs. The advantages of such an approach are (a) utility to a large class of users, not necessarily expert in reliability analysis, and (b) a lower probability of human error in the computation
A general graphical user interface for automatic reliability modeling
Reported here is a general Graphical User Interface (GUI) for automatic reliability modeling of Processor Memory Switch (PMS) structures using a Markov model. This GUI is based on a hierarchy of windows. One window has graphical editing capabilities for specifying the system's communication structure, hierarchy, reconfiguration capabilities, and requirements. Other windows have field texts, popup menus, and buttons for specifying parameters and selecting actions. An example application of the GUI is given
A study of the efficacy of a reliability management system - with suggestions for improved data collection and decision making.
Master's thesis in Risk ManagementProduct reliability is very important especially in the perspective of new product development. Making highly reliable drilling and well equipment is expensive and time-consuming process. But ignoring the product reliability could prove even more costly. Thus the manufacturers need to decide on the best reliability performance that succeeds to create a proper balance between time, cost and reliability factors to ensure the desired results. A reliability management system is a tool that the manufacturers can use to manage this process to produce reliable equipment. However, if this system is not well structured and lacks some important features, it can affect the outcomes of reliability analysis and decision making. A lot of research has been done on creating a good reliability and maintenance database to improve systems reliability in the petroleum industry. Offshore & Onshore Reliability Data (OREDA) and ISO 12224 are part of such research projects. The main objective of this research is to analyze the existing reliability management system (RMS) in Petroleum Technology Company (PTC) in terms of its structure, features, functionality, and the quality of data being recorded in RMS and how it affects decision making. The research was motivated by following issues 1) Reliability Management System of PTC is not automated in terms of extracting data from other sources within company, 2) PTC is missing a specified platform for failure reporting of their equipment, 3) the activities related to data collection and management are not well-organized hence demanding more effort. To analyze these issues, a literature study is performed to review the existing standards in the industry. ISO14224 and OREDA define a very structured database to get easy access to reliability and maintenance data. OREDA database has well-defined taxonomy, boundaries and database structure. Also, it has a well-organized procedure in place to collect and store reliability data. Quality assessment of the data being collected is done through predefined procedures guideline. OREDA have a very consistent list of codes to store language in coding form in the reliability
and maintenance database. By reviewing the existing standard in the industry, a few shortcomings have been identified both in the RMS and PTC failure reporting procedures. It is observed that data from the sources is collected by the responsible person but the collection method is usually not tested and
planned. Data collection sources, methods and procedures within company or outside the
company lack well-defined criteria and data quality assurance processes. Currently, the
company is using Field Service Reports (FSR) and company’s other databases as data sources
for RMS. A company cannot access client’s system that contains equipment utilization and
process-related information. This can lead to missing information or ambiguous data because
the data-entry responsible person needs to make assumptions sometimes to complete the
missing operational and environmental data.
The RMS database structure lacks well-defined taxonomy, design parameters, and adequate
failure mode classification. The Failure modes is an important aspect of the high-quality database
since it can help in identifying the need for changes to maintenance periodicities, or the need
for additional checks. The Offshore & Onshore Reliability Data (OREDA) project participating
companies e.g. Statoil can calculate failure rates for selected data populations of within well-defined
boundaries of manufacturer, design and operational parameters. These features are
missing in RMS database.
It is recommended that PTC consider developing a failure reporting database to handle their
failure event data in an organized way. For this purpose, failure reporting, analysis, and
corrective action system (FRACAS) technique is suggested. FRACAS data from FRACAS
database can be used effectively to verify failure modes and failure causes in the failure mode
effect and criticality analysis (FMECA). Failure review board in the FRACAS process includes
personnel from mix disciplines (design, manufacturing, systems, quality, and reliability
engineering) as well as leadership (technical or managerial leads), to make sure that a well-rounded
the discussion is performed for particular failure related issues. The Failure Review Board
(FRB) analyzes the failures in terms of time, money required corrective actions. And finally,
management makes the decisions on basis of identified corrective action.
As data quality has a high impact on the outcomes of reliability analysis through reliability
management system. To have a good data quality, data collecting procedures and process
management should be well-organized. It is crucial to performed data quality assessment on
collected data. A data mining technique is discussed as a part of suggestion to improve data
quality in RMS database. Once data is stored in RMS database a data mining method; data
quality mining can help to assess the quality of data in a database. This is done by applying a data
mining (DM) tool to look at interesting patterns of data with the purpose of quality assessment.
Various data mining model is available in the market but PTC needs to select DM model
which suits best their business objectives.
RMS database is hard-wired so it is difficult to change its features and database structure.
However, if PTC emphasize on improving failure reporting procedures and data quality in data
sources locating within the company, it will directly and positively affect the data quality in
RMS and the results of data analysis in RMS. This, in turn, can improve their decision-making
the process regarding new product development and redesigning the existing products
Probabilistic verification of satellite systems for mission critical applications
In this thesis, we present a quantitative approach using probabilistic verification techniques for the analysis of reliability, availability, maintainability, and safety (RAMS) properties of satellite systems. The subject of our research is satellites used in mission critical industrial applications. A strong case for using probabilistic model checking to support RAMS analysis of satellite systems is made by our verification results. This study is intended to build a foundation to help reliability engineers with a basic background in model checking to apply probabilistic model checking to small satellite systems.
We make two major contributions. One of these is the approach of RAMS analysis to satellite systems. In the past, RAMS analysis has been extensively applied to the field of electrical and electronics engineering. It allows system designers and reliability engineers to predict the likelihood of failures from the indication of historical or current operational data. There is a high potential for the application of RAMS analysis in the field of space science and engineering. However, there is a lack of standardisation and suitable procedures for the correct study of RAMS characteristics for satellite systems. This thesis considers the promising application of RAMS analysis to the case of satellite design, use, and maintenance, focusing on its system segments. Data collection and verification procedures are discussed, and a number of considerations are also presented on how to predict the probability of failure.
Our second contribution is leveraging the power of probabilistic model checking to analyse satellite systems. We present techniques for analysing satellite systems that differ from the more common quantitative approaches based on traditional simulation and testing. These techniques have not been applied in this context before. We present the use of probabilistic techniques via a suite of detailed examples, together with their analysis. Our presentation is done in an incremental manner: in terms of complexity of application domains and system models, and a detailed PRISM model of each scenario. We also provide results from practical work together with a discussion about future improvements
Formal Availability Analysis using Theorem Proving
Availability analysis is used to assess the possible failures and their
restoration process for a given system. This analysis involves the calculation
of instantaneous and steady-state availabilities of the individual system
components and the usage of this information along with the commonly used
availability modeling techniques, such as Availability Block Diagrams (ABD) and
Fault Trees (FTs) to determine the system-level availability. Traditionally,
availability analyses are conducted using paper-and-pencil methods and
simulation tools but they cannot ascertain absolute correctness due to their
inaccuracy limitations. As a complementary approach, we propose to use the
higher-order-logic theorem prover HOL4 to conduct the availability analysis of
safety-critical systems. For this purpose, we present a higher-order-logic
formalization of instantaneous and steady-state availability, ABD
configurations and generic unavailability FT gates. For illustration purposes,
these formalizations are utilized to conduct formal availability analysis of a
satellite solar array, which is used as the main source of power for the Dong
Fang Hong-3 (DFH-3) satellite.Comment: 16 pages. arXiv admin note: text overlap with arXiv:1505.0264
Recommended from our members
Improving System Reliability for Cyber-Physical Systems
Cyber-physical systems (CPS) are systems featuring a tight combination of, and coordination between, the system's computational and physical elements. Cyber-physical systems include systems ranging from critical infrastructure such as a power grid and transportation system to health and biomedical devices. System reliability, i.e., the ability of a system to perform its intended function under a given set of environmental and operational conditions for a given period of time, is a fundamental requirement of cyber-physical systems. An unreliable system often leads to disruption of service, financial cost and even loss of human life. An important and prevalent type of cyber-physical system meets the following criteria: processing large amounts of data; employing software as a system component; running online continuously; having operator-in-the-loop because of human judgment and an accountability requirement for safety critical systems. This thesis aims to improve system reliability for this type of cyber-physical system. To improve system reliability for this type of cyber-physical system, I present a system evaluation approach entitled automated online evaluation (AOE), which is a data-centric runtime monitoring and reliability evaluation approach that works in parallel with the cyber-physical system to conduct automated evaluation along the workflow of the system continuously using computational intelligence and self-tuning techniques and provide operator-in-the-loop feedback on reliability improvement. For example, abnormal input and output data at or between the multiple stages of the system can be detected and flagged through data quality analysis. As a result, alerts can be sent to the operator-in-the-loop. The operator can then take actions and make changes to the system based on the alerts in order to achieve minimal system downtime and increased system reliability. One technique used by the approach is data quality analysis using computational intelligence, which applies computational intelligence in evaluating data quality in an automated and efficient way in order to make sure the running system perform reliably as expected. Another technique used by the approach is self-tuning which automatically self-manages and self-configures the evaluation system to ensure that it adapts itself based on the changes in the system and feedback from the operator. To implement the proposed approach, I further present a system architecture called autonomic reliability improvement system (ARIS). This thesis investigates three hypotheses. First, I claim that the automated online evaluation empowered by data quality analysis using computational intelligence can effectively improve system reliability for cyber-physical systems in the domain of interest as indicated above. In order to prove this hypothesis, a prototype system needs to be developed and deployed in various cyber-physical systems while certain reliability metrics are required to measure the system reliability improvement quantitatively. Second, I claim that the self-tuning can effectively self-manage and self-configure the evaluation system based on the changes in the system and feedback from the operator-in-the-loop to improve system reliability. Third, I claim that the approach is efficient. It should not have a large impact on the overall system performance and introduce only minimal extra overhead to the cyberphysical system. Some performance metrics should be used to measure the efficiency and added overhead quantitatively. Additionally, in order to conduct efficient and cost-effective automated online evaluation for data-intensive CPS, which requires large volumes of data and devotes much of its processing time to I/O and data manipulation, this thesis presents COBRA, a cloud-based reliability assurance framework. COBRA provides automated multi-stage runtime reliability evaluation along the CPS workflow using data relocation services, a cloud data store, data quality analysis and process scheduling with self-tuning to achieve scalability, elasticity and efficiency. Finally, in order to provide a generic way to compare and benchmark system reliability for CPS and to extend the approach described above, this thesis presents FARE, a reliability benchmark framework that employs a CPS reliability model, a set of methods and metrics on evaluation environment selection, failure analysis, and reliability estimation. The main contributions of this thesis include validation of the above hypotheses and empirical studies of ARIS automated online evaluation system, COBRA cloud-based reliability assurance framework for data-intensive CPS, and FARE framework for benchmarking reliability of cyber-physical systems. This work has advanced the state of the art in the CPS reliability research, expanded the body of knowledge in this field, and provided some useful studies for further research
Addressing Complexity and Intelligence in Systems Dependability Evaluation
Engineering and computing systems are increasingly complex, intelligent, and open adaptive. When it comes to the dependability evaluation of such systems, there are certain challenges posed by the characteristics of “complexity” and “intelligence”. The first aspect of complexity is the dependability modelling of large systems with many interconnected components and dynamic behaviours such as Priority, Sequencing and Repairs. To address this, the thesis proposes a novel hierarchical solution to dynamic fault tree analysis using Semi-Markov Processes. A second aspect of complexity is the environmental conditions that may impact dependability and their modelling. For instance, weather and logistics can influence maintenance actions and hence dependability of an offshore wind farm. The thesis proposes a semi-Markov-based maintenance model called “Butterfly Maintenance Model (BMM)” to model this complexity and accommodate it in dependability evaluation. A third aspect of complexity is the open nature of system of systems like swarms of drones which makes complete design-time dependability analysis infeasible. To address this aspect, the thesis proposes a dynamic dependability evaluation method using Fault Trees and Markov-Models at runtime.The challenge of “intelligence” arises because Machine Learning (ML) components do not exhibit programmed behaviour; their behaviour is learned from data. However, in traditional dependability analysis, systems are assumed to be programmed or designed. When a system has learned from data, then a distributional shift of operational data from training data may cause ML to behave incorrectly, e.g., misclassify objects. To address this, a new approach called SafeML is developed that uses statistical distance measures for monitoring the performance of ML against such distributional shifts. The thesis develops the proposed models, and evaluates them on case studies, highlighting improvements to the state-of-the-art, limitations and future work
- …