Abstract
Introduction
Technological improvement and innovation have led to a continuously increasing in the complexity and functionality of systems. The failures of these systems can cause disruption to the operational functionality and may lead to huge loss. Fault diagnosis has therefore become a first objective in engineering applications. Effective diagnostic approaches, which bring the system back with the the lowest cost, can decrease downtime and consequently, enhance the operational functionality. Aimed at this issue, researchers have developed many effective theories and methodologies. As far as the reasoning model is concerned, there are dependency model [1] , fault tree [2] , Petri net [3] , directed graph [4] , neural networks [5] , Bayesian Network (BN) [6] and so on. Fault tree, widely used in reliability analysis and fault diagnosis, is a graphical method that models how failures propagate through the system. Assaf put forward a diagnostic importance factor (DIF) to determine the diagnosis sequence using static fault tree analysis for the first time [7] . The ratio of efficiency to time for fault diagnosis was introduced to take into consideration the mean time to detection of each unit and find the best program to remove the faults [8] . However, these methods determined the diagnostic sequence only by components' DIF or minimal cut sets' DIF alone, and usually caused the minimal cut sets with a smaller DIF to be diagnosed first [9] . Furthermore, these diagnosis methods were based on the static fault tree analysis and could not model dynamic fault behaviors. For this purpose, Assaf used dynamic fault tree analysis to locate the fault and on this basis, he put forward a method to incorporate evidence data from sensors into the diagnostic process in order to reduce the numbers of the minimal cut sets (MCS) [10] [11] . Nonetheless, the solution for dynamic fault tree was based on Markov model, which has the infamous state space explosion problem. Moreover, it did not incorporate sensors data to update the components' posterior failure probability, which affects the diagnostic efficiency. In the
226
Copyright ⓒ 2015 SERSC work of [12] , reliability results were calculated by mapping a dynamic fault tree to an equivalent discrete-time BN (DTBN) and an efficient diagnostic decision algorithm, proposed based on the DIF of both components and cut sequences, could overcome the above disadvantages. Unfortunately, DTBN is an approximate solution for dynamic fault tree and requires huge memory resources to obtain the query variables probability accurately. Execution time of DTBN method increases when the accuracy increases. An innovative algorithm has been introduced to reduce the dimension of conditional probability tables by an order of magnitude. However, this method cannot perform probability updating [13] . The online implementation of these diagnosis techniques is attracting increasing attention due to the demand for higher performance, efficiency, reliability and safety of system equipments. An online fault diagnostic scheme for nonlinear systems based on neurofuzzy networks was proposed [14] . This scheme needed intact historical data about the process operation under various normal and faulty conditions, which are very difficult to obtain. C. Nan et al. proposed a knowledge-based fault diagnosis approach [15] , which used the valuable knowledge from the experts and operators, as well as real-time data from lots of sensors. Fuzzy logic was also used to make inferences based on the real-time data and the knowledge. However, this methodology is data-driven and its performance is dependent on the quality of expert knowledge and frequency of data processing. A probabilistic model-based real-time diagnosis method was proposed to locate the electrical power system [16] [17] , which used a high-level modeling language to construct BN and compiled the BN into an arithmetic circuit for the real-time reasoning. The novel method could support real-time diagnosis and was an order of magnitude or much faster than the join tree propagation algorithm in comparative experiments. Nevertheless, its diagnostic efficiency greatly depended on its BN model. A fault tree circuit from a reliability block diagram was introduced to aid the analyst in efficient diagnosis, sensitivity analysis, and decision support for many typical reliability problems [18] . However, this approach constructed the system fault model using the static reliability block diagram and could not capture the dynamic fault behaviors.
Motivated by the problems motioned above, this paper presents a novel fault diagnosis framework based on dynamic fault tree and arithmetic circuit shown in Figure 1 . It pays special attention to meeting two challenges: model development and real-time reasoning. To address the challenge of model development, it uses a dynamic fault tree model to capture the dynamic behavior of system failure mechanisms and calculates some quantitative parameters using algebraic technique and BN in order to avoid the state space explosion problem. Furthermore, BN can incorporate the evidence data from sensors and update the DIF according to sensors data. To address the real-time reasoning challenge, a logic compilation based inference is used and divided into two phases: an offline phase, which compiles a BN into an arithmetic circuit and is run once; and an online phase, which answers many queries each time it is invoked, and which may be invoked multiple times. Moreover, we incorporate sensors data into diagnosis process and propose the schemes on how to update the DIF and the minimal cut sets. Finally, we propose an efficient diagnostic algorithm to generate a diagnostic decision tree (DDT) which guides the maintenance crew to make more efficient decisions when trying to repair a system. The proposed approach takes full advantage of the dynamic fault tree for modeling, BN for inference and arithmetic circuit for real-time diagnosis, which is especially suitable for online diagnosis of complex systems. 
Dynamic Fault Tree Analysis

Qualitative Analysis of Dynamic Fault Tree
Qualitative analysis of dynamic fault tree is used to generate all minimal cut sequences. There are many traditional algorithms for solving MCS. However, they are inappropriate to dynamic fault tree. The zero-suppressed binary decision diagram (ZBDD) separates logic constraints and timing constraints and converts the dynamic fault tree into the static fault tree [19] . We generate the MCS of the resulting static fault tree using some set operations and expand each MCS to minimal cut sequences by considering the timing constraints. Let S 1 , S 2 be the input of MCS-AND and MCS-OR respectively, the basic set operations are as follow:
,,
So the output of MCS-AND and MCS-OR are respectively. The MCS generation algorithm is executed recursively during the depth-first left-most traversal of a fault tree. It first generates the MCS of the inputs of a connection gate, and then performs a serial of set operations to combine the MCS of the inputs into the MCS of the output of the connection gate. At last, we can get all the minimal cut sequences from the minimal cut sets by considering the timing constraints.
Quantitative Analysis of Dynamic Fault Tree
Quantitative analysis for dynamic fault tree is used to calculate reliability results , such as the components' DIF and minimal cut sequences' DIF. Traditional solution for dynamic fault tree is based on Markov chain (MC) model [10] [11] , which has the International Journal of Security and Its Applications Vol. 9, No.7 (2015) 228 Copyright ⓒ 2015 SERSC infamous state space explosion problem and cannot solve a larger dynamic fault tree. Therefore, DTBN was proposed to solve the dynamic fault tree in [12, 20] . Dynamic logic gates are converted to DTBN and the reliability results are calculated using a standard BN inference algorithm. However, this is an approximate solution and requires huge memory resources to obtain the probability distribution accurately. In addition, as the number of intervals increases, the accuracy and execution time increase greatly. An innovative algorithm has been introduced to reduce the dimension of conditional probability tables by an order of magnitude [13] . However, this method cannot perform posterior probability updating. Yuge and Yanagi proposed a novel approach for dynamic fault tree solution [21] . In their method, modularization algorithm classifies dynamic fault tree into two types: one satisfies the parental Markov condition and the other does not. If the modul e top is a dynamic gate and a repeated event is contained in the module, or if the module top is a static gate and a repeated event that has distinct dynamic gates as its upper level gates among the module is contained, the module does not have the parent Markov condition. The module without the parent Markov condition is replaced with an equivalent single event. The occurrence probability of this event is obtained as the sum of disjoint sequence probabilities. After the contraction of modules without the parent Markov condition, the BN algorithm is applied to the dynamic fault tree. In this paper, we use BN and algebraic technique to calculate the reliability parameters in order to overcome the disadvantages mentioned above.
Fault Probability of a Module with Sequence Dependence: Let us consider
an event sequence composed of n events, 12 , , , n e e e including several spare events.
An event in the sequence is denoted by i j e , which means that the event that failed in the j-th order of the sequence is designated a spare of an event that failed in the i-th order.
0 j e denotes an event that was originally in active mode. 
where 0 0 a  .
Mapping Static Fault Tree into BN:
There is a clear correspondence between static fault tree and BN. The fault tree can be seen as a particular deterministic case of the BN. Conceptually it is straightforward to map a fault tree into a BN: one only needs to "re-draw" the nodes and connect them while correctly enumerating reliabilities. Figure 2 shows the conversion of an OR and an AND gate into equivalent nodes in a BN. Parent nodes A and B are assigned prior probabiliti es, which coincident with the probability values assigned to the corresponding basic nodes in the fault tree, and child node C is assigned its conditional probability table (CPT). Since the OR and AND gates represent deterministic causal relationships, all the entries of the corresponding CPT are either 0 or 1. The detailed algorithm of converting a fault tree into a BN was proposed in [22] [23] .
Mapping Dynamic Fault Tree into BN:
Dynamic fault tree extends traditional fault tree by defining special gates to capture the components' sequential and functional dependencies. Currently there are six types of dynamic gates defined: the functional dependency gate (FDEP), the cold, hot, and warm spare gates (CSP, HSP, WSP), the priority AND gate (PAND), the sequence enforcing gate (SEQ). Here, we briefly discuss the FDEP and the WSP gates as they will be later used in our examples. 
Figure 2. The Equivalent BN of OR and AND Gate
International
230
Copyright ⓒ 2015 SERSC
(1) WSP Gate WSP gate has one primary input and one or more alternate inputs. The primary input is initially powered on and the alternate inputs are in standby mode. When the primary fails, it is replaced by an alternate input, and in turn, when this alternate input fails, it is replaced by the next available alternate input, and so on and so forth. In standby mode, the component failure rate is reduced by a factor  called the dormancy factor.  is a number between 0 and 1. A cold spare has a dormancy factor =0  ; and a hot spare has a dormancy factor =1  . The WSP gate output is true when the primary and all the alternate inputs fail. Figure 3 shows the WSP gate and its equivalent BN. 
( , )( )
are sequence probabilities calculated by equation (7). 
The output of node WSP is an AND gate whose CPT is shown in Figure 2 . non-dependent output reflecting the status of the trigger, and one or more dependent basic events. Figure 4 shows FDEP gate and its equivalent BN. 
The CPT of output node FDEP is shown in Table 3 . 
Inference based on Arithmetic Circuit
Compilation based Inference Approach
After mapping a dynamic fault tree into a corresponding BN, we can use inference algorithms to the model to calculate components' DIF and update them according to the real-time evidence information. Some popular algorithms exploit global structure to a certain extent and run in time that is exponential in a measure known as treewidth. However, for a larger treewidth BN these algorithms are difficult for exact and real-time inference. To deal with this problem, many approaches have been proposed [24] [25] , which seek to exploit such local structure. These techniques for exploiting local structure have achieved only limited success and have no power of answering multiple queries simultaneously. BN from the dynamic fault tree has lots of local structure information, which contains some determinism (0 or 1), some equal parameters and context-specific independence. In addition, multiple queries need to be calculated simultaneously. So a logic compilation based inference is adopted because logic allows many types of structure to be represented explicitly and allows us to leverage state-of-the-art algorithms for knowledge compilation. The compilation based inference method is divided into two phases, offline compilation and online inference. The offline phase compiles the 
232
Copyright ⓒ 2015 SERSC network into an arithmetic circuit and is run once [26] [27] . In particular, the approach encodes the BN into a conjunctive normal form (CNF), converts t he CNF into a smooth decomposable negation normal form (sd-DNNF) circuit that satisfies some properties, and then extracts an arithmetic circuit from the sd-DNNF circuit. The online phase uses the resulting arithmetic circuits to answer many queries through a simple process of circuit propagation. The main advantages of compilation based inference approach can be summarized as follows. First, the separation of the inference process into offline and online phases allows us to push much of the computational overhead into the offline phase, which can then be amortized over many online queries. Next, the simplicity of the compiled arithmetic circuit and its propagation algorithms facilitate the development of online reasoning systems. Finally, compilation provides an effective framework for exploiting the network local structure, allowing an inference complexity that is not necessarily exponential in the network treewidth. The next section describes the arithmetic circuit representation and presents the circuit propagation process. For more details, we refer the reader to [28] .
Compiling BN into Arithmetic Circuit with Local Structure
The compilation from a BN to an arithmetic circuit is based on the following connection between BN and multilinear functions (MLF). With each BN, we associate a corresponding MLF that computes the probability of evidence. To escape the exponential complexity of the network MLF, Darwiche proposed a CNF approach to encode BN to exploit the network topology and local structure. According to this approach, one encodes the MLF using a propositional theory in CNF, factors the CNF, and then immediately extracts the arithmetic circuit from the CNF factorization. The critical computational step in the above approach is clearly that of factoring/compiling the CNF. The quality of the CNF encoding and the amount of local structure they capture can have a significant effect on both the offline compile time and online inference time. Some new encoding methods have been proposed to obtain order-of-magnitude improvements in compile time and online inference for some networks with local structure, as compared to baseline jointree inference, which does not exploit local structure [26] . Figure 5 shows a simple BN and its corresponding arithmetic circuit that exploits the network topology and local structure. Obviously, if one exploits this local structure, then one can generate the smaller arithmetic circuit.
Online Inference Using Arithmetic Circuit
Once the arithmetic circuit is generated, it can be evaluated and differentiated to answer queries. To each circuit node v, two registers vr(v) and dr(v) are associated. To evaluate the circuit, the propagation algorithm performs a bottom up pass to compute the value of the vr(v) register of each node after having computed the values of the vr(v) registers of all its child nodes. Then it performs a top down pass to differentiate the circuit by computing the value of the dr(v) register of each node after having computed the values of the dr(v) registers of all its parents. The algorithm for evaluation and differentiation an arithmetic circuit can be found in [27] [28] . 
Implementation of the Proposed Real-Time Diagnosis Method
Sensors Diagnostic Model
When the system fails, sometimes additional evidence from diagnostic sensors is observed too, and this may be used to optimize the system diagnosis. A monitor layer for capturing evidence is appended onto the dynamic fault tree and uses static gates to represent sensors [11] . However, this approach only uses sensors to update the qualitative information and does not update the quantitative information. The BN created from the dynamic fault tree is appropriate for reliability analysis. To use the BN for fault diagnosis, we need to add to the network nodes representing the evidence. Evidence nodes in the BN provide links connecting it with the component in the BN, which are observed by the sensors. The links are directed from the component to the evidence nodes. Evidence nodes in the BN create a conditional probability table using the probability of producing the observation results.
DIF allows us to discriminate between components or minimal cut sets by their importance from a diagnostic point of view. The higher is DIF, the more important is component or minimal cut sets. So it can be used to decide between candidate monitor locations. The components which maximum the evidence information function will be monitored by sensors. Thus the designer can just select the components with higher DIF as the sensors location [29] . This sensors optimization placement considers the quantitative and qualitative data obtained from reliability analysis and can guarantee a lower expected diagnostic cost.
As is known to all, sensors might not be completely reliable. A sensor that provides false information can misguide the diagnosis process, thus a sensor failure can make the diagnosis meaningless. So we must consider the effect of sensors reliability. The influences of sensors reliability are embodied not only in the changes on the DIF but also
234
Copyright ⓒ 2015 SERSC in the changes of the minimal cut sets. As to the effect on the DIF, we just change the conditional probability tables of the evidence nodes during the mapping fault tree into BN, and update the DIF according to the evidence data; As far as the effect on the minimal cut sets is concerned, We augment the system function by adding sensors as cut sets, since a failure sensor can lead to a faulty diagnosis progress.
Updating Reliability Results Using Sensors Data
After the sensors location is determined, we can use sensors data to optimize the diagnostic progress. On one hand, we can use sensors data to narrow down the number of the diagnosed minimal cut sets. The cut sets under evidence (CUE) is the set of all essential minimal cut sets obtained after evidence eliminates some cut sets. For example, a system has 4 minimal cut sets: {A, B}, {A, D}, {C, D} and {D, E}. These minimal cut sets are captured in the system's characteristic function due to ignoring evidence:
If sensors detect the failure of B and D, the updated CUE function is generated:
(10) On the other hand, we can update the DIF of the components and CUE. Components' DIF can be updated solving BN according to sensors data, while the DIF of the CUE can be calculated using:
where S and E represent the system and the variables with given evidence respectively.
Diagnosis Algorithm and its Evaluation
As cut sets represent minimal sets of component failures that can cause a system failure, we should diagnose them one by one to find the root reason of system failure. Only when we finish diagnosing a minimal cut set can we do next. The order by which cut sets are checked depends on its DIF ordering, while the order of components in the same cut set is determined by their DIF. The cut sets with larger DIF are checked first. Accordingly, components with larger DIF in a cut set are checked first [29] . This assures a reduced number of system checks while fixing the system. Average diagnostic cost is often used to evaluate the fault diagnosis method. The diagnostic cost is lower, the method is better. As we all know, the output of fault diagnosis method is the DDT, we can evaluate it with the help of several decision tree evaluation measures. We adopt expected diagnostic cost (EDC) which incorporates the qualitative (structure) and quantitative (reliability analysis) into one measure for predicting diagnosis cost. The EDC can be calculated by the following expression:
where Q s is the unreliability of the system, cp i is the sum of all test costs from the top node to the cutset's leaf node, qcutset i is the unreliability of cut sequences. 
Application of Diagnostic Method
We generate all minimal cut sequences via the efficient ZBDD: 
Assume the mission time 600, we map the dynamic fault tree into the equivalent BN and calculate some DIF using inference algorithm based on arithmetic circuit. Table 4 and 5 show DIF of components and DIF of minimal cut sets without sensors for train-ground communication system, respectively. 
Also we can update the DIF of components and CUE after receiving the evidence data. Table 6 and 7 shows the diagnostic data with sensors data. Based on the diagnostic decision algorithm above mentioned, we can generate its DDT. Assuming all components and sensors have a unit test cost, the diagnostic cost of different algorithms is shown in Table 8 , which indicates the proposed approach is more efficient than others. Experimental results demonstrate that the EDC is lower as sensors have higher reliability. So we should choose sensors with higher reliability. Table 9 shows results for different inference algorithms. The compiled algorithm based on arithmetic circuit is faster than the join tree algorithm. The timing measurements reported here were made on a PC with an Intel 2. 5 GHz processor, 4 GB RAM, and Windows 7. 
Conclusions
In this paper, we have proposed an efficient framework for real-time fault diagnosis based on reliability analysis and arithmetic circuit. It has emphasized two important issues that arise in engineering diagnostic applications, namely the challenges of model development and real-time reasoning. In terms of the challenge of model development, we adopt a dynamic fault tree model to capture the dynamic failure mechanisms and calculate some reliability results using algebraic technique and BN in order to avoid the infamous state space explosion problem. In terms of the real-time reasoning challenge, we use a logic compilation based inference algorithm, which compiles a BN into an arithmetic circuit and retrieves answers to probabilistic queries by evaluating and differentiating the arithmetic circuit. Furthermore, we incorporate sensors data into diagnosis process, deal with the sensors reliability problem and propose the schemes on how to update the diagnostic importance factor and the minimal cut sequences. A case study is given to demonstrate the efficiency of this framework. The proposed method makes use of the advantages of the dynamic fault tree for modeling, BN for inference ability and arithmetic circuit for real-time reasoning, which is especially suitable for the complex system diagnosis.
