Abstract-Recent feedback obtained while applying Modelbased diagnosis (MBD) in industry suggests that the costs involved in behavioral modeling (both expertise and labor) can outweigh the benefits of MBD as a high-performance diagnosis approach. In this paper, we propose an automatic approach, called ANTARES, that completely avoids behavioral modeling. Decreasing modeling sacrifices diagnostic accuracy, as the size of the ambiguity group (i.e., components which cannot be discriminated because of the lack of information) increases, which in turn increases misdiagnosis penalty. ANTARES further breaks the ambiguity group size by considering the component's false negative rate (FNR), which is estimated using an analytical expression. Furthermore, we study the performance of ANTARES for a number of logic circuits taken from the 74XXX/ISCAS benchmark suite. Our results clearly indicate that sacrificing modeling information degrades the diagnosis quality. However, considering FNR information improves the quality, attaining the diagnostic performance of an MBD approach.
INTRODUCTION
In model-based diagnosis (MBD) the cost of the diagnostic process can be broken down into modeling and solution cost. Solution cost includes algorithmic as well as identification penalty (pointlessly testing any incorrectly diagnosed candidates), where identification cost is often used as diagnostic utility measure. Traditionally, MBD studies the trade-offs between the above cost dimensions in a model-once-diagnoseoften context, where modeling cost is amortized over many observations. While solution cost has been an important success factor (especially in time-critical applications), a recent design and maintenance case study in Dutch industry suggests that modeling cost is much more of a bottleneck for the acceptance of MBD than previously considered. At ASML a LYDIAbased diagnoser has successfully been used to diagnose faults in an important electro-mechanical subsystem that frequently suffered from failures [1] . It was shown that MBD can reduce solution cost from days to minutes for a once-only investment 978-1-4799-1622-1/14/$31.00 c 2014 IEEE. of 25 man-days of modeling effort (approximately 2,000 lines of code (LOC), comprising sensor modeling, electrical circuits, and some simple mechanisms). Despite the obvious financial gains, management discontinued the project once it became clear that only 80% of the model could be obtained automatically from the system's source code [2] , [3] .
In view of the continuous evolution of a large fraction of the subsystems (component upgrading, new lithographic technology, many machine versions) their reluctance to embrace non-automated modeling for anything else than their core business (lithography) seems understandable. Behavioral modeling is primarily a complex, manual process which can be extremely time-intensive and error-prone. For certain systems, it is even impossible to build behavioral models. For simple components, as found in combinatorial logic circuits, a library approach to behavioral component modeling can amortize much of that cost, reducing the modeling process to compiling the structural information of the circuit into a system model. Still, there remains a considerable manual factor as components evolve and compositionality in complex systems is typically limited.
The fact that real-world software of realistic size still cannot be modeled for the purpose of efficient, automatic debugging [4] has led the software engineering community to investigate approaches that are not based on behavioral modeling, such as spectrum-based fault localization (SFL). Unlike MBD, in SFL the dynamic program execution profiles of tests (called spectra, hence the name SFL) is correlated with the test outcomes (pass/fail), typically by using statistical similarity coefficients. The components are subsequently ranked in order of the likelihood that they are responsible for test failures. As the spectra are captured by automatic profiling, and as the test oracles are readily implemented from existing specifications, no modeling effort is required. Benchmark studies, as well as case studies by the authors diagnosing embedded software (100 KLOC) from Philips Semiconductors (now NXP) have shown promising results [5] , [6] , [7] . Recently, a model-based approach to SFL has been presented [8] where the statistical approach has been replaced by a reasoning approach. Grounded in (Bayesian) probability theory, the reasoning approach outperforms the statistical approach, in particular for multiple-faults, at polynomial cost due to a number of approximations within the diagnosis algorithm. In particular, as the reasoning approach is based on a generic component model no software modeling effort is involved.
In industrial situations where software and hardware is constantly evolving, a critical success factor in the mainstream adoption of MBD is whether modeling can be fully automated. Given the results of (model-based) SFL in the software domain, in this paper, we study to what extent (model-based) SFL can offer an alternative to MBD in the logic hardware domain. Apart from the above industrial motivation, there are two additional reasons: (1) the motivation to study the relationship between SFL and MBD, where SFL's greater ability to handle large time series of observation data can partly compensate for its inherently limited precision compared to MBD, as well as (2) the benefits of a unified approach to simultaneously diagnosing software and hardware, particularly of interest in the embedded systems domain (e.g., abundant in the aerospace industry), where the root cause of software level failures can now be traced down to the hardware level.
This paper makes the following contributions:
(1) We present a spectrum-based diagnosis approach to logic circuits, which is part of ANTARES (AutoMAtic systems Diagnosis wIthout behaviOral modelS ), to generate diagnosers based on circuit topology without modeling the behavior of the circuit's components. (2) We describe a particular ANTARES feature that automatically estimates the error propagation characteristics of a circuit, a critical parameter that significantly improves the quality of SFL's Bayesian posterior probability computation. (3) We compare the performance of ANTARES with a state-of-the-art MBD approach (GDE [9] ) using the 74XXX/ISCAS85 benchmark suite of logic circuits.
Our results show that for the logic circuits we studied ANTARES is indeed capable of approaching the performance of MBD, provided accurate information for each component is available on the average pass rate of tests (false negative rate, FNR) that cover the component when faulted.
Approaches that abstract specific component behavior, also known as structural diagnosis, have been proposed in the past, e.g., [10] , [11] , [12] . None of these approaches are able to deal with intermittent faults. The Analytic Redundancy Relation (ARR) based approach [13] is close to our approach. However, (i) it does not scale well for multiple faults, (ii) it is not studied for probabilistic framework, and (iii) it is also incapable of diagnosing intermittent fault. To the best of our knowledge, we are the first to propose the use of SFL in the multiple-fault diagnosis of logic circuits comprising both persistent and intermittent logic. Note that the technique presented in this paper is orthogonal to techniques for automatic testing (the approach in this paper is started once something is found to be failing in order to pinpoint the root cause of the observed failure), such as automatic testing pattern
The paper is organized as follows. In Section 2 we briefly describe the principles behind SFL, as well as the diagnostic utility metric (identification cost) that we use to compare diagnostic performance. In Section 3 we present the ANTARES approach to modeling hardware, featuring an analytic FNR estimation technique. In Section 4 we compare ANTARES with GDE using the 74XXX/ISCAS85 benchmark circuits. In Section 5 we summarize our contributions.
SFL
This section briefly reviews SFL. More detailed descriptions can be found in [8] , [14] . In SFL the following is given:
• A finite set C = {c 1 , . . . , c j , . . . , c M } of M components of which M f are faulted.
• A finite set T = {t 1 , . . . , t i , . . . , t N } of N tests with binary
, where a ij = 1 if test t i involves component c j , and 0 otherwise. Each row is also called a spectrum.
For a Bayesian approach to SFL, the following additional information is also required:
• The prior fault probability of a component c j , denoted p j .
• The false negative rate (FNR) of a component, denoted g j , which expresses the probability that a test involving a component c j , when faulted, will still pass. In software FNR is related to coincidental correctness [15] and failure exposing potential [16] , while in hardware FNR is related to failure intermittency [17] .
The result of SFL is a component ranking R =< c r(1) , . . . , c r(j) , . . . , c r(M ) >, ordered in terms of decreasing likelihood Pr(c j ) that c j is at fault. In statistical approaches to SFL Pr(c j ) is approximated using statistical similarity coefficients [18] . In this paper we will consider a reasoning approach where the Pr(c j ) are posteriors based on Bayesian probability theory.
The diagnostic utility of R is measured in terms of the identification cost C d , which models the verification effort of a diagnostician, going down the suspect ranking R searching for the actual faults (true positives). In particular, we measure the identification effort wasted on false positives (i.e., excluding the components found to be faulted). Let c r denote the actually faulted component that has the lowest posterior in R, where r ∈ {1, . . . , M } denotes its rank in R. Then C d = r − M f . In our studies we will typically consider a normalized value C d /(M − M f ) which ranges from 0 to 1, in order to compare across varying system sizes. Note that M − M f is the number of actually non-faulted components. This normalized metric is essentially the inverse of the DXC utility metric [19] for diagnosers that produce no false negatives (R includes all components so it cannot miss any faulted component).
Candidate Generation
In ANTARES R is derived from the multiple-fault diagnosis
|D| > which is an ordered set of all |D| minimal candidates, ordered by decreasing posterior probability Pr(d k ). Each candidate d k comprises a minimal set of components c j that, when faulted, are consistent with all test observations (i.e., a minimal diagnosis).
Candidate generation is based on modeling each component by the generic, weak (i.e., faulty behavior is not specified) model 2 given by
where h j denotes component health (true when nominal, false when faulted), while inputs-ok and output-ok denote whether the component's inputs and output are error-free (an error being produced by some faulted component upstream).
Depending on the test outcome, each row i in spectrum matrix A yields either a pass set ({c j |a ij = 1, o i = 0}) or a fail set ({c j |a ij = 1, o i = 1}). It can be easily seen that a fail set is equivalent to a conflict (set). Candidate generation is based on computing the minimal hitting sets (MHS) of all fail sets.
When faulted components are covered in a test, the fact that components have non-zero FNR leads to many pass sets. While not useful for deriving candidates the pass sets do influence a candidate's posterior probability, and are also useful for speeding up (focusing) the MHS computation.
Probability Computation
Given the typically large number of candidates in D that have equal fault cardinality, for large systems the ranking induced by the posterior probability computation is critical to diagnostic accuracy. For each observation obs i = (A i * , o i ) the posteriors are updated according to Bayes' rule
where Pr(d k |obs 0 ) is computed from the priors according to
assuming components fail independently. The denominator Pr(obs i ) is a normalizing term that is identical for all d k and need not be computed directly. Pr(obs i |d k ) is defined as
where
is the set of component indices in d k covered by the test. Eq. (2) assumes an OR-model, i.e., the test may fail if either of the faulted components fail. In general, the OR-model is an acceptable approximation [20] , [8] , not in the least since D's probability mass is often dominated by single faults, even when the system has multiple faults.
R is derived from D by aggregating the posteriors of each d k into posterior component probabilities according to
The approximation is due to the fact that formally D should be expanded with all non-minimal candidates for the above equation to be correct. The reason for our approximation is discussed in the next section.
Implementation Details
In ANTARES we use the STACCATO MHS algorithm [8] , [21] for computing D from (A, O). STACCATO exploits pass sets in its any-time computation of the most probable minimal candidates d k . Typically, the first few hundred candidates practically cover all posterior probability mass, after which the MHS algorithm is terminated. As a result, for random problems comprising N = 1, 000 tests and M = 1, 000, 000 components of which M f = 1, 000 are faulted, the MHS is diagnosed in less than 0.1 CPU second on a contemporary PC [8] .
As discussed earlier, our performance metric C d is formally defined in terms of the full (non-minimal) diagnoses rather than the minimal diagnoses D. For the large 74XXX/ISCAS85 circuits we are considering, however, the . Three-inverter circuit posterior probability mass covered by D is virtually equal to unity. As the number of minimal diagnoses is considerably smaller than all diagnoses, the huge computational savings outweigh the small estimation error by far. The reason for the small error is that after multiple observations are combined D already involves all components due to the use of a weak component model and the fact that the observations are random (modeling a practical application situation). Even when multiple faults are present, the latter leads to many observations that can already be explained by candidates of lower cardinality. Had we chosen to limit our observations to MFMC (Max-Fault Min-Cardinality) observations (such as generated by MIRANDA [22] ) we would have to extend D with non-minimal candidates (e.g., by a low-cost, first-order extension approach such as in SEQUOIA [14] ).
The posterior probability computation is a straightforward application of Bayes' rule, implemented within the BARINEL toolset [8] , using an option to externally read the g j parameters from file. How the g j are generated is described in Section FNR Estimation.
ANTARES
ANTARES applies SFL to diagnose hardware, exploiting topological information only. Whereas in software the spectra are obtained by tracing the components that are executed per test run (dynamic control flow), in hardware a spectrum originates from a cone, i.e., all components involved in the computation of a circuit output (determined by topology). A particular feature of ANTARES is that it includes a method to estimate the g j parameters, which is vital to diagnostic performance. In the following we outline the principle for logic circuits. Note, however, that the approach generalizes to any causal system.
Consider the example circuit shown in Fig. 1 . Each primary output observation is interpreted as one test. Thus one test vector yields two tests, one which involves c 1 and c 2 (the cone of the top primary output), and one involving c 1 and c 3 (the bottom cone). Since SFL assumes the existence of test oracles, we observe a pass for the above output, and a failure for the bottom output. The observations are given in terms of A and O according to 3 . Note that in this small example we have actually computed R from the full, non-minimal diagnosis, in order to include c 2 in the probability computation (since c 2 is not in D). In our 74XXX/ISCAS85 experiments, however, we simply derive R from D with negligible loss of accuracy. c 3 ) . In SFL terms both conflicts are expressed as 1 0 1 -0 1 1 -which yields the minimal candidates c 3 , and (c 1 , c 2 ). Assuming the same priors and FNR we obtain the following diagnosis D =< c 3 (0.99), c 1 ∧ c 2 (0.01) >. The derived ranking R is < c 3 (0.92), c 1 (0.21), c 2 (0.21) (again, from the full, non-minimal diagnosis). Note that the higher defect density estimation (M f = r Pr(c r ) = 1.34) compared to the SFL solution is due to the fact that now two fail sets are found vs. one fail set and one pass set. The latter exonerates c 1 and c 2 leading to lower posteriors.
SFL vs. MBD
Despite the difference in posterior distribution, the diagnostic accuracy of both approaches are equal in terms of C d . However, the SFL approach suffers from the fact that no modeling information is exploited. This becomes particularly clear when considering D. While MBD correctly infers a single fault (c 3 with 0.99 probability) or a double fault (c 1 , c 2 with 0.01 probability), SFL infers a single fault (c 3 with 0.67 probability), and another single fault c 1 (with 0.33 probability). As the latter cannot be true SFL suffers from false positives (in terms of D) compared to MBD. However, note that typically a diagnostician will only consider R, which comprises all components anyway. Thus the diagnostic accuracy is effectively determined by the quality of the posterior computation, which is key in the comparison between SFL and MBD.
An aspect in favor of SFL is that it exploits the information of the pass sets, whereas MBD does not (except internally, e.g., for an MHS engine such as STACCATO, which allows better focusing, yielding computational cost reduction). This explains why c 3 is ranked higher than c 1 although, according to SFL, both are single-fault candidates. Exploiting pass set information is one of the reasons why SFL's diagnostic performance is of practical interest.
Ambiguity Groups
In ANTARES A directly derives from the circuit's topology. Each of the circuit's N outputs generates 1 row in A, leading to N rows in A per test vector. For multiple test vectors A simply grows in multiples of N rows. Since, regardless of the number of test vectors, A only contains N different rows, many of the columns in A are equal. Consider the well-known systems topology in Fig. 2 which (for a single 3 Note that the total probability mass (M f ) may well exceed unity. Each of both cones, as well as the cone intersection, generates a set of equal columns in A. The associated components are called an ambiguity group (AG). In the above example A has three ambiguity groups (c 1 , c 4 ), (c 2 ), (c 3 , c 5 ). While the 1-member AG does not pose any problem, the other 2-member AG's introduce a lower bound on C d since their member components cannot be distinguished unless they have different posteriors. The latter is key to our SFL approach in ANTARES.
Ambiguity Reduction
As shown by the 3-inverter example MBD generates additional fail sets (conflicts) compared to SFL. Consequently, SFL gives rise to AGs that can be much larger than MBD. In software the AG problem can often be resolved by adding better tests (e.g., distinguishing components by introducing different control flow, unless they belong to the same basic block). In hardware, however, the AG problem is determined by circuit topology (which we assume static). As the ratio between the number of components and primary outputs is determined by area vs. circumference, the AG problem scales with the size of the circuit, leading to very large AGs. This implies that there are also large AGs in the ranking R (equal posteriors) if the p j and g j would be equal, which can greatly affect C d . As p j is typically not available often one assumes p j to some arbitrary value p. Even when the p j would be different, the diagnostic performance of SFL is still largely determined by the quality of g j , as has been shown in the software domain [23] . The reason is that g j is involved in the Bayesian update every time a new observation is processed. In software g j is typically measured using mutation analysis [24] . While these measurements significantly increase SFL accuracy, the cost of a Monte-Carlo approach scales linearly with system size. In ANTARES we therefore also consider an analytic approach to the estimation of g j since the estimation quality is sufficient to tackle the ambiguity problem.
Gate EPP Estimation
Computing the g j of a component c j can be framed as the problem of computing the error propagation probability (EPP) through a logic circuit [25] , [26] . For example, consider the simple logic circuit according to Figure 3 For input x = (X, 0) (X = don't care) an error at the output of c 1 will be masked by the fact that c 2 will always produce y = 0. However, for input x = (X, 1) an inverter error will always propagate to y. Assuming a uniform input value probability distribution, the EPP at y is 0.5, and consequently, g 1 = 1 − 0.5 = 0.5.
EPP in logic circuits has been studied in the context of reliability studies, primarily motivated by an increasing soft error rate due to ever decreasing gate sizes [25] , [27] . While there exists a deterministic approach to compute the EPP through circuits of arbitrary topology given the model of each gate involved, in this paper we use a novel, probabilistic approach to EPP computation, since in the ANTARES approach we refrain from modeling the actual components. While our method produces exact estimates of the mean value of the EPP over the corresponding gate space, the EPP value found for a particular circuit output may differ from the correct value. However, a certain error is acceptable provided the posterior probability ranking by our diagnosis algorithm is not too seriously affected.
Due to space limitation, we refrain from explaining in detail how the EPP is computed. For interested readers, refer to [28] . However, in this paper, we use the general EPP model for a binary gate as derived in [28] . Let e 1 and e 2 denote that the probability that inputs x 1 and x 2 of the binary gate have an error respectively. The EPP value (probability e that the gate's output has an error) is given by
Note that Eq. (3) is averaged over the space of all 16 conceivable binary gate functions. Computing the EPP at circuit level simply requires composition of Eq. (3) per gate between the faulted gate(s) and the circuit output.
FNR Estimation
The above EPP model allows us to directly compute Pr(obs i |d k ) for multiple-fault candidates d k in the Bayesian update (Eq. (1)), circumventing the approach based on the single-fault g j parameters in combination with the OR-model (Eq. (2)). However, the decreasing accuracy with increasing cardinality makes the EPP model less attractive for multiple faults. Despite the fact that the OR-model assumes failure independence, its accuracy in practice outweighs the inaccuracy of a direct computation [28] . Consequently, in the following we outline an FNR estimation procedure based on the g j parameters obtained through single-fault EPP modeling.
Obtaining the EPP for single faults is straightforward. In the following we assume that a gate is either SA0 (stuck at 0) or SA1 (stuck at 1), leading to an average failure probability of 0.5. As this gate is the only faulted gate in the circuit every subsequent gate downstream to the primary output has exactly one input that has an error. Consequently, Eq. (3) can be simplified. Without loss of generality, assume that for each gate in the path between faulted gate and primary output, x 1 is in the error path. Consequently, e 2 = 0 and Eq. (3) reduces to E[e] = e 1 /2, which implies halving the EPP per stage 4 . Let m j denote the 'depth' of the gate c j relative to the output considered (for the gate at the output m j = 0). Assuming that the faulted gate produces e = 1/2 it follows that g j is given by 
EXPERIMENTAL RESULTS
In this section we evaluate the diagnostic performance of ANTARES for the circuits described earlier in comparison to an MBD approach. Tables 1, 2 We consider three versions of ANTARES:
• A version, denoted A BAR , where g j is not determined by topology, but is computed internally by BARINEL based on (A, O) [8] . This reference version [29] is intended to assess the added value of using topology-specific g j information.
• A version, denoted A MC , where g j is determined from the circuit using Monte Carlo (MC) simulation and is externally supplied to BARINEL. This version uses the most accurate g j information.
• A version, denoted A EPP , where g j is estimated from the circuit using the analytical EPP model and is externally supplied to BARINEL.
In order to compare ANTARES to MBD we include results for GDE, a state-of-the-art MBD engine [9] . Since GDE does not provide posterior probabilities we have incorporated GDE within our SFL approach as follows. The additional conflicts that GDE infers due to its MBD capability are appended as rows to the A matrix obtained by ANTARES, with o i = 1. The extended (A, O) are processed as usual using STACCATO and BARINEL.
Comparison of the A BAR and MBD results shows that some diagnostic accuracy is lost by merely taking into account structure (topology) with a standard, weak component model without FNR information. The results for A MC show that knowledge of the g j has a significant impact on ANTARES' diagnostic performance. Although the results are for logic The results for A EPP show that analytically estimating the g j using our generic component model does not always improve ANTARES' performance compared to not using them (A BAR ). Given the impact of g j as shown by A MC , however, there is a great potential in developing more elaborate, analytic schemes. An obvious extension of the analytic EPP model takes into account information on the truth probability of a gate (in terms of its truth table). For instance, the EPP characteristics of an AND (truth probability 1/4) and an OR (truth probability 3/4) are equal to Eq. (3), while an XOR (truth probability 1/2) has higher EPP (a single error on one of its inputs always propagates to the output). In some cases the truth probability of components may be known by design, or can be measured in isolation using MC simulation.
Single faults dominate the ranking yielded by random vectors since there are many single-fault diagnosis candidates that explain all failed observations. In other words, unlike MBD, our study does not make use of Max-Fault Min-Cardinality (MFMC) observation vectors, but instead use random vectors in an attempt to mimic reality. As a consequence, ANTARES suffers from a limitation in presence of multiple faults. Even for large circuits (c1355, c2670 and c7552) only one of the faulty components appears in the diagnosis and therefore ANTARES could not isolate all of them. Hence we do not include those circuit results in the paper for M f = 2 and 3. However, in many real life scenarios, a diagnostician looking for the root cause of a system failure does not know in advance how many faults are in the system; every time a faulty component has been found and replaced, the system is typically re-tested, to ascertain that all faults have been found. With such iterative process, multiple faults often can be detected using the single fault diagnosis approach, making 
CONCLUSION
Results clearly show that MBD outperforms every variant of ANTARES which demonstrates the importance of modeling information in diagnosis. However, there are situations where it is impossible to create behavioral models. For instance, in software of realistic size and complexity the choice not to model is typically borne out of necessity. Our industrial feedback suggests that there is a business proposition in sacrificing some diagnostic performance in an approach where modeling is no longer required.
In this paper we addressed the trade-off between the modeling/identification costs in diagnosis. We also propose to exploit FNR information to boost the diagnosis quality without actually using the behavioral models. Our results show that ANTARES using detailed FNR information is capable of approaching the performance of MBD. While in software mutation analysis is relatively easy to implement, measuring FNR data in hardware can only be done when simulators are available. Consequently, we also studied a simple, abstract EPP modeling technique to analytically estimate the FNR data. Our results show that a more detailed EPP model is required to attain the performance of Monte Carlo measurements.
Future work will address improved EPP modeling to further exploit the potential of ANTARES. Currently, we use a generic component EPP model, and we will investigate whether we can reach the quality of the Monte Carlo approach by dynamically measuring each component's EPP. The great significance of such empirical study is to measure EPP without needing to inject faults in the circuit, while still not modeling component's behavior. 
BIOGRAPHY[

