The test patterns computed for detecting the manufacturing defects in the electronic circuits are generally insufficient for diagnosis. The test set compaction and failure log truncation lead to loss of critical failure observations that diagnosis might depend on. In this context, it is beneficial to know the diagnostic usefulness of failures so that we can log the more useful failures instead of logging the initial failures. In this paper, we evaluate three metrics to gauge such diagnostic usefulness in real-time by observing the circuit responses on the tester. We implement a pattern selection framework for failure logging and compare the results with those achieved by logging the initial failures. Using one of our proposed metrics, we were able to improve the diagnosis quality for a significant number of faulty instances of ISCAS'89 and IWLS'05 benchmarks having 1-7 inserted stuck-at and transition faults.
INTRODUCTION
Diagnosis of defective VLSI circuits plays an important role in improving the yield of the chip manufacturing processes. During diagnosis, we analyze the test, layout and design data to identify suspect locations of manufacturing defects, to reduce the cost of the physical failure analysis (PFA).
Figure 1 provides a simplified overview of the test and diagnosis process. A test suite consisting of a good mix of random and deterministic patterns targeted at stuck-at and transition faults is applied to the manufactured chips and the failures are logged. These failures that are mismatches with the good circuit response are then analyzed by offline diagnosis software tools to find a set of suspect locations. In * Supported in part by NSF Grant 1422054 and Intel Corp.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. an ideal scenario, we will uncover the real defects by physically analyzing a small number of suspect locations. However, during the low yield regime, we observe a large number of failures in most of the faulty circuits, all of which cannot be logged due to memory limitations. Therefore, we must discard a significant portion of the log which likely includes information that is critical for diagnosis. Recently, the test data explosion due to the increasing design complexity and continual technology scaling has sparked interest in methods that can minimize the failure data without a large trade-off on diagnosis quality. The authors of [6] have used machine learning techniques to predict a termination point for failure data collection. Using their proposed method, they could achieve a data reduction of 30.4% while ensuring about 90% accuracy. The main drawback of their method is that a large population of full failure data is needed for learning. In their implementation, they learnt from 90% of initial full failing data to minimize about 10% of the remaining data by 30%. Their method only improves the accuracy (not the resolution) and is not practical for adoption by the industry.
In [4] and subsequent papers, the authors propose saving a predetermined subset of response bits. They rank the output bits on the basis of the single stuck-at and transition fault detection counts and log the top ones that cover all modeled faults. Their ranking is predetermined and cannot be changed according to the observed failures. Moreover, fault model changes requires re-ranking, which is computationally expensive for real-time. Also, one particular output response ranking may not work for all failing chips that fail differently.
The authors of [3] have used rule mining to reduce failure log sizes. Their method is dependent on available diagnosis results and circuit design; however, during the low yield regime, the diagnosis data is usually not good enough to support such learning. In [2] and [1] the authors have proposed a static and dynamic N-cover algorithm for failure data minimization. They compute an N-cover requirement corresponding to the total failing frequency of an output, which is a monotonically increasing function computed by repeating diagnosis on a small sample of failing chips. Once this function is available, the authors use it and the total failing frequencies of the outputs to find the actual number of instances to be logged. A concern with this method is that a single N-cover curve may not be applicable to all failing outputs of the circuit. Secondly, it is highly dependent on the population of defects in the initial sample. Also, the fact that different chips fail differently presents a challenge to the generalization of the technique.
In another recent paper [5] , the authors have experimented with various metrics for the online selection of failure logs, some of which require additional meta-data. Their best techniques may be computationally expensive and may not be suitable for making log selection decisions in real-time for large circuits. Another limitation of their work is that they have evaluated their proposed metrics only on circuits with single stuck-at faults.
In this paper, we evaluate three such metrics that may be used to gauge the diagnostic usefulness of test patterns in real-time. In order to reduce the cost, we don't use any internal signals and rely only on the scan and primary outputs. In our experiments, using one of our proposed metrics, we could improve the diagnosis quality for 29 and 25 benchmark circuits having multiple stuck-at and transition faults respectively.
The paper is organized as follows. In Section II, we explain in detail the evaluated metrics. In Section IV and V, we explain our evaluation framework and the results of our experiments. Finally, we present our conclusions in Section VI. 
THE EVALUATED METRICS
We evaluate and compare three different metrics on faulty circuits having one or more stuck-at and transition faults:
Total Number of Failing Outputs
When using this metric, we intend to maximize the total number of failing outputs in the selected failing responses logged. Figure 2 shows two output responses of a hypothetical circuit to failing test patterns p1 and p2. These responses r1 and r2 consist of the failing outputs {o3, o5, o6, o8, o10} and {o2, o5, o6} respectively. The total number of failing outputs in these two responses is |r1| + |r2| = 5 + 3 = 8. The motivation for this metric is that a larger number of failing outputs offers a larger set of observable points.
Number of Unique Failing Outputs
When using this metric, we intend to keep as many unique failing outputs in the log as possible. Using the same responses r1 and r2, the number of unique failing outputs is the size of their union i.e |r1 ∪ r2| = |{o2, o3, o5, o6, o8, o10}| = 6. The motivation for this metric is that a larger number of 
Sum of Total and Unique Failing Outputs
This is a hybrid metric, where we try to maximize both the total and unique failures. For the above example responses r1 and r2, the sum of total and unique failing outputs is |r1| + |r2| + |r1 ∪ r2| = 5 + 3 + 6 = 14.
Algorithm for Log Selection
We use a greedy algorithm for selecting failing responses on the basis of the metrics enumerated above. First, we let the log fill up with the initial failing responses. Then, with each new failing pattern, we determine how much the metric would improve when its response replaces each of the existing responses. We replace an existing response with the new response such that we achieve the maximum gain. In case of no gain, we discard the new response.
EVALUATION FRAMEWORK
We use 29 full scan circuits from the ISCAS'89 benchmarks and 8 full scan circuits from the Opencores library in the IWLS'05 benchmarks to test the effectiveness of the considered metrics.
Test Generation
We use a state-of-the-art commercial tool to generate two separate test sets for stuck-at and transition faults. For the transition faults, we generate double-capture or broadside patterns. Table 1 lists the characteristics of the generated tests. Column 1 lists the benchmark names. Columns 2 and 3 list the number of generated tests and fault coverages for stuck-at faults respectively. Columns 4 and 5 list the same for the transition faults.
Generation of Faulty Circuits
For our evaluation experiments, we create two sets of faulty circuits each for stuck-at and transition fault models. The first set consists of 300 circuits with a single stuck at fault each. The second set also consists of 300 faulty circuits except that each circuit has 2 to 7 inserted faults. We determine at random the faults and the number of faults inserted in these circuits.
Test Application and Response Collection
We apply the computed tests to the generated faulty circuits. The commercial tool is used to compute the responses that make up the failure files. From the full failure files, we create two shortened failure files. In the first file, we keep the failing outputs corresponding to the initial k = 5 failing patterns. In the second file, we store the failing outputs of the selected k failing patterns, using the proposed metrics.
Diagnosis
We perform diagnosis on the truncated and selected failure files. A definitive improvement in accuracy and resolution in a large number of benchmarks would imply that the metric is good for improving diagnosis. In our comparison, we look at two measures.
Accuracy
We define accuracy as the number of actual (inserted) faults reported by the diagnosis tool among the suspects divided by the total number of actual faults existing (inserted) in the circuit. For example, if 6 faults were inserted in a circuit and only 2 of those ended up in the list of suspects. The accuracy of diagnosis is 2/6 × 100% = 33.33%.
Average Inverse Hit Index
In the context of our experiments, the hit index is the in- Tables 2 and 3 provide the results for about 300 faulty instances for each benchmark having 2-7 stuck-at and transition faults respectively. Column 1 of Table 2 lists the benchmark circuits. Column 2 and 6 list the average accuracy and inverse hit index for the baseline results obtained using the 5 initial failing patterns (and all passing patterns from the test set). Columns 3, 4 and 5 state the diagnosis accuracy achieved when we optimized the log by total failing outputs, unique failing outputs and both total and unique failing outputs respectively. Whereas, columns 7, 8 and 9 provide the average inverse hit index for these three cases.
EXPERIMENT RESULTS
As can be seen from the table, optimizing the total number of failures did improve the accuracy and resolution for some benchmarks. However, for most benchmarks, we see a drop in the diagnosis quality. On the other hand, when we optimize the number of unique failing outputs, we see an improvement in diagnosis quality for 29 out of 35 benchmarks for stuck-at faults and 25 out of 28 benchmarks for transition faults. When we optimize both, the result is not as good because we see a deterioration in diagnosis quality for more number of benchmarks. Fig 4 depicts the improvement in diagnosis resolution for circuits having single faults using the three metrics. It shows the ratios of the number of circuits that were diagnosed better with the application of the metric to the circuits diagnosed better using the initial failures. As may be observed, the most significant improvement in diagnostic resolution was achieved by optimizing the unique failing outputs that yielded better diagnosis for 28x, and 12x more circuits having single stuck at and transition faults respectively.
CONCLUSION
In this paper, we evaluated three log selection metrics for improving VLSI diagnosis. Using the metric that optimizes the number of unique failing outputs in the failure log, we were able to achieve a significant improvement in both accuracy and resolution for a majority of ISCAS'89 and IWLS'05 circuits having single and multiple stuck-at and transition faults. Increasing the number of unique failing outputs is effective because the internal signals explaining those are likely to be different. Especially for circuits with multiple faults, different unique failing outputs may correspond to different activated faults. This helps us capture a greater number of symptoms, thus improving the accuracy.
REFERENCES
[1] S. Bodhe, M. E. Amyeen, C. Galendez, H. Mooers, I. Pomeranz, and S. Venkataraman. Reduction of diagnostic fail data volume and tester time using a dynamic n-cover algorithm. In 2016 IEEE 34th VLSI Test Symposium (VTS), pages 1-6. IEEE, 2016. 
