Abstract
Introduction
Semiconductor devices are becoming increasingly complex in terms of transistor count, frequency and integration. Emerging design styles coupled with aggressive design methods pose significant challenges for testing. These challenges include testing for manufacturing defects as well as speed binning of devices such as microprocessors. Functional tests are derived manually or through automated test generation techniques [1] [2] [3] in design validation phase. They exercise the design and build confidence that the design matches specification. In addition to design validation, functional tests can also be reused for manufacturing test. Studies have shown additional fallout using functional tests even for test set with high structural coverage [4, 5] .
The number of functional tests is normally large. Due to test time and tester memory limitations, only those tests that provide good manufacturing defect screening value are added to production test tape. The process of selecting a subset of tests from a pool of functional test sequences is called functional test selection.
Exact methods for test selection such as fault simulation of the entire test suite are not practical due to computational costs. Even though it is preferable to use register-transfer-level (RTL) coverage metrics, existing RTL metrics either do not establish the correlation with gate-level fault models [6] [7] [8] [9] [10] , or require expensive fault simulation [11] [12] [13] [14] . Current approaches to functional test selection for manufacturing testing are ad-hoc and often use structural coverage metrics such as toggle coverage [15] , which gives suboptimal results. We propose a new RTL coverage metric which is simple yet very effective with a low computational overhead. This coverage metric can be used in evaluating functional tests for high volume manufacturing (HVM) as well as in early testability analysis. Another recently published functional coverage metric [16] monitors events during logic simulation. However, defining events comprehensively for adequate coverage by automated means is an unsolved problem.
The rest of the paper is organized as follows. Section 2 covers the necessary background on the test selection problem. Section 3 introduces the proposed coverage metric. Sections 4 and 5 show test selection results for ISCAS89 benchmarks and industrial circuits respectively. Section 6 concludes the paper and identifies future extensions.
Background
We assume that a pool of tests is available for selection, where each test in the pool consists of a sequence of test vectors. As mentioned, test selection is the problem of selecting a subset from the pool of tests. This is done with the help of some coverage metric (say, M) using a two-step process [17] :
1. Evaluate the coverage of each test in the pool according to M. 2. Select the smallest number of tests that cover all faults covered by the whole set.
An optimum selection in the second step involves solving the set-cover problem which is known to be NP-complete, hence a greedy heuristic is used to iteratively select a test with the highest incremental coverage. When there is no further coverage improvement with respect to metric M, we say that test selection using M saturates. This is the most M could help in selecting tests. The outcome of the greedy approach is an ordered set of selected tests.
The goodness of the test selection can be measured along three dimensions: the quality of selected tests, the time spent in test selection, and the number of selected tests. The quality would ideally be measured as actual defect coverage but for practical reasons traditional measures, such as gate-level stuck-at or transition fault coverage, are used as proxies. Clearly, all three dimensions of goodness are important and can be traded off depending on the context. While test time and data volume are important concerns, they are typically optimized on the tester as long as they can fit within capacity targets. Generally, as long as the selection time is affordable, we prefer a metric that does not saturate too soon, and provides the highest defect coverage.
In addition, because functional tests are available for RTL designs, the metric should preferably be evaluated at the RTL, so that the test selection can take place before gate-level net lists are generated. Such an early coverage metric could also identify testability holes early in the design cycle.
RTL toggle coverage has been used as an approximation of gate-level fault coverage [18] , and has been used for test selection because of its small overhead [17] . However, the toggle coverage does not take propagation into account and its correlation with stuck and transition fault models is limited. When used in test selection, it saturates too early and results in low coverage. VVG [12] extends the toggle metric to include propagation but requires RTL fault simulation which is expensive. Later works extend the fault model, by considering not only RTL line stuck-at faults, but also condition stuck faults [13] , and additional stuck-at faults inside the blocks whose structures are unknown at RTL [14] . In this paper, we concentrate on an efficient technique that has the same order of complexity as logic simulation. These more expensive metrics could be used if our interest is in bringing in more exactness.
The proposed metric
Our proposed metric also extends the toggle at the RTL. However, in contrast to VVG, we do not introduce additional variables and add only partial observability. Further, unlike VVG, we preserve the transition aspect of the toggle so that a single measure can capture both the static and dynamic faults. Guided by these considerations, we formally define our extension next.
TRIO Metric
Definition 1: The Input/Output TRansition (TRIO) fault model is defined with respect to a subset S of the RTL variables of an RTL module. S consists of primary inputs, primary outputs, and state variables (i.e. registers and latches) of the module. A TRIO fault is a pair (<V i , T i >, <V j , T j >), where V i is one bit of a primary input or a (current) state variable, V j is one bit of a primary output or a (next) state variable, T i is rising or falling transition on V i and T j is a rising or falling transition on V j . Further, there exists a combinational path from V i to V j with the correct polarity so that the transition T i on V i can cause the transition T j on V j. In TRIO model, we ignore clock signals.
The TRIO faults can be represented in graphical form. In this graph, a node represents a bit with transition <V i , T i > and an edge from node <V i , T i > to <V j , T j > means that signal transition T i on V i could cause transition T j on V j . Each edge in the graph represents a TRIO fault. . From the function of the circuit, we know that a rising transition on A will increment the counter state but a falling transition will not. Similarly two successive 1s on A will cause successive increments of the counter state. From this analysis we can obtain all possible TRIO faults in this circuit.
The TRIO graph for the example circuit is shown in Figure 2 . There are a total of ten TRIO faults in this circuit, as represented by the ten edges in TRIO graph. The TRIO graph in Figure 2 reflects the functional constraints of the circuit. For actual circuits, we may not know or be able to derive all functional constraints. Fortunately, for the test selection application it is sufficient to determine the absolute coverage of functional tests being compared, which does not require explicit construction of the TRIO graph. The absolute coverage of a test can be determined from analyzing the results of its functional simulation in conjunction with the knowledge of bits that are directly connected through a combinational path. The latter can be generated efficiently from the parse tree built by an RTL compiler. Now, consider the coverage of TRIO faults by a test. For the example circuit, assume that the initial counter state to be zero when the test sequence 011110 is applied to input A of the circuit. The result of simulating this test on the circuit is shown in Table 1 . There is a rising transition T a on signal A from time frame 0 to 1, accompanied by a rising transition T n0 on signal N[0]. Because A is the only signal in the support set of N[0] that have changed during this cycle, transition T n0 must be caused by transition T a . In this case, the cause effect relationship between , A < ↑> and 0 , N < ↑> is trivial, the TRIO fault
In general, the cause-effect relationship is harder to deduce. Transition T j may depend on multiple input transitions as well as the bits that remain stable. Because TRIO is intended to be a fault model at the RTL, the definition of TRIO coverage needs to be based on the function and not the structure of the circuit. Accordingly, the exact definition of the causerelation relationship would be defined from a subset of inputs to an output. Every subset of changed inputs that could cause the Boolean difference on function V j w.r.t. this subset to be true would be a cause for the change on V j. However, for this computation, both the fault list and fault-evaluation time would be exponential. On the other hand, the alternative of crediting only single-input changes in V j 's fanin cone is unduly pessimistic, ruling out T i causing the change in combination with other changing bits. In the current version of TRIO, we take an optimistic interpretation, by simply checking that V i is in the support set of the function on V j :
Definition 2: A TRIO fault (<V i , T i >, <V j , T j >) is covered if we see transitions T j and T i in the simulation trace, T j occurs one cycle later than T i if V j corresponds to a state variable, otherwise, the two transitions occur in the same time frame, and V i is in the support set of V j .
Considering the simulation trace in Table 1, from  time frame 2 The definition of the TRIO fault model was guided by considerations that strike a balance between our desire for a functional metric at the RTL and the need for accuracy and computational feasibility. As compared to the computationally-efficient toggle fault model, TRIO is stricter in requiring not only that V i toggles but also that the toggling be propagated to V j . This eliminates the problem of early saturation with the toggle coverage. For example, the input sequence in Table 1 covers all the toggle faults in the circuit. TRIO does not stipulate toggle propagation all the way to primary outputs because this will be equivalent to defining an RTL stuck-fault model and require expensive fault simulation [12] . Consideration of implementation-independence guided us in restricting the TRIO definition to bits corresponding to primary inputs, primary outputs, and registers. At the same time, we require signal sensitization paths from every input of a combinational block to all reachable outputs, with the expectation of covering a large fraction of lines in any structural implementation. TRIO model also ensures that faults on these lines are further propagated to the block outputs. This results in a better correlation of the TRIO metric with structural models.
Two other fault models in the literature are apparently similar to TRIO. The double-transition fault (DTF) [19] approximates path delay faults by transitions between all pairs of <g 1 , g 2 > of connected gates in the circuit. It requires robust path propagation of the transition from g 1 to g 2 and from g 2 to a primary output. These requirements limit the use of the DTF to the gate-level and to implicit evaluation of coverage because of the huge fault list.
The coupling fault (CF) model [20] is also defined by an input/output pair. However, detection of a CF requires application of all vectors that satisfy the Boolean difference of the output w.r.t. to the input, which is called the coupling test set (CTS). CF model is extended to cover delay faults by requiring that all adjacent pairs of vectors in the CTS must be applied. These pairs correspond to the subset of all single-input change (SIC) pairs in CTS that yield different outputs. The twin requirements of SIC and all pairs were shown to be useful in generating realization-independent robust path-delay tests, but they are unduly pessimistic for coverage evaluation.
Evaluation of TRIO Metric
TRIO-coverage evaluation could either be integrated tightly with or carried out after logic simulation. The second option may be slower but was preferred in our work because of its ease of implementation and independence from logic simulation. During logic simulation we capture the simulation trace on the bits of interest and post-process it to get the TRIO fault coverage. The latter involves determining at each signal-change step whether the associated TRIO fault is covered according to the cause-effect relationship described above. The total time for TRIO-based fault evaluation is the sum of the time for logic simulation and post-processing.
An Extension of TRIO
For comparison, we implement an extended version of TRIO, called E_TRIO, which employs a stricter cause-effect relationship in its fault definition and includes observability to a primary output. The following steps summarize the E_TRIO implementation.
1. From the circuit description, obtain the list of E_TRIO faults, which is identical to that of TRIO faults. 2. For each E_TRIO fault (<V i , T i >, < V j , T j >), inject a transition fault at V i , according to the direction of T i . 3. Using V j as the observation point, do transitionfault simulation for the injected transition fault at V i and record the cycles at which this transition fault is detected at V j . This gathers the information about E_TRIO fault excitation.
4. Inject a transition fault at V j in each cycle when the transition fault on V i was detected at V j and use a transition-fault simulator to determine if the newly added transition fault is detectable at an observable output. This step gathers information about E_TRIO fault propagation. 5. By combining the result of E_TRIO fault excitation and propagation, we determine if the E_TRIO fault is detected.
As the cost of E_TRIO evaluation is quite high it is only feasible for small circuits.
Experiments on ISCAS89 benchmarks
In the absence of an available test pool of functional tests for ISCAS89 sequential benchmark circuits, for each circuit we generated a test set using sequential ATPG [21] and augmented it with random tests. We used the test generation tool repeatedly, targeting at a single stuck fault each time, to generate a set of short tests instead of a single long one. The stuck fault coverage of our test set is not as high as reported using a single test [21] , since each test in our test set starts from the unknown state, increasing the test generation difficulty. This was done to generate validation-like independent test sequences. For TRIO evaluation, every circuit was considered as a single block with the state and I/O signals visible. For the toggle coverage, we could have used the same signals, but chose the gate-level instead so as to compare TRIO against a measure that performed better in terms of not saturating too early and correlating better with the gate level fault models.
Test selection was carried out for the toggle, TRIO and E_TRIO metrics using the two-step process described in Section 2. In addition, to compare the quality of selected tests, we also did test selection using the reference metrics, i.e. gate-level stuck-at fault model and transition fault model. Tests selected using each of the above five metrics are then evaluated for their gate-level stuck-at coverage and transition coverage respectively. We didn't include big circuits because of the E_TRIO evaluation cost. Table 2 summarizes the results. The details of the test pools are shown in columns 2 to 4. Column 2 is the number of tests in each test pool. Columns 3 and 4 show the total stuck-at and transition coverage respectively of each test pool. Although the goal of test selection is to maximize the fault coverage on standard HVM fault models (e.g. stuck-at and transition), coverage loss can be expected for models that are different from the standard model used as the reference. Columns 5, 6 and 7 show the stuck-at coverage loss in test selection, respectively, for the toggle, TRIO and E_TRIO Similarly, columns 9-12 display the loss in the transition fault coverage for the toggle, TRIO, E_TRIO and stuck-at metrics, respectively. In all cases TRIO achieved higher coverage (both stuck-at and transition) than toggle and, in many cases, TRIO did as well as E_TRIO. Table 3 shows the number of tests selected by each metric. Due to early saturation, the toggle metric selects the least number of tests in all cases. For example, for s298, toggle only selects three tests, while both TRIO and E_TRIO select closer to the number of tests selected by the reference (stuck-at) metric. 
Experiments on industrial circuits
TRIO was also evaluated against toggle for test selection on two real industrial circuits, called F and S. These circuits are data path blocks in the execution cluster of an x86 CPU design. The whole cluster is about 20 times the size of block S. Table 4 shows the number of collapsed stuck-at faults for each circuit.
The test pool consisted of 1,010 functional tests for the whole cluster derived from micro-architectural and system level validation. The length of each test ranges from 100K to a few million vectors. Many of these tests have good fault coverage in some portion of the cluster. The goal of the experiment was to see if the TRIO metric could effectively mine this test pool and select a subset of tests with only a small coverage loss for faults in these blocks. Although the computational cost of gate-level fault simulation was quite high (see Table 5 ), we again wanted to include the test-selection results for stuck-at and transition faults as reference. However, E_TRIO runs could not be finished on these circuits because of excessive time. Both logic and fault simulation runs were performed at the cluster level on dual-core Pentium machines running on Linux. Figures 3 and 4 show the cumulative stuck-at and transition fault coverage of tests selected by stuck-at, transition, TRIO and toggle. Note that the tests selected by a particular metric is invariant across the two figures; only the evaluation criterion (stuck-at vs. transition fault coverage) is changed. The results are consistent with those observed on ISCAS89 circuits. The TRIO based test selection achieves higher stuck-at and transition fault coverage than toggle based test selection. The issue of premature saturation with toggle metric is again apparent on both the circuits. Furthermore, tests selected using TRIO consistently have a higher coverage than toggle-selected tests, for any given number of tests. While the TRIO based selection achieves most of the stuck-at and transition fault coverage, the stuck based selection on circuit S fails to reach high transition fault coverage. The results on both circuits highlight the effectiveness of TRIO over toggle. Further, TRIO is almost as effective as stuck-at (transition) for selecting tests with high transition (stuck-at) coverage. Table 5 shows the average and maximum time to evaluate functional tests. Since the simulation is performed at cluster level, logic simulation time is about the same for both blocks, where the difference is caused by tracing and dumping different set of signals. The TRIO metric involves logic simulation and post processing the simulation trace on RTL signals. Transition fault simulation run times are similar to that of stuck-at fault simulation. The toggle simulation time is roughly the same as for logic simulation. As can be seen, TRIO evaluation has very small computational overhead and can be subsumed as part of RTL simulation that is part of design validation effort. On the other hand, the fault-simulation for block S is already high, therefore, fault simulating the whole cluster or circuit would be impractical. 
Conclusion and future work
We have described an efficient RTL coverage metric, TRIO, and shown its effectiveness for solving a practical problem of functional test selection for high volume manufacturing. The proposed metric has very small computational overhead and it is easy to incorporate into existing RTL simulation flows used in design validation. Results on both ISCAS89 and industrial circuits show that it can be used to effectively mine validation tests for HVM.
We are investigating ways to improve the accuracy of TRIO metric without adding substantially to the cost of evaluation. We may be able to improve upon the criterion for fault excitation when multiple inputs in the support set of an output have transitions. First, we could implement a more sophisticated linear-time cause-effect analysis than used in TRIO. Second, we could assign weights, based on functional characteristics, to edges in the TRIO graph and estimate the TRIO coverage as a weighted sum. We may be able to include inexpensive graph-based measures to add fault propagation to TRIO that is not as expensive as E_TRIO. We also plan to study the relationship of TRIO with other delay fault models such as robust path delay model. Finally, we would like to explore other applications of TRIO, including early testability analysis.
