Experimental results show that BIST TPG's based on input reduction achieve complete stuck-at fault coverage in practical test lengths (2 30 ) for many benchmark circuits. These are achieved with low area overhead and performance penalty to the circuit under test. Results also show that the memory storage and test application time for external testing using deterministic test sets can be reduced by as much as 85%.
can be easily identified by traversing the circuit structure in polynomial time, identification of compatible inputs requires complex analysis of circuit logic and detectabilities of all target faults. This paper details the mathematical and graph models used to represent input compatibility and a branch-and-bound framework to solve the input reduction problem. Discussion in this paper is restricted to input reduction for single stuck-at faults in combinational blocks of sequential circuits, assumed to be accessed via BIST or full scan circuitry. Input reduction for other types of faults, e.g., transition delay faults and path delay faults, can be found in [8] .
Test signals derived by input reduction can be used to design efficient test pattern generators (TPG's) for built-in self-test (BIST) [1] , [5] . The efficiency of a BIST TPG is normally evaluated by the test length required to achieve target fault coverage and its area overhead and performance degradation. Formation of the test signals guarantees that complete stuck-at fault coverage is achieved in test vectors, where is the number of test signals. (In the following, the term complete stuck-at fault coverage is used to describe the coverage of all detectable stuck-at faults in the given circuit.) In verification testing, the number of inputs in the largest cone is the lower bound on the number of test signals. Our technique, however, does not have such a restriction because compatible inputs may belong to the same cone. Experimental results show that the number of test signals derived by input reduction is below 30 for a wide range of circuits, including many circuits which are impractical to test using verification testing (because one or more cones is too large). At prevalent clock rates, the corresponding test length requires a few seconds of test application time, and hence is very practical. The proposed BIST TPG's can be implemented based on the built-in logicblock observation (BILBO) [30] or the scan-based BIST [5] architecture, both of which use maximal length linear feedback shift registers (LFSR's) [11] as random pattern generators. These architectures have acceptable area overhead for large circuits. By carefully designing the LFSR, the performance degradation incurred by the BIST circuit can be as low as that incurred by conventional scan design.
Test signals can also be used to configure multiple-scan chains and to derive compact test sets for external testing using automatic test equipment (ATE). To reduce test application time, a long scan chain is often broken into multiple-scan chains, each with dedicated scan-in and scan-out pins [16] . Test signals can be used to design multiple-scan chains that are driven by a single scan-in pin. This is achieved by reconfiguring the multiple-scan chains such that circuit inputs fed 0278-0070/98$10.00 © 1998 IEEE with identical values are compatible. ATE memory required to store test vectors can also be reduced because only values for the test signals (instead of the circuit inputs) need to be stored. Experimental results show that 50-85% reduction in test data volume and test application time can be achieved by using the proposed multiple-scan design. Even for a singlescan design, test signals can be used to reduce test data volume without a significant increase in test application time.
Many techniques have been proposed in the literature to design BIST TPG's for stuck-at faults. Exhaustive testing by applying all possible test vectors for an -input circuit is impractical for circuits with a large number of inputs, say
. In pseudoexhaustive testing [3] , [15] , [25] , all logic feeding each circuit cone is tested exhaustively. If the largest cone size is large, the test length to guarantee complete fault coverage is also impractical. Test points may be inserted to partition the circuit to reduce the largest cone size (and hence the test length). This introduces significant area overhead, and may deteriorate circuit performance during normal operation. Pseudorandom testing uses simple LFSR's to generate pseudorandom sequences, but may require impractical test length to achieve acceptable fault coverage for random-pattern resistant circuits. LFSR's with multiple seeds or reconfigurable feedback polynomials [24] can reduce test length, but result in complicated TPG designs and BIST control circuits. In weighted random testing (e.g., [18] ), weight circuits are used to bias a pseudorandom sequence using precomputed weight sets to reduce test length. This is at the cost of high area overhead required to implement the weight and control circuits, which also adversely degrade normal circuit performance.
The main objective of ATPG algorithms is to generate tests for all target faults in the least amount of time. Obtaining a compact test set has been considered as a secondary objective by many algorithms (see [1] for a discussion on techniques for test set compaction). COMPACTEST [17] employs various compaction techniques-including reverse order fault simulation [22] , the use of independent fault set [4] for fault ordering, and dynamic line justification-and can generate the smallest test sets found in the literature for ISCAS85 [7] and ISCAS89 [6] benchmark circuits. COMPACTEST is used (by the courtesy of the tool's developers) in this paper to analyze further compaction that can be obtained using input reduction. This paper is organized as follows. Basic concepts and formal definitions are presented in Section II. General applications of input reduction are described in Section III. A branchand-bound framework to solve the input reduction problem is presented in Section IV. Experimental results are presented in Section V, including a comparison of our technique with existing BIST schemes. Finally, the conclusions are presented in Section VI.
II. NEW CONCEPTS
Input reduction is based on new concepts of compatibility and inverse compatibility, which can be used to identify circuit inputs that can be combined into test signals without reducing fault coverage. In this section, a simple example is used to illustrate the basic approach, followed by formal definitions of compatibility and inverse compatibility. By allowing inverted connections, the number of possible ways to combine circuit inputs into test signals increases significantly [8] .
A. A Simple Example
We will use c17, the smallest ISCAS85 benchmark circuit [7] , to illustrate the key advantages of input reduction over other techniques, such as verification testing. In the following TPG design, an LFSR with primitive feedback polynomial is used to generate a maximal length sequence. If the all-zero pattern is required, then a complete LFSR (also called a de Bruijn counter [5] ) must be used.
Example 1: The circuit c17 (shown in Fig. 1 ) has five inputs and two outputs, and the size of its largest cone is 4. The dependence matrix [15] for c17 is shown in Fig. 2(a) . Outputs and depend on inputs -and inputs -, respectively. Based on verification testing [15] , inputs and can be combined into a test signal and connected to the same TPG stage. [ Fig. 2 Fig. 2(b) , can be used to test the circuit pseudoexhaustively with test vectors. A complete test set derived using the input reduction procedure (described later in Procedure 1) for all stuck-at faults in c17 is shown in Fig. 2(c) . Note that columns 2 and 3 of are identical to columns 4 and 5, respectively. In addition, columns 1 and 2 of are complements of each other. This implies that all stuck-at faults in c17 can be detected using a 2-bit complete LFSR with outputs , that are connected to the circuit inputs as , and , as shown in Fig. 2(d) . (A complete LFSR must be used in this case because the all-zero pattern is necessary to detect some faults. But, in general, this is rare, and a simple LFSR can be used in most cases.) The 2-bit complete LFSR can guarantee the detection of all stuck-at faults in c17 with only test vectors. Note that the complete test set used above to obtain an optimal TPG is algorithmically derived using the input reduction procedure (to be presented later), and is used here to illustrate how circuit inputs can be combined into test signals without reducing fault coverage. If an arbitrary complete test set is chosen, it is very unlikely to obtain such reduction simply by comparing each pair of input values along all test patterns in the test set. This is due to the fact that even if some test patterns in a complete test set have different values for a pair of inputs, it may still be possible to combine the two inputs into a test signal by replacing these patterns by others that have identical values for these inputs.
Based on the complete test set derived by input reduction, inputs , and (as well as and ) can be combined into a test signal, even though they belong to the same circuit cone. Inputs and (also, and ) that can be combined into a test signals are said to be compatible, while inputs and (also, and ) that can be combined into a test signal via an inverter are said to be inversely compatible. The main focus of this paper is on the design of efficient techniques to identify compatible and inversely compatible inputs that can be combined into test signals. It can be shown that nonadjacent inputs are both compatible and inversely compatible [8] . Hence, the proposed definitions of compatibility are more general than that of nonadjacency and help design more compact TPG's.
B. Compatibility and Inverse Compatibility
One goal of BIST TPG design is to minimize the test length required to achieve target fault coverage. For a circuit with inputs which can be combined into test signals, the test length to achieve complete fault coverage is reduced from to . The objective of this paper is therefore to solve the following problem.
Definition 4 (Input Reduction Problem):
The problem of finding the minimum number of test signals based on the notions of compatibility and inverse compatibility is called the input reduction problem (IRP).
III. APPLICATIONS OF INPUT REDUCTION
Test signals derived by input reduction can be configured as a BIST TPG-based on the BILBO or the scan-based BIST architectures, or as multiple-scan chains for external testing using ATE. They can also be used to generate compact test sets as described in this section.
A. BIST TPG Design

1) BILBO:
Test signals derived by input reduction can be configured as a BILBO register by: 1) replacing normal flipflops by scan flip-flops; 2) configuring one flip-flop per test signal as a maximal length LFSR; and 3) making appropriate feedforward connections for each test signal. Area overhead for this structure is due to the replacement of normal flip-flops by scan flip-flops, addition of one or three two-input XOR gates and, sometimes, one inverter, and some local routing for feedback/feedforward connections. For circuits with a large number of flip-flops, the area overhead for this structure is very close to that required for scan-based DFT techniques.
Example 2: Using the test signals derived in Example 1, the hardware implementation based on the BILBO architecture for c17 is shown in Fig. 3 a test signal. The sets can be sorted in decreasing order in the number of inputs in each set. Assume that the largest set has inputs. Then the scan cells can be configured into scan chains, where each scan chain is obtained by selecting one input from each set and connecting the corresponding scan cells together. In this implementation, the scan input of each scan cell may be connected to or of the preceding scan cell to assure that inputs in the same set have the designated values (identical or complementary). The inputs of the scan chains are connected directly or via an inverter (depending on whether inputs in the largest set require complementary values) and fed by an -stage LFSR. The outputs of the scan chains are connected to a multiple-input signature register (MISR), which compresses the test responses into a signature. This structure has all the advantages of scanbased BIST, plus the capability to detect all detectable single stuck-at faults with test vectors. Example 3: A scan-based BIST TPG design for c17 is shown in Fig. 4 . The set of inputs for c17 has been partitioned into two sets and . Therefore, and for c17. Since and ( is the number of inputs in ), the two sets are ordered as . The scan cells can be configured into three scan chains given by , and . The scan inputs of flip-flops and (whose outputs are connected to and , respectively) are tied together and connected to the scan input of flip-flop via an inverter (so that and have identical values, which are complementary to the values applies to ). For scan chain , the scan input of flip-flop is connected to the inverted output of (so that and have identical values).
B. External Testing 1) Test Set Compaction for Multiple-Scan Chain Design:
If the LFSR and the MISR are eliminated from Fig. 4 , then the hardware implementation becomes a full-scan design with multiple-scan chains. Precomputed test patterns can then be applied and test responses analyzed by an ATE. While other scan techniques with multiple-scan chains [16] normally require multiple scan-in pins, the proposed design only requires one scan-in pin.
Consider a circuit with inputs which can be combined into test signals. Let and be the numbers of tests derived for the original circuit and for the reduced circuit (obtained by combining inputs into test signals), respectively. Then the memory storage required to store the tests is bits for a conventional single-scan design and bits for the proposed multiple-scan chain design. The test application time for a conventional single-scan chain design is given by clock cycles, where a single normal clock is required to latch the response of each test, and the last clocks are required to flush the response of the last test. For the proposed multiple-scan chain design, the test application time is given by clock cycles because each test has only bits. In general, the value of is usually close to that of , but is much smaller than . Therefore, memory storage and test application time can be significantly reduced by using the proposed multiple scan chain design.
Example 4: A conventional single-scan chain design and the proposed multiple-scan chain design for c17 are shown in . The memory storage and test application time for the conventional single-scan chain design are bits and clock cycles, respectively. For the proposed design, the memory storage and test application time are bits and clock cycles, which correspond to 60 and 52% reductions in memory storage and test application time over the conventional single scan chain design, respectively.
2) Test Set Compaction for Single-Scan Chain Design:
Input reduction can also be used to reduce the volume of test data in a conventional full-scan design. As described above, only bits of test data need to be stored in the ATE. The information to construct the test signals from the circuit inputs is also stored in the ATE. Note that this information requires only bits to store, which is negligible compared to the test set size. During external testing, each -bit test fetched from the disk is expanded to an -bit test by the ATE before shifting into the scan chain with flip-flops. The test application time with this approach is clock cycles. Since is much smaller than , memory storage required to store the tests can usually be reduced dramatically using this approach.
Example 5: The mechanism of test set compaction using input reduction for a conventional scan design is shown in Fig. 5 (c) for c17. The memory storage and test application time are bits and clock cycles, respectively. For c17, the memory storage can be reduced by 60% without any increase in test application time.
IV. FRAMEWORK FOR INPUT REDUCTION
In this section, a complete framework for input reduction is described. By representing input compatibilities as relations, the problem of identifying the minimum number of test signals for a circuit can be formulated as a new graph optimization problem. A new concept called reinforcement is introduced to deal with the variant that compatiblities may not coexist. Efficient polynomial-time filters to determine input compatibilities are described. Finally, a branch-and-bound algorithm that incorporates several efficient branching rules and bounding conditions is presented.
A. Compatibilities as Relations
A relation on a set , called the base, is any subset of the Cartesian product . Each element is an ordered pair. The notion of compatibility between circuit inputs can be represented by relations on the set of inputs . A pair of compatible inputs can be viewed as an ordered pair in the Cartesian product of the base . Therefore, the set of compatible inputs can be represented by a relation on , denoted by is compatible . Two compatible (incompatible) inputs and can be represented by . Similarly, the set of inversely compatible inputs can be represented by a relation on , denoted by is inversely compatible . Two inversely compatible (inversely incompatible) inputs and can be represented by . By definition, an input is assumed to be compatible to itself and not inversely compatible to itself, i.e., and . Also, the relations and are symmetric, i.e., and . A relational structure , where is a birelation on , can be built to represent the compatibility relations for the circuit inputs.
Example 6: Consider the benchmark circuit c17 with the input set shown in Fig. 1 . Assume that all single stuck-at faults in the circuit belong to the target fault list. The relation matrix for the birelation on is shown in Fig. 6 (a) (only the upper triangular matrix is shown because is symmetric). Each entry in consists of two bits , with and corresponding to entries in the relation matrices and , respectively. Note that this relation matrix is determined by checking compatibility relations for each pair of inputs (as will be described later in Section IV-D) with respect to the set of all stuck-at faults in c17. The set of related inputs used to derive the optimal TPG for c17 in Example 1 is a subset of . The relation graph for is shown in Fig. 6(b) . The solid (dashed) edges correspond to . (For simplicity, the self-loop on each vertex, which corresponds to the identity relation contained in , has been removed from and from all subsequent relation graphs.) the induced subgraph of by each is a clique in . The minimum clique covering (CLIQUE-COVER) problem is to find a clique covering with the minimum cardinality for a graph.
B. Graph Structure
If we only consider the relational structure and assume that the compatibility relation does not change after combining a pair of compatible inputs (which is generally not true, as will be illustrated in Example 8), then the IRP can be relaxed to the CLIQUE-COVER problem-as in the case of verification testing. Vertices in each clique correspond to the inputs that can be combined into a test signal.
To obtain a minimal TPG design, however, the birelation which corresponds to compatible as well as inversely compatible inputs, must be considered. If the birelation does not change after combining a pair of compatible or inversely compatible inputs, then inputs in a coterie can be combined into a test signal and the IRP can be relaxed to the COTERIE-COVER problem for the relation graph. Unfortunately, due to a variant which will be described in the next section, the converse of Theorem 1 is generally not true, and hence the coterie condition is only a necessary condition for a set of inputs to be a test signal.
C. Reinforcement
A minimum coterie cover for the relation graph may not correspond to a valid set of test signals because it may contain pairs of compatible inputs that can not coexist. This variant is illustrated by the following example.
Example 8: Consider the circuit in Fig. 8(a) . Assume that all single stuck-at faults in the circuit belong to the target fault list. The relation matrix for the circuit is shown in Fig. 8(c) . Since and are compatible, they can be combined into a test signal as shown in Fig. 8(b) . The relation matrix for the reduced circuit is shown in Fig. 8(d) . Consider the pair of inputs and which are compatible in the original circuit . Since the assignment is a necessary assignment to detect the fault line 7 s-a-0 in , inputs and are incompatible in (even though neither nor are involved in the formation of the test signal ). As shown in the above example, by combining a pair of related inputs, another pair of related inputs may become unrelated. Therefore, the relation must be reinforced after each pair of related inputs is combined into a test signal. 
D. Determination of Relation Matrix
To determine whether , we can check if all detectable target faults in the original circuit remain detectable in the modified circuit , obtained by combining and into a test signal. If any target fault detectable in becomes undetectable in , then . The existence of redundancy can always be verified by generating a test for every target fault using an automatic test pattern generator/fault simulator (ATPG/FSIM). However, there are possible pairwise input connections for an -input circuit. In addition, after combining any pair of related inputs into a test signal, the birelation needs to be reinforced. Application of ATPG/FSIM to check compatibility of each pair of inputs would hence be impractical.
In practice, many entries in can be determined using polynomial-time algorithms, called filters. Use of filters can dramatically reduce the computation time required to calculate and update the relation matrix . Two types of filters are utilized:
• compatibility filters-that determine whether , and • incompatibility filters-that determine whether . Filters have low computational complexity (polynomial time), but are one sided-i.e., if a pair of inputs passes a compatibility (incompatibility) filter, then the two inputs are compatible (incompatible); but if it fails, then no relation can be assumed between the two inputs. A filter is fault dependent if it determines input relations by examining detectability of each fault in the fault list; otherwise, it is fault independent. 1) Compatibility Filters: Nonadjacent inputs: Nonadjacent inputs are both compatible and inversely compatible (see [8] for the proof), and thus can be used as a fault-independent compatibility filter. Nonadjacent inputs can be identified from the dependence matrix [15] , which can be determined from the circuit structure in time complexity , where is the number of nodes in the circuit.
Deterministic test set: If a complete test set is a set of test cubes for the fault that detects all target faults in a circuit is given, then it can be used to build a simple fault-dependent compatibility filter. Let and be the bits corresponding to input and in a test cube for a fault . Compatible inputs can be determined by applying the intersection operator, defined in Fig. 9 , on the input values
and . An intersection is inconsistent if ; otherwise, the intersection is consistent. If for each fault in the target fault list there exists at least one test cube such that the intersection of and is consistent, then and are compatible. A similar technique can be used to determine inversely compatible inputs by intersecting the values and in each test. In the extreme case, if all test cubes for all target faults are available (which is very unlikely), then input compatibility can be determined exactly by traversing the test cubes. However, if only some of the test cubes for each target fault are given, then this simple scheme becomes a compatibility filter. The time complexity to check if a pair of inputs is compatible using this filter is , because we need to traverse all the test cubes. Since there are pairs of inputs, the overall time complexity for this filter is . Even if this compatibility filter fails, the complete test set can still be used to reduce the run time of other fault-dependent filters. For example, if the current objective is to determine whether inputs and are compatible, then any target fault that has a test cube with consistent bits and (according to Fig. 9 ) is detectable and need not be considered. The fault-dependent filters must be applied only to the remaining target faults.
2) Incompatibility Filters: Level-1 incompatibility: This fault-independent incompatibility filter is based on the following lemma.
Lemma 3: If primary inputs and are connected to the same primitive gate (i.e., AND, NAND, OR, or NOR gate), where both stuck-at faults at each input of are detectable and targeted, then and are neither compatible nor inversely compatible.
Proof: Assume that is a two-input AND gate. A complete test for a two-input AND gate must contain the three patterns: 11, 10, and 01. Since all input faults of are detectable, these three patterns must appear at the two inputs and in order to achieve complete fault coverage. If and are combined, either directly or via an inverter, at least one stuck-at fault at the inputs of cannot be detected. Hence, the two inputs and are neither compatible nor inversely compatible. The same argument holds for other types of primitive gates with arbitrary number of inputs.
Lemma 3 is most useful for circuits with many primary inputs connected to the same gates. However, if some stuck-at faults at the inputs of are undetectable or if some of these faults are not targeted, then the inputs connected to may still be compatible [8] . The time complexity for this filter is for an -input circuit. Uncontrollability analysis: Uncontrollability and unobservability are used to identify redundancies in combinational and sequential circuits [2] , [12] . This technique can be used to identify incompatible inputs by checking whether any redundancy is introduced in the reduced circuit. If two inputs and are combined into a test signal, then the assignments and become illegal combinations. On the other hand, if two inputs and are combined into a test signal via an inverter, then the assignments and become illegal combinations. If any detectable target fault requires any illegal combination for its detection, then and are incompatible. Note that this incompatibility filter is fault independent because input incompatibilities are determined without individually targeting each detectable fault in the target fault list. In uncontrollability analysis, undetectable faults are identified using implication and set intersection, which can be done in time, where is the number of nodes in the circuit. Example 9: Consider the benchmark circuit c17 shown in Fig. 1 . Again, assume that the target fault list contains all single stuck-at faults in the circuit. Suppose we want to verify if inputs and are inversely incompatible. The assignments and become illegal. The implications for the illegal assignment are shown in Fig. 10 , based on the procedure in [2] . The value on a line means that is uncontrollable for value . Let denote a s-a-v fault on a line . The uncontrollability of setting to 1 makes all faults in undetectable. Similarly, the uncontrollability of setting to 1 makes all faults in undetectable. The set of undetectable faults due to the illegal assignment is therefore . Since the fault is detectable in the original circuit, inputs and are inversely incompatible.
Necessary assignment: A necessary assignment for a target fault is a value that must be assigned to a line in order to detect . Necessary assignments at the primary inputs can be used as a fault-dependent incompatibility filter to determine many incompatible inputs. Let and (where ) be the necessary Boolean assignments on a pair of inputs and . If (i.e., inputs and must have opposite values in order to detect ), then (otherwise, the fault will become undetectable after combining and into a test signal). On the other hand, if , then . Given a set of necessary assignments for a target fault which has determined values at the primary inputs, then pairs of inputs can be declared as (inversely) incompatible inputs [8] . Unfortunately, identifying all necessary assignments for a fault is an NP-complete problem. Efficient techniques developed to identify necessary assignments [10] , [14] , [21] , [22] can be used for our purpose. Example 10: The above filters are used to calculate the relation matrix for c17 assuming that all detectable single stuck-at faults in the circuit are targeted. The result is shown in Fig. 11 . The entries identified by verification, level-1 incompatibility, uncontrollability analysis, and necessary assignments (using direct implications only) are shaded as shown. All pairs of incompatible inputs have been identified by these filters. The remaining seven pairs of compatible inputs cannot be determined by these simple filters, but may be identified using more complicated filters or ATPG/FSIM.
3) ATPG/FSIM: The problem of determining the relation matrix is an NP-complete problem. In fact, determining a single entry in the relation matrix may require exponential run time in the worst case. Therefore, no filter (which has, by definition, polynomial time complexity) can guarantee the identification of all compatibility relations. When all filters fail, ATPG and FSIM are used to identify compatible inputs. Fig. 12 shows how ATPG/FSIM can be applied efficiently to identify compatible inputs. The procedure utilizes FSIM extensively so that ATPG needs to be performed only on a small number of faults. First, a complete test set for all target faults in the CUT is generated. Assume that the compatibility of and is being checked. Let be the circuit obtained by combining and into a test signal. Note that has one less input than . Let be the reduced test set obtained by removing the column corresponding to . The test set is used to perform simulation in the circuit on the target faults whose corresponding tests in have complemented values at and . (All other target faults are guaranteed to be detected by the tests in .) Since only and are combined, most detectable faults in remain detectable in by the tests in . Let be the target faults in that cannot be detected by any test in . The ATPG is then used to generate new tests for the faults in . If any fault in the circuit is aborted or proven redundant, then inputs and are declared incompatible. Otherwise, the two inputs and are compatible. There are several features that reduce the run time of ATPG/FSIM for input reduction. In early iterations, there exist many inputs that can be combined without introducing redundancy. Hence, many related inputs can be identified by FSIM and less time is spent on ATPG. As more inputs are combined, it becomes more difficult to find compatible inputs by FSIM alone. However, the time complexity for ATPG decreases because many inputs are already combined into test signals and a smaller space of test vectors is searched by the ATPG.
E. Branch-and-Bound Algorithm
A branch-and-bound algorithm, which incorporates various techniques described earlier, to solve the IRP (input reduction problem) is outlined next. Note that, although in the above discussion complete relation matrices were shown, in the implementation of the following algorithm, compatibility of a pair of inputs (direct or inverse) is computed only when they are considered for reduction. Furthermore, once an incompatibility is discovered, it is used repeatedly to reduce run time. is initially set to the number of inputs . The first call to the routine REIN-FORCE determines the birelation reinforced by the identity relation . The reinforced birelation is simply the collection of all possible relations determined for each pair of circuit inputs. Procedure REDUCE is then called to combine the inputs into test signals, one pair at a time. An ordered pair is selected by the routine BRANCH based on various branching rules. The order pair is included in together with all ordered pairs in to make a coterie birelation, denoted by the set . The routine REINFORCE is called again with the parameters and to determine the reinforced birelation . The number of test signals is decremented by one by combining a pair of related inputs into a test signal. The routine BOUND is used to bound the search space which cannot lead to better solutions. A minimal set of test signals (a local optimal) is obtained when as proven in Theorem 2. The global variables and are updated if the minimal solution returned by previous iteration of REDUCE is better than the best solution so far.
The worst case time complexity for a branch-and-bound algorithm remains exponential in terms of problem size. We have to rely on heuristics to obtain good solutions in reasonable time. The criteria of a good heuristic are very similar to that of a good branching rule. Therefore, the approach obtained by using any branching rule and stopping at the first solution can be used as a heuristic. Various branching rules with different complexities have been implemented in the routine BRANCH [8] . However, the simple branching rule of selecting the first or the last pair of related inputs in topological input order has been empirically found to be very efficient. (Topological order of inputs is the order in which the inputs appear in the original circuit description. Furthermore, when two or more inputs are combined into a single test signal, the input that occurs earlier in the original list is replaced by the test signal and the other inputs in the test signal are deleted.) In many benchmark circuits that we experimented on, the first solution obtained by using this heuristic is also the optimum TPG design, as will be demonstrated in the next section.
At each step of reduction, the heuristic uses the techniques described in Sections IV-C and IV-D to determine the compatibility of the pair of inputs (or, in general, test signals) being considered for reduction. This further reduces the complexity by avoiding the computation of all birelations in the beginning. Furthermore, in this approach, the birelation is naturally reinforced at each step.
V. EXPERIMENTAL RESULTS
The input reduction procedure was implemented and used to design BIST TPG's and derive compact test sets for all ISCAS85 [7] , combinational parts of most ISCAS89 [6] , and MCNC logic synthesis benchmark circuits. For all circuits, all detectable single stuck-at faults are targeted.
A. BIST TPG Size
The experimental results for BIST TPG sizes for ISCAS85 and ISCAS89 benchmark circuits are shown in Table I (a). Columns 1 and 2 give the circuit name and the number of circuit inputs. Column 3 gives the largest cone size. Column 4 gives the number of TPG stages derived by a heuristic version of Procedure 1 (the program is terminated at the first minimal solution obtained by selecting input pairs in input topological order). For example, the circuit c7552 has 207 inputs and the largest cone size 194. The lower bound on the test length for any pseudoexhaustive approach (without partitioning) is therefore . Using the proposed technique, however, the number of TPG stages is 28. In other words, the proposed TPG design guarantees the detection of all detectable single stuckat faults with test length . To determine the quality of the heuristic, lower bounds for BIST TPG sizes based on input reduction are calculated by relaxing the IRP to the CLIQUE-COVER problem [19] . The results are shown in column 5 of Table I(a). A "-" in the column indicates that the corresponding CLIQUE-COVER problem cannot be solved in reasonable time by our simple implementation. For most circuits, the number of TPG stages derived by the heuristic is close to the lower bound obtained. For the remaining circuits, we believe that more realistic lower bounds can be obtained by solving the COTERIE-COVER problem.
Most ISCAS85 benchmark circuits are data path circuits, while most ISCAS89 benchmark circuits are control circuits. These two classes of circuits have very different characteristics, but our program performs equally well on both sets of benchmark circuits. Except for the circuits s838 and s15850, the proposed TPG's guarantee the detection of all detectable single stuck-at faults with test length , including circuits that are known to be random-pattern resistant (e.g. c2670 and c7552). Also note that greater improvements have been obtained for larger circuits.
We also applied our procedure on some MCNC logic synthesis benchmark circuits with the results shown in Table I (b). The circuits are synthesized in SIS [23] using two different scripts: rugged and fx [20] . It is observed in [9] that these synthesized circuits are random-pattern resistant compared to manually designed circuits. Except for the circuit rckl, the proposed TPG's guarantee the detection of all stuck-at faults with test length . (The circuit rckl has a node which is The time taken to perform all steps of input reduction for a circuit was between 1.5 and 5 times the time taken to generate a complete set of deterministic tests for the circuit.
B. Fault Coverage for Pseudorandom Test Sequences
To show the efficiency of the proposed TPG's, fault simulation was performed for some random-pattern resistant circuits using test sequences generated by full-length ( -stage) TPG's and the proposed TPG's, with the results shown in Table II . The total number of faults, the number of detectable faults, and the maximum fault coverage are given in columns 2, 3, and 4, respectively. The number of TPG stage and the test length required to achieve the maximum fault coverage for a full-length TPG and the proposed TPG are shown in columns 5-6 and columns 7-8, respectively. As an example, the circuit chkn has 589 stuck-at faults (after collapsing), of which 587 faults (99.66%) are detectable. The numbers of TPG stages for the full-length TPG and the proposed TPG are 29 and 19, respectively. The test lengths to assure complete fault coverage are therefore for the full-length TPG and for the proposed TPG. In actual fault simulation, all stuck-at faults can be detected after simulating 1 634 428 test vectors using the full-length TPG. For the proposed TPG, all stuck-at faults can be detected after simulating 395 029 test vectors-a 76% reduction in pseudorandom test length. The fault coverage for chkn is plotted as a function of the test length in Fig. 13(a) . As illustrated by the figure, the proposed TPG achieves complete fault coverage in a fraction of the test length required for the full-length TPG.
Fault simulation was also performed for the circuit c2670, which is known to be random-pattern resistant, using 2 million test vectors generated by the full-length TPG and the proposed TPG. The fault coverage as a function of the test length for c2670 is shown in Fig. 13(b) . For c2670, 2396 (95%) stuck-at faults out of the 2408 detectable ones are detected after simulating 2 million test vectors using the proposed 22-stage LFSR, while only 2379 (94.33%) stuck-at faults are detected using a 233-stage LFSR (the full-length TPG). The proposed TPG can achieve this fault coverage with only 0.29 million vector-a factor of seven improvement over the 233-stage LFSR. 
C. Test Set Compaction for External Testing
As shown in Section III-B, test signals derived by input reduction can be used to perform test set compaction. The experimental results shown in Table III are obtained using COMPACTEST [17] , a state-of-the-art method to generate compact test sets. After the circuit name, the number of inputs and the number of tests in the smallest test sets reported in [17] are shown in columns 2 and 3, followed by the memory storage, bits, and the test application time, clock cycles. The number of bits for each test [which is also the number of TPG stages shown in Table I (a)] and the number of tests derived for the reduced circuits (obtained by combining inputs into test signals) are shown in columns 6 and 7. The disk space required to store the tests and the ratio are given in columns 8 and 9, respectively. The test application time required to apply the tests derived by input reduction for single-scan chain (SSC) and multiple-scan chains (MSC), given by and , together with their ratios to the test application time required for the original circuit, are given in the last four columns.
As an example, for the circuit c2670, COMPACTEST can generate a test set of size 67, each with 233 bits. The memory storage required for this test set is therefore bits. The test application time to apply the tests is clock cycles. The number of tests generated by COMPACTEST for the reduced circuit is 103, each with 22 bits. The memory storage is therefore bits, which is 85% lower than that required for the original circuit. The test application time to apply the tests generated by the proposed technique for single-scan chain and multiple-scan chains are and clock cycles, respectively. The TABLE III  TEST SET COMPACTION FOR EXTERNAL TESTING test application time using the single-scan chain configuration increases by 53% compared to that without input reduction. Using multiple-scan chains, however, the test application time is only 15% of that required to apply the tests derived for the original circuit.
As shown in Table III , the memory storage and test application time are reduced by 50-85% if the proposed multiple-scan chains are used. For conventional single-scan chain designs, the proposed technique can still be used to reduce the memory storage by 50-85%. This is at the cost of 1-2 times increase in the test application time because more tests need to be applied.
D. Comparison with Other BIST Techniques
The proposed technique is compared with other BIST schemes based on the following criteria: test length, fault coverage, hardware overhead, and performance penalty.
Weighted random testing can achieve the shortest test length among various TPG design techniques. However, it necessitates the insertion of weight circuits between the outputs of flip-flops and CUT inputs, causing area overhead and degrading CUT performance. Since the test time for the proposed scheme is practical, the shorter test length obtained with weighted random testing may not be important enough to pay with higher area overhead and performance penalty.
Pseudoexhaustive testing can assure complete fault coverage, if the TPG is allowed to generate a maximal test sequence. However, for practical circuits, the number of TPG stages is large, and it is impossible to generate all tests. Partitioning cells are often inserted at selected circuit lines to make pseudoexhaustive testing possible. These partitioning cells introduce significant area overhead and degrade circuit performance. In contrast, the proposed technique obtains complete fault coverage in practical test time without partitioning, thereby keeping the area overhead and performance degradation very low. The performance penalty can be minimized by carefully designing the TPG's (e.g., Fig. 3) .
One major advantage of pseudoexhaustive testing is the coverage of unmodeled combinational faults (e.g., bridging faults) within each cone. However, for most circuits, many test points need to be inserted to reduce the number of TPG stages and the test length. As the cone sizes in the test mode become smaller, the number of bridging faults that are guaranteed to be detected by pseudoexhaustive testing also decreases. In contrast, our technique is fault oriented, and does not guarantee the coverage of bridging faults. However, since no partitioning is required, we believe that the proposed scheme can detect a large portion of unmodeled faults as well. Finally, it is easy to include possible bridging faults in the fault list, and synthesize a TPG that guarantees their detection with the proposed technique. However, this may increase the design effort and the number of stages in the resulting TPG.
VI. CONCLUSION
A new TPG design technique that can be viewed as a significant generalization of verification testing [15] has been proposed. The technique uses all (or, if desired, a subset of) single stuck-at faults as the target fault set and analyzes the logic of the CUT to identify pairs of compatible inputs, i.e., inputs that can be connected to a common TPG stage, i.e., a test signal, and still guarantee the detection of all target faults. An input reduction procedure that can determine a small number of test signals for a given CUT while guaranteeing complete stuck-at fault coverage has been presented. Experiments have been performed on a large number of benchmark circuits (ISCAS85, ISCAS89, and MCNC synthesis benchmarks). The results show that, for most circuits, the proposed procedure can design TPG's that guarantee the detection of all single stuckat faults in practical test length. The resulting TPG designs replace normal flip-flops in the circuit by scan flip-flops, and only require the addition of feedback logic and routing typical in BILBO designs [30] . The performance penalty of the proposed TPG's is as low as that for standard scan designs. The proposed technique can also be used to derive compact test sets for external testing. Experimental results show that up to 85% reduction in memory storage and test application time can be achieved compared to those achieved by existing test set compaction algorithms.
The input reduction technique has also been generalized to design BIST TPG's that guarantee the detection of all or selected delay faults, to determine the scan cell order in a scan chain (so that complete delay fault coverage can be obtained), and to derive compact test sets for delay faults [8] . Several enhancements to the proposed input reduction procedure to make it applicable to larger circuits-by further reducing the TPG size as well as the run time complexity of the IRP-are currently under investigation. Enhancements being considered include reconfigurable TPG's that can apply tests in multiple sessions and TPG's that use more complicated compatibility types (e.g., combining inputs into test signals via logic gates). Finally, a technique to synthesize a multilevel circuit with the target of minimizing the number of test signals is also under investigation.
