Abstract-A test pattern generator (TPG) for built-in self-test (BIST), which can reduce switching activity during test application, is proposed. The proposed TPG, called dual-speed LFSR (DS-LFSR), consists of two linear feedback shift registers (LFSRs), a slow LFSR and a normal-speed LFSR. The slow LFSR is driven by a slow clock whose speed is 1 th that of the normal clock, which drives the normal-speed LFSR. The use of DS-LFSR reduces the frequency of transitions at the circuit inputs driven by the slow LFSR, leading to a reduction in switching activity during test application. A procedure is presented to design a DS-LFSR so as to achieve high fault coverage by ensuring that patterns generated by it are unique and uniformly distributed. A new gain function and a method to compute its value for each circuit input are proposed to select inputs to be driven by the slow LFSR. Also, a procedure to increase the number of inputs driven by the slow LFSR by combining compatible inputs is presented to further decrease the switching activity. Finally, DS-LFSRs are designed for the ISCAS85 and ISCAS89 benchmark circuits and shown to provide a 13% to 70% reduction in the numbers of load-capacitance weighted transitions with no loss of fault coverage (for stuck-at as well as transition delay faults) and at very slight area overheads.
DS-LFSR: A BIST TPG for Low Switching Activity I. INTRODUCTION
T HE LINEAR feedback shift register (LFSR) is commonly used as a test pattern generator (TPG) in low overhead built-in self-test (BIST). This is due to the fact that an LFSR can be built with little area overhead and used not only as a TPG, which provides high fault coverage for a large class of circuits, but also as an output response analyzer.
A significant correlation exists between consecutive vectors applied to a circuit during its normal operation. This fact is what has motivated several architectural concepts, such as cache memories, and is central to their effectiveness. This is also true for the high-speed circuits that process digital audio and video signals-the inputs to most of whose modules change relatively slowly over time. In contrast, the consecutive vectors of a sequence generated by an LFSR are proven to have low correlation. Since the correlation between consecutive vectors applied to a circuit during BIST is significantly lower, the switching activity in the circuit can be significantly higher during BIST than during its normal operation.
Manuscript received May 15, 2001; revised November 26, 2001 . This work was supported in part by the National Science Foundation under CAREER Award MIP-9502300. This paper was recommended by Associate Editor R. Aitken.
S. Wang is with CCRL, NEC USA, Princeton, NJ 08540 USA (email: swang@nec-lab.com).
S. K. Gupta is with Electrical Engineering Systems, University of Southern California, Los Angeles, CA 90089-2562 USA (e-mail: sandeep@boole.usc.edu).
Publisher Item Identifier S 0278-0070(02)05633-6.
Excessive switching activity during test can cause several problems. Foremost, since power dissipation in a CMOS circuit is proportional to weighted switching activity, a circuit under test (CUT) can be permanently damaged due to high temperature that is caused by excessive power dissipation if the switching activity in the circuit during test application is much higher than that during its normal operation. (In this paper, the number of transitions and hazards at a line are always weighted by the line's load capacitance. For simplicity, we often drop the modifier "weighted".)
Power dissipated during test application is already influencing the design of test methodologies for practical circuits. For example, it is reported in [1] and [2] that one of their major considerations in test scheduling is the fact that power dissipated during test application is typically significantly higher than that during normal circuit operation (sometimes 100%-200% higher).
The seriousness of excessive power dissipation during test application is exacerbated by trends such as circuit miniaturization for portability and high performance (smaller chips can be placed closer, decreasing interconnect delays). These objectives are typically achieved by using circuit designs that decrease power dissipation and reducing the package size to aggressively match the average power dissipation during the circuit's normal operation. In order to ensure nondestructive testing of such a circuit, it is necessary to either apply test vectors that cause switching activity that is comparable to that during normal circuit operation or remove any excessive heat generated during test using special cooling equipment. The use of special cooling equipment to remove excessive heat dissipated during test application becomes increasingly difficult and costly as tests are applied at higher levels of circuit integration, such as BIST at board and system levels. More importantly, it cannot solve other problems caused by excessive switching activity described next.
It has been observed that metal migration (electromigration) causes the erosion of conductors and subsequent failure of circuits [3] , [4] . Since temperature and current density are major factors that determine electromigration rate, elevated temperature and current density caused by excessive switching activity during test application can severely decrease the reliability of circuits under test. This is even more severe in circuits equipped with BIST, since such circuits may be tested frequently.
To test a bare dice, power must be supplied during the period of test through probes that typically have higher inductance than power and ground pins of a circuit package. Hence, the bare dice under test will experience higher power/ground noise that is given by , where is the inductance of power and ground line and is the rate of change of current flowing in power and ground lines. Excessive power/ground noise can 0278-0070/02$17.00 © 2002 IEEE erroneously change the logic state of circuit lines causing some good die to fail the test, leading to unnecessary loss of yield.
All the above-mentioned problems are being brought to light by increasing acceptance of at-speed testing. In the past, the tests were typically applied at rates much lower than a circuit's normal clock rate (since only the coverage of stuck-at faults was deemed to be important and slow testers provided an inexpensive way of testing). However, in recent years, aggressive timing has made it essential for the tests to identify slow chips via delay testing. Delay testing is almost imperative for the growing number of circuits manufactured for use in MCMs, a fact reflected in the extensive demand for performance-certified die [5] , [6] . Circuits are now tested at higher clock rates, if possible, at the circuit's normal clock rate (at-speed testing), to achieve coverage of delay faults. Consequently, power dissipation during test application is on the rise and is fast becoming a problem that requires close attention. (It should be noted, however, that at-speed testing is not necessary to achieve high delay fault coverage. It is sufficient to apply a rich set of two-pattern tests and capture the circuit response one normal clock delay after the application of the second pattern in each two-pattern test.)
Recently, several papers that address the problem of reducing power dissipation during built-in self-test have been published [7] - [11] . Methods to reduce power dissipation in deterministic tests are proposed in [12] - [14] . Techniques proposed in [7] , [8] , [11] require extra gates (AND gates, latches, or pass transistors) to be inserted to block transitions at the TPG outputs. These extra gates are inserted between TPG stages and CUT inputs and hence degrade circuit performance beyond typical BIST methodologies. Furthermore, if many extra gates are required to achieve desired reduction in power dissipation, additional hardware overhead becomes significant. This paper, which is a significant extension of [10] , presents a BIST technique that can reduce switching activity thus mitigating the above-mentioned problems by decreasing, during test application, the magnitudes of power dissipation, average power supply current ( ), and current spikes in power and ground lines ( ). We assume that the given circuit is sequential but during self-test all flip-flops in the circuit are configured as pattern generators and/or response analyzers. Hence, the CUT is combinational. Unlike [7] , [8] , and [11] , the technique proposed in this paper can achieve significant reduction in switching activity without affecting circuit performance. Simulation results presented in Section V show that the proposed technique can be implemented at low hardware overhead.
The random pattern test length required to achieve high fault coverage is sometimes determined by only a few hard-to-detect faults [15] . These faults are also called random pattern resistant faults because they escape most randomly generated test patterns. Hence, uniformly distributed test patterns may not achieve high fault coverage for circuits that have many random pattern resistant faults. We have also developed a test pattern generator that can be used to cover random pattern resistant faults without causing excessive switching activity during test application [16] . Due to space limitations, here we focus on the construction of pattern generators that are counterparts of LFSRs that help reduce weighted switching activity. This paper is organized as follows. In Section II, the architecture of the proposed DS-LFSR TPG is described. In addition, the sequences generated by the proposed TPGs are analyzed and compared with the sequences generated by LFSRs with primitive feedback polynomials. In Section III, a procedure to select inputs to be driven at slow speed is described. In Section IV, a compatibility analysis-based method to increase the number of inputs driven at slow speed without loss of fault coverage is proposed. In Section V, simulation results are reported for ISCAS85 and ISCAS89 circuits. Concluding remarks are finally given in Section VI.
II. DUAL-SPEED LFSR
Let be the number of transitions at a circuit line in the time interval ( ]. The transition density at , i.e., the number of transitions per second at , is defined as [17] , [18] (1)
Consider a CUT with inputs, , which are driven by an -bit LFSR. Assume that outputs of the LFSR (inputs of the CUT) are not correlated, so that the value applied to any input of the CUT is independent of the value applied to any other input , where . The Boolean difference of the Boolean function implemented by line , , with respect to input is defined as (2) where denotes an exclusive -OR operation. The transition density of a line can be redefined in terms of the Boolean difference with respect to each input and the transition density of each input , as
where is the probability that the Boolean difference, , evaluates to a 1. Finally, the average power dissipated in the CUT during BIST is given by (4) where is the power supply voltage and is the load capacitance at line . Equation (4) shows that the average power dissipated in a CUT during BIST is proportional to the transition density at the inputs of the CUT. In this paper, the average power dissipation during BIST is reduced by lowering the transition density at a subset of the inputs of the CUT; at the same time, the same, or sometimes even higher, fault coverage is obtained.
Let be a sequence of random patterns generated by an -bit LFSR, where is the length of test sequence and is an -bit pattern , for
. This pattern sequence, , can also be written as an matrix each of whose rows corresponds to a pattern and column corresponds to the bit sequence appearing at the output of the th stage of the LFSR. Let be a sequence of -bit patterns consisting of arbitrary columns of and let be the sequence consisting of the remaining ( ) columns of not included in . In the following, we will refer to the sequences and as the portion and portion, respectively, of . In the following, we will assume that all LFSRs have been modified to generate the all-zero pattern in addition to generating maximal length sequences.
First, consider the case where . In this case, contains exactly 2 repetitions of each of the 2 distinct -bit patterns. Hence, can be partitioned into 2 groups such that the portions of all patterns within each group are identical. Let us first reorder such that patterns that belong to the same group are placed next to each other. Next, let us reorder 2 elements in each of the 2 groups in such a manner that the -portions of the patterns in the group appear in the same order as they would be generated by an ( )-bit LFSR. Let the reordered sequence be . Fig. 1 shows an example for , , and . The random pattern sequence shown in Fig. 1(a) is called the original sequence and the one in Fig. 1(b) is called the reordered sequence . Because is obtained by merely reordering the patterns in , all patterns in are also in . Therefore, the sequence is equivalent to in the sense that the stuck-at fault coverage obtained by the application of to a combinational CUT is the same as that obtained by applying (since, for a combinational CUT, the stuck-at fault coverage is independent of the order in which test patterns are applied). However, will cause less switching activity in the CUT than , since the patterns in the portion of change at a much slower speed, namely, once every 2 patterns. For example, the leftmost bit of the reordered sequence in Fig. 1 changes only two times, while in the corresponding original sequence, , the same bit changes eight times.
The sequence of the type can be generated by two independent LFSRs -a -bit LFSR to generate the portion and an ( )-bit LFSR to generate portion. Both these LFSRs will be designed to generate maximum length sequences including the all-zero pattern. Since the patterns in portion of change clocks but the patterns in portion change every clock, the -bit LFSR must be driven by a slow clock whose period is 2 times as that of the normal clock, which drives a ( )-bit LFSR. We will call the -bit LFSR the slow LFSR and the ( )-bit LFSR the normal-speed LFSR to distinguish from the original single -bit LFSR that generates the sequence . Recall that both the slow and normal-speed LFSRs generate and ( )-bit all-zero patterns. Fig. 2 shows a BIST architecture that is equipped with a slow and a normal-speed LFSR. SCLK denotes the slow speed clock and CLK the normal-speed clock. The slow speed LFSR will be clocked by a clock whose frequency is th of that of the normal clock i.e., slow clock speed normal clock speed . (To simplify the following discussion as well as the hardware, will be assumed to be a power of 2.) Note that the slow LFSR has both SCLK and CLK as clock inputs and has a control signal SEL_CLK that selects either SCLK or CLK. SCLK is selected when the slow LFSR is used as a test pattern generator; CLK is selected when the CUT is in the normal mode or when the slow LFSR functions as a multiple input signature register (MISR). The TPG that is composed of the slow and normal-speed LFSR is called dual speed LFSR or DS-LFSR.
When , the sequence of random patterns generated by an -bit LFSR is not complete in that it does not generate all possible -bit patterns. Hence, may have fewer than 2 repetitions of some of the -bit patterns. The patterns generated are determined by the polynomial and seed of the LFSR. Hence, if and , the patterns in the modified sequence generated by the slow and normal-speed LFSRs may not be equivalent to those in the original sequence. Our objective is not only to reduce switching activity of the CUT during BIST but also to achieve fault coverage that is comparable to, or higher than, that obtained by the sequence generated by the corresponding original LFSR. Hence, we need a quantitative analysis to compare the modified sequence with the original sequence.
A. Uniqueness of Patterns
The first condition to be satisfied by a DS-LFSR to achieve the same fault coverage is that the sequence it generates should not contain any repeated -bit patterns. (Recall that, for simplicity of notation, and are both assumed to be powers of 2.) If , consecutive -bit patterns in the sequence generated by a DS-LFSR with the stage slow LFSR and ( ) stage normal-speed LFSR (both of which have primitive feedback polynomials and generate the all-zero pattern) are distinct. This is due to the fact that when , patterns in cannot repeat, since an ( )-bit LFSR that has a primitive feedback polynomial and generates the all-zero pattern has a period 2 . Even though the pattern sequence generated by a DS-LFSR of above type contains distinct patterns, this sequence is not suitable for pseudorandom testing, since the portion of each vector contains only one fixed bit pattern. Next we show that in a more practical case, where , hits slow and normal-speed LFSR can collectively generate distinct patterns, provided that the values of and satisfy certain conditions.
Lemma 1: If and , consecutive -bit patterns in the sequence generated by a DS-LFSR with the slow and normal-speed LFSRs (both of which have primitive feedback polynomials and also generate the all-zero pattern) are distinct.
Proof: Since a -stage slow LFSR that has a primitive feedback polynomial and generates the all-zero pattern has a period cycles, when , the slow LFSR does not exhaust its period. Hence, each distinct -bit pattern in the portion appears over exactly one time frame of consecutive cycles in the entire period with normal clock cycles.
In order for any two -bit patterns, and , to be identical, both the -bit patterns in portion and the -bit patterns in portion of and must be identical. Hence, in order for the DS-LFSR to generate any repeated patterns, the stage normal-speed LFSR, which has a period of 2 cycles (the normal-speed LFSR also has a primitive feedback polynomial and generates the all-zero pattern), must repeat its period within a time frame of consecutive cycles during which the stage slow LFSR generate repeated patterns. However, since , the normal-speed LFSR does not repeat within consecutive cycles. Hence, the DS-LFSR does not generate any repeated patterns.
Q.E.D. The sequence generated by a DS-LFSR satisfying conditions specified in Lemma 1 always contains distinct patterns. However, if a large number is selected for , only few distinct -bit patterns in the space of 2 possible -bit patterns will be contained in the -portion of the patterns. In an extreme case where , the portion contains only one -bit pattern for the entire -bit pattern. In contrast, when , the slow LFSR generates all possible 2 -bit patterns and hence the DS-LFSR will generate patterns that are shown ahead to be uniformly distributed in the space of all possible 2 -bit patterns. Due to this property, which is demonstrated next, DS-LFSRs that satisfy the conditions, and , are the ones that must be used.
The number of DS-LFSR stages, , is determined by the number of circuit inputs and the test sequence length is determined by desired fault coverage. Hence, the conditions of Lemma 1 are used to determine the number of stages in the slow LFSR, , when the clock ratio, , is given, or to determine when is given.
Theorem 1: If , there always exists that satisfies the conditions, and , for any given , , and . Similarly, there always exists that satisfies the conditions, and , for any given , , and .
Proof: We will prove only the first statement, since the proof for the second statement is similar. Let us claim that for given values of , , and , the condition, , is satisfied only if . Substituting the latter condition, i.e., into the former, we get . This implies that . This contradicts the condition that . Q.E.D. Theorem 1 shows that for any combination of , , and , or , , and we can design a DS-LFSR that satisfies the condition: and , i.e., a DS-LFSR that can generate a sequence with uniform distribution. In the remainder of this paper, we only study DS-LFSRs that satisfy all requirements identified in this section.
B. Equidistribution of Patterns
The second condition to be satisfied by sequences generated by a DS-LFSR is randomness. Several methods to test randomness of a sequence are described in [19] .
The sequences generated by the proposed DS-LFSR are already shown, by Lemma 1, to pass collision test, which tests for the repetition of any pattern. Since the stuck-at fault coverage for a combinational CUT is independent of the order in which the test patterns are applied, the tests that test randomness of the sequence, such as serial, poker, and run tests, are not useful to quantify the quality of random test patterns for stuck-at faults. Therefore, for our application, the most important test is the equidistribution test.
test [19] is a method used to test the degree of agreement between distribution of a sample of generated random values and a targeted distribution. Assume that we generate -bit patterns where and 2 is divisible by . Let be a natural number. For the generated patterns to be uniformly distributed in the space of 2 patterns, when expressed as natural numbers, there is only one generated pattern in any interval [ ), for . Therefore, the formula for test is given by (5) where is the expected number of patterns that belong to the interval [ ), , and is the number of generated patterns belonging to the same interval. For uniform distribution, , for . If test patterns are not uniformly distributed, then there might be some inputs that are assigned the same values in most test patterns. Hence, faults that can be detected only by patterns that are not applied may escape and cause low fault coverage. For example, if inputs and of a circuit under test are mostly assigned 00 when a sequence of patterns is applied to , then all faults that can be detected only when inputs and are assigned values other than 00, i.e., 01, 10, and 11, will escape detection. Hence, uniformly distributed test patterns generally achieve higher fault coverage for most circuits than test patterns that are not uniformly distributed. Table I compares the value of for the sequences generated by single LFSRs (for short, LFSR sequences) with those for sequences generated by DS-LFSRs (for short, DS-LFSR sequences).
As described in the preceding paragraph, the binary bit pattern of each pattern generated by DS-LFSRs and single LFSRs is translated into the corresponding natural number to calculate . Since the use of a different bit order can cause the same pattern to be translated into a different natural number, the value of for the sequences generated by DS-LFSRs are highly dependent on the positions of bits generated by the slow LFSR in the patterns. Fig. 3 shows two identical sets of six-bit patterns with different bit orders. Let us assume that the patterns are generated by a DS-LFSR that is composed of a two-bit slow LFSR and four-bit normal-speed LFSR. For convenience, let us call the set of patterns on the left-hand side of the figure the patterns with the given bit order and the set of patterns on the right-hand side of the figure the patterns with the reverse bit order. The patterns with the given bit order are obtained by considering the bits generated by the slow LFSR as the most significant bits (MSBs) and those generated by the normal-speed LFSR as the least significant bits (LSBs). The patterns with the reverse bit order are obtained by simply reversing the bit positions in the given bit order; hence the MSB (LSB) in the given bit order becomes the LSB (MSB) in the reverse bit order. Since the DS-LFSR has six stages and only eight patterns are generated in this example, the space of patterns is partitioned into eight groups, each of which has an interval of 8. The integers in the columns next to each set of patterns in the figure (the column to the right of the patterns with the given bit order and that to the left of the reverse bit order) are natural numbers (radix 10) that correspond to each binary bit pattern in the two pattern sets. In the set of patterns with the given bit order, exactly one pattern is mapped onto each of eight intervals, i.e., . However, in the set of patterns with reverse bit order, all eight patterns are mapped to only two intervals, namely [0, 8) and [32, 40), resulting in . This illustrates that results for the patterns generated by a DS-LFSR may be highly dependent on the bit order.
A sequence where the bits generated by the slow LFSR are considered MSBs of each vector of the sequence will produce better (i.e., a smaller ) than a sequence where the bits generated by the slow LFSR are considered LSBs. This is due to the fact that, by our construction, the slow LFSR always generates an exhaustive sequence of -bit patterns, which are uniformly distributed. Hence, in the experiments shown in Table I , each vector in a sequence generated by DS-LFSRs is shuffled according to a randomly generated order to reduce the effect due to biased bit ordering.
Results are shown in Table I , below this heading denote the values. Thus, for example, the column labeled "4" shows normalized values for the sequence of the patterns generated by a DS-LFSR containing a slow LFSR which is driven by a slow clock whose speed is 1/4 of the normal clock.
Normalized s for most DS-LFSR sequences are smaller than 1 except when and or 4096. Note that for the DS-LFSR sequence is smaller than 0.1 when and . These results indicate that DS-LFSR sequences are more uniformly distributed than LFSR sequences if . Typical IC chips have hundreds of thousands of flip-flops where many flip-flops are configured as TPG stages in order to achieve high fault coverage. However, in order to finish testing in a reasonable time, test vectors that can be applied to CUTs should be limited to, say, 2 . Hence, 2 is many orders of magnitude greater than in typical IC chips. This implies that the DS-LFSR can generate more uniformly distributed sequences and achieve higher fault coverage for most practical IC chips than the traditional LFSR. Variations in for different clock speeds, i.e., for different values of , are not significant except when . A circuit consists of multiple cones. A cone is a set of circuit lines and gates that are logically connected to an output. Fig. 4 shows a circuit that has two cones, cone A and cone B. Inputs of cone A and cone B are driven by both a four-stage slow LFSR and an stage normal-speed LFSR. Only two stages out of four stages of the slow LFSR drive cone A while all four stages of the slow LFSR drive cone B. Hence, even though the modified sequence of -bit patterns is uniformly distributed, a sequence obtained by selecting columns of the sequence, where one or more columns are selected from the sequences generated by the normal-speed as well as the slow LFSR, may not be as uniformly distributed as the sequence obtained by selecting columns of the sequence generated by an -stage original LFSR. The sequence of patterns generated by an LFSR is said to hold a good randomness property. Hence, the portion of the modified sequence that is generated by the normal-speed LFSR can be assumed to hold a good randomness property [20] . Since the slow LFSR always generates 2 -bit patterns, the -bit patterns in the portion are also uniformly distributed even though the slow LFSR is driven by the slow clock. Thus, our concern is restricted to the -bit patterns that consist of parts of both and portions. Table II shows the ratios of test results for ( )-bit portions of the -bit DS-LFSR sequences to those for ( )-bit portions of the single LFSR sequences, where for DS-LFSRs, all stages of normal-speed LFSR, and , and four stages of slow LFSRs are selected. Recall that when a number shown in the table is smaller than 1, the result for the corresponding DS-LFSR sequence is lower (i.e., better) than that for a sequence generated by a single LFSR. The heading "clk ratio ( )" denotes the ratios of the slow clock speed to the normal clock speed. The column labeled "# bits from " denotes the number of bits in the portion that are concatenated with the portion, i.e., the in ( ) bits. The headings below the values of or denote the values of as multiples of 1024. In most cases, ( )-bit patterns obtained from sequences generated by DS-LFSRs are more uniformly distributed than those obtained from sequences generated by LFSRs. As in Table I, when , sequences generated by DS-LFSRs produce much smaller results than those generated by LFSRs.
III. SELECTING INPUTS DRIVEN BY THE SLOW LFSR
Depending on the circuit structure, the transitions at some inputs of a CUT cause more transitions at internal lines than those at other inputs. Therefore, driving at slower speed those inputs that may cause more transitions in the internal circuit will yield greater reductions in switching activity.
We have developed a procedure to select the inputs to be driven at slower speed. A gain function, which is based on the transition density formulation, is calculated for each input of the circuit. The transition density of a circuit line is the sum of the transitions at each input that propagate to line as shown in (3). Hence, the portion of the transition density of line due to the transition at a specific input is given by (6) where is the Boolean function of line . The sum of transition densities of all lines in the circuit, weighted by each line's capacitance that can be attributed to the transitions at , is given by
The gain function is computed for all inputs of the circuit and used as the criterion to select the inputs that are to be driven by the slow LFSR of the DS-LFSR. In other words, inputs that have the greatest values are chosen to be driven by the slow LFSR.
The probability that the Boolean difference of with respect to evaluates to a 1, , is derived from the signal probability of each line using a procedure similar to what is used to calculate detection probability in [15] . An auxiliary AND gate is introduced for each path from to , which is in the transitive fanout of . The controlling value of a gate is the value which, when applied to an input of a gate, determines the value of the output of the gate independent of the values applied to its other inputs. The controlling value of an AND gate is a 0 and the controlling value of an OR gate is a 1. If the controlling value of a gate that is on a path from input to is 1, all inputs of the gate that are not on the path are directly connected to the inputs of the auxiliary AND gate. Otherwise, all such inputs are connected to the inputs of the auxiliary AND gate through inverters. Hence, the probability that a path from to is sensitive to is the probability that the output of the auxiliary AND gate evaluates to a 1. The outputs of the auxiliary AND gates, which are introduced for each path from to , are connected to the inputs of an auxiliary OR gate. Finally, the probability that evaluates to a 1 is the probability that the output of the auxiliary OR gate evaluates to a 1. ) and ( ) are connected to inputs of auxiliary AND gates through inverters and signal lines, , , and , that drive AND gates along the two paths, are connected to inputs of auxiliary AND gates directly. A transition at can propagate to when the output of auxiliary AND evaluates to a 1, i.e., , , and or the output of auxiliary AND gate evaluates to a 1, i.e., , , . Hence, the probability that is given by the probability that the output of the auxiliary OR gate evaluates to a 1.
IV. MERGING COMPATIBLE INPUTS
Assume that random patterns generated by a combination of slow and normal-speed LFSR are applied to the CUT. Recall that the slow clock speed is th of that of the normal clock. Then, the number flip-flops in the slow LFSR required to avoid repetition of any pattern and to generate uniformly distributed patterns is given by that can be derived from the condition . Note that this value of is determined by and , without taking the structure of the CUT into consideration. Hence, if the CUT has many inputs and is small, the number of inputs that are driven by the outputs of the slow LFSR will be low compared with the number of inputs that are driven by the normal-speed LFSR. This means that most lines of the CUT will be driven by the normal-speed LFSR. Therefore, for such circuits, the reduction in the number of transitions obtained by driving inputs at the slow speed may not be significant.
If no output of a circuit is driven by input as well as , then and are said to be compatible [21] and denoted by . The testability of the cones driven by is independent of the value of and vice versa. Hence, a set of compatible inputs can be merged into a test signal and driven, during test application, by the same output of an LFSR without any loss of test coverage. This implies that the actual number of inputs driven by the slow LFSR can be greater than when it drives test signals comprised of multiple inputs.
In order to take merged inputs into consideration, the gain function given by (7), which was calculated for each input, is now calculated for each set of inputs in a test signal, or a clique, where a clique [22] consists of inputs that are compatible to each other. The modified gain function is given by (8) where denotes a line in the circuit and a set of inputs in test signal . The outputs of the -bit slow LFSR are connected to test signals instead of compatible inputs that have the greatest gain function values. Let denote the total number of inputs driven by the slow LFSR. Thus , if any multiple input test signals are driven by the slow LFSR. In this case, only , instead of , inputs are driven by the normal-speed LFSR after compatible inputs are merged. The problem of identifying test signals by combining compatible inputs and selecting test signals to be connected to the outputs of the slow LFSR can be formulated as a maximum clique problem [22] . Since the maximum clique problem is a well-known NP-complete problem, a heuristic that is based on the Kernighan-Lin bipartitioning algorithm [23] is developed to select the clique that has the greatest gain function for the circuits with many inputs and outputs. Since the inputs in a clique must be disjoint to the inputs in other cliques, after a clique with the greatest gain function value is selected, all inputs in the selected clique are removed from all unselected cliques. This is repeated until cliques are selected. Fig. 6 shows a slow LFSR with . (Even though this register has five stages, as will become clear later, .) Two pairs of inputs, ( ) and ( ), are compatible (denoted by , ). Note that the inputs of the flip-flops whose outputs are connected to inputs belonging to a clique are driven by the output of the same flip-flop. In Fig. 6 , the inputs of and are both connected to the LFSR feedback signal (
) and the inputs of and are both connected to . Hence, the output value of is always identical to that of and the output of is always identical to that of . Hence, the LFSR generates only three independent bit patterns even though it has five flip-flops. The flip-flops, and , have two clock inputs, SLCK and CLK, and a control input SEL_CLK. Hence, these dual clocked flips-flops, such as , are more expensive than the normal flip-flops, such as and , which have only a single clock, CLK. If we do not merge inputs, five dual clocked flip-flops are required to drive five inputs at slow speed. However, merging ( ) and ( ) into test signals, only three dual clocked flip-flops and two normal flip-flops are used in the slow LFSR shown in Fig. 6 .
In summary, by merging compatible inputs, we can increase the number of inputs driven by the slow LFSR, which is determined by the length and clock ratio , without any loss of testability. The second advantage of merging inputs is a reduction in the number of the dual clocked flip-flops hence a reduction in area overhead.
Some circuits have few compatible inputs. In those circuits, the number of inputs that can be driven by the slow LFSR will be low, resulting in small reduction in the number of transitions. However, even though two inputs and are not compatible, if merging and does not decrease the testability of the circuit, we can merge and . This idea has not been pursued in this paper but is the subject of ongoing research.
The procedure to design a DS-LFSR can now be described. 1) For an -input CUT, perform fault simulation using patterns generated by an -stage LFSR to determine the length of the test sequence , to be applied. is the test length which either achieves desired fault coverage or after which the sequence does not detect any fault for a predefined number of consecutive patterns. 2) Select a clock ratio . (In all the following experiments, we selected .) 3) Calculate the number of stages in the slow LFSR . 4) Find compatible inputs, combine them into sets, and select sets of inputs to be driven by the slow LFSR using the gain function. The number of inputs to be driven by the normal-speed LFSR is then , where is the total number of inputs in the sets of inputs connected to the slow LFSR.
V. SIMULATION RESULTS
Fault simulations and true value simulations (to count the number of transitions and potential hazards) were performed on ISCAS85 and ISCAS89 benchmark circuits. Table III compares the numbers of transitions caused and transition delay as well as stuck-at fault coverages achieved by the DS-LFSR sequence with those for the original LFSR sequence. (Recall that the term "number of transitions" is being used as a short-form for "number of weighted transitions," since our methodology weights the number of transitions at each line with the line's load capacitance.) The slow clock speed used for all the circuits is 1/4 of the normal clock speed. The column labeled shows the number of patterns in both sequences, shows the number of inputs (sum of primary and state inputs for ISCAS89 and primary inputs for ISCAS85) of the circuit, and SFC and TFC stand for stuck-at and transition delay fault coverage, respectively. The columns under the headings LFSR and DS-LFSR show data for patterns generated by the original LFSR and DS-LFSR, respectively. For both TPGs, the columns entitled # haz. and # trans. show the average number of circuit lines that can potentially have hazards and the average number of transitions in the circuits, per test pattern, during the application of these sequences. The number of transitions is counted under the zero-delay model. For the DS-LFSR, the numbers in parenthesis under this column denote the ratios of the numbers of transitions for the DS-LFSR sequences to those for the original LFSR sequences. The column labeled under the heading DS-LFSR displays the number of test signals that are driven by the slow LFSR, in other words, the number of stages in the slow LFSR. Note that these numbers are smaller than the numbers of inputs driven by the slow LFSR, , for most circuits, because multiple compatible inputs are merged into test signals. The column labeled shows the number of inputs driven by the normal-speed LFSR.
Since is determined solely by values of and , the number of dual clocked flip-flops required to construct a DS-LFSR is low even in circuits that have many inputs. For example, even though 1011 inputs of s38584 are driven at slow speed, only 18 ( ) dual clocked flip-flops are required. This shows that hardware overhead required to implement a DS-LFSR is very low.
The reductions in the average numbers of transitions range from 13% to 70%. As expected, large reductions in the average numbers of transitions occur in the circuits many of whose inputs are driven by the slow LFSR. For example, 11 out of 14 inputs of s1488 are driven by the slow LFSR, where the largest reduction in the average number of transitions occurs. The reductions for ISCAS85 benchmark circuits are not as significant as those for ISCAS89 benchmark circuits. This is primarily due to the fact that most ISCAS85 benchmark circuits do not have many compatible inputs. Furthermore, the lengths of the sequences applied to these circuits are short compared with the number of inputs of these circuits; since these circuits are easily tested with random patterns, long test sequences are not necessary. Since , the number of stages in the slow LFSR, is determined by the test length and clock ratio, the number of inputs that are driven by the slow LFSR is much smaller than that of inputs that are driven by the normal-speed LFSR of the DS-LFSR.
Since the average numbers of transitions are counted under the zero-delay model, we report the average numbers of circuit lines that can possibly have hazards to predict the number of transitions under general-delay model. The reductions in these numbers are greater than those in the average numbers of transitions, for most circuits. This indicates that the reductions in the average numbers of transitions under general-delay model will be at least as great as those under zero-delay model.
For almost all circuits, the stuck-at fault coverages obtained by the DS-LFSR and LFSR are very similar. (s420, for which DS-LFSR achieves higher fault coverage, and s838, for which LFSR coverage is higher, are some of the notable exceptions.) Table III also compares the transition delay fault coverages (TFC) obtained by applying the original sequences with those obtained by applying the DS-LFSR sequences. DS-LFSR sequences achieve higher (by 0.5% or more) fault coverage for 12 circuits and LFSR sequences achieve higher fault coverage for 9 circuits and for remaining 11 circuits, DS-LFSR and LFSR sequences achieve almost the same fault coverage. These results show that the sequences generated by the proposed DS-LFSRs can achieve transition delay fault coverages comparable to those provided by sequences generated by the original LFSRs.
Since the slow LFSR is driven by a slow clock, gates that are driven by only the slow LFSR are not exercised at-speed. However, the transition delay fault coverage reported is accurate, since the response at the outputs of the CUT is captured one normal clock period after the application of each distinct pattern at its inputs.
Consider a scenario where one is interested beyond delay testing and intends to cover unmodeled faults via at-speed testing. First note that a large portion of a fanout cone that is typically driven by the slow LFSR is also driven by the normal LFSR (as illustrated in the circuit shown in Fig. 4 ) and many gates in the portion may still be exercised at-speed. We also can exercise at-speed the gates that are driven by only the slow LFSR (e.g., gates that are in the nonoverlapping portion of cone A of the circuit shown in Fig. 4 ) by clocking the slow LFSR by the normal speed clock for a short period of time between two normal sessions when the slow LFSR is clocked by the slow clock speed of the normal clock speed. However, the period during which the slow LFSR is clocked by the normal clock should be short enough not to risk damaging the circuit under test.
VI. CONCLUSION
A BIST TPG, which can reduce switching activity during test application, is proposed. The reduction in switching activity is achieved by lowering the transition densities at selected inputs. The proposed TPG, called DS-LFSR, consists of two LFSRs, a slow LFSR and a normal-speed LFSR. The slow LFSR is driven by a slow clock whose speed is th that of the normal clock that drives the normal-speed LFSR, thereby, lowering transition densities at inputs driven by the slow LFSR. The DS-LFSR is designed in such a way that the generated patterns are all unique and uniformly distributed to achieve high fault coverage. The empirical analysis using tests demonstrates that the DS-LFSR generated sequences are more uniformly distributed than the sequences generated by single LFSRs with primitive feedback polynomials.
The inputs to be driven by the slow LFSR are selected using a gain function whose value is computed for all inputs. The gain function of an input denotes the sum of load-capacitance weighted transition density values of the circuit lines in its transitive fanout. inputs that have the greatest gain functions are selected to be driven by the slow LFSR. The number of inputs driven by the slow LFSR is further increased by merging multiple compatible inputs into test signals. Merging compatible inputs also reduces the area overhead by reducing the number of dual clocked flip-flops required to implement the slow LFSR.
The 13% to 70% reductions in the numbers of weighted transitions are attained for the ISCAS85 and ISCAS89 benchmark circuits. High reductions in the numbers of weighted transitions are achieved for the circuits that have many compatible inputs. When we extend the definition of compatible inputs to inputs that can be merged without loss of random pattern testability, even higher reductions in the numbers of weighted transitions will be possible.
For most circuits, the stuck-at fault coverages obtained by applying the sequences generated by the proposed DS-LFSRs are equal to or higher than those obtained by applying the sequences generated by LFSRs with primitive feedback polynomials. The additional area overhead due to DS-LFSR is low, since only few (no more than 20) dual-clock flip-flops are required even in circuits with hundreds and even thousands of flip-flops.
The simulation results demonstrate that the DS-LFSR generated sequences typically achieve high transition delay fault coverages as well. If at-speed testing is desired to serendipitously cover unmodeled faults beyond the modeled stuck-at and delay faults, the test application scheme can be somewhat modified to accomplish that goal as well.
