Abstract-We dynamically monitor per cycle scan activity to speed up the scan clock for low activity cycles without exceeding the specified peak power budget. The activity monitor is implemented as on-chip hardware. Two models, one for test sets with peak activity factor of 1 and the other for test sets with peak activity factor lower than 1, have been proposed. In test sets with peak activity factors of 1, the test time reduction accomplished depends upon an average activity factor of αin. For low αin, about 50% test time reduction is analytically shown. With moderate activity, αin = 0.5, simulated test data gives about 25% test time reduction for ITC02 benchmarks. BIST with dynamic clock showed about 19% test time reduction for the largest ISCAS89 circuits in which the hardware activity monitor and scan clock control required about 2-3% hardware overhead. In test sets with peak activity factors lower than 1, the test time reduction depends on an input activity factor of αin and an output activity factor of αout. For low αin and high αout, a test time reduction of about 50% is analytically shown.
I. INTRODUCTION Scan testing [1] spends a large fraction of the test time for loading (scan-in) and unloading (scan-out) test data in flip-flops that are chained as shift registers. During this process, random combinational logic activity can produce large unintentional power consumption resulting in power supply noise and heating. If this consumption is higher than that of the normal functional operation for which the circuit is designed the test can cause yield loss [2] . Therefore, scan testing uses a slower speed than the normal operation. The scan clock frequency is determined based on the maximum power consumption the circuit under test can withstand. The power P dissipated at a node is given by [2] :
where C is the capacitance of the node, V is supply voltage, f is clock frequency and α is a node activity factor.
α = N umber of transitions per clock cycle (2)
The activity factor α for a clock signal is 2 because there are two (rising and falling) transitions per cycle. For a combinational node, α ranges between 0 (no transition) and 1 (a toggle every clock cycle). In the worst case, the frequency of the scan clock can be based on the maximum activity, i.e., α = 1, so that the test power can never exceed the power limit. Therefore,
where f test is the scan clock frequency and P budget is the maximum power dissipation the circuit can withstand without malfunctioning. Thus,
In general, the worst case assumption (α = 1.0) can be modified for any value. Although all vectors are scanned in and scanned out at this frequency, many may not cause the maximum activity. It is possible to scan in these vectors at higher clock frequencies without exceeding the power budget. When the number of transitions in the circuit reduces to a 1 i of the maximum,
From (3) and (5),
The capacitance and the voltage are constant for a node and so the power is proportional to the product of activity and frequency. Since the circuit can withstand a power P budget , the frequency can be multiplied by i, and the power dissipated in every cycle can still be kept within the allowed limit. Girard [2] defines peak power as the highest energy consumed during one clock period divided by the clock period and the average power as the total energy consumed during test divided by the test time. Since the power must never exceed P budget in any clock cycle, both peak power and average power will be below P budget in spite of the increased shift frequency. Also, instantaneous peak power [2] is consumed right after the application of the clock edge. This power depends on the vectors scanned in and is unaffected by changes in the scan clock frequency. Hence, it can be reduced only by changing the test vectors. In this work we assume that the vectors conform to the instantaneous peak power requirement. During scan tests, gates are either driven by outputs of the scan flip-flops or by primary inputs. Primary inputs do not change during scan in and scan out. Thus, scan chain activity is a direct measure of the test power [3] and by monitoring and controlling this activity, we can speed up the test as well as limit the test power. That is the idea presented in this paper.
Section II discusses the existing techniques used to optimize test time. Section III describes the past work done to reduce test time in circuits for which the peak activity factor of the test set is assumed to be 1. Section IV discusses the implementation proposed in this paper for circuits where the peak activity factor of the test set is found to be lower than 1. Section V gives a mathematical analysis of the scheme. Section VI explains the experimental results obtained. Section VII discusses the conclusion of this work.
II. EXISTING TECHNIQUES
Many test time reduction methods for scan circuits use compression. In a simple compression technique, the number of scan chains is increased reducing the number of flip-flops per chain. This reduces the time for shifting the input vector bits through scan flip-flops resulting in an overall reduction in test time. However, compression techniques require alterations in the design and may also suffer from linear dependencies.
One compression technique keeps the functionality of the ATE intact by moving the decompression task to the circuit under test [4] . Another technique [5] uses a dynamically reconfigurable scan tree that applies a part of the test sequence in scan tree mode and the other part in single scan mode. Reference [6] describes a decompression hardware scheme for test pattern compression. References [6] and [7] use compression algorithms with concurrent application of compaction and compression. Reference [8] implements a compression technique with embedded deterministic test logic on chip to provide vectors for the internal scan chains. Reference [9] employs alternating run-length codes for test data compression.
Reference [10] employs a two phase testing strategy where the first phase is a scan-less phase for easy-to-detect faults and the second phase is a scan phase for hard to detect faults. Scan is performed only until all effective test bits are shifted to the right position and until all fault-affected response bits are shifted out. Reference [11] uses genetic algorithms to obtain compact test sets, which limit the scan operations. References [12] reduces test application time by generating a test for a sequential circuit using combinational test generation and sequential test generation adaptively. Reference [13] proposes a strategy to identify flip-flops to be removed from scan chains to increase the observability of the circuit so that faults activated during scan cycles can be observed at a primary output. The technique proposed in this paper can be applied to any scan circuitry, and can be used in addition to many of the methods mentioned above.
III. PAST WORK Previous work was done to implement dynamic scan clock control in BIST circuits for which the peak activity factor of the test vectors was assumed to be 1 and was proposed in [14] , [15] . The implementation is described in this section.
A. BIST circuit with a single scan chain
We add flip-flops at primary inputs and outputs as shown in Figure 1 and connect all flip-flops into a single scan chain. A linear feedback shift register (LFSR), a signature analysis register (SAR) and a BIST controller are added to the circuit to implement the test per scan BIST architecture [13] . BIST vectors are scanned in and combinational outputs are captured through scan flip-flops. Application of a vector includes scanning in LFSR bits into flip-flops, normal mode capture and scan out (overlapped with next scan in) into SAR.
The proposed dynamic frequency control is shown in Figure 2 . The shaded parts of the circuit are not used for this implementation. As test vectors are scanned in, the activity (or inactivity) in the scan chain is monitored at the first flipflop of the chain. The entering transitions ripple through other flip-flops in subsequent cycles. This activity does not change if there are inversions in the chain. When a transition passes through an inverting flip-flop, a rising transition becomes a falling transition and vice-versa, leaving the number of transitions unchanged.
An XNOR gate between the input and output of the first flip-flop monitors the activity. The output of the XNOR gate is 0 when a transition enters the scan chain and is 1 when a non-transition enters. The XNOR output is fed to a counter, which counts up for each 1, i.e., a non-transition. The counter is set to 0 at the start of every scan in sequence. According to (6) , the scan frequency can be raised as the number of non-transitions entering the scan chain increases. This is accomplished through frequency control and frequency divider blocks in Figure 2 . We assume that the response captured from the combinational circuit for the previous vector has a transition density of 1, i.e., the scan chain is filled with alternating 1s and 0s before scan-in begins. This pessimistic worst-case assumption guarantees that the power budget shall not be exceeded. Correspondingly, the scan in of each vector begins with the slowest frequency, f test , permitted by the power budget for α = 1. The f test clock is the lowest frequency generated by the frequency divider that divides the frequency of an externally supplied fast tester clock. The frequency control circuit monitors the state of the counter. As the count goes up it lowers the frequency division ratio of the clock divider in several steps.
The counter states at which the clock is sped up can be found by simulation, which establishes correlation between the circuit activity and scan chain activity. If each transition in the scan chain causes a large number of transitions in the circuit, power consumption reaches large values for low scan chain transition numbers. Thus, a large number of scan chain nontransitions should be counted before the scan clock frequency is stepped up. Similarly, if a transition in the scan chain has a small effect on the circuit activity, then only a few nontransitions in the scan chain are sufficient to increase the scan clock frequency. The reset generator in Figure 2 applies a reset signal to the counter, frequency control block and frequency divider at the positive edge of the scan enable signal, i.e., at the start of scan-in for every combinational vector. Since the frequency divider cannot generate an f/1 (divide by 1) clock, a multiplexer selects either the frequency divider output or the fastest clock.
Let us consider a circuit with 1000 flip-flops. If the slowest scan clock period based on the power budget is 80ns and we raise the frequency in 8 steps, then a modulo 125 (1000/8) counter will be implemented. Assuming the worst-case activity by the captured states, every scan-in is started with the 80ns clock and counter set to 0. The count goes up by 1 at every clock in which a non-transition enters the scan chain. When the count reaches 125, the counter is reset and the frequency divider generates a 70ns clock to scan-in the subsequent bits. The counter may again count up to 125 and the clock period would be reduced to 60ns. This process repeats until all 1000 bits are scanned in. Thus, if the input were a series of 1000 1s, the first 125 bits are scanned in at a clock of period 80ns, the second 125 bits at 70ns, until the last 125 bits are scanned in using a clock period 10ns. If the scan-in bits were a series of alternating 0s and 1s, the counter would never count up since there are no non-transitions entering the scan chain and hence the entire scan-in will use the 80ns clock. Notice that due to the worst-case assumption we start each scan-in with slowest clock and so the activity monitor only raises the clock rate without ever having to lower it during the same scan-in.
Clearly, a bit stream with fewer transitions will be scanned in faster than one with many transitions. Don't cares in deterministic ATPG patterns can be filled in such that the number of transitions is minimum [16] . Also, techniques to generate BIST patterns with low transition densities [17] may be useful. This technique would perform well for such patterns.
B. BIST circuit with multiple scan chains
When the circuit has multiple scan chains, the activity of all chains must be monitored. XNOR gates are added across the input and output of the first flip-flop in every scan chain. Outputs of XNOR gates are supplied to a parallel counter [18] that counts up by the number of 1s at its input. The rest of the circuitry remains unaltered and still resembles the unshaded part of Figure 2 . When the count reaches a certain threshold value, the frequency is stepped up and the counter is reset. Except for the use of the parallel counter the control scheme is similar to that in the unshaded portion of Figure 2 .
IV. IMPLEMENTATION The implementation of previous section works well for circuits with test vectors having peak activity factors of 1. It has low area overhead and we need not simulate test vectors to estimate the peak activity factor. However, it is not suitable when the scan clock frequency is computed based on a peak activity factor (α peak ) that is lower than 1. In such cases, it becomes necessary to modify that model into a generalized version [15] as is proposed in this paper.
A. BIST circuit with single scan chain and α peak < 1
The slowest scan clock frequency is chosen using Eq. 1 using values of α peak and peak power limit. The number of transitions in the scan chain is continuously monitored at the input and output of the scan chain. Figure 2 shows the implementation of the technique for BIST circuits with single scan chain and peak activity factors lesser than 1. The activity monitor comprises of an xnor gate connected between the input and output of the first flip-flop, and an xnor gate connected between the input and output of the last flip-flop. The former monitors the number of nontransitions entering the scan chain and the latter monitors the number of non-transitions leaving the scan chain. An updown counter keeps track of the number of non-transitions in the scan chain. Thus, the former xnor drives the count up signal and the latter drives the count down signal of the updown counter. The number of non-transitions in the scan chain during any cycle is the difference between that entering the scan chain and that leaving the scan chain.
Since power is proportional to the activity in the scan chain, test power is lower when the number of transitions in the scan chain is lower or, in other words, when the number of nontransitions in the scan chain is higher. As discussed earlier, from (6), the scan frequency can be increased when the number of non-transitions in the scan chain increases.
The up-down counter is reset to 0 at the start of scanin. When a non-transition enters the scan chain, the counter counts up and when a non-transition leaves the scan chain, the counter counts down. When the counter counts up to a certain threshold value, it signals the frequency control block to increase the frequency of scan clock and the counter is reset to 0. Similarly, when the counter counts down to 0, the frequency control block is signaled to lower the frequency of scan clock and is reset to the threshold value. Thus, whenever the number of non-transitions in the scan chain increases, the frequency is increased and when the number reduces, the frequency is decreased. The rest of the circuitry functions the same as described earlier.
At the start of scan-in of a vector, the frequency control block is reset such that the frequency of scan clock is the slowest possible. This is based on the assumption that the activity factor of the vector captured in the scan chain before the start of scan-in equals α peak . The scan clock frequency is never increased beyond the highest or decreased below the lowest possible frequency regardless of the signal from the counter.
It can be observed that this implementation can be easily modified for circuits with activity factors equal to 1, by removing the flip-flop at the end of the scan chain and tying the count down signal of the up-down counter to 0.
B. BIST circuit with multiple scan chains and α peak < 1
When the circuit has multiple scan chains, the activity of all chains must be monitored. XNOR gates are added across the input and output of the first flip-flop and across the input and output of the last flip-flop in every scan chain. The outputs of the XNOR gates at the inputs of the scan chains are fed to the count up inputs of a parallel counter [18] which counts up by the number of 1s at its count up inputs. Similarly, the outputs of the XNOR gates at the end of the scan chains are fed to the count down inputs of the parallel counter [18] which counts down by the number of 1s at its count down inputs. The rest of the circuitry remains unaltered and still resembles Figure 2 . When the count reaches a certain threshold value, the frequency is stepped up and the counter is reset. Except for the use of the parallel counter the control scheme is similar to that in Figure 2 .
V. ANALYSIS Let N be the number of flip-flops, k be the peak activity factor of the test vectors (k = α peak ), α in be the activity factor of the scan-in vector, α out be the activity factor of the vector captured in the scan chain prior to scan-in, A be the number of non-transitions that enter the scan chain per cycle, v be the number of frequencies and T be the time period corresponding to the fastest clock.
The period of the fastest scan clock is v times shorter than the slowest clock. Therefore, the period of the slowest clock is given by vT . If the vectors were scanned in at the slowest clock, the total scan-in time per vector would be N vT .
This analysis considers uniform alpha in and α out . Thus, if α in > α out the number of non-transitions in the scan chain never decreases and hence there will be no change in scan clock frequency. However, if α in < α out , the number of non-transitions in the scan chain increases and the scan clock frequency is continuously increased. The scan-in of test vectors is started at the slowest possible clock period which equals vT and is then continuously increased.
The number of transitions in the scan chain can range from 0 to kN . Therefore the number of non-transitions in the scan chain can range from N − kN to N − 0 i.e., from N (1 − k) to N . In order to simplify the values, N (1 − k) is subtracted from both limits. Thus, the number of non-transitions can be monitored between N − N (1 − k) i.e. between 0 and kN .
Since the maximum number of non-transitions encountered by the activity monitor is kN , a scan clock frequency is specified for every 
The total scan-in time per combinational vector is the sum of all clock periods used. The test time at each frequency is given by the product of the number of cycles run at that frequency and the clock period. Total time per vector is given by
where v is usually chosen as a power of 2 because we can design a divide by 2 n frequency divider with n flip-flops. Time per vector if a single speed is used is N vT , and hence, the reduction in test time is given by
If N and v were chosen as powers of 2, Eq. 9 reduces to 
Time per vector if a single speed is used is N vT , and
The number of non-transitions in the scan chain in any cycle equals the difference between the number of non-transitions entering and leaving the scan chain. Non-transitions enter the scan chain at a rate of (1 − α in ) and leave at the rate of (1 − α out ). The non-transition density, A is therefore given by A = α out − α in . Thus, the reduction in test time is given by
where k = 1 for the model where the peak activity factor is assumed to be 1. In this model, the scan chain is assumed to be filled with transitions prior to scan-in and hence, the scan-in vector is assumed to be the sole contributor of non-transitions in the scan chain is . Thus, non-transitions enter the scan chain at a rate of (1 − α in ) and hence, A = 1 − α in . The reduction in test time for this model is given by
A C program was written to generate random vectors for a circuit with 1000 flip-flops. The test time reduction for these vectors was estimated, and compared with the values obtained from the formula. Table I shows the test time reduction versus number of frequencies for an activity factor of 0.5. Table II shows the variation of test time reduction with activity factor when the number of frequencies is 8. Both tables compare the test times estimated for random vectors (column 2), with those obtained from the accurate formula (10) (column 3) and from the approximate formula (14) (column 4). Tables I and II show that for a chosen number of frequencies, vectors with lower activity achieve higher reduction in test time. The test time reduction increases when the number of frequencies increases. The test time initially reduces rapidly for 8 frequencies and after that the reduction is gradual.
VI. EXPERIMENTAL RESULTS

A. Circuits with α peak = 1
In verilog netlists of the ISCAS89 benchmark circuits flipflops were added at all primary inputs and primary outputs. All flip-flops were converted to scan types and chained together. Thus, the number of flip-flops in the circuit would be the sum of the number of primary inputs, number of primary outputs and number of D-type flip-flops. A 23-bit linear feedback shift register (LFSR), a 23-bit signature analysis register (SAR), and a test-per-scan BIST controller were implemented [19] , [20] . A single bit output of the LFSR supplied the scan input and the scan output was fed into the SAR. The sequential circuit along with the BIST circuitry was treated as the core circuit for test time and area analysis. The counter, frequency control circuitry, and frequency divider circuitry for dynamic frequency control were implemented as shown in the unshaded portions of Figure 2 . The number of frequencies for each circuit was chosen according to the size of the circuit. The number of random patterns required to achieve sufficient fault coverage was chosen for each circuit from [21] and was incorporated into the BIST controller.
ModelSim from MentorGraphics was used to simulate the circuits with and without the dynamic frequency control circuitry. The time required for test application was recorded in each case. DesignCompiler, a synthesis tool from Synopsys, was used to analyze the area of the circuits with and without the dynamic frequency control circuitry. Table III shows the results. The number of frequencies chosen for each circuit is shown in column 3. The percentage reduction in test time with respect to the test time for the core circuit is shown in column 4 and the percentage increase in area with respect to the area of the core circuit is shown in column 5. At any node, the capacitance and the voltage are constant. From (1), the power dissipated at any node is proportional to the product of activity and frequency. Thus, the activity per unit time is a direct measure of power dissipated in the circuit. Therefore, an analysis to find activity per unit time was performed on the s386 benchmark circuit. The Synopsys power analysis tool, PrimeTime PX, was used. The activity per unit time in every cycle was found for the circuit for a scan vector with an activity factor of 1. The peak among these values was set as the limit for activity per unit time. The values of activity per unit time of the circuit in every cycle were found for a vector with an activity factor of 0.25 Figure 3 . Notably, the activity per unit time in every cycle is closer to the peak limit when dynamic clock method is used. Also, the peak limit is never exceeded in both methods. A reduction of 11.25% was observed when the dynamic clock method was used.
The results for multiple scan chain implementation would be very similar to that obtained for single scan chain since the activity of the circuit will be very similar in both single and multiple chain implementations. However, there would be a marginal increase in area due to the additional XNOR gates at the first flip-flop of every scan chain and also due to the use of a parallel counter as opposed to the simple counter used for the single scan chain.
These results for reduction in test time conform to the theoretical results given in Tables I and II . Two trends are clearly observed in Table III . As circuit size increases, the area overhead drops and test time reduction improves. These circuits are not very large from today's standard and we can expect better results as predicted by the analysis.
To estimate the test time reduction for larger circuits, an accurate mathematical analysis was applied to ITC02 circuits. Test vectors with different activity factors (α in ≈ 0, α in = 0.5 and α in ≈ 1) were generated. Test vectors were generated randomly to achieve α in = 0.5. To generate test vectors with low activity factors (α in ≈ 0), one transition was randomly placed per test vector. Test vectors with high activity factors (α in ≈ 1) were generated to resemble clock signals.
The test time reduction with the proposed implementation (using test-per-scan BIST model) was computed for the generated test vectors. Table IV shows the results. The number of scan flip-flops in column 2 is the sum of number of inputs, number of outputs and number of flip-flops. The number of frequencies for circuits are shown in column 3. The test time reductions achieved for best, moderate and worst case activity factors are shown in columns 4, 5 and 6, respectively. A simulation tool was not used for these circuits due to the large sizes of the circuits. However, it is important to note that any simulation tool would produce the same results since the input activity at the scan chain was closely monitored during estimation of test time. Evidently, more test time reduction can be achieved in larger circuits. The reduction in test time varies from 0% for patterns causing very high activity to 50% for patterns with almost no activity.
When external tests are used and an ATPG tool generates them, the vectors may have very few care bits. The don't care bits can be filled in using heuristics [22] to minimize scan transitions. Then, a dynamic control of scan clock will provide a large reduction in test time. This is illustrated using the ISCAS89 benchmark s38584. The Synopsys ATPG tool TetraMAX was used to generate two sets of vectors, a set of 961 vectors with no don't care bits and another set of 14, 196 vectors with don't care bits. The vector set without don't cares was found to have an activity factor around 0.5 and the vector set with don't care bits was found to have a low activity factor around 0.01. The don't care bits in the second set were filled using a minimum transition heuristic [22] . Reductions In another typical scenario, a test set may initially contain few (say, 10%) high activity (α in = 0.5) vectors. These resemble fully-specified random vectors and achieve about 70-75% fault coverage. The latter 90% vectors then detect about 20-25% hard-to-detect faults and contain many don't cares, which may be filled in for reduced (α in ≤ 0.05) activity. The adoptive test will be potentially beneficial in such cases.
B. Circuits with α peak < 1
In order to estimate the reduction in scan-in time achieved with the model proposed for dynamic scan clock frequency control in circuits with peak activity factors lower than 1, the t512505 ITC02 benchmark circuit was chosen. This circuit is large enough to employ 512 different scan clock frequencies because it has 76714 scan flip-flops.
The pattern sets of various large benchmark circuits were studied to analyze trends in peak activity factors. The mean value of peak activity factor (α peak ) in these pattern sets was found to be around 0.57 and the standard deviation (σ) was around 0.025. The value of mean + 3σ was found to be around 0.65. This indicates that the probability that the peak activity factor of the test patterns of a circuit would lie below 0.65 is 99.7%. Therefore, the peak activity factor for the t512505 circuit was set at 0.65. The pattern sets generated by TetraMAX ATPG for large benchmark circuits were analyzed and it was found that the peak activity factor in these test vectors never exceeded 0.65. The value of 0.65 for peak activity factor can be used only for large circuits with several hundred or more flip-flops. For smaller circuits with just few tens of flip-flops, the peak activity factor was found to be 1.
Accurate mathematical analysis was used to estimate the reduction in scan-in time achieved in the t512505 circuit when α peak = 0.65 and 512 steps of frequencies were chosen. The activity factor of the captured vector was assumed to be 0.65 and the activity was monitored at the input and output of the scan chain. Test vectors with different activity factors ranging from 0 to 0.65 were generated and the test time reduction obtained using the proposed implementation was determined for these vectors. The results are listed in Table V . It shows the variation of scan-in time reduction with variations in α in and α out . It can be seen from Table V that when the activity factor of the scan-out vector (α in ) is greater than or equal to the activity factor of the captured vector (α out ), there is no reduction in scan-in time. The frequency is increased only when the number of non-transitions in the scan chain increases. However, when α in > α out the number of nontransitions (as counted by the counter) never increases and hence the scan-in is carried out at the starting frequency which is the frequency employed when dynamic scan clock frequency control is not implemented. Thus, the reduction in scan-in time is 0% in such cases. Table V indicates that scan-in time reduction is higher for lower values of α in and for higher values of α out . This can be explained from the perspective of number of non-transitions in the scan chain. If α in is low, the number of non-transitions entering the scan chain is high and if α out is high, the number of non-transitions leaving the scan chain is low. Thus, the net number of non-transitions in the scan chain is high giving a higher reduction in scan-in time.
VII. CONCLUSION Reduction of test application time in power-constrained testing by adoptively adjusting the scan frequency to the circuit activity is demonstrated. On-chip hardware, whose overhead reduces as the circuit becomes large, provides the adoptive control. The technique is particularly beneficial when the peak circuit activity during test is very high but the average activity is quite low.
