In this paper; we propose a general test application scheme for existing scan-based BIST architectures. The objective is tofirther improve the test quality without inserting additional logic to the Circuit Under Test (CUT). The proposed test scheme divides the entire test process into multiple test sessions. A different number of capture cycles is applied after scanning in a test pattern in each test session to maximize the fault detection for a distinct subset of faults. We present a procedure to find the optimal number of capture cycles following each scan sequence for every fault. Based on this information, the number of test sessions and the number of capture cycles after each scan sequence are determined to maximize the random testability of the CUT We conduct experiments on ISCAS89 benchmark circuits to demonstrate the effectiveness of our approach.
Introduction
Agrawal et al. classified the test scheme of scan-based BIST as either test-per-clock or test-per-scan [ 11. In test-perclock BIST, a test vector is applied and its response is compressed every clock cycle. The examples of test-per-clock BIST are BILBO-based design [2] and circular BIST [3] . In test-per-scan BIST, a test vector is applied and its response is captured into the scan chains only after the test is scanned into the scan chains. The well-known STUMP architecture [4] falls into this category.
There are tradeoffs between these two test application schemes in terms of area overhead, performance degradation, and test application time. The test-per-clock BIST typically has shorter test time but incurs higher area and performance overheads than test-per-scan BIST. Recently, PSBIST has been proposed to incorporate partial-scan and pseudorandom testing into the scan-based BIST [5] . The test application scheme of PSBIST is a combination of test-per-clock BIST and test-per-scan BIST. It results in shorter test time without increasing the area and performance overheads comparing to the conventional test-per-scan BIST. This work is Permission to make digital or hard copies of all or part of this work for personal or classmm use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DAC 99, New Orleans, Louisiana 01999 ACM 1-58113-092-9/99/~..$5.00
Bell Laboratories Lucent Technologies Princeton, NJ 08542 based on the PSBIST architecture and we'll give an overview of the PSBIST in the next section. Unlike PSBIST (or any other test-per-scan BIST) which always applies a single capture cycle after scanning in a new test pattern, we propose to apply multiple capture cycles after each scan sequence. It has been observed that applying a different number of capture cycles per scan can help to detect a different subset of faults. We'll illustrate this concept and discuss the motivation in Section 3. We then propose a general test application scheme -multiple test sessions with a different number of capture cycles per scan in each session -for PSBIST. A procedure of finding the optimal parameters (i.e. the number of test sessions and the number of capture cycles per scan in each session) of the test scheme for a given circuit is described in Section 4. The proposed scheme has been implemented and experimented on ISCAS89 benchmark circuits using an industrial scan-based BIST system, psbt. The results presented in Section 5 illustrate the effectiveness of the proposed approach. Fig. 1 is the schematic of PSBIST which is similar to STUMP [4]. The BIST capability are incorporated into the circuit through the following steps:
PSBIST
1. Replace crucial flip-flops with scan flip-flops, then connect them into the scan chains. For full-scan BIST, all flip-flops are replaced; for partial-scan BIST, the crucial flip-flops are defined as those, if scanned, that will remove all sequential loops with length greater than one.
2.
Optionally, add test points (control points or observa-3 tion points) to increase the random testability of the circuit.
Add a Test Pattern Generator (TPG) add an Output Data Compactor (ODC). The TPG consists of a Linear Feedback Shift Register (LFSR) and a Phase Shifter (PS). The PS is used to avoid the structure dependency among the outputs of TPG [6] . The ODC consists of a Multiple Input Signature Register (MISR) and a Space Compactors (SC) [7] .
During the test, the patterns are continuously applied to the network from the primary inputs and through the scan chains. The responses at the primary outputs are compressed by the MISR every clock cycle which is similar to the testper-clock scheme. However, the responses are captured into the scan chains only after a new test pattern is scanned in, and then compressed by the MISR when they are scanned out. This is similar to the test-per-scan BIST. By doing so, the desired fault coverage can be reached earlier than the conventional test-per-scan approach without extra hardware.
Motivation
Several techniques have been developed to improve the test quality of pseudo random testing. Among them, weighted random testing uses multiple weight sets to provide different signal probability profiles. it can achieve the desired fault coverage with a reasonable test length [8] [9] [ lo][ 111. However, if the number of weight set is large, the storage and extra hardware required for the pattern generator becomes costly. Instead of providing different signal probability profiles purely from the test pattern generator, we found that under the PSBIST architecture, test patterns with different signal probability profiles can be generated by changing the test application scheme. Specifically, using multiple capture cycles after each scan sequence results in different profiles for patterns in the scan flip-flops.
Before further discussion, we define the following terms. 0 A test cycle is one scan cycle followed by one func-A graphical representation of the entire test procedure and one test cycle is shown in Fig.2 . Let's now consider the example in Fig.3 : A, B and C are the primary inputs and their signal probabilities are 0.5 assuming they are driven by an LFSR. PSI is the pseudo-input driven by the scan flip-flop FE. During the scan cycle, the values appeared at PSI are random, therefore, its signal probability is 0.5. If we only apply Figure 3 : Different signal probability profiles at PSI one capture cycle after each scan sequence, the signal probability of PSI remains 0.5 in the functional cycle because its value is derived from the scan chain. As a result, only one signal probability profile is produced using one capture cycle per scan. Notice that at the end of the scan cycle, the signal probability of F is 0.9375. If we use two capture cycles after the scan cycle, the value of F will be latched into FF after the first capture cycle and thus the signal probability of PSI changes to 0.9375 at the second capture cycle. Consequently, two profiles are generated at PSI during the functional cycle.
In this example, the effects of using two capture cyclesper scan are the following. At the second capture cycle: 0 Faults A s p , B,p and Csp become more observable (side input PSI has a greater chance of being a noncontrolling value).
0 Fault Fs/l becomes easier to be activated (signal probability decreases from 09375 to 0.8828125).
0 Fault PSIs/! becomes harder to be activated (signal probability increases from 0.5 to 0.9375).
This result suggests that the optimal number (k) of capture cycles vary from one fault to another. To achieve the best test result, a set of k's needs to be applied during the entire test. Thus, it is intuitive to divide the entire test process into multiple test sessions with a different k in each session.
Using multiple test sessions has shown to be effective for test point insertion [ 121. In [ 121, different subsets of control points are activated in different test sessions and each subset of control points only targets on a subset of faults. Here, instead of adding test points to increase the random testability of the CUT, we explore different test application schemes to be used in different test sessions. Each test session also only targets on a subset of faults.
In addition, it has been observed that under the PSBIST architecture, the test quality can be improved by increasing the observability of the scan flip-flops' data inputs [13] . Using multiple capture cycles per scan increases the chance of latching the fault effects thus increases the possibility of observing them at the primary outputs through functional logic during the functional cycle. 
Multiple test sessions with multiple capture cycles
In this section, we first introduce a testability computation model to find the detection probability of a fault using multiple capture cycles per scan. Based on this information, a procedure is proposed to find the required number of test sessions and the corresponding number of capture cycles per scan in each session for achieving the highest possible test quality.
Testability computation
Under stuck-at fault model, the detection probability Pdj of a fault i can be estimated by one of following two equations:
Pd,lo = C, . 0, , for s-a-0 at signal s;
where C.y and 0, are 1-controllability and observability of signal s, respectively. They can be estimated very efficiently using COP [14] . Note that COP can only be applied to a combinational circuit or a sequential circuit with only self loops in its sequential graph without excessive iterations [ 151. However, the CUT is sequential during the functional cycle, thus computing COP testability measures of the CUT in the functional cycle can be costly because the iterations may not converge quickly. Fortunately, the length of the functional cycle is fixed and relatively small in this application. By dividing one test cycle into multiple phases, we can calculate the testability measures using the original COP method with proper boundary conditions (controllabilities of the inputs and observabilities of the outputs). Unlike PIS and POs, the controllability of a PSI, cntl(PSI), and the observability of a PSO, obs(PSO), are different in different phases. The first phase is the scan cycle. In this phase, the PSI receives random patterns, thus cntl(PS1) is set to 0.5. Meanwhile, the responses at the PSOs are not captured, therefore obs(PS0) is 0. The second phase is the first k-1 capture cycles that the scan flip-flops are in the functional mode. In this phase, the patterns at PSIS are the responses captured at PSOs at the previous clock cycle. Consequently the cntl(PS1) is set to the corresponding cntl(PS0) at the previous clock cycle. Similarly, obs(PS0) at n-th cycle is equal to the corresponding obs(PS1) at (n+l)-th cycle. The last phase corresponds to the last capture cycle. The response captured by the scan flop-flops will eventually be scanned out and compressed by the MISR, therefore, obs(PS0) is 1. Once we know the proper cntl(PS1) and obs(PS0) in each phase, C , and 0, for every signal s can be computed accordingly using the original COP method. k capture cycles, the number of time frames required to be expanded is k+l (1 for the scan cycle and k for the functional cycle). This expanded circuit is combinational thus original COP method can be applied directly. The controllabilities are computed from the inputs of the 1st copy of the circuit (the leftmost one in Fig. 5 ) toward the (k+l)-th copy of the circuit (the rightmost one in Fig. 5) . Similarly, the observabilities are computed backward from the outputs of the (k+l)-th copy of the circuit toward the inputs of the 1st copy.
Finding the optimal number of capture cycles for each fault
After computing the COP testability measures, we can find the detection probability of the fault during the functional In Equation (3), Pdifunc,k is the detection probability of fault i in thefunctional cycle with k capture cycles per scan, and Pd; is the detection probability of fault i at the j-th capture cycle. Pd/ is computed using Equation (1) or (2). By neglecting detecting faults in the scan cycle, P d p I k approximates the detection probability of fault i in one test cycle.
Given two different k's, kl and kz, we can easily compute PdifU)lC>k' and Pdfunc,kz for each fault i. However, it would not be appropriate to compare them directly for deciding the optimal k value for fault i. This is because different k's mean there are different numbers of clock cycles in one test cycle. Therefore, we normalize PdifuncIk by dividing it by the num- 
Determining test scheme
With metric , we propose a procedure to find the optimal number of test sessions and the corresponding number of capture cycles per scan in each session for a circuit as shown in Fig. 6 . Note that the metric for determining the optimal k (i.e. ;+k ) for a fault is derived by neglecting the chance of detecting it in the scan cycle. Thus, only faults that are hard-to-detect during the scan cycle should be considered in this procedure. To identify these faults, we first compute COP measures of the circuit in scan cycle. Faults associated with those signals which have zero observabilities can never be detected in the scan cycle and thus must be targeted in the functional cycle. In addition, we also include faults with very low detection probabilities in the scan cycle, say < lo-", into the fault list. Finally, we record the total number of considered faults (f>. After that, the procedure becomes an iterative process. At each iteration, k is first incremented (initially set to 0) and then the corresponding PdfuncPk is computed. set the threshold to 50%, i.e. n/ f > 0.5. Once the iterative process stops, the best k value of each fault is determined and we say a fault is covered by k l if its best k value is k l . After that, a small set of k's which covers a pre-determined percentage of considered faults is selected. This pre-determined percentage should cover most of the considered faults. In the work, we set it to 90%. Choosing a higher percentage may not be cost-effective because the increased area overhead for the extra test sessions is only devoted to a small number of faults (less than 10% in this case). Finally, the number of test sessions is equal to the number of k's selected; the corresponding numbers of capture cycles are these k's.
Experimental Results
We have implemented the algorithm and conducted experiments on ISCAS89 benchmark circuits. An industrial tool, psb2 l , is used to automatically insert the BIST circuitry. While adding the BIST capability, we set the length of the longest scan chain to 10 and use a 21-stage LFSR as the test pattern generator (19-stage LFSR for s953). The scanned circuits are fault simulated for 500K clock cycles using both single test session (STS) -one capture cycle per scanand multiple test sessions (MTS) schemes. Note that fault simulation is done by issuing commands to the BIST controller so that the circuit tests itself. Therefore, the fault list contains faults not only in the CUT but also in the added BIST circuitry. To ensure a fair fault coverage comparison, we keep the fault lists identical for both STS and MTS schemes by excluding faults associated with the extra circuitry for MTS in the BIST controller.
We first determine the number of test sessions and the corresponding number of capture cycles per scan using our algorithm. The results are shown in Table l Table 2 . The 2nd and 3rd columns are the fault coverages using STS and MTS schemes, respectively. Both schemes run the same number of clock cycles (500K) for BIST. Although the fault coverage improvement varies, using MTS gives us higher fault coverages in all cases. We obtained an average 94.95% fault coverage with MTS scheme as opposed to 93.63% with STS scheme. The area information in Table 2 does not include routing area. The circuit size without BIST capability is in Column 4 in unit of grid using Lucent's 0 . 5~ CMOS standard cell library. The area overhead is divided into scan and other BIST circuitry overheads. Both are normalized by dividing them by the original circuit size. Scan overhead in Column 5 is the cost of converting flip-flops into scan flip-flops. This part is identical for both test schemes. Other BIST circuitry overhead includes BIST controller, TPG, ODC and multiplexers. Because STS and MTS differ only in the BIST controller, this part is slightly different for STS and MTS schemes and shown in Columns 6 and 7, respectively. The last 2 columns show the overall area overhead. The extra area overhead of MTS depends only on the number of test sessions and the maximum number of captures cycles per scan. For larger circuits, the overall area overhead of MTS is similar to that of STS.
The fault coverage curves of some circuits are shown in Fig. 7 . Unlike the relatively smooth curves using STS, using MTS creates "jumps" at the beginning of a new test session. The "magnitude" of the "jump" is significant at the beginning of the 2nd test session and then diminishes afterwards. This diminishing phenomenon may be caused by the greedy nature of determining the k value for each fault. Given two k values, k l and k2, we say k l is better than k2 for fault i if ifkl is greater than ;+kz . are included in the final test application scheme and k l is applied before k2, then faults which are intended to be targeted in the session with kz capture cycles per scan may also likely be detected in the session with k l capture cycles per scan. Table 3 shows comparison of test length using STS and MTS. Here we set the target fault coverage to be the final fault coverage of STS. We then record the earliest clock cycle at which the target fault coverage is reached for both STS and MTS. Columns 2 and 3 show the final fault coverages and corresponding number of clock cycles for STS; Columns 4 and 5 are the results of MTS. With comparable fault coverages, the test length reduction (in %) using MTS is calculated and shown in the last column. The reductions are very significant for many circuits. The average test length reduction is approximately 62.57%. Note that we do not obtain any improvement for s1423 and ~35932. This is because the first test session for both circuits uses one capture cycle per scan which is the same as STS. Moreover, the target Table 4 validate this point. MTS always obtains higher fault coverages than STS does with afewer number of test points. The difference in the number of test points becomes much more significant for larger circuits. This is because when the circuit reaches a high fault coverage level, the undetected faults tends to be scattered over the entire circuit. Thus, it usually requires a separate test point to target each one of them. If we can detect extra faults with the proposed test application scheme, then the number of required test points can be drastically reduced. Furthermore, it reduces the chance of causing performance degradation due to the test points.
Conclusion
A general test application scheme is proposed to improve the test quality of the scan-based BIST. Instead of capturing the responses into the scan chain once every scan cycle, capturing the responses multiple times can increase the chance of detecting a subset of faults. We introduce a testability computation model for finding the detection probability of a fault in the functional cycle. A metric is developed to find the optimal number of capture cycles per scan for each fault. We propose the use of multiple test sessions with multiple capture cycles per scan for better test quality. A procedure is used to determine the best test scheme for a given circuit. Experimental results show significantly improvement on both fault coverage and test length. The experimental results also indicates that the proposed test application scheme and test point insertion are complementary -A higher fault coverage is achieved with fewer test points. The difference in the number of test points is more significant for larger circuits.
The timing of switching test sessions greatly influences the test length and this issue is currently under investigation.
