Abstract
Introduction
Along with the scale and speed of integrated circuits continue to increase in the nanometer era, built-in-self-test approaches that can provide high fault coverage with low area overhead and performance penalty are needed. In the literature, various BIST techniques yielding high fault coverage were proposed. Most of the techniques embedded the pre-computed test data T D in an on-chip BIST circuit. Since T D was generated by an ATPG (Automatic Test Pattern Generator) tool, the fault coverage is deterministic. This is why such BIST techniques are called deterministic BIST. The circuit that regenerates T D can be broadly classified to two categories: ROM-based (Read Only Memory) and PRPG-based (Pseudo Random Pattern Generator).
A straightforward way to use ROM is to store T D in ROM. However, it is impractical in the cases of huge test data volume as the area overhead of ROM is intolerable. [1] proposed to compress T D with Huffman coding or Comma coding firstly, and then stored the encoded test data T E in ROM. During self-testing process, T E was decoded by a finite-state machine (FSM). Because the data volume of T E was much smaller than that of T D , the area overhead of ROM and FSM was less than the area overhead of directly storing T D in ROM.
PRPG-based deterministic BIST is more popular than ROM-based deterministic BIST. Among the PRPG-based deterministic BIST techniques, there are two subcategories, according to whether the PRPG is additional circuitry or not.
When using additional circuitry such as LFSR (Linear Feedback Shift Register), counter and CA (Cellular Automation) as PRPG, extra control circuitry is added to alter the pseudo random test data to deterministic test data. Approaches using LFSR include LFSR-reseeding [2] , Bit-flipping [3] , Bit-fixing [4] , deterministic BIST with Reconfigurable Interconnection Network [5] , EDT [6] and DBIST [7] , while [8] using both folding counter and LFSR. To address the memory consumption and computing time issues of Bit-flipping and Bit-fixing, which were usually cubical even exponential, [9] proposed a BDD-based (Binary Decision Diagram) algorithm to alter the pseudo random test data. To reduce the number of seeds loaded in the LFSR or CA, [10] optimized the reseeding procedure by ordering and encoding seeds. For the above mentioned schemes whose test stimulus (test response) needs to be serially shifted in (out) scan chains, they are test-per-scan BIST.
CBIST (Circular BIST) [11] and CSTP (Circular Self Test Path) [12] belong to the other subcategory that utilizes the on-chip memory elements such as flip-flops and latches to generate test data. As shown in Figure 1(a) , the memory elements, primary inputs and primary outputs were firstly replaced by CSTP scan cells [12] or CBIST scan cells [11] shown in Figure 1(b) , and then the scan cells were connected to form a long circular chain. In the BIST mode, a scan cell was fed by the XOR (exclusive-OR) of its normal functional input and the output of the scan cell that preceded it in the chain. The dashed line shows the circular chain in Figure 1 (a). For CBIST and CSTP, a test stimulus was generated by a previous test response. Because the test stimulus is generated every test cycle, CBIST and CSTP is test-per-clock BIST. The circular chain is a feedback shift register which acts both as the response compactor and the random pattern generator. [13] proposed a state skipping technique for CBIST, which broke out limit cycles and test pattern correlations, and transformed present test response to a deterministic test stimulus. Since the insertion of state skipping logic is hill-climbing, CBIST with state skipping can systematically provide high fault coverage.
Although the area overhead of the state skipping logic in [13] was low as shown in its experimental results, it will deteriorate when the scale of the circuit-under-test increases. In this paper, we propose a deterministic CSTP (DCSTP) structure to obtain high fault coverage with predictable low area overhead. With the proposed structure, not only the fault coverage, but also the area overhead is predictable. And the procedure to implement the DCSTP is not time-consuming, so it can be applied to large circuit. This paper is organized as follows. Section 2 presents preliminary concepts of CBIST and state skipping, and then we theoretically analyze the area overhead of the technique proposed in [13] . In Section 3, the proposed DCSTP structure along with the state skipping insertion procedure is described, and then its area overhead and computing complexity are analyzed. Experimental results on ISCAS'89 benchmark circuits are reported in Section 4. Finally, Section 5 concludes the paper. 
Preliminary
CBIST and CSTP have the advantages of shorter test time, lower area overhead, and simpler control logic than test-per-scan BIST. However, there are three structural constraints that limit the fault coverage of CBIST and CSTP [14] : 1) Limit cycling 2) Test pattern correlation 3) Random-pattern-resistant faults Approaches such as flip-flop reordering [14] , initial state selection [15] [16] can target one or two of the three constraints, but the remained constraints still limit the fault coverage. In order to reliably yield high fault coverage, [13] proposed a state skipping design named CBIST to overcome the three constrains systematically.
Figure 2 Circular chain with state skipping logic
CBIST with State Skipping
The main idea of [13] is to add a small amount of "state skipping logic" that causes the circular chain to skip from a state that can not detect any fault to a state which can detect new faults. Meanwhile, the state skipping logic can also break the correlation between test patterns and jump to states that detect random-pattern-resistant faults. Figure 2 shows an example of a circular chain with state skipping logic. The grayed XOR gates are inserted for state skipping, the AND gate with two inputs Q 3 and Q 4 is the decoding logic that activates the state skipping XORs. For normal CBIST without inserting state skipping logic, when the circular chain reaches state 1101, its next state is 0101. However, in order to break limit cycling, break test pattern correlation or go to a state that can detect new random-pattern-resistant faults, the next state is expected 0011. Therefore, the two-input AND gate and the two grayed XOR gates are inserted. When the circular chain reaches the state 1101, the output of the AND gate is 1, which makes 0011 as the skip-to state. And 0101 is the skipped state.
To find the longest state sequence that precedes the skipped state, [13] describes the relationship between the preceding states and the skipped state with a Boolean conflict matrix. Then the inputs of the AND gate are obtained by solving the minimum column cover problem. Unfortunately, the minimum column cover problem is NP-complete. Therefore, heuristic technique is proposed in [13] to achieve high fault coverage. However, the area overhead is relatively high. We will theoretically analyze the area overhead of this technique in the following subsection.
Area Overhead Analysis
Assume there are N flip-flops in the circuit-under-test, state skipping happens M times. The skipped state is denoted as S, and its direct preceding state is P. To reach the state P, we need to run BIST L+1 cycles.
The area overhead is increased by the decoding logic and the inserted XOR gates. The decoding logic consists of multiple-input AND gates. As the multiple-input AND gate must output 1 at the state P but output 0 at any states preceding P, the AND gate is the product of α literals, where
A literal is a Boolean variable or its complement. For M times of state skipping, totally M number of AND gates is needed.
For the area overhead of the inserted XORs, since one time of state skipping needs to insert one XOR gate for every flip-flop that will change its state, β XOR gates will be inserted in the circular chain, here β∈ [1,N] . Therefore, during the test procedure, there are totally γ XORs inserted,
Therefore, the area overhead can be evaluated as Mα+γ, which is in the interval of,
3. Deterministic CSTP
Deterministic CSTP Structure
Although CBIST with state skipping [13] can achieve high fault coverage, its area overhead is relatively high. Hence we propose DCSTP to reduce the area overhead while keep the high fault coverage. DCSTP also embeds the pre-computed test data in the BIST logic, which is similar to other deterministic BIST techniques. DCSTP generates pseudo random test patterns with DCSTP chain. When new faults can not be detected, the state of the DCSTP chain jumps from the pseudo random state to a deterministic state. We design a decoding logic for state skipping. Figure 3 illustrates the DCSTP structure.
In Figure 3 , the decoding logic part consists of a counter and a state decoder. An example is given to illustrate how the DCSTP works. The circuit is shown in Figure 3 , and the state sequence of CSTP and DCSTP is shown in Figure 4 . The first column is the index of the state sequence, and the second column is the state sequence running in CSTP. The circuit of CSTP will run into a cycle at the fourth pattern, and can not detect any more faults. But DCSTP can activate state skipping logic to jump out of the cycle and then detect more faults. For the third state, the output of the counter is 011, so the decoding logic outputs 11 to let the circuit jump out of the cycle to detect new faults. Figure 5 shows an implement of the DCSTP cell with the decoding logic. The operation mode of the DCSTP cell with decoding logic is described in Table 1 . The "X" entry means the value of the control signal is don't-cared. We can see in DCSTP mode, when the decoding logic asserts Sel i , the value of D i is i-1 Q ○ + Z i , hence the state sequence is altered by the decoding logic.
The DCSTP chain is tested by shifting 0011 sequence through the DCSTP chain in Shift mode. The Counter is used only in the Normal DCSTP and Transitional DCSTP mode, not in the System mode. It doesn't influence the function of the circuit. So we just ensure that the Counter works well in its function during BIST operation including both the Normal DCSTP and Transitional DCSTP mode. If the counter does not function correctly, the State Decoder would decode the state wrongly, and would change the state sequence of the DCSTP chain. The State Decoder and the ORs fed by the outputs of the State Decoder can be test by itself during BIST operation. If there is a stuck-at fault either at the input or output of the State Decoder or ORs, it would change the normal state sequence of the DCSTP chain. After final state of the DCSTP chain is shifted out, the fault will be detected. 
Compared with conventional CSTP or circular BIST, the DCSTP does not add any delay in the functional path, because the state skipping logic is added in the DCSTP chain, not in the functional path.
Next we will demonstrate DCSTP reduces area overhead by the decoding logic design.
Area Overhead Analysis
Let U denotes the total number of states the circuit transited in the DCSTP procedure, to record these U states, we use a v-bit counter, where . To decode each state skipping address given by the counter, a multiple-input AND gate is needed, which is the production of v literals. Therefore, the area overhead can be evaluated as c+Mv +η+μ, which is in the interval of,
where c is the area overhead of the v-bit counter.
Compare formula (1) with (2), we can see the interval of (2) is more tight than (1). Therefore, given a circuit design, the area overhead of DCSTP is easier to predicable than CBIST with state skipping. Note (1) has a smaller lower bound than (2) , which means CBIST with state skipping can achieve less area overhead than DCSPT in some cases. But for the larger scale circuits, the upper bound of (1) will be greater than (2), which means DCSTP will predictably result in less area overhead along with increasing circuit scale. Figure 6 illustrates an example of the area overhead intervals given in (1) and (2) . Assume N is less than L in formula (1). State skipping happens 40 times. The counter is 16-bit, which is equivalent to 112 literals when synthesized with a 0.18um CMOS library. We can see along with increasing N, the upper bound of DCSTP area overhead is much less than that of CBIST with state skipping. 
Procedure of state selection
From above mentioned theoretical analysis, the value of M and v impacts the area overhead. Both M and v is decided by the selection of skipped state and the skip-to state. In this subsection, the procedure of state selection is presented.
The main idea is to do fault simulation until there are w continuous states that do not detect any new faults, and then one state in the w continuous states is chosen to skip to a new state that can detect new faults. The decoding logic and a XOR gate are added after the fault simulation. The procedure is described below in detail.
1)
Initialize the circuit. Firstly, the circuit-under-test is converted to CSTP form. Then the CSTP cells are initialized by shifting an initial seed in the CSTP chain. The counter is also initialized to 0. Go to 2). The CPU time to implement the DCSTP is the sum of the time to do fault simulation in the BIST operation and the time to generate the state decoding logic. DCSTP and [13] conduct one-pass fault simulation in BIST operation, so they consume approximate time for fault simulation. But DCSTP need less time to generate the state decoding logic. This is because the complexity of generating decoding logic of DCSTP is linear and the generation of the decoding logic of [13] is NP-complete.
2)

Experimental Results
We have conducted experiments on several ISCAS'89 benchmark circuits. The circuits are synthesized with a 0.18um digital CMOS standard cell library. Area overhead was estimated by DesignCompiler from Synopsys. The pre-computed test data for stuck-at fault was generated by ATLANTA [17] . All of the circuits were initialized by a random pattern. And the same size of patterns was applied to the CSTP-designed and DCSTP-designed circuit. After the desired fault coverage was achieved, the DCSTP chain scanned out the compacted response. And then the response was compared with the response of the fault-free circuit. Table 2 shows the comparison of our DCSTP with the approach in [13] . The first column lists the name of the circuits. The "Pattern length" columns indicate the number of BIST patterns for these test technique. The "Coverage (%)" columns are the fault coverage of these test techniques. The fault coverage is for detectable faults. Both CBIST with state skipping and DBIST achieved 100% fault coverage. For most of the smaller circuits, the pattern length of CBIST with state skipping were shorter than that of DCSTP, but for larger circuits, from s526 to s13207, DCSTP needs less patterns to achieve 100% fault coverage.
The "Overhead (%)" column under DCSTP shows the area overhead that DCSTP more than CSTP. As the overhead is DCSTP vs. CSTP, the increased overhead of DCSTP is resulted by the decoding logic. For small scale circuits, the area overhead of DCSTP was higher than CBIST with state skipping. It shows our approach is not good for the small circuit, because the overhead of the counter in the decoding logic will be very high in the small circuit. However, from s5378 to s13207, the area overhead of DCSTP was less than or equal to CBIST with state skipping. Moreover, for the larger ISCAS'89 circuits such as s13207, s15850, s35932, s38417 and s38584, DCSTP steadily required less area overhead.
Conclusions
In this paper we presented a deterministic CSTP structure that provides the capability to provide high fault coverage with reasonable area overhead. A new decoding logic was proposed to reduce the area overhead. As the procedure to implement the DCSTP is not time-consuming, the DCSTP can be applied to large circuit. Experimental results on the ISCAS'89 benchmark circuits showed that the approach provided high fault coverage with reasonable area overhead. __: the data is not provided in [13] 
Acknowledgement
