I. INTRODUCTION
T EST application time, test power, and test data volume of scan testing can be very large. It is essential to propose an effective method that can reduce test application time, test data volume, and test power consumption simultaneously. Test power reduction can be completed by test-compression techniques [1] , [3] , [12] , test sequence or scan flip-flop ordering [2] , new scan architectures [2] , [7] , [10] , [13] , [14] , and transition propagation reduction techniques [6] , [13] , and low-power built-in self-test (BIST) generator design.
Techniques were presented to minimize test power by maximizing clock disables. References [6] , [13] inserted extra logic into the circuit to block transition propagation from the scan chain into the combinational part. These methods constrained test power inside the scan chain. Test energy is proportional to the total number of transitions in the whole process of test application.
Test energy in a circuit (except the clock tree) to apply a test vector includes four separate parts: 1) scan-in; 2) scan-out; 3) capturing; and 4) transferring. vector to the next test vector, which occurs at the root of a scan tree or the first scan flip-flop of a scan chain. Test energy for the scan forest [14] is still similar to scan design with a scan chain, where a transition at a scan flip-flop can be propagated globally into all scan flip-flops in a scan tree and the whole combinational circuits. We would like to limit transition propagation within a very small part of the scan flipflops during scan shifts, which effectively blocks propagation of transitions at the scan flip-flops to the combinational part of the circuit for scan shift cycles. The test data stored in the scan flipflops in the first stage are then propagated to the scan flip-flops in the second stage easily.
II. TWO-STAGE SCAN ARCHITECTURE
A two-stage scan architecture is proposed in this section to achieve a scan testing scheme with very low test energy, test application cost and test data volume. The proposed scan architecture consists of two separate parts: The multiple scan chains in the first stage and the scan flip-flop groups driven by the scan flip-flops of the multiple scan chains in the second stage. Each scan flip-flop in the multiple scan chains and the scan flip-flop group driven by it are in the same group. All the scan flip-flops in the same group have no common successor in the combinational part of the circuit. We say two scan flip-flops and have a common successor if there exists a gate , such that there exist combinational paths from to and from to , respectively. There is no new reconvergent fanout if any pair of scan flip-flops in the same group do not have any common combinational successor as stated above.
Each scan flip-flop in the multiple scan chains and the scan flip-flops driven by it are assigned the same values for all test vectors. All scan-in signals of the multiple scan chains are driven by primary inputs. Two scan-out signals can be connected to an XOR gate if the following condition is met: Let and be two scan chains. The scan-out signals of both scan chains can be connected with an XOR gate without generating any aliasing faults if each pair of scan flip-flops and and and do not have any common predecessor in the combinational part of the circuit, respectively.
There exists a lot of flexibility when multiple scan chains are connected with the same XOR gate. As for each test vector, each scan flip-flop in the multiple scan chains and the scan flip-flop group driven by it are assigned the same value. Any one of the scan flip-flops in the scan flip-flop group can be exchanged with the one that drives it in order to satisfy the condition. Each scan flip-flop group in the second stage and the scan flip-flop in the multiple scan chains in the first stage that drives it share the same pseudo primary input in the test circuit. The XOR trees are added into the test circuit to do fault simulation using HOPE [10] after the test vectors have been generated.
The two-stage scan architecture is given in Fig. 1 . The boxes shown in Fig 
III. TEST APPLICATION SCHEME
The test application scheme is also partitioned into two steps. 1) A test vector is shifted into the multiple scan chains.
2) The vector is propagated to the scan flip-flops in the second stage, which is followed by a functional cycle to receive test responses. 
In (1), is the length of the multiple scan chains, and # is the number of test vectors for the circuit with the two-stage scan architecture. The second term in (1) represents the number of clock cycles required to shift out the test responses of the last test vector. The proposed two-stage scan architecture can control clock tree test energy very well. As for full scan design with a single scan chain, the number of transitions # at the clock tree is
In (2), is the number of scan flip-flops, and # is the number of test vectors for the fully scanned circuit with a single scan chain. The second term in the right hand of (2) 
where and # in (4) are the height of the scan forest and the number of test vectors for the scan forest designed circuit, respectively.
IV. EXPERIMENTAL RESULTS
The test generator ATALANTA [9] is used to generate tests, and the HOPE [9] fault simulator is utilized to do fault simulation in the experimental results. All results are obtained with a Sun Blade 2000 workstation using C language in the UNIX environment. Test application cost reduction ratio (TA) and the clock tree test energy reduction ratio (CTE) are obtained as follows:
where # # , and # are obtained by (1), (2), (3), and (4), respectively. And # , and are the number of scan flip-flops in the circuit, the number of test vectors for the fully scanned circuit with a single scan chain, and the CTE of the two-stage scan corresponding to the original scan forest. The node transition count (NTC) is reported as quantitative measure for test energy in the brief. Let load capacitance for each unit be the number of fanouts. The node transition count in scan flip-flops is 2 for and transitions, and 6 for and transitions for all node . We have A transition occurs at a scan flip-flop when a scan flip-flop held a value 0 (or 1) at the previous cycle and it gets a value 1 (or 0) at the current cycle. The transition is propagated to the combinational successors until the transition disappears or a scan flip-flop reaches.
The two-stage scan architecture has been implemented. Results and comparison with the multiple clock disabling (MCD) [5] are also presented, where the MCD is also used a clock disabling scheme. The proposed method is also compared with the multiple scan chains whose number of scan chains is equal to the number of primary inputs in the circuit. TA and test energy reduction ratio (TE) of the proposed two-stage scan architecture corresponding to the multiple scan chains are also presented in Table I . As shown in Table I , TA, TE, TE , CPU, CPU , and area represent the test application cost, TEs with respect to full scan with a single scan chain, TE corresponding to the original scan forest, CPU time to generate tests, CPU time to construct the two-stage scan architecture, and the area overhead, respectively. The number of scan chains in the first stage is equal to the number of primary inputs in the circuit. The area overhead is calculated based on the cell library class.lib in the Synopsys system. Table I , all circuits designed by the two-stage scan architecture and full scan with a single-scan chain obtain complete fault coverage. Table II presents comparison of the proposed method with three most recent methods BO [1] , RBO [11] , and FDR [3] on test data compression and TAs. Table III 
As shown in

