In recent VLSIs, small-delay defects, which are hard to detect by traditional delay fault testing, can bring about serious issues such as short lifetime. To detect small-delay defects, on-chip delay measurement which measures the delay time of paths in the circuit under test (CUT) was proposed. However, this approach incurs high test cost because it uses scan design, which brings about long test application time due to scan shift operation. Our solution is a test application time reduction method for testing using the on-chip path delay measurement. The testing with on-chip path delay measurement does not require capture operations, unlike the conventional delay testing. Specifically, FFs keep the transition pattern of the test pattern pair sensitizing a path under measurement (PUM) (denoted as p) even after the measurement of p. The proposed method uses this characteristic. The proposed method reduces scan shift time and test data volume using test pattern merging. Evaluation results on ISCAS89 benchmark circuits indicate that the proposed method reduces the test application time by 6.89∼62.67% and test data volume by 46.39∼74.86%.
Introduction
In recent years, semiconductor device scaling has greatly improved performance and circuit integration density. With the increasing speed of integrated circuits, violations of the performance specifications are becoming a major factor affecting the product-quality level [1] . Timing related defects created by manufacturing process-related problems, such as resistive opens and shorts, metal mouse bites, via voids, will become more common [2] . Delay defects that degrade performance and cause timing related failures are emerging as a major problem in nanometer technologies [3] . Several delay fault models and delay test methods have been proposed. Transition fault and path delay fault are two prevalent fault models.
A small-delay defect is a delay defect with defect size not large enough to cause a timing failure under the system clock cycle. Usually, it is caused by resistive short, resistive open, or resistive via. Since they might escape detection during traditional Pass-Fail delay fault testing with functional clock, small-delay defects have become a significant problem and it is essential to detect such defects during manufacturing tests [4] , [5] . Firstly, a timing failure might be triggered in the circuit during functional application caused by the increment of small delay on paths with small timing slacks [6] . The second reason is that small-delay defects can become a reliability issue because the defect might be worsened during subsequent aging in the field and cause a failure of the device [7] . In addition, in order to improve the yield and reduce the time-to-market of VLSIs, design-related failures and performance limiters need to be identified and rectified during the first silicon debug [8] . However, the cost of testing and debugging for delay defects in modern highperformance chips by using external high-speed automatic test equipment (ATE) is very high. Moreover, it is inherently limited by the accuracy of the provided test clock frequency with the external ATE, which can be affected by factors such as parasitic capacitance, resistance of probe and tester skew [9] . On-chip path delay time measurement is one of the best alternatives to solve these problems. By measuring delay time of the path under measurement (PUM), not only the gross and small-delay faults can be detected but also the amount of timing violation in the failing paths can be obtained under certain environment conditions [10] , [11] . However, on-chip delay measurement incurs high test cost because it uses scan design, which brings about long test application time due to scan shift operation. Thus, a method reducing test application time is strongly required. In on-chip path delay measurement, the capture operation is unnecessary unlike the conventional delay testing. Thus, FFs keep the transition pattern (denoted as v m,1 ) of the test pattern pair sensitizing a PUM p even after the measurement of p. If v m,1 can be used as the initial pattern (denoted as v n,0 ) of another test pattern pair (v n which sensitizes another path p'), we can sensitize p' by just shifting 1 bit of the transition pattern (under LOS test). The proposed method uses this characteristic. This paper presents a method reduces scan shift time and test data volume by using scan-based test pattern merging. We can also reduce the switching activity induced by the launch pulse. As a result, this also reduces excessive IR-drop in scan testing avoiding test-induced yield loss.
The rest of the paper is organized as follows. Section 2 introduces some related works for small-delay testing. Section 3 describes some terminologies related to scan-based delay testing and the on-chip delay measurement. Section 4 explains methods to reduce test application time and test data volume. Section 5 evaluates the introduced method. Finally, Sect. 6 concludes this paper.
Copyright c 2014 The Institute of Electronics, Information and Communication Engineers

Related Work
Several on-chip methods have been proposed for delay testing and debugging. Methods using fast on-chip clock were proposed to detect delay faults [12] , [13] . However, a smalldelay defect occurring on a short path with a larger slack could escape the detection. To detect small-delay faults, methods with delay fault testing using a ring oscillator have been proposed [14] - [16] . In these, the PUM is made a part of ring oscillator, delay of the target path can be translated into oscillation period. However, the timing resolution is low. Some time-to-voltage converter (TVC) based schemes have been proposed [17] - [19] . The delay of the PUM is converted to a certain voltage, by comparing the converted voltage with the reference voltage, delay of the PUM can be got. These techniques give good timing resolution. However, the calibration is difficult.
Some on-chip path delay time measurement methods using embedded delay measurement were proposed [20] - [26] . In these, delay time of paths are measured. To screen small-delay fault, small-delay defect screening with criteria based on statistical analysis is used [5] . In this technique, small-delay defects are detected as outlier. The delay distributions for each path are generated in manufacturing tests. If a path delay time is beyond a specified time such as the three-sigma limit (users can set the specified time freely taking into consideration for the trade-off between yield and dependability), even if it is not beyond the system clock cycle, we regard it as a faulty path. Datta et al. proposed a modified vernier delay line (VDL) technique for path delay measurement [20] . High-resolution delay measurement capability can be achieved by using this method. The paper [21] presented modified boundary scan cells in which a time-to-digital converter (TDC) is embedded. Tsai et al. proposed a built-in delay measurement (BIDM) circuit consisting of coarse and fine blocks, which is an extension of the modified VDL technique. A built-in-self delay testing methodology based on BIDM and self-calibration methods can be developed [22] . The measurement used in [23] utilizes an area efficient method of the modified VDL. The feature of this method is delay range of each stage. The delay ranges increase by a factor of two gradually, which reduces the required stages. Thus, without decreasing delay measurement resolution, this method expands delay measurement range much easier with significantly less hardware overhead. The authors group proposed a measurement system which is different from the VDL method to improve the accuracy of the measured value [24] . This method measures two paths, a path which includes the PUM and the other path whose length is almost equal to the redundant line of the path which is measured previously. The measured delay of the former path minus that of the latter path gives the delay of the PUM. This method is able to give a precise measurement. A method with smaller execution time and circuit area has been proposed [25] . In addition, a fault coverage improvement method has been proposed in [26] . However, testing using on-chip delay measurements incurs high test cost because it uses scan design, which brings about long test application time due to scan shift operation. Thus, a method reducing test application time is strongly required.
Preliminaries
This section introduces some terminologies related to scanbased delay testing and the on-chip delay measurement. 
(2) Transition fault and path delay fault
Transition fault and path delay fault are two prevalent fault models. The transition fault model targets each gate output in the design for a slow-to-rise and slow-to-fall delay fault while the path delay fault model targets the cumulative delay through the entire list of gates in a pre-defined path. Transition fault model is more widely used than path delay fault because it tests for at-speed failures at all nets in the design and the total fault list is equal to twice the number of nets. The transition fault is detected if a transition occurs at the fault site and if a sensitized path extends from the fault site to a primary output.
In this paper, we try to detect increases of gate and line delays caused by resistive faults to reduce early-life failure, the transition fault model is adopted. The small-delay fault coverage is equal to the transition fault coverage.
(3) LOS and LOC
For circuits using scan, there are two approaches to test delay faults: launch off shift (LOS) (or referred as skewedload), and launch off capture (LOC) (or referred as broadside). In the LOS method, the transition pattern is generated by one-bit shift of the initial pattern. In the LOC method, the transition pattern is obtained from the circuit response to the initial pattern. Note that in this paper we focus only on the LOS test. Figure 1 shows the architecture of the on-chip path delay measurement. The on-chip delay measurement system measures the delay of each path including a PUM, whose input and output are start and stop, respectively. The delay measurement system consists of delay value measurement circuit (DVMC), stop signal generator (SSG, which is an 
On-Chip Delay Measurement Method
N-to-1 multiplexer), and circuit under test (CUT).
The clock line is directly connected to start of the DVMC; the DVMC starts the measurement when a positive transition is sent to start. The SSG detects the transition on the input of a designated flip-flop (FF) and sends the transition to stop of the DVMC, by setting the corresponding control data of SSG. The input line clk is the clock signal of the CUT. The line clk i is the clock line of FF i . The input of FF i is connected to ssg out through ssg in i and the SSG. The system measures a PUM including one clock line clk j , a path p i , and some redundant lines ssg in i and ssg out . For example, after the measurement of the path p = clk j -p issg in i -ssg out , by comparing the measured delay time with the expected delay time, small-delay defects on clk j and p i can be detected. In this paper, we insert one DVMC circuit in one CUT. Thus, only one path is selected to be measured for each test.
The architecture of the embedded delay measurement circuit DVMC is shown in Fig. 2 . The DVMC is a ring oscillator based TDC, it measures the time difference between the transition signals sent to start line and stop line. The transition of start triggers the measurement. The TRC (an n-bit up counter) counts the round cycles of the oscillation. Synchronizing the transition of stop, the FFs capture the states on the output of corresponding NOT gates and the TRC. From these values, the delay value is calculated.
In one circuit, the set of paths under measurement is denoted by P (includes paths p 0 , p 1 , . . . , and the SSG, thus, the DVMC stops the measurement. 4. We get the measurement result through scan out of the DVMC. Consequently, the path delay of the PUM is calculated from the read out values. 5. Delete p i from P. If P = ∅, stop the test; else go to Step 1.
The Proposed Method to Reduce Test Application Time and Test Data Volume for On-Chip Delay Measurement
This section proposes a method which reduces test application time and test data volume of the on-chip delay measurement by using scan-based test pattern merging. The LOS operation of the on-chip delay measurement is introduced in Sect. Figure 3 shows the LOS operation of the on-chip delay measurement. When using the on-chip delay measurement to detect small-delay defects on path p (from FF i to FF j ), we set the SSG to detect the transition on the input of FF j . At the moment the transition reaches the input D of FF j , the transition is sent to stop of the DVMC through the SSG. Then, the DVMC stops the measurement. In this process, the capture operation of LOS test is unnecessary unlike the conventional LOS operation. Thus, FFs keep the transition pattern v m,1 of the test pattern pair sensitizing p even after the measurement of p. If v m,1 can be used as the initial pattern v n,0 of another test pattern pair (v n which sensitizes another path p'), we can sensitize p' by just shifting 1 bit of the transition pattern. Therefore, we can reduce the test application time. We can also reduce the switching activity induced by the capture pulse. Generally, an effective approach for avoiding test-induced malfunction and reducing IR-drop is to reduce switching activity induced by the capture pulse [4] . As a result, the proposed method also reduces excessive IRdrop in scan testing avoiding test-induced yield loss.
LOS Operation of The On-Chip Delay Measurement
Scan-Based Test Pattern Merging
The scan-based test pattern merging technique is based on merging compatible patterns using scan shift operation. Two Fig. 4 (a) , if v m,1 and v n,0 are compatible, we say that the two pattern pairs are compatible. As shown in Fig. 4 (b) , if we can make v m,1 and v n,0 compatible by shifting r bit of v m,1 , and v m,1 and v n,0 are not compatible without shifting, then we say that the two pattern pairs are compatible with r bit shift (in Fig. 4 (b) r = 2) . If v m,1 and v n,0 are compatible, we do not need to scan-in all bits in v n,0 , and thus we need to scan-in only the last one bit in v n,1 for v n . If v m,1 and v n,0 are compatible with r bit shift, we need to scan-in the r bits in v n,0 as well as the last one. In sum, we can reduce test data for v n to 1 or (r+1) bits. Beside the 1 or (r+1) bits, we need to know the control data of shift times (denoted as S which includes s 0 , s 1 , . . . , s (m−1) for controlling the shift time of test pattern pairs v 0 , v 1 , . . . , v (m−1) ).
Procedure for Test Application Time and Test Data
Volume Reduction
In this subsection, we describe the procedure for test application time and test data volume reduction. Specifically, we introduce the generation of the test data, the corresponding control data of SSG and the control data of shift times. For illustration, an example of test compaction is shown in Fig. 5 . We assume that the CUT contains three PUMs: p 0 , p 1 , p 2 . Paths p 0 , p 1 end in FF i and p 2 ends in At first we need to decide the first path to be sensitized, here we choose p 0 . Because p 0 ends in FF i , the control data of SSG d 0 is set to 0 (00). The control data of shift time s 0 is 7 (111), which equals to the length of the scan chain. After the sensitization of p 0 , the data stored in the FFs are v 0,1 = 1101XXX. Next, we try to sensitize p 2 . The reason why we do not select p 1 is to reduce the test application time (in greedy way). Here, v 0 and v 2 are compatible, and v 0 and v 1 are compatible with 3 bit shift. This means that sensitizing p 1 requires shifting of 4=3+1 bits while sensitizing p 0 requires only 1 bit shift. Here, the control data of SSG d 1 is set to 1 (01), and the control data of shift time s 1 is set to 0 (000). At last we sensitize p 1 , when the control data of SSG d 2 is set to 0 (00), and the control data of shift time s 2 is set to 2 (010) (v 2 and v 1 are compatible with 2 bit shift). After all the steps, we get the compacted test data V = (X11011X011X1), the corresponding D (the control data of SSG) and S (the control data of shift times). The procedure for reducing test application time and test data volume is given as follows. 
Test Application Time and Test Data Volume
The test application time T is the sum of the scan shift time of test data T S and the measurement result read out time T R .
Here, we use time normalized as clock cycles. By considering the implementation of LOS test, we have the scan shift time of test data:
where n is the number of the test pattern pairs. Let T D be the read out time of the DVMC, and the measurement result read out time appears as:
Therefore, the test application time is:
The test data volume is the sum of data volume of V(V V ),
S (V S ) and D(V D ). By considering the implementation of LOS test, we have the data volume of test data:
When we implement the scan shift of LOS, the maximum shift number is equal to the length of scan chain. Thus, the data volume of the shift time and the data volume of the control data of SSG are:
where N is the length of the scan chain. Therefore, the test data volume is:
Experimental Result
In this section, we provide experimental results of the proposed test compaction method. In this evaluation, we use ISCAS 89 benchmark circuits. The initial test sets are constructed from the LOS test sets of [26] . The test set detects all the detectable transition faults under the single-path sensitization condition. A register is inserted to each primary input, and arbitrary values can be assigned to each register with scan-in operation. We use the ring oscillator based DVMC which has 14bit registers. Thus, we need 14 clock cycles to read out the result of the DVMC. First, we evaluate the test application time reducing effect of the proposed procedure. Next, we evaluate the data volume compaction effect of the proposed procedure. In the proposed procedure, we get a test pattern pair v m from the original test set (the LOS test set of [26] to the new test set and delete it from the original test set. We repeat the above steps until the original test set is empty. This paper just proposed a test compaction method by optimizing the test pattern. Note that the proposed method does not change the area overhead of the conventional on-chip delay measurement. The area overhead of the on-chip delay measurement compared to conventional scan design are 12∼20% for some large ISCAS89 circuits [25] , [26] . Table 1 shows the test application time of ISCAS89 benchmark circuits by using the conventional method [26] and the proposed method. Table 2 compares the test application time of methods with/without reordering test patterns. Here, test application time is calculated by using formula (3). In these Tables, the column circuit shows the circuit name. The column CNV shows the results of the conventional method in 10 4 clock cycles. The column PRO shows the results of the method using the proposed procedure1 (in 10 4 clock cycles). The column COM shows the results of the method using scan based pattern merging without reordering test patterns (in 10 4 clock cycles). The columns T S and T R show the scan shift time of test data and the measurement result read out time, respectively. The column T shows the test application time. The column T RED shows the percentage of test application time reduction of each circuit using our method. From Table 1 we observe that the proposed method can effectively reduce the test application time very significantly in most of the benchmark circuits. Specifically, the test application time is reduced by 6.89∼62.67% after the proposed compaction procedure. From Table 2 , we notice that we reduce the test application time by 2.14∼49.01% using scan based pattern merging without reordering test patterns. Our proposed procedure has better effect in test application time compaction than the method using only scan based pattern merging without reordering test patterns. Table 3 shows the test data volume of ISCAS89 benchmark circuits by using the conventional method [26] and the proposed method. Table 4 compares the test data volume for methods with/without reordering test patterns. Here, test data volume is calculated by using formula (6) . In these tables, the column circuit shows the circuit name. The column CNV shows the results of the conventional method in 10 4 bits. The column PRO shows the results of the method using the proposed procedure1 (in 10 4 bits). The column COM shows the results of the method using scan based pattern merging without reordering test patterns (in 10 4 bits). The columns V S , V D and V V show the data volume of the shift time, the data volume of the control data of SSG and the data volume of test patterns V. The column V total shows the test data volume of each circuit. The column V RED shows the percentage of test data volume reduction of each circuit by using our method. From Table 3 we observe that the proposed method can effectively reduce the test data volume very significantly in most of the benchmark circuits. Specifically, the test data volume is reduced by 46.39∼74.86% after the proposed compaction procedure. Table 4 shows that we reduce the test data volume by 42.18∼68.65% using scan based pattern merging without reordering test patterns.
The proposed procedure has better effect in test data volume compaction than the method using only scan based pattern merging without reordering test patterns.
From the results of these Tables, we note that the proposed method can effectively reduce the test application time and test data volume very significantly in most of the benchmark circuits. Moreover, it has better effect on large circuits. The reason is that large circuits have more test patterns. We have more choice to minimize the shift times. Then, we have better compaction effect on large circuits.
Conclusion
This paper proposed a test compaction method for on-chip delay measurements. To reduce test application time and test data volume of the on-chip delay measurement, this paper presented a method that uses scan-based test pattern merger. Experimental results on ISCAS89 benchmark circuits showed that the proposed method reduced the test application time by 6.89∼62.67% and the test data volume by 46.39∼74.86%. In this work, we proposed a method to reduce test application time and test data volume by using only scan-based test pattern merging. By analyzing the results in Table 1 , we noticed that the measurement result read out time occupied a considerable part of the total test time. In our future work, we will consider a new method to reduce the measurement result read out time. In this work, we used an in-house ATPG. As our future work, we also try to use a commercial ATPG for efficient test generation.
