# A Novel Capture-Safety Checking Method for Multi-Clock Designs and Accuracy Evaluation with Delay Capture Circuits

Kohei Miyase<sup>1</sup>, Masao Aso<sup>2</sup>, Ryou Ootsuka<sup>2</sup>, Xiaoqing Wen<sup>1</sup>, Hiroshi Furukawa<sup>2</sup>, Yuta Yamato<sup>3</sup>, Kazunari Enokimoto<sup>1</sup>, and Seiji Kajihara<sup>1</sup>

<sup>1</sup> Kyushu Institute of Technology, Iizuka 820-8502, Japan
<sup>2</sup> Renesas Micro Systems Co., Ltd., Kamimashiki 861-2202, Japan
<sup>3</sup> Fukuoka Industry Science Technology Foundation, Fukuoka 814-0001, Japan

Abstract- Excessive capture power in at-speed scan testing may cause yield loss due to timing failures. Although reducing the number of clock domains that capture test responses simultaneously is a practical and scalable solution for reducing capture power, no available capturesafety checking metric can assess its effect in an accurateenough manner, especially when multiple clock domains capture test responses in a short period of time. This paper proposes a novel CLEAR (CLock-Edge-Arrival-Relationbased) capture-safety checking method that, for the first time, takes clock edge arrival times for different clock domains into consideration. The accuracy and usefulness of the proposed method have been clearly demonstrated by simulation-based evaluation with the largest ITC'99 benchmark circuit as well as real-chip-based evaluation with an industrial chip embedded with on-chip delay measurement circuitry.

# **I. INTRODUCTION**

With ever-shrinking feature sizes and ever-increasing clocking frequencies, timing-related defects have become dominant. This has made at-speed scan testing mandatory, and it is usually performed by using the *LOC (Launch-On-Capture)* [1] clocking scheme. As shown in Fig. 1, the shift is for loading test stimuli and unloading test responses through scan chains, while the capture has two clock cycles for launching test transitions (switching activity) and capturing test responses to detect delay defects.



Figure 1. Capture-Safety Checking for LOC Scheme.

Despite its importance, at-speed scan testing suffers from various problems due to the gap between functional power and test power [1]. It is well-known that shift power and capture power during at-speed scan testing are much higher than functional power due to high fault / block parallelism

and non-functional clocking used during testing for higher test efficiency.

Since a large number of shift cycles are used in shift mode, the accumulative impact of high shift power often manifests itself as excessive heat, causing damage to packages or dies. Fortunately, several effective shift power reducing solutions (notably scan segmentation [2]) have been proposed and successfully applied in industry for solving the shift power problem.

On the other hand, the accumulative impact of capture power is negligible. However, its instantaneous impact in at-speed testing may cause timing failures [1,3]. As illustrated in Fig. 1, if excessive switching activity occurs in launch cycle  $C_1$ , excessive IR-drop may occur, and unexpected test responses may be captured in  $C_2$  (even though the circuit-under-test is defect-free and functionally operational).

Particularly in the testing of high-speed devices, even a slightly increased delay due to excessive IR-drop (*launch-induced IR-drop*) may cause *capture malfunction*, resulting in test-induced yield loss [1]. In order to avoid capture malfunction, various techniques have been proposed for reducing capture power, based on such approaches as DFT, ATPG, and test vector modification [4-13]. However, even after capture power reduction, some test vectors may still cause capture malfunction.

Once the low-capture-power techniques are conducted, it is necessary to check whether a test vector may cause capture malfunction. Any test vectors causing capture malfunction must not be included in a test set, since such test vectors should be discarded. Furthermore, adequate checking for fault detection oriented test vectors helps to generate a small number of test vectors with shorter computing time [7]. Therefore, *capture-safety checking* in at-speed testing needs to be conducted regardless of fault model.

Several methods for capture-safety checking [8-16] have previously been proposed. So far, only the capture-safety checking method proposed in [13] takes transition time relations among logic gates into consideration. Earlier transitions of logic gates cause excessive IR-drop at other logic gates that have later transitions. Although method [13] can achieve more accurate results than the other methods, it does not consider transition time relations among more than one clock domain. In practice, multiple clock domains (if not all) are tested simultaneously in order to minimize test application time in practice. It has been reported that the impact of excessive IR-drop impact can be suppressed by reducing the number of clock domains capturing test responses simultaneously [12, 17]. However, more than one clock domain must be simultaneously activated based on the functional dependency among domains. Furthermore, less parallelism causes a significant increase in test application time.

When multiple clock domains are simultaneously tested, there are certain differences in the clock edge arrival times of flip-flops in different domains [18]. Therefore, if more than one domain captures the responses in a short period of time, transition time relations must be considered among different domains as well as transition time relation capture-safety checking [13]. Obviously, the impact of IRdrop varies depending on the clock edge arrival time at each clock domain.

In this paper, we propose the *CLEAR* (*CLock-Edge-Arrival-Relation-based*) capture-safety checking method. The CLEAR method considers the clock edge arrival time at flip-flops in different clock domains. Based on certain differences in the clock edge arrival times, the proposed method represents a practical and accurate method of capture-safety checking. In experimental results, we evaluate the proposed method with the largest ITC'99 benchmark circuit as well as manufactured chip embedded with an on-chip delay capture circuit that can capture the delay increase caused by capture malfunction.

The rest of the paper is organized as follows: Section II describes the background; the proposed CLEAR (Clock-Edge-Arrival-Relation-Based) capture-safety checking is described in Sections III; the on-chip delay capture circuit design which is also proposed in this paper is described in Section IV; the experimental results are shown in Section V; and our conclusions are summarized in Section VI.

# **II. BACKGROUND**

#### A. Previous Capture-Safety Checking Methods

Previous capture-safety checking methods can be classified as considering capture power from spatial and temporal perspectives.

Within the spatial perspective, capture-safety checking methods can be further classified as (S1) *global* (*in which the switching activity of the whole circuit is checked*) [9-11], (S2) *regional* (*in which the switching activity in specific regions is checked*) [8], (S3) *structural-long-path-based* (*in which the switching activity around structurally long paths is checked*) [14], and (S4) *sensitized-long-path-based* (*in which the switching activity around sensitized long paths is checked*) [15].

From the temporal perspective, capture-safety checking methods can be further classified as (T1) *total* (*in which the switching activity of the entire launch cycle is checked*) [9-11], (T2) *instantaneous* (*in which the peak or* 

instantaneous switching activity is checked) [8], (T3) *transition-window-based* (in which switching activity in the transition window is checked) [16] and (T4) *transition-time-relation-based* (in which switching activity related to transition time is checked) [13].

Within capture-safety checking a test vector causing capture malfunction is called *capture-risky*. A test vector not causing capture malfunction is called *capture-safe*. Accurate capture-safety checking methods help to avoid testing with capture-risky vectors which should be correctly discarded and generate fewer test vectors in a shorter amount of computing time than low-capture-power ATPGs without capture-safety checking [7].

### **B.** TTR Capture-Safety Checking Method

Capture-safety checking method [13] first identifies sensitized long paths (i.e., paths whose length exceeding a preset threshold) and their neighboring node (logic gate) sets with nodes that are located close together and share a power supply net. Then, it assesses the delay increase of each sensitized long path using the *TTR* (*transition-time-relation-based*) capture-safety checking method, which is based on those transitions that occur earlier than any other transition at each on-path node.

In Fig 2, when the TTR capture-safety checking method evaluates the IR-drop impact of gate  $g_5$ , only early transitions on neighboring gates  $g_1$ ,  $g_2$ ,  $g_4$ ,  $g_7$ , and  $g_8$  are considered. In fact, transition times depend on the length of the sensitized path that which has neighboring gates.



Figure 2. An Example of Transition Time Relation.

Although capture-safety checking [13] considers the transition time relation in a single clock domain, multiple (if not all) clock domains are tested simultaneously in order to minimize test application time in practice. In this paper, we propose a capture-safety checking method for multiclock designs. Experimental results show that the transition time relation among different clock domains should be considered in capture-safety checking.

#### **III. CLEAR CAPTURE-SAFETY CHECKING**

In this section, we present a novel capture-safety checking, called a *CLEAR* (*Clock-Edge-Arrival-Relation-based*) capture-safety checking method. The CLEAR capture-safety checking appropriately takes the IR-drop impact of the transition time relations into consideration among multiple clock domains in capture-safety checking.

# A. Clock Edge Arrival Relation for Multiple Domains

A circuit usually has multiple clock domains, in which clock edges may reach flip-flops of different clock domains at different times [18]. Once CTS (clock tree synthesis) is finished, relatively accurate results can be obtained by STA (static timing analysis) tools.

The proposed CLEAR capture-safety checking utilizes the fact that different clock domains have different clock arrival times, in order to efficiently take the transition time relations into consideration for capture-safety checking. Fig. 3 shows the concept of clock edge arrival relations in each clock domain. We define two types of domains as follows:

**Definition 1:** A domain including a long sensitized path for a test vector is called a *victim domain*, denoted by **Domain**<sub>vic</sub>. Note that the long sensitized path is the suspicious path where the excessive delay occurs, resulting in capture malfunction.

**Definition 2:** A domain causing the impact of IR-drop on the path in a victim domain is called an *aggressor domain*, denoted by **Domain**<sub>agg</sub>.

An example is shown in Fig. 3, where three domains are demonstrated. From the perspective of clock edge arrival relations, the aggressor domain is  $Domain_1$ , since the earlier transitions (the transitions in  $Domain_1$ ) affects IR-drop caused by the later transitions (the transitions in  $Domain_2$ ) [13]. The proposed method considers the excessive IR-drop impact of  $Domain_1$  since the IR-drop cau affect the excessive delay increase in  $Domain_2$ .



Figure 3. Concept of Clock Edge Arrival Relation.

# **B.** IR-Drop Impact for Multiple Clock Domains

Here, we describe the impact of clock edge arrival relations. Capture malfunction is usually caused by excessive delay along a long sensitized path. It is obvious that the longer period of excessive IR-drop occurs, the larger amount of delay increases.

Fig. 4 shows a typical waveform of excessive IR-drop. As shown in Fig. 4, the period of excessive IR-drop affects the delay of the path from  $FF_L$  to  $FF_C$ . For a design with multiple clock domains, there are differences of clock arrival times among the clock domains. Depending on the clock arrival relations among clock domains, the period of excessive IR-drop varies.



Figure 4. Typical IR-Drop Waveform.

Suppose that clock edge arrival time for  $Domain_{vic}$  is  $t_{vic}$ and that for  $Domain_1$  is  $t_1$ , respectively. If  $t_1 < t_{vic}$ ,  $Domain_1$ is identified as an aggressor domain ( $Domain_{agg}$ ). Fig. 5 shows the IR-drop for  $Domain_1$  and  $Domain_{vic}$ . IR-drop induced by a launch cycle is much larger than one induced by a capture cycle ( $C_1$  for the launch cycle,  $C_2$  for the capture cycle as shown in Fig. 1). Therefore, the IR-drop of  $Domain_1$  affects a significant impact on the IR-drop for  $Domain_{vic}$ , resulting in the long period of excessive IR-drop. If the launch cycle occurs in more than one domain in a relatively short period, the IR-drop impact described above becomes significant. Obviously, such impacts must be included in capture-safety checking.



Figure 5. Excessive IR-Drop of Aggressor Domain.

#### C. CLEAR Capture-Safety Checking

At first, the proposed CLEAR capture-safety checking method assesses the switching activity around sensitized long paths so as to identify absolutely capture-safe vectors. Next, we analyze the excessive delay caused by clock edge arrival relations. Although the proposed method does not analyze the detail power supply network, it can achieve effective capture-safety checking. The proposed method consists of (1) *critical region extraction*, (2) *region based transition calculation*, and (3) *delay calculation with clock edge arrival relation*.

### (1) Critical Region Extraction

The switching activity of neighboring gates around a sensitized long path has strong impact on the delay increase for the path [15]. Therefore, we first extract a region around the sensitized long path, called *critical region*. In this work, we partition logic gates (standard cells) based on clock domains. Since the latest place and route tools support module based placement, standard cells in a clock domain are placed close together.

Fig. 6 demonstrates that the sensitized longest path and the critical region. Place and route are conducted with IC Compiler<sup>®</sup> by Synopsys for the largest ITC'99 circuit b19. This circuit was duplicated and fed two clocks to each duplicated circuit (the two duplicated circuits become the two different clock domains in this design). Since *Domain*<sub>1</sub> includes the path, *Domain*<sub>1</sub> is the critical region.

In general, excessive IR-drop occurs when a circuit has the abnormally high switching activity (transitions). The degree of IR-drop also depends on the power supply network consisting of power rings, power straps, and power rails. In capture-safety checking, the power supply network also needs to be considered [19]. Since we focus on the importance of clock edge arrival relations among multiple clock domains, we do not incorporate with region partitioning based on the power supply network. But, we can achieve decent results from this partitioning, even though detailed power network analysis is not conducted.



Figure 6. Critical Region and Sensitized Longest Path for b19.

# (2) Region Based Transition Calculation

Using the results of the critical region extraction, the proposed method calculates the transition ratio at flip-flops in the critical region. By the calculation, we can identify definitely capture-safe test vectors. Although considering the transition ratio at flip-flops is not sufficient enough for accurate capture-safety checking, an extremely low transition ratio does not make test vectors to be capture-risky. After the identification of capture-safet vectors by this step, the proposed method finally checks capture-safety for the rest of test vectors.

# (3) Delay Calculation with Clock Edge Arrival Relation

The actual delay of manufactured chips depends on many factors, for example, wire length, parasitic capacitance, and parasitic resistance affected by process rules and manufacturing environment, as well as power supply network analysis. Therefore, it is very difficult to accurately estimate such delay with simple techniques. In particular, the estimation of delay increase for clock edge arrival relations in nano second order is extremely difficult.

The proposed method selects test vectors identified not capture-safe, and then calculates the worst delay caused by aggressor domains ( $Domain_{agg}$ ) with earlier clock edge arrival time by performing a delay analysis tool which can calculate the increased delay due to IR-drop. By identifying capture-safe test vectors with the transition ratio in the critical region, we can save the computation time for running delay analysis tools.

Although SPICE simulator calculates the most accurate increased delay, it takes a significantly long computation time. However, even if we use a less accurate tool than SPICE, we can achieve good correlations with results of a real chip. After the calculation, the test vector can be identified as capture-risky if the obtained delay value is larger than a given threshold.

Note that the threshold must be determined by a process rule and an environment of manufacture. Therefore, the threshold must be determined by a user.

# **IV. ON-CHIP DELAY CAPTURE CIRCUIT**

The fairest evaluation of a capture-safety checking is to be compared with the increased delay on a fabricated chip. If a capture-risky vector identified induces excessive delay on a fabricated chip, the capture-safety checking is accurate. In order to capture excessive delay on a chip, we present an on-chip delay capture circuit design, which captures the delay increased by capture malfunction. We use this circuit to evaluate our proposed CLEAR method.

The on-chip delay capture circuit consists of a delay adjuster circuit, flip-flops and several buffers as shown in Fig. 7. From the flip-flop as the start point of a path, the path goes through the delay adjuster circuit, and then the path is branched off to an individual flip-flop capturing a test response. The buffers are inserted on the path in order to propagate transitions to the individual flip-flop at desired timing.

When a flip-flop captures an unexpected response, it means that the flip-flop cannot capture the response within a clock period. We refer to the flip-flop as a *fail flip-flop*. The buffers in the delay adjuster and on the path are adequately selected. With the number of fail flip-flops in the delay capture circuit, we can observe the excessive delay increased by a test vector.

Fig. 7 shows the on-chip delay capture circuit. The circuit consists of a delay adjuster circuit, buffers  $B_1, B_2, \ldots, B_n$ , and flip-flops  $Q_0, Q_1, Q_2, \ldots, Q_n$ . The delay adjuster circuit has a delay selector input (*DSEL*). Fig. 8 shows the delay adjuster circuit example which has a 3-bit delay selector. By the value of *DSEL*, the number of buffers inserted on a path is determined. From the output of the delay adjuster circuit, there are *i* buffers to flip-flop  $Q_i$ . The number of fail flip-flops capturing unexpected responses varies depends

on delay on the path. The number of fail flip-flops increases roughly proportional to the increased delay.



Figure 7. On-Chip Delay Measurement Circuit Design.



Figure 8. Delay Adjuster Circuit Design.

#### V. EXPERIMENTAL RESULTS

We conducted experiments with modified b19 with two clock domains shown in Fig. 6. Circuit b19 is the largest ITC'99 benchmark circuit (129,130 gates / 6,130 FFs). This circuit was duplicated and fed two clocks to each duplicated circuit (the duplicated circuits become the two different clock domains in this design).

We synthesized above design using Design Compiler<sup>®</sup>, and then placed and routed using IC Compiler<sup>®</sup> with the SAED\_EDK90nm library. 3,060 transition fault test vectors with 81.4% test coverage were generated using TetraMAX<sup>®</sup>. We extracted a testable longest path using STA and ATPG tools (PrimeTime<sup>®</sup> and TetraMAX<sup>®</sup>) as a sensitized longest path. We assume that the clock domain including the testable longest path as the victim domain.

Fig. 9 shows that the transition (switching activity) ratio in the critical region described in Section III and the delay increased due to IR-drop. We randomly picked up 50 test vectors. We conducted IR-drop analysis with PrimeRail<sup>®</sup> and obtained the delay for the path by using PrimeTime<sup>®</sup>. The transition ratio is roughly proportional to the increased delay on the target path. As described in Section III, the low transitions cause the small amount of delay increase. With a given threshold, we can identify definitely capturesafe vectors.



Figure 9. Transition Ratio and Delay for b19.

Fig. 10 shows the delay increased by differences of the clock edge arrival times between the aggressor domain and the victim domain for three different test vectors. The test vectors cause 15.9%, 11.7%, and 6.8% of the transition ratio in the critical region.

It can be seen that the transition ratio is more dominant than the clock edge arrival times between the two domains. However, if the delay value as a threshold whether capture-safe or capture-risky is 11.55ns, the test vector causing 15.9% transition can be both of capture-safe and capture-risky. Therefore, taking CLEAR (*CLock Edge Arrival Relation*) into consideration is significantly important.

Next, we evaluate the delay increased by the differences of clock edge arrival time on a fabricated chip. The design has 4.1M gates, 123K flip-flops, and 16 clock domains. The chip is fabricated based on the design with 40nm process. The frequency is 250MHz. The proposed delay capture circuit described in Section IV was embedded in the chip.

In the evaluation with the fabricated chip, we controlled the clock edge arrival time of the aggressor domains. Fig. 11 shows the number of failed flip-flops in the delay capture circuit, when we changed the clock edge arrival time of flip-flops in the aggressor clock domains from -4 ns to 4ns. We could obtain the results similar to the result shown in Fig. 10. The experimental results clearly demonstrate the

importance of clock edge arrival relations in a capturesafety checking.

In general, place and route phase has already included approximate power analysis based on power budget which is estimated by the probabilistic transition ratio in a circuit. Therefore, it rarely occurs that the low transition ratio becomes power related problems. Therefore, it is reliable that test vectors causing the low transition ratio are capturesafe. Combining the CLEAR capture-safety checking and the previous capture-safety checking for the victim domain [8-16] may achieve more scalable and detailed results, although there must exist trade-off between the computational cost and the accuracy. We address the improvement by the combination of the CLEAR and previous capture-safety checking as a future work.





Figure 11. Results of On-Chip Evaluation.

# **VI.** CONCLUSIONS

This paper presented the CLEAR (*CLock-Edge-Arrival-Relation-based*) capture-safety checking method by taking clock edge arrival relations of multiple clock domains into consideration. None of previous capture-safety checking considers clock edge arrival relations among different clock domains even if it is important for the accurate capture-safety checking. We also presented the delay capture circuit to evaluate the accuracy of the proposed capture-safety checking method. The experimental results demonstrated that the clock edge arrival relations have the relatively high impact on the excessive delay. Therefore, the CLEAR capture-safety checking method improves the state of the

art capture-safety checking, resulting in the accurate test vector sing-off for practical scan-testing. Future work includes (1) a more detailed analysis considering power supply noise, ground bounce, and inductive phenomena in order to prove that the CLEAR method is useful and; (2) formulation of equations that expresses the increased delay.

# ACKNOWLEDGMENT

This work was partly supported by JSPS KAKENHI Grantin-Aid for Scientific Research (B) 22300017.

#### REFERENCES

- P. Girard, N. Nicolici, X. Wen (Editors), *Power-Aware Testing and Test Strategies for Low Power Devices*. Springer, New York, Oct. 2009.
- [2] L. Whetsel, "Adapting Scan Architectures for Low Power Operation," Proc. Int'l. Test Conf., pp., 863-872, 2000.
- [3] J. Wang, et al., "Power Supply Noise in Delay Testing," Proc. Int'l Test Conf., Paper 17.3, 2006.
- [4] S. Ravi, "Power-Aware Test: Challenges and Solutions," Proc. Int'l Test Conf., Lecture 2.2, 2007.
- [5] P. Girard, et al., Low-Power Testing (Chapter 7) in Advanced SOC Test Architectures - Towards Nanometer Designs, Morgan Kaufmann, 2007.
- [6] K. Noda, et al., "Power and Noise Aware Test Using Preliminary Estimation," Proc. VLSI-Int'l Symp. on VLSI Design, Automation and Test, pp. 323-326, 2009.
- [7] V. R. Devanathan, et al., "Glitch-Aware Pattern Genration and Optimization Framework for Power-Safe Scan Test," *Proc. VLSI Test Symp.*, pp. 167-172, 2007.
- [8] V. R. Devanathan, et al., "A Stochastic Pattern Generation and Optimization Framework for Variation-Tolerant, Power-Safe Scan Test," *Proc. Intl. Test Conf.*, Paper 13.1, 2007.
- [9] X. Wen, et al., "On Low-Capture-Power Test Generation for Scan Testing," Proc. VLSI Test Symp., pp. 265-270, 2005.
- [10] S. Remersaro, et al., "Low Shift and Capture Power Scan Tests," Proc. VLSI Design, pp. 793-798, 2007.
- [11] K. Miyase, et al., "A Novel Post-ATPG IR-Drop Reduction Scheme for At-Speed Scan Testing in Broadcast-Scan-Based Test Compression Environment," *Proc. Int'l Conf. on CAD*, pp. 97-104, Nov. 2009.
- [12] L.-T. Wang, C.-W. Wu, and X. Wen, (Editors), VLSI Test Principles and Architectures: Design for Testability, San Francisco: Elsevier, 2006.
- [13] K. Miyase, et al., "Transition-Time-Relation Based Capture-Safety Checking for At-Speed Scan Test Generation," *Proc. Design Automation and Test in Europe Conf.*, pp. 895-898, 2011.
- [14] X. Wang, et al., "A Novel Architecture for On-Chip Path Delay Measurement," Proc. Int'l Test Conf., Paper 12.1, 2009.
- [15] X. Wen, K. Miyase, et al., "Critical-Path-Aware X-Filling for Effective IR-Drop Reduction in At-Speed Scan Testing," *Proc. Design Automation Conference*, pp. 527-532, 2007.
- [16] N. Ahmed, et al., "Transition Delay Fault Test Pattern Generation Considering Supply Voltage Noise in a SOC Design," *Proc. Design Automation Conf.*, pp. 533-538, 2007.
- [17] T. Yoshida and M. Watari, "A New Approach for Low Power Scan Testing," Proc. Intl. Test Conf., pp. 480-487, 2003.
- [18] S. Bahl, et al., "State of the Art Low Capture Power Methodology," Proc. Int'l Test Conf., Paper 12.3, 2011.
- [19] W.-W. Hsieh, et al., "A Physical-Location-Aware X-filling Method for IR-Drop Reduction in At-Speed Scan Test," *Proc. Design Automation and Test in Europe Conf.*, pp. 1234-1237, 2009.