# LCTI-SS: Low-Clock-Tree-Impact Scan Segmentation for Avoiding Shift Timing Failures in Scan Testing

Yuta Yamato, Xiaoqing Wen, Michael A. Kochte, Kohei Miyase, Seiji Kajihara, and Laung-Terng Wang

Abstract-Moving further into the deep-submicron era, the problem of test-induced yield loss due to high power consumption has increasingly worsened. One of the major causes of this problem is shift timing failure, which arises from excessive switching activity in the proximities of clock paths that tends to introduce severe clock skew due to IR-drop-induced delay increase on a portion of the clock tree. This paper proposes a novel layout-aware scan segmentation design scheme called LCTI-SS (Low-Clock-Tree-Impact Scan Segmentation) for avoiding shift timing failures. The proposed scheme searches for an optimal combination of scan segments for simultaneous clocking so as to reduce the switching activity in the proximities of clock trees while maintaining the average power reduction effect of the conventional scan segmentation. Experimental results on benchmark circuits have demonstrated the advantage of the LCTI-SS scheme.

*Key-words:* scan testing, shift power reduction, scan segmentation, switching activity, clock tree, clock skew.

# I. INTRODUCTION

CONTINUOUS shrinking in process feature sizes has led in an Cera of high-speed and low-supply-voltage VLSI designs. At the same time, the *deep-submicron* (DSM) process technology has posed serious design and test challenges. One major issue is test-induced yield loss due to excessive power consumption in scan testing. In recent years, *at-speed scan testing* has become crucial for DSM VLSI circuits in guaranteeing sufficient circuit quality levels. This is because timing-related defects have become dominant in such circuits [1].

In practice, a *launch-on-capture* (LOC) clocking scheme has been widely used in scan testing because its scan enable (*SE*) signal has lower physical design complexity than other clocking schemes. A basic LOC clocking scheme is shown in Fig. 1. In *shift* mode (*SE* = 1), scan chains are operated as shift registers with multiple clock pulses ( $S_1$  to  $S_L$ ) for loading a new test vector and unloading the test response to the previous test vector. Then, in *capture* mode (*SE* = 0), a first capture pulse  $C_1$ is applied for launching transitions and subsequently, a second

This Article is an extension of the paper "A Novel Scan Segmentation Design Method for Avoiding Shift Timing Failures in Scan Testing" presented at the International Test Conference, held at Anaheim, California, USA, in September 2011. capture pulse  $C_2$  is applied at system clock cycle *T* for capturing the response to the launched transitions. After that, *SE* is set to 1 again for unloading the test response and loading the next test vector.



Fig. 1. Test power safety issues.

# A. Test Power Safety in At-Speed Scan Testing

At-speed scan testing is indispensable for DSM VLSI circuits. However, despite its importance, scan testing is facing a serious challenge of test-induced yield loss due to excessive power consumption [2].

**Test power** caused by switching activity which attributes to power dissipation in scan testing has known to be much higher than *functional power* because of the need to test in the shortest time possible. High test power may cause various problems, threatening *test power safety*. Fig. 1 illustrates the test power safety issues in at-speed scan testing.

There are two types of issues in scan testing: thermal overheating and timing failures. The thermal issue is closely related to average power since it is the accumulative impact of excessive *shift switching activity* (SSA) as most of the test application time is spent in shift mode. This may result in overheating of the die or chip packages, leading to yield loss or performance degradation. On the other hand, timing failures can occur in both shift mode and capture mode since they are caused by the instantaneous impact of excessive switching activity at individual clock cycles. Excessive switching activity causes large switching current flow through the power and

ground network, resulting in IR-drop that reduces the switching speed of each affected gate. As a result, timing failures may occur due to IR-drop-induced delay increase and thus yield loss occurs [2]. While IR-drop-induced delay increase is the main cause, the mechanism of timing failures differs in different scan test modes. In shift mode, IR-drop-induced delay increase along clock paths may lead to severe clock skew and shift timing failures may occur due to hold time violations. In contrast, in capture mode, excessive *launch switching activity* (LSA), as caused by  $C_1$  in Fig. 1 results in IR-drop-induced delay increase along sensitized paths. Capture timing failures may thus occur in the capture cycle due to setup time violations. Therefore, test power safety--the combination of both shift safety and launch safety--must be guaranteed for at-speed scan testing to avoid chip and package damage, reliability and performance degradation, and undue yield loss. The thermal issue has been addressed in the past years and various techniques have been proposed to reduce average SSA. Typical techniques include scan clock gating [3], scan chain disabling [4], toggle suppression [5], scan cell ordering [6], and scan segmentation [7]. For timing failures, effective techniques exist for reducing LSA [9, 10], which are helpful in achieving launch safety. However, these techniques do not target shift timing failures and thus cannot guarantee shift safety.

This paper addresses this shift safety problem caused by excessive SSA around clock paths with a novel layout-aware scheme based on scan segmentation, called LCTI-SS (Low-Clock-Tree-Impact Scan Segmentation). The basic idea is to optimize the combination of scan segments for simultaneous clocking since SSA depends on which segments are simultaneously clocked. LCTI-SS deals with the real cause of excessive-SSA-induced yield loss by reducing SSA in the proximities of active clock paths while preserving the benefits of conventional scan segmentation in reducing average whole-circuit shift power without performance degradation. A segment regrouping algorithm is proposed to directly reduce SSA in impact areas by optimally grouping scan segments for simultaneous clocking. LCTI-SS improves the shift safety since the reduction of instantaneous SSA is directly focused on impact areas to significantly reduce IR-drop-induced shift timing failures.

# II. BACKGROUND

# A. Conventional Scan Segmentation

The basic concept of scan segmentation [7] is to split a scan chain into multiple segments, and shift just one segment of the scan chain at a time while keeping all other segments deactivated. Fig. 2 shows an example of a scan segmentation design for a circuit with 3 scan chains. The original scan chains with length L (Fig. 2a) are split into 3 shorter segments with length L/3, resulting in a total of 9 segments  $S_{11}$  to  $S_{33}$  (Fig. 2b). Three gated clocks  $GCLK_1$ ,  $GCLK_2$ , and  $GCLK_3$  are connected to all scan FFs in 3 segment groups  $G_1 = \{S_{11}, S_{21}, S_{31}\}, G_2 = \{S_{12}, S_{22}, S_{32}\}$ , and  $G_3 = \{S_{13}, S_{23}, S_{33}\}$ , respectively. The shift operation is conducted for  $G_1$ ,  $G_2$ , and  $G_3$ , one at a time. As shown in Fig. 2c, gated clocks  $GCLK_1$ ,  $GCLK_2$ , and  $GCLK_3$  are

exclusively applied during a shift operation. The test response to a test vector is captured by applying all gated clock signals after a test vector has been shifted into all segments. Since the number of simultaneously-switching FFs becomes smaller, global average SSA is effectively reduced. Note that no modification is required on functional paths, thus avoiding any performance degradation. In addition, test application time remains the same as that of the standard scan architecture. It has been reported in [7] that the average shift power reduction ratio is approximately 50% for a 2-segment configuration and 66% for a 3-segment configuration.

2



(a) Basic scan architecture with 3 scan chains









Fig. 2. Conventional scan segmentation.

# **B.** Shift Timing Failures

Conventional scan segmentation can effectively and predictably address the *accumulative* impact of excessive SSA, thus solving the overheat problem caused by high average SSA. However, it is unable to mitigate the *instantaneous* impact of excessive SSA. As a result, IR-drop-induced delay increase may still occur along clock paths from a clock pin to scan FFs, which may cause clock skew and shift timing failures and severely reduce shift safety and test yield.

Circuit level experiments show that even local IR-drop affecting only a single clock buffer may already cause an increase of the propagation delay of the driven clock paths in the order of the designed maximum clock skew.

In consequence, excessive SSA around clock paths threatens shift safety by causing shift timing failures at scan FFs, resulting in undue yield loss. With regard to scan segmentation, shift safety is not guaranteed by reducing only global average SSA. There is a strong need for effectively reducing local SSA around clock paths as well.

## III. THE LCTI-SS SCHEME

This section describes *Low-Clock-Tree-Impact Scan Segmentation* (LCTI-SS), for reducing the instantaneous *shift switching activity* (SSA) in the proximities of clock trees to reduce the risk of timing failures in scan chains. Together with the intrinsic benefit of scan segmentation for reducing global average SSA to mitigate the overheat problem, the proposed LCTI-SS significantly improves the overall shift safety.

# A. Basics

In conventional scan segmentation, once a scan segment configuration is fixed, the same groups of segments are always simultaneously shifted at a time. However, there may be a better combination of segments for simultaneous clocking with low SSA in the proximities of clock paths. The LCTI-SS scheme tries to find such a combination. Fig. 3a shows the general flow of the proposed LCTI-SS scheme. It consists of two major steps: *impact area identification* (①) and *segment regrouping* (②), as described below:

Given a circuit netlist N with standard full-scan design, conventional scan segmentation (as illustrated in Fig. 2b) is first designed. The result is a new netlist N', for which place-and-route is conducted to produce a layout design L and a clock tree design C. Based on these two types of information, *impact area identification* (①) is conducted to identify nodes (gates and FFs) whose transitions have significant impact on IR-drop-induced delay increase on clock paths. After that, segment regrouping (2) is conducted to minimize the number of nodes in impact areas which may affect active clock paths. To illustrate the LCTI-SS scheme, let us revisit the case shown in Fig. 2b. Here, the initial segment groups provided by conventional scan segmentation are  $G_1 = \{S_{11}, S_{21}, S_{31}\}, G_2 =$  $\{S_{12}, S_{22}, S_{32}\}$ , and  $G_3 = \{S_{13}, S_{23}, S_{33}\}$ . By applying the LCTI-SS scheme, scan segments are regrouped, for example, into  $G_1' = \{S_{13}, S_{22}, S_{31}\}, G_2' = \{S_{11}, S_{21}, S_{33}\}, \text{ and } G_3' = \{S_{12}, S_{23}, S_{33}\}$  $S_{32}$ }, as shown in Fig. 3a.

## B. Reconfigurable Scan Segmentation Architecture

Since clock trees are timing-critical and have to be perfectly balanced, it is objectionable to regroup scan segments by modifying the clock trees after physical design and timing closure. This may lead to clock tree re-synthesis, which in turn may change other parts of the layout. As a result, the impact areas may also change. To realize scan segment regrouping without changing the layout, a programmable clock control is preferable. Fig. 3c shows an architecture of a reconfigurable scan segmentation scheme with programmable clock control. It consists of clock control logic for scan chains, address registers, and shadow registers. The inputs of the clock control logic are CLK, SE and the address representing which segment to activate. Each output is fed into the corresponding clock tree of a segment. At the beginning of scan testing, address data for the first segment group is loaded into shadow registers. When all the control data is loaded, the load clock of the address registers is applied and SE is set to 1 for scan shift. This selects the AND-gated clock paths according to the mask in the address registers. Because of the one-hot-decoder, only the segments of the first group are activated. Address data for the next segment group is loaded into shadow registers while shifting. After shifting the first segment, the load clock is applied again to switch the active segment group. This is repeated until shifting of the last segment group is completed. Note that, while shifting the last segment group, address data for the first segment group is loaded into the shadow registers. SE is then set to 0 for launching transitions and capturing the response. In capture mode, paths from original clock are selected to activate all segments at a time. After that, SE is set to 1 again for shifting out the response and shifting in the next test vector.

This way, any grouping of scan segments can be chosen for simultaneous clocking for each scan chain without layout modification. This flexibility requires a slight increase in test data volume.

For each scan chain with *m* segments, the additional control circuit needs *m* 2-to-1 MUXs, *m* AND gates, one  $(\log_2 m)$ -to-*m* decoder and  $2[\log_2 m]$  registers. Then, *n* of the these control circuits are needed for *n* scan chains in a circuit. Let's assume the number of segments is at most 4 since average power reduction effect diminishes beyond 4 segments as shown in [8]. Then, the overhead per a scan chain is approximately 50 gates in 2-input NAND gate equivalent, when using the SAED90nm EDK Digital Standard Cell Library. This is sufficiently small even for a circuit with a high number of scan chains.

# C. Impact Area Identification

To identify the nodes whose transitions have significant impact on clock skew, the LCTI-SS scheme uses circuit layout information e.g., a *design exchange format* (DEF) file. The *clock aggressors*, defined as the nodes (gates and FFs) placed near a clock buffer and sharing power rails with the clock buffer, are extracted from the layout using clock tree information. Then, the elements of *impact areas*, i.e., the set of nodes which potentially causes transitions in the proximity of active clock paths, is computed based on the following definitions. This information is necessary in the subsequent segment regrouping

## step.

**Definition 1:** Let CA(B) be a set of clock aggressors of a clock buffer *B*, *P* be a path consisting of all clock buffers  $\{B_1, B_2, ..., B_m\}$  from a gated clock pin to the clock input of a scan FF, and *S* be a set of all clock paths to all FFs in a scan segment  $\{P_1, P_2, ..., P_n\}$ . A set of clock aggressors of a path *P*, called **path aggressor set**, denoted by PA(P) and a set of clock aggressors of a segment *S*, called **segment aggressor set**, denoted by SA(S)are defined as follows.

$$PA(P) = \bigcup_{\substack{i=1\\n}}^{m} (CA(B_i))$$
$$SA(S) = \bigcup_{i=1}^{m} (PA(P_i))$$

An example is shown in Fig. 4a, where two scan FFs,  $FF_1$  and  $FF_2$ , are assumed to form the scan segment  $S_{11}$ . Here,  $PA(P_1) = CA(B_1) \cup CA(B_2) \cup CA(B_3)$ ,  $PA(P_2) = CA(B_1) \cup CA(B_2) \cup CA(B_4)$ . As a result,  $SA(S_{11}) = PA(P_1) \cup PA(P_2)$ .

Path aggressor sets and segment aggressor sets can be statically identified based on the physical locations of circuit nodes, thus, each segment is assigned a fixed set of clock aggressors. However, not all clock aggressors in a segment aggressor set necessarily affect the propagation delay of clock paths. This is because only a part of the segments is simultaneously activated in scan segmentation. Even though a clock aggressor belongs to a segment aggressor set of the active segments, transitions may only occur when connected from active segments. Otherwise, the clock aggressor has no impact on IR-drop-induced delay increase on the active clock path and it is not necessary to take it into consideration any more. A clock aggressor impacting active clock buffers, called *impact aggressor* satisfies the following two conditions:

*Condition* **A**: The node belongs to at least one segment aggressor set of active segments.

*Condition* **B**: The node is structurally reachable from at least one scan FF in active segments.

**Definition 2**: Let RA(S) be a set of clock aggressors structurally reachable from all FFs in a segment *S*, and let *G* be a segment group composed of segments  $S_1$ ,  $S_2$ , ..., and  $S_n$  to be clocked simultaneously. The *impact area* of *G*, denoted by *IA* (*G*), is defined as

$$IA(G) = \bigcup_{i=1}^{n} (SA(S_i)) \cap \bigcup_{i=1}^{n} (RA(S_i))$$

Thus, the impact area of *G* contains only impact aggressors that may affect active clock paths, i.e., clock aggressors satisfying both Condition A and Condition B. An example is shown in Fig. 4b. Here, two scan segments  $S_{11}$  and  $S_{21}$  are assumed to belong to  $G_1$ .  $SA(S_{11}) = \{N_1, N_2, N_3, N_5, N_7, N_6\}$ ,  $SA(S_{21}) = \{N_4, N_5, N_6,$  $N_7, N_8, N_9\}$ ,  $RA(S_{11}) = \{N_1, N_2, N_3, N_5, N_7\}$ , and  $RA(S_{21}) = \{N_3,$  $N_5, N_6, N_8\}$ . In this case,  $IA(G_1) = (SA(S_{11}) \cup SA(S_{21})) \cap$  $(RA(S_{11}) \cup RA(S_{21})) = \{N_1, N_2, N_3, N_5, N_6, N_8\}$ .

From above definitions, the impact area of a segment group with arbitrary combinations of scan segments can be derived. This information is used to estimate the risk of shift timing failures.













Fig. 4. Impact area identification.

# D.Segment Regrouping

Generally, the number of impact aggressors depends on the combination of segments to be simultaneously clocked. The smaller the number of impact aggressors, the lower the probability of simultaneous transitions at impact aggressors. This indicates that it is possible to regroup segments optimally so that each segment group has a smaller number of impact aggressors. This section presents an effective algorithm for segment regrouping, which is another critical step in the LCTI-SS scheme.

The proposed algorithm for segment regrouping uses the *weighted switching activity* (WSA) metric for SSA estimation since this metric has good correlation with power dissipation and IR-drop at low computational effort [12].

**Definition 3:** The weighted impact of an impact area *IA*, denoted by *WI(IA)*, is defined as

$$WI(IA) = \sum_{i=1}^{n} w_i$$

where *n* is the number of impact aggressors in *IA*, and  $w_i$  is the weight of node *i* (*i* = 1, 2, ..., *n*), which can be approximated by the number of its fanout branches.

To find an optimal combination of segments for simultaneous clocking with low *WI*, we formalized the problem of segment

regrouping as follows:

Segment Regrouping Problem: Given a scan segmentation design with *m* scan chains and *n* segments for each scan chain, find *n* segment groups  $G_1, G_2, ..., G_n$  such that the weighted impact of the impact area for each segment group  $G_i$  (i = 1, 2, ..., n), namely  $WI(IA(G_i))$ , is minimized.

Theoretically, the total number of segment group combinations can be expressed by the following theorem:

**Theorem 1**: For a scan segmentation design with *m* scan chains and *n* segments for each scan chain, the total number of segment group combinations is  $(n!)^m$ .

**Proof**: For the first segment group, *n* segments can be selected from each of the *m* scan chains, which results in  $n^m$  possible combinations. Then, repeating this until the *n*-th segment group result in  $(n-1)^m$  possible combinations for the second segment group,  $(n-2)^m$  possible combinations for the third segment group, ..., and one combination for the *n*-th segment group. Therefore, the total number of segment group combinations is as follows:

$$\prod_{k=0}^{n-1} (n-k)^m = (n!)^m$$

Theorem 1 indicates that it is impractical to check all possible segment group combinations to find the best one for large industrial circuits with a large number of scan chains. Therefore, we propose a heuristic two-phase algorithm to efficiently find an optimal segment group combination with low SSA at clock aggressors.

The proposed segment regrouping algorithm is shown in Fig. 5. In Phase 1, a segment group  $G_{imp}$  with the maximum weighted impact is identified. Segments in  $G_{imp}$  are placed into separate groups  $G_1, G_2, ..., G_n$  in order to divide the segments in the worst case segment group into discrete groups. Then, in Phase 2, a segment  $S_{min}$  is selected such that the union  $(G_i \cup S_{min})$  has the minimum weighted impact, and  $S_{min}$  is added to  $G_i$ . This process is repeated until all segments are selected. This algorithm tries to reduce SSA at clock aggressors by minimizing the weighted impact for each segment group. This way, the clock aggressors of this particular segment group in the affected area can be reduced.

As shown in Fig. 5, in Phase 1 and Phase 2 of the algorithm, segments are selected one at a time and added to a particular segment group. In Phase 1, the segment which maximizes the weighted impact of impact aggressors *IA* for group  $G_{tmp}$  is selected. In Phase 2, the segment which results in the minimum weighted impact of *IA* of a particular group *G* is selected for addition to *G*.

To find and select the segment with minimum or maximum WI, we compute the resulting WI for the considered group and all yet-unselected segments. Each segment is selected exactly once and before the selection, WI is computed with respect to each yet-unselected segment. Thus, the number of WI computations is

$$\sum_{i=1}^{NS} i = \frac{NS(NS+1)}{2}$$

П

where NS is the total number of segments. To compute WI, we use optimized set operations (union, intersection) on the pre-computed sets of segment aggressors SA and reachable aggressors RA to reduce runtime.

```
Algorithm: Segment_Regrouping{
  INPUT: netlist, clock aggressors, initial segment groups
  OUTPUT: updated segment groups
  n = the number of groups;
  for (i = 1 to n) {
     G_i = \emptyset;
  }
  // Phase 1:
  G_{tmp} = \emptyset;
  for (i = 1 to n) {
    foreach ( unselected segment S) {
       compute WI(IA(G_{tmp} \cup \{S\}));
        a_{x} = the segment with the maximum WI(IA( G_{tmp} \cup \{S\}));
     // Select Smax
     G_i = G_i \cup \{S_{max}\}
     G_{tmp} = G_{tmp} \cup \{S_{max}\};
  3
  // Phase 2:
  while ( not all segments are selected yet ) {
     for (i = 1 to n) {
       foreach ( unselected segment S ) {
          if (S shares same scanchain
              with at least one segment in G_i ) {
            continue;
          } else {
             compute WI(IA(G_i \cup \{S\}));
       S_{min} = the segment with the minimum WI(IA(G_i \cup \{S\}));
       // Select Smin
       G_i = G_i \cup \{S_{min}\};
    }
  return \{G_1, G_2, ..., G_n\};
}
```

Fig. 5. Segment regrouping algorithm.

#### **IV. EXPERIMENTAL RESULTS**

The proposed LCTI-SS scheme was implemented in C language for evaluation. Six largest ITC'99 benchmark circuits (b17 to b22) and one industrial circuit (ck1) were used in the experiments. The layout was designed using the SAED90nm EDK Digital Standard Cell Library with 1.2V power supply voltage under typical operating condition. Transition delay fault test sets were generated to evaluate SSA at impact areas. The profile of the circuits and corresponding test sets is shown in Table 1.

TABLE I PROFILE OF CIRCUITS AND TEST SETS

| TROTILE OF CIRCOTIS AND TEST BETS |                             |       |                      |                  |                      |  |  |  |  |  |  |
|-----------------------------------|-----------------------------|-------|----------------------|------------------|----------------------|--|--|--|--|--|--|
| Circuit                           | #Gates<br>(2NAND<br>Equiv.) | #FFs  | #Clock<br>Aggressors | #Test<br>Vectors | Fault<br>Cov.<br>(%) |  |  |  |  |  |  |
| b17                               | 40k                         | 1317  | 5643                 | 1175             | 85.1                 |  |  |  |  |  |  |
| b18                               | 106k                        | 3020  | 14691                | 1485             | 80.2                 |  |  |  |  |  |  |
| b19                               | 202k                        | 6042  | 32940                | 1888             | 78.3                 |  |  |  |  |  |  |
| b20                               | 37k                         | 430   | 2471                 | 2559             | 93.9                 |  |  |  |  |  |  |
| b21                               | 35k                         | 430   | 2420                 | 2462             | 93.8                 |  |  |  |  |  |  |
| b22                               | 56k                         | 613   | 3929                 | 3075             | 93.7                 |  |  |  |  |  |  |
| ck1                               | 1.4M                        | 99815 | 101630               | 2267             | 98.8                 |  |  |  |  |  |  |

For each circuit, various scan configurations with different numbers of scan chains and segments were prepared according to the size of the circuit. For b17, b20, b21, and b22, configurations with 3, 4, and 5 scan chains were used. For b18 and b19, configurations with 10, 30, and 50 scan chains were used. For ck1, configurations with 100, 200, 300 scan chains were used. Conventional scan segmentation with 3, 4, and 5 segments were applied and LCTI-SS is then performed for each configuration. Since our objective is to reduce IR-drop on clock paths by reducing switching activity at impact areas, ideally, dynamic IR-drop analysis for all individual shift cycles should be performed for exact evaluation. However, this is computationally too expensive due to the large number of shift cycles in the entire test sequence. Therefore, we computed WSA at impact areas for every shift cycle and then picked the 100 cycles with highest WSA at an impact area for dynamic IR-drop analysis using a commercial tool. For each cycle, the average IR-drop at clock buffers along clock paths to an active segment group was extracted. We compared the proposed LCTI-SS scheme with conventional scan segmentation in terms of the weighted impact WI, WSA at impact area, and average IR-drop at active clock buffers.

Table 2 summarizes the experimental results. The reduction ratio of the maximum and the average weighted impact ("WI"), the maximum and the average WSA at impact areas ("WSA at IA"), and the maximum and the average of the average IR-drop at active clock buffers ("Avg. IR-drop at clock buffers") among segment groups are shown in columns 4 to 9. CPU runtime for segment regrouping ("CPU (s)") is shown in column 10.

As can be seen in the table, the weighted impact WI, our direct objective for reduction in segment regrouping was reduced on average by 6.5% and 3.1% for maximum WI and average WI, respectively. This indicates that the probability of signal transitions on the nodes in impact area is effectively reduced. The WSA at impact area was also reduced on average 6.1% for maximum WSA and 1.1% for average WSA. The correlation coefficient between WI and WSA was computed as a measure of the strength and direction of the linear relationship. The correlation coefficient gives the value between -1 and +1 inclusive. The result was 0.6, which shows a good correlation. As to the average IR-drop at active clock buffers for the worst 100 cycles of the WSA at impact area, while the correlation with WI was very low, both maximum and average of the average IR-drop among cycles were effectively reduced. The maximum reduction exceeded 36% in the case of b18 with 10 scan chains and 4 segment groups. In addition, effective IR-drop reductions at clock buffers can be seen for the largest circuit ck1. Moreover, the runtime of the proposed segment regrouping algorithm was relatively short even for the large industrial circuit with a high number of scan chains and FFs. . This indicates that this algorithm is applicable for large industrial designs with millions of gates.

|         | #Chains | #Seg-  | Reduction (%) |            |            |      |                 |        |            |
|---------|---------|--------|---------------|------------|------------|------|-----------------|--------|------------|
| Circuit |         |        | WI            |            | WSA at IAS |      | Avg. IR-drop at |        | CPU<br>(s) |
|         |         | incino | Max.          | Avg.       | Max.       | Avg. | Max.            | Avg.   | (3)        |
| b17     | 3       | 3      | 7.8           | 0.7        | 0.0        | 0.7  | 0.0             | 0.0    | 0.0        |
|         |         | 4      | 14.4          | 0.6        | 1.0        | 0.5  | -2.7            | 10.5   | 0.0        |
|         |         | 5      | 23.9          | -0.5       | 18.1       | 0.0  | 15.7            | 8.4    | 0.0        |
|         | 5       | 3      | 3.6           | 0.1        | -0.6       | 0.0  | -1.3            | -1.2   | 0.0        |
|         |         | 4      | 6.4           | 2.2        | 4.6        | 0.3  | 16.5            | 3.9    | 0.0        |
|         |         | 5      | 11.3          | 3.6        | -1.1       | 0.0  | -10.1           | -2.5   | 0.0        |
|         | 10      | 3      | 4.8           | 4.4        | -4.1       | -0.6 | 26.8            | 4.3    | 0.0        |
|         |         | 4      | 6.2           | 3.8        | -0.2       | -1.6 | 33.5            | -1.7   | 0.1        |
|         |         | 3      | 3.2           | 4.0        | -1.1       | -0.8 | -5.4            | -0.0   | 0.1        |
|         | 10      | 4      | 0.1           | 0.2        | -2.4       | -0.7 | 36.6            | 2.4    | 0.2        |
|         |         | 5      | 5.6           | 2.1        | 6.9        | 0.5  | -1.3            | -3.9   | 0.2        |
|         |         | 3      | 7.3           | 2.1        | 12.6       | 0.5  | 3.2             | -0.4   | 0.9        |
| b18     | 30      | 4      | 4.2           | 2.3        | 10.5       | 0.0  | 7.1             | -6.5   | 1.4        |
|         |         | 5      | 12.2          | 3.7        | 14.0       | 0.6  | 2.3             | 8.0    | 1.8        |
|         |         | 3      | 1.7           | 1.6        | 0.6        | -0.1 | -0.1            | -0.7   | 2.9        |
|         | 50      | 4      | 0.4           | 1.6        | 3.0        | 0.2  | -0.2            | -2.7   | 3.8        |
|         |         | 5      | 5.4           | 3.1        | 1.0        | 1.2  | -0.4            | -1.1   | 5.4        |
|         |         | 3      | 2.4           | 1.4        | 0.7        | -1.2 | 2.2             | 1.0    | 0.2        |
|         | 10      | 4      | 5.3           | 0.9        | 1.8        | -0.5 | 11.1:           | 0.1    | 0.3        |
|         |         | 5      | 7.3:          | 1.2        | 2.9:       | -1.0 | 23.0:           | 0.3    | 0.4        |
| L10     | 20      | 3      | 7.5           | 0.7        | 8.8        | 0.2  | 2.2             | 2.1    | 1.5        |
| 019     | 50      | 5      | 13.1          | 2.0        | 7.4        | -0.0 | 12.0            | 4.4    | 2.5        |
|         |         | 3      | 3.8           | 3.1        | 4.6        | -0.1 | -2.5            | 5.6    | 4.5        |
|         | 50      | 4      | 6.4           | 4.0        | 0.2        | 0.3  | 3.9             | -1.6   | 6.8        |
|         |         | 5      | 2.3           | 4.1        | 1.9        | 0.7  | 8.9             | 4.2    | 9,9        |
|         |         | 3      | 1.7;          | 2.5        | -2.7       | 6.5  | -7.6            | 0.7    | 0.0        |
|         | 3       | 4      | 8.3           | 6.2        | 14.2       | 6.1  | 4.3             | 9.1    | 0.0        |
|         |         | 5      | 5.9           | 7.0        | 2.6        | 11.4 | 0.3             | -2.3   | 0.0        |
|         | 5       | 3      | 4.5           | -1.4       | -4.6       | -1.0 | -3.0            | -5.2   | 0.0        |
| b20     |         | 4      | -2.6          | 1.6        | -4.6       | 0.0  | -4.2            | -6.7   | 0.0        |
|         |         | 5      | 6.4           | 2.4        | -12.7      | 2.1  | -9.9            | -0.4   | 0.0        |
|         | 10      | 3      | 3.1           | 0.7        | 4.7        | -1.7 | 2.9             | -0.1   | 0.0        |
|         | 10      | 5      | 14.2          | 5.5        | 14.1       | -2.2 | 1.0             | 15.0   | 0.0        |
|         |         | 3      | 3.9           | 8.1        | -6.6       | 3.4  | -8.4            | 1.9    | 0.0        |
|         | 3       | 4      | 4.7           | 6.1        | 3.1        | 1.8  | 10.7            | 2.4    | 0.0        |
|         |         | 5      | 6.0           | 8.2        | -1.4       | 3.6  | 0.8             | 5.4    | 0.0        |
|         |         | 3      | 4.9           | 2.4        | 7.2        | 6.4  | 1.8             | 2.2    | 0.0        |
| b21     | 5       | 4      | 3.7           | 2.5        | 13.3       | 4.9  | -2.5            | -1.4   | 0.0        |
|         |         | 5      | 9.5           | 3.8        | 2.8        | 2.2  | 1.3             | -2.1   | 0.0        |
|         |         | 3      | 5.5           | 2.4        | 17.2       | -0.1 | 5.8             | 8.5    | 0.0        |
|         | 10      | 4      | 18.5          | 7.5        | 22.7       | -1.2 | 29.6            | 24.2   | 0.0        |
|         |         | 5      | 1.9           | 4.3        | 22.8       | -0.8 | 25.1            | 31.0   | 0.0        |
|         | 3       | 3      | 20.2          | -2.8       | 25.1       | 1.6  | 8.5             | 21.4   | 0.0        |
|         |         | 4      | 18.5          | -2.4       | 23.8       | 2.4  | 1.6             | 18.7   | 0.0        |
|         |         | 3      | 19.0          | -0.5       | 11.0       | 3.0  | -2.8            | -2.2.2 | 0.0        |
| b22     | 5       | 4      | 8.3           | 6.6        | 7.0        | 2.0  | 17.9            | 15.3   | 0.0        |
|         |         | 5      | 4.4           | 4.6        | -2.4       | 1.4  | -1.3            | -1.5   | 0.0        |
|         | 10      | 3      | 2.7:          | 3.2        | 7.9        | 1.0  | -0.8            | 1.2    | 0.0        |
|         |         | 4      | 7.6           | 2.9        | -0.1       | 3.6  | 7.9             | -4.0   | 0.1        |
|         |         | 5      | 8.0           | -1.0       | 0.6        | 2.7  | -2.5            | 0.1    | 0.1        |
|         |         | 3      | 2.9           | 0.6        | 5.9        | 0.3  | 14.8            | 0.4    | 147.8      |
|         | 100     | 4      | 0.6           | 6.1        | 1.7        | 3.4  | 0.6             | -0.5   | 238.3      |
| ck1     |         | 5      | 11.3          | 9.8        | 0.7        | 4.0  | -0.4            | 1.4    | 346.0      |
|         | 200     | 3      | 1.7           | 2.8        | 1.2        | 1.3  | 0.7             | 1.6    | 838.8      |
|         |         | 4      | 1.7           | 6.3        | 6.1        | -2.7 | 1.0             | 2.3    | 1369.9     |
|         | L       | 5      | 7.6           | 2.3        | 12.2       | -1.3 | 11.9            | 0.7    | 2513.2     |
|         | 200     | 5      | -1.3          | 4.1        | 16.2       | 3.4  | 11.7            | 0.9    | 3003.2     |
|         | 300     | 4      | -1.4          | 5.5<br>5.6 | 25.0       | -0.2 | 15.0            | -1.4   | 4033.0     |
| Auc     |         | , ·    | 4.3           | 2.0        | 23.0       | -5.1 | 10.8            | -1.4   | 7251.1     |
| Avg.    |         | /      | 0.5           | 3.1        | 0.1;       | 1.1  | 5.5             | 3.8    | _          |

#### TABLE II Experimental Results

# V.CONCLUSION

This paper proposed a novel layout-aware scan segmentation design scheme called LCTI-SS to address an emerging problem —IR-drop-induced shift clock skew—that can severely damage test power safety due to scan shift failures. The LCTI-SS scheme identifies an optimal or near optimal combination of scan segments for simultaneous clocking so that shift switching activity in the proximities of active clock paths is reduced. As demonstrated by experimental results, the proposed scheme can effectively reduce instantaneous IR-drop at active clock buffers. This helps to reduce IR-drop-induced shift clock skew, thus improving shift safety in scan testing. We are currently conducting additional experiments to evaluate the delay difference at adjacent scan FFs. In addition, to further improve shift safety, development is under way of an accurate metric for the segment regrouping algorithm which is well correlated with IR-drop at active clock buffers.

## ACKNOWLEDGMENT

This work was partly supported by JSPS KAKENHI Grant-in-Aid for Scientific Research (B) 22300017 and Challenging Exploratory Research 24650022. M. Kochte was a Visiting Researcher at Kyushu Institute of Technology in 2010, supported by the German Academic Exchange Service (DAAD).

#### REFERENCES

- L.-T. Wang, C.-W. Wu, and X. Wen, Editors, VLSI Test Principles and Architectures: Design for Testability, San Francisco: Morgan Kaufmann, 2006.
- [2] P. Girard, N Nicolici, and X. Wen, Editors, Power-Aware Testing and Test Strategies for Low Power Devices, Springer, 2009.
- [3] J. Saxena, et al., "A Case Study of IR-Drop in Structured At-Speed Testing," Proc. IEEE Intl. Test Conf., pp. 1098-1104, 2003.
- [4] S. Gerstendorfer and H. -J. Wunderlich, "Minimized Power Consumption for Scan-Based BIST," *Proc. IEEE Intl. Test Conf.*, pp. 77-84, 1999.
- [5] R. Sankaralingam and N. A. Touba, "Reducing Test Power During Test Using Programmable Scan Chain Disable," *Proc. Intl. Workshop on Electronic Design, Test and Applications*, pp. 159-163, 2002.
- [6] M.E. Imhof, et al., "Scan Test Planning for Power Reduction," Proc. Design Automation Conf., pp. 521-526, 2007.
- [7] Y. Bonhomme, et al., "Efficient Scan Chain Design for Power Minimization during Scan Testing under Routing Constraint," Proc. IEEE Intl. Test Conf., pp. 488-493, 2003.
- [8] L. Whetsel, "Adapting Scan Architectures for Low Power Operation," Proc. IEEE Intl. Test Conf., pp. 863-872, 2000.
- [9] X. Wen, et al., "On Low-Capture-Power Test Generation for Scan Testing," *Proc. IEEE VLSI Test Symp.*, pp. 265-270, 2005.
- [10] S. Remersaro, et al., "Preferred Fill: A Scalable Method to Reduce Capture Power for Scan Based Designs," *Proc. IEEE Intl. Test Conf.*, Paper 32.2, 2006.
- [11] A. Al-Yamani, E. Chmelar, and G. Grinchuck, "Segmented addressable scan architecture," *Proc. IEEE VLSI Test Symp.*, pp. 405- 411, 2005.
- [12] K. Noda, et al., "Power and Noise Aware Test Using Preliminary Estimation," Proc. VLSI Design, Automation and Test, pp. 323-326, 2009.

# AUTHOR BIOS

**Yuta Yamato** received his Ph.D. degree from Kyushu Institute of Technology, Japan in 2010. He is currently a researcher at the Nara Institute of Science and Technology. His research interests include low power test, fault diagnosis, and dependable system. He is an IEEE member.

Xiaoqing Wen is a professor and chairman of Department of Creative Informatics at Kyushu Institute of Technology. His research interests include power-aware testing, design for testability, and fault diagnosis of VLSI circuits. He holds a Ph.D. degree in applied physics from Osaka University. He is a Fellow of IEEE.

**Michael A. Kochte** is a research assistent at the Institute for Computer Architecture and Computer Engineering, University of Stuttgart, Germany. He holds a Diploma degree in computer science from University of Stuttgart. His research interests include test generation, fault simulation, and fault tolerance. He is an IEEE student member.

Kohei Miyase received Ph.D. degrees from Kyushu Institute of Technology, Japan 2005. From 2007, he has worked for Kyushu Institute of Technology, Japan as an Assistant Professor currently. His research interests include design for testability, low power test, and fault diagnosis. He is a member of the IEEE. Seiji Kajihara received the Ph.D. degree from Osaka University, Japan, in 1992. Since 1996, he has been working with Kyushu Institute of Technology, where he is a Professor currently. His research interest includes test generation, delay testing, and design for testability. He is a member of IEEE, IEICE, and IPSJ.

Laung-Terng Wang is founder and CEO of SynTest Technologies. His research interests include DFT, ATPG, logic and memory BIST, scan compression, fault diagnosis, and soft-error resilience. He has a PhD in electrical engineering from Stanford University. He is an IEEE fellow.

# CONTACT INFORMATION

Yuta Yamato Nara Institute of Science and Technology 8916-5, Takayama Nara 630-0192, Japan Email: yamato@is.naist.jp Phone: +81-743-72-5224

Xiaoqing Wen Department of Creative Informatics Kyushu Institute of Technology 680-4, Kawazu Iizuka Fukuoka 820-8502, Japan Email: wen@cse.kyutech.ac.jp Phone: +81-948-29-7891 Fax: +81-948-29-7651

Michael A. Kochte ITI, Uni Stuttgart Pfaffenwaldring 47 70569 Stuttgart Germany Phone: +49-(0)711-685-88-361 Email: kochte@iti.uni-stuttgart.de

Kohei Miyase Kyushu Institute of Technology 680-4, Kawazu Iizuka Fukuoka 820-8502, Japan Email: k\_miyase@cse.kyutech.ac.jp Phone/Fax: +81-948-29-7685

Seiji Kajihara Kyushu Institute of Technology 680-4, Kawazu Iizuka Fukuoka 820-8502, Japan Email: kajihara@cse.kyutech.ac.jp Phone/Fax: +81-948-29-7665

Laung-Terng Wang SynTest Technologies, Inc. 505 S. Pastoria Avenue, Suite 101 Sunnyvale, CA 94086 Email: wang@syntest.com Phone: 408-720-9956 x 200 Fax: 408-720-9960 8