A diagnosis technique is presented to locate at least one fault in a scan chain with multiple timing faults. This diagnosis technique applies Single Excitation (SE) patterns of which only one bit can be flipped even in the presence of multiple faults. By applying the SE patterns, the problem of simulations with unknown values is eliminated. The diagnosis result is therefore deterministic, not probabilistic. Experiments on the IS-CAS benchmark circuits show that the average diagnosis resolution is less than ten scan cells.
Introduction
The commercial tools perform Single Stuck-at Fault (SSF) diagnosis for combinational logic very efficiently nowadays. By contrast, the scan chains which account for 30% of the total silicon area [1] , cannot be diagnosed by most commercial tools. This paper presents a technique to diagnose the scan chains with multiple timing faults. The timing faults considered in this paper are Slow-To-Rise (STR), SlowTo-Fall (STF), Fast-To-Rise (FTR), and Fast-To-Fall (FTF) faults. Traditionally, single stuck-at fault model is assumed by most Automatic Test Pattern Generators (ATPG). However, it is shown by experiments that single stuck-at fault model is very effective for testing but may not be very precise for diagnosis [2] , [3] . One of the possible explanations why multiple faults occur is that defects tend to clustered together rather than scattered uniformly on the wafer [4] . Another reason for multiple faults is that, in nano-meter technologies, the timing of scan cells can be susceptible to systematic problems, which can be better modeled by multiple timing faults than by single stuck-at faults. Cases of multiple timing faults in scan chains have been reported in several recent industry chips [5] , [6] . Although the faults in scan chain may not cause the chips to fail in normal operation, they can lower the yield in volume production. Therefore, diagnosis of multiple timing faults in scan chains is very useful in both silicon debug and reducing the time to volume production.
Past research in this area can be classified into two major categories: the hardware Design For Diagnosis (DFD) † The author is with the Electrical Engineering Department/GIEE, National Taiwan University (NTU), Taiwan.
* This work was supported by the National Science Council (Grant NSC 93-2220-E-002-012).
a) E-mail: cmli@cc.ee.ntu.edu.tw DOI: 10.1093/ietfec/e88-a. 4 .1024 solutions, and the software solutions. In the first category, Schafer proposed to add extra routings from one scan chain to another scan chain, the partner scan chain [7] . The output of every scan cell is connected to the input of scan cells of the partner scan chain. By doing so, the contents of the scan chain under diagnosis can be observable by the partner scan chain. Edirisooriya proposed to insert XOR gates into the scan chain so that the contents of the scan cells can be flipped before shifting into the next scan cell [8] . Narayanan and Wu proposed to flip the contents of each scan cells by modifying the scan cell design [9] , [10] . These hardware solutions either require custom scan cells or require extra hardware. Besides, for those chips that are already designed and manufactured, the above hardware solutions are not applicable.
In the software category, [1] proposed to use the sequential ATPG to generate diagnosis patterns. For every scan cell i in the chain, a sequence of initialization patterns that control cell i to a desired value is applied. Kundu's idea is good for single stuck-at fault only, not for multiple timing faults. Guo proposed a three-step diagnosis procedure [6] . In the first step, the faulty chain and the fault type is determined by fixed scan in and scan out patterns. In the second step, the upper bound and the lower bound of the faulty cell is determined by logic simulations with unknown values in all cells of the faulty chain. In the third step, fault simulations are performed to obtain the expected faulty outputs. The final diagnosed result is a list of faults sorted by their scores. The score of a fault represents the degree of similarity of its expected faulty outputs to the circuit's actual outputs. Their technique is applicable to single stuck-at faults as well as timing faults. Their technique requires, however, some modifications to handle multiple faults. Huang proposed a probabilistic model for intermittent timing faults in scan chains [5] . Their technique handles multiple faults by ranking the probability of a group of candidate faults. Neither Huang's nor Guo's technique produces deterministic diagnosis results in the presence of multiple timing faults in one single scan chain.
This paper presents a technique to diagnose scan chains with multiple timing faults. The technique consists of two parts. In the first part, the fault type and the number of faults are determined. In the second part, the location of one faulty cell closest to the scan input is diagnosed. This technique deals with multiple faults by applying single excitation (SE) patterns, of which only one bit can be flipped by the faults. By using the SE patterns, there is no need for simulations
Copyright c 2005 The Institute of Electronics, Information and Communication Engineers
with unknown values so the final diagnosis result is deterministic, not probabilistic. It is a pure software solution, no DFD required. This is an ideal solution for silicon debug.
One critical assumption of this technique should be mentioned here. The idea of single excitation pattern is applicable when there is only one type of fault present at a time. That is, no two different types of faults present in the same chain simultaneously. In the case of mixed faults, however, the presented idea can still be applied with some modifications. Please see the discussion section for more details.
The organization of the proposed technique is as follows. The second section introduces some basic terms used in this paper. The third section describes the diagnosis technique in detail. The fourth section shows experimental data of ISCAS circuits. The fifth section discussed some issues related to this technique. Finally section six summarizes the paper.
Definitions
In this paper, the scan cells are indexed in a descending order, from the scan input (SI) to the scan output (SO). The length of the scan chain (L) is the total number of scan cells in the chain. For a given scan cell i, the cells that are indexed higher than i are called the upstream cells of cell i. The cells that are indexed lower than i are called the downstream cells of cell i. This definition follows the preceding research such as [6] . The proposed technique gives the upper bound and the lower bound of the interval of consecutive cells that contains the most upstream faulty cell. The most upstream faulty cell is the faulty cell that has the highest index. The upper bound is the highest index of a group of consecutive scan cells. The lower bound is the lowest index of a group of consecutive scan cells. Diagnosis resolution is defined as the difference between the upper bound and the lower bound. The smaller the diagnosis resolution is, the more precise the diagnosis result is.
Let n denote the clock cycle number and let e(i, n) represent the expected fault-free value of scan cell i at cycle n. In the shift mode, the content of scan cell i is updated by its immediate upstream cell every cycle. The shift operation therefore can be modeled by this equation: e(i + 1, n) = e(i, n + 1). Let a(i, n) denote the actual content of scan cell i at cycle n. For a faulty chain, the actual content of a faulty cell is different from its expected content when certain excitation conditions are met. Table 1 shows the excitation conditions of four types faults considered in Table 1 Excitation conditions of four types of timing faults * .
* n = current cycle; n − 1 = previous cycle; n + 1 = next cycle this research. This table assumes cell i to be faulty. A SlowTo-Rise (STR) fault in cell i is excited when cell i is expected to have a rising transition at the current cycle; that is, e(i, n − 1) = 0 and e(i, n)= 1. The effect of the STR fault is that cell i remains zero instead of rising to one. The rising transition is therefore missing at cycle n. A Slow-To-Fall (STF) fault does the opposite thing. It remains one when it is expected to have a falling transition. A Fast-To-Rise (FTR) fault in cell i is excited when cell i is expected to rise in the next cycle-that is, e(i, n + 1) = 1 and e(i, n)=0. The effect of the FTR fault is that cell i rises one cycle earlier than expected.
Excitation patterns for a given type of fault are scan input patterns that, when shifted in the scan chain, cause excitation in one or more faulty scan cells. For example, the pattern {10110} is an excitation pattern for STR fault. (In this paper, the right most bit of a pattern is shifted into the chain first.) The underlined bits, which are flipped after the fault excitation, are called the sensitive bits. The scan input patterns that cause no excitation for a given type of fault are the none-excitation patterns. For example, the pattern {00000} is a non-excitation pattern for STR fault. By definition, there is no sensitive bit in the none-excitation patterns. Single Excitation (SE) patterns are excitation patterns that, after flipping of the only sensitive bit, become noneexcitation patterns. In other words, there is at most one bit difference between the expected scan cell contents and actual scan cell contents, even in the presence of multiple faults. For example, {01000} is a single excitation pattern for STR fault because after fault excitation, it becomes {00000} which is a none-excitation pattern. It follows from the definition that there is one and only one sensitive bit in a single excitation pattern.
In this research, SE patterns of length 2L are applied for diagnosis purpose. Table 2 lists SE patterns of length 2L for four types of timing faults. The SE patterns in Table 2 can be divided into two portions: the head portion and the tail portion. Each portion is of equal length L. The '0. . . 0' in the table means one or more continuous zeros. Similarly, '1. . . 1' means one or more continuous ones. The first right bit of the head portion is first shifted into the scan chain followed by the second right bit. The first right bit of the tail portion follows the leftmost bit of the head portion. Every fault type has two rows. Totally, there are L − 2 possible SE patterns for the STR and STF faults. There are L−1 possible SE patterns for the FTR and FTF faults. The sensitive bit is underlined. For our diagnosis purpose, the sensitive bits are always placed in the leftmost positions of the head portion.
It can be verified that the SE patterns will become none-excitation patterns by flipping the sensitive bits. Take the first row of STR for example, the pattern {00000 10111} is an SE pattern. After the fault excitation, it becomes {00000 00111} which is none-excitation pattern for STR fault. For clear illustration purpose, this paper assumes no inverter inserted between scan cells. If there is any inversion, the discussions in this paper still hold true as long as the contents of negative polarity scan cells are complemented.
Proposed Diagnosis Technique

Determining Fault Type and Number of Faults
To determine the fault type, [6] proposed to use three test patterns. Their proposed test patterns cannot determine the number of faults in the case of multiple faults. This paper proposes to apply two test patterns to determine the fault type and the number of faults in a scan chain. The proposed test patterns are listed in Table 3 . Each pattern is of length L. The scan chain cells are divided by halves. In the first pattern, the downstream half is all zeros and the upstream half is all ones. The second pattern is the bitwise complement of the first pattern. On the tester, the test procedure is to scan in the first pattern, followed by an immediate scan out without any system clock. The scan out patterns of the faulty chain is recorded by the tester without pass/fail decision made on the fly. The same procedure is then repeated for the second pattern.
In Table 3 , the scan outputs of four types of faults are listed (mismatched bits in bold). In this table, we assume two faults in the table for demonstration. If there exist f STR faults, there will be f more zero than expected in the scan out of pattern 1. If there exist f FTR faults, there will be f more ones than expected in the scan out of pattern 1. Similarly, STF faults and FTF faults are detected in pattern 2 by f more ones or f more zeros than expected, respectively. The proposed two test patterns are able to distinguish the four types of timing faults. The number of faults equals the number of bits flipped in the scan outputs. The maximum number of faults countable is as many as half the length of the scan chain.
Determining Fault Location
The flow chart to determine fault location is depicted in Fig. 1 . Given a netlist, a fault type, and an SE pattern, the scan in (SI) and primary input (PI) patterns are generated by the Automatic Diagnosis Pattern Generators (ADPG). There are two ADPG methods: the combinational ADPG and the sequential ADPG. The associated expected good primary outputs (PO g ) and expected good scan outputs (SO g ) Table 3 Determine fault type and number of faults. are also generated by ADPG. The Circuit Under Diagnosis (CUD) is then tested on the Automatic Test Equipment (ATE) and the actually observed primary outputs (PO a ) and scan outputs (SO a ) are logged into a file. Finally, the PO a and SO a are matched with PO g and SO g to obtain the final diagnosis results, the upper bound and the lower bound of the most upstream faulty scan cell.
Combinational ADPG (C-ADPG)
This paper proposes a Combinational Diagnosis Procedure (CDP), which is different from the regular test procedure in that the former has no system clock. Instead, the CDP applied PI patterns and observes PO at every cycle during the scan chain shifting. The advantage of this CDP is to avoid unknown values in the scan chain. The proposed CDP is described as follows.
1. Shift in the head portion of the SE pattern. 2. Apply a primary input pattern. 3. Observe primary outputs (PO a ) and save to files. 4. Shift in one bit of SE pattern. Observe the scan out bit (SO a ) and record its value in a file. 5. Repeat steps 2 to 4 until the end of the SE pattern.
As the SE patterns shifting in the faulty scan chain, the sensitive bit can be flipped by the faults. To locate the most upstream fault is to find out, as early as possible, the position where the sensitive bit is flipped. Therefore, the primary input patterns that detect a flipped sensitive bit have to be generated. To detect a flipped sensitive bit means to detect a single stuck-at x fault, where x is the opposite value to the expected good value of the sensitive bit. Since the contents of the scan cells are fully specified, the C-ADPG can be implemented by a combinational SSF ATPG. The only difference between a C-ADPG and a combinational SSF ATPG is that the former allows observation points in primary outputs only, not scan outputs. The steps of C-ADPG for a given SE pattern can be described as follows. For a given SE pattern, a C-observable cell is a cell for which the primary input patterns exist to detect a flipped sensitive bit at that cell. A C-unobservable cell is a cell for which primary input patterns do not exist detect a flipped sensitive bit at that cell. For example, scan cells that do not fan out to any primary output are C-unobservable. For a set of SE patterns, a scan cell is C-observable if it is C-observable to any pattern in the set. A scan cell is Cunobservable if it is C-unobservable to all SE patterns in the set.
Sequential ADPG (S-ADPG)
Apparently, the diagnosis resolution of C-ADPG patterns may not be satisfactory because of some C-unobservable cells. To enhance the diagnosis resolution, the CDP unobservable cells have to be diagnosed by the Sequential Diagnosis Procedure (SDP). The SDP is different from the CPD in that the former has at least one system clock. The SDP is applied to one single scan cell every time a SE pattern is loaded. That particular scan cell is called the scan cell under diagnosis, which is one of the C-unobserved scan cells. The SDP is described as follows.
1. Keep shifting in the SE pattern until the sensitive bit reaches the scan cell under diagnosis. 2. Apply a primary input pattern. 3. Pulse a system clock and observe primary outputs (PO a ). Record the PO a in a file. 4. Repeat steps 2 and 4 for a certain number of times. 5. Scan out and observe the contents of good scan chains.
Mask the outputs of faulty scan chains.
The S-ADPG generates a sequence of primary input patterns that detect a flipped sensitive bit. The S-ADPG is very similar to sequential SSF ATPG. There is, however, one major difference between the sequential SSF ATPG and the S-ADPG: the time frame that contains the fault. The sequential SSF ATPG assumes that the fault is present in every time frame. The S-ADPG assumes that the fault is present only in the first time frame, not in any later time frame. This is because we assume that the scan chain faults are present only in scan chain shifting mode, not in normal operation mode. From the above discussion, the S-ADPG patterns can be generated by existing SSF sequential ATPG tools with some modifications. The following steps describe how the S-ADPG can be done for a given SE pattern.
1. Shift the SE pattern until the sensitive bit reaches the position of the cell under diagnosis. Force the shifted SE pattern at the scan input of every scan cell. 2. Inject a stuck-at X fault at the scan input of the cell under diagnosis. X is the opposite value of the sensitive bit. 3. Hold the scan enable signal to one (scan operation) and pulse one clock. By doing so, all the scan cells are initialized to SE patterns and the fault is injected into the cell under diagnosis. 4. Force the scan enable to zero (normal operation) and run sequential ATPG for a certain number of system clocks to detect the injected fault. Allow observation in primary outputs only, not scan cells. 5. After the last system clock, force the scan-enable to one (shift operation). Allow observation in both primary outputs and scan cells of good scan chains. 6. If ATPG successfully generates a sequence of primary input patterns for the injected fault, the cell under diagnosis is marked as S-observable.
For a set of SE patterns, a cell is S-observable if it is Sobservable if it is S-observable to any SE patterns in the set.
A cell is S-unobservable if it is S-unobservable to all SE patterns.
Diagnosis Resolution
A faulty cell can be identified if it is detected by either the combinational or the sequential diagnosis procedure. Hence, a cell is overall observable if it is either Cobservable or S-observable. Figure 2 shows an example chain under diagnosis. 
Matching
After testing the CUD on the ATE, the actually observed primary outputs (PO a ) are recorded in a file. The PO a are then compared off line with PO g . The first mismatch cell is the most upstream cell in which the PO g and PO a mismatch occurs. The first mismatch cell is obtained from either the CDP or the SDP, whichever is the most upstream. The upper bound and the lower bound of the most upstream faulty cell are the DUB and DLB of the first mismatch cell, respectively. The final diagnosis resolution is the DR of the first mismatch cell. Take the scan chain in Fig. 2 for example.
If the first mismatch cell is cell 3, the upper bound of the diagnosis result is DUB(3) = 4 and the lower bound of the diagnosis result is DLB(3)=3. The final diagnosis resolution is one which means the most upstream fault is exactly cell 3.
Experimental Results
To demonstrate the effectiveness of the proposed technique, experiments are performed on ISCAS'89 benchmark circuits of various sizes. A commercial tool that supports both combinational and sequential ATPG is used as the pattern generation engine. Table 4 shows the diagnosis resolutions for four types of timing faults. Every CUD has two rows of numbers. The first row shows the diagnosis resolutions obtained from the C-ADPG only; the second row shows the diagnosis resolutions obtained from the C-ADPG plus the S-ADPG-that is, a cell is observable if it is either Cobservable or S-observable. In each cell of this table, two DR numbers are separated by a slash sign: the average DR and the worst DR. The average DR is obtained by taking the average of DR(i) among all scan cells. The worst DR, which is the maximum value among all DR(i), is the length of the longest diagnosis interval. Their number of combinational logic gates and the length of scan chain (L) are shown below the CUD name. All scan cells of a CUD are chained into one single scan chain because the numbers of scan cells are small. Three systems clocks are applied in the S-ADPG. All possible SE patterns are exhaustively tried except for the last two large CUDs due to S-ADPG time limitation. From this table, it can be seen that the average DR are less than ten in all cases and the worst DR are less than ten in most cases. For s15850, the worst case diagnosis resolution is 7, which is only 1.3% of the scan chain length L. For s38584, the worst diagnosis resolution is less than 1% of the chain length. The diagnosis resolutions of the above cases are pretty good for failure analysis purpose. In spite of the good results, there are some cases in which diagnosis resolution are not ideal. For s5378, the worst DR of STR is 13, which is 7% of the total 179 cells. For s9234, the worst DR of STR is 19, which is 9% of the total 211 cells. There are two main reasons for unsatisfactory diagnosis resolutions. The first one is due to the constraints of the SE patterns. The second reason is due to the structure of the circuits. S9234 has quite a few redundant faults that are not testable [11] . A further analysis finds that about 6% of the faults in s9234 are redundant.
To visualize the distribution of the diagnosis resolutions, Figure 3 shows a histogram of DR. This plot is obtained from CUD s9234, STF fault. The X-axis is the DR value and the Y-axis is the number of cells of the corresponding DR. There are a total of 211 scan cells in this circuit. The light color bars are the DR of the C-ADPG, and the dark color bars are the DR of the C-ADPG plus the S-ADPG. A significant decrease in DR is observed with the addition of the S-ADPG.
For the S-ADPG, one key factor that affects the diagnosis resolution is the number of system clocks (SCK). Table 5 shows the diagnosis resolution of 1, 2 and 3 system clocks. For large CUDs, increasing the number of system clocks greatly improves the diagnosis resolution. This ta- ble shows the numbers for the STF fault only. The numbers for the other fault types show similar trends and therefore are not listed. In this table, the C-ADPG is assumed to be performed before the S-ADPG. Table 6 compares the CPU time for the C-ADPG and the S-ADPG of different numbers of system clocks (STF fault). The C-ADPG is performed on all faults in the chain. The S-ADPG is performed only on the C-unobservable faults. The CPU time needed for S-ADPG is significantly longer than that of the C-ADPG. As the number of system clocks increases, the CPU time grows rapidly. No experiment more than three clocks is performed because the CPU time can be too high for practical use. The CPU time shown here is obtained for one SE pattern. If more than one SE pattern is used, the total CPU time needed is proportional to the number of SE patterns.
Besides the number of system clocks, another important factor that affects the diagnosis resolution is the number of SE patterns. The more SE patterns use in ADPG, the lower the DR becomes. An experiment is conducted to evaluate the impact of the number of SE patterns on the DR. Table 7 shows the diagnosis resolution (STF fault) versus three chosen SE patterns in S-ADPG. The first SE pattern chosen is of the same form as the second row of STR in Table 2 . The second SE pattern chosen is different from the first SE pattern in half of the scan cells. The third pattern is different from the second scan chain in another half of the scan cells. By maximizing the difference between these three SE patterns, the chance of getting different observable cells is maximized. The first column in Table 7 shows the DR of using only the first SE pattern. The second column shows the DR of using both the first and the second SE patterns. The third column shows the DR of using three SE patterns. The last column shows the DR of using all SE patterns exhaustively. It can be seen from this table that the diagnosis resolution of the three chosen SE patterns is very close to that of all SE patterns. Table 8 compares the diagnosis resolutions of the proposed technique with a previous technique. The CUD used in [6] is an industrial design. The chain under diagnosis has 410 scan cells. The worst DR and the average DR of STF and FTR are calculated from the figures of the original paper. (The data of STR and FTF are not available.) For the proposed technique, three CUDs are shown: s9234 (L=211), s15850 (L=534), and s38584 (L=1,426). The proposed technique has lower DR than that of [6] . Although the CUDs in this comparison are not the same, the results indirectly show that the proposed multiple fault diagnosis technique can achieve better diagnosis resolution than a single fault diagnosis technique.
Discussions
Mixed Types of Faults
The presented technique can be extended to diagnose mixed types of faults in one scan chain. Consider two types of mixed faults: STRF, FTRF. A STRF fault is a combination of a STR fault and a STF fault. The faulty cell is excited when it is expected to make a rising or falling transition. The fault effect is that the faulty cell rises or falls one clock cycle later than expected. A FTRF fault is a combination of a FTF and FTR fault. The fault effect is that the faulty cell rises or falls one clock cycle earlier than expected. Table 9 shows the SE patterns for the mixed types of faults. The tail portion consists of all zeros or all ones. The head portion is a stream of all zeros or a stream of all ones followed by a sensitive bit. The sensitive bit is the same as the tail portion for the STRF fault. The sensitive bit is the same as the head portion for the FTRF faults. The sensitive bit is flipped as soon as it passes the faulty cell. To diagnose the faulty cell is to find out, as soon as possible, where the sensitive bit is flipped. With the SE patterns in Table 9 , our diagnosis technique can be extended for the mixed types of faults. Table 10 shows the simulation results on the ISCAS circuits of mixed faults. The average diagnosis resolutions and worst diagnosis resolutions are separated by slashes. It can be seen from the diagnosis resolutions of the mixed faults are comparable to those in Table 4 . These numbers show that our technique is still applicable even in the presence of mixed types of fault. The number of system clock used in the experiment is three.
Here is one final notice of the preceding technique. Table 9 assumes that there is only one fault in the chain at a time. If there is more than one mixed fault in a chain, there is more than one bits flipped and the single excitation assumption cannot hold.
Limited Accessibility to Primary Outputs
Due to the recent high density packaging, the primary outputs sometimes are not easily accessible. In the case of limited accessibility to primary outputs, an alternative solution is to add boundary scan (or JTAG) architecture to the circuits under test. For our diagnosis technique to work correctly, one important point should be addressed when designing the boundary scan architecture. It is required that the boundary scan chain is able to shift independently of the internal scan chains. In this way, the primary outputs can be observed in the CDP or the SDP without affecting the contents of the internal scan chains.
Conclusion
This paper presents a technique that diagnoses the timing faults in scan chains with multiple faults. The proposed technique has the advantage of handling scan chain with multiple faults because of the single excitation patterns. In the single excitation patterns, only one bit can be flipped even in the presence of multiple faults. This diagnosis technique provides deterministic results and prevents the trouble of simulations with unknown values. For the ISCAS circuits in our experiment, the presented technique achieves an average diagnosis resolution less than ten scan cells.
