We propose a low-overhead scan design methodology which employs a new test point insertion technique to establish scan pat,hs through the functional logic. The technique re-uses the existing functional logic; as a result, the design-for-testability (DFT) overhead on area or timing can be minimized. In this paper we show an algorithm which considers the test point insertion for reducing the area overhead for the full scan design. We also discuss its application to timing-driven partial scan design.
I. INTRODUCTION
Automatic test pattern generation (ATPG) for sequential circuits is a difficult problem because of the lack of direct controllability of the present state lines and direct observability of the next state lines. To enhance testability, design-for-testability (DFT) techniques aiming at improving controllability and observability of the state lines have been proposed, such as full scan [l! 2, 31 and partial scan [4, 51. Both scan techniques facilitate testing of a sequential circuit by interconnecting selected flip-flops into a shift register during the test mode to directly control and observe the state lines. The complexity of ATPG is therefore reduced. However, the area and delay overheads imposed by conventional scan can be significant due t,o the extra scan multiplexers (MUXs) in the scan flip-flops (assuming that MUXed D flip-flops are used) and the extra routing area for the scan chains.
To alleviate the above D F T penalty, we propose a lowoverhead scan design methodology which employs the test point insertion to establish the scan paths through the existing combinational logic. These test points are established by appropriately inserting a two-input AND gate or a two-input OR gate with a common test input. The essential idea is illustrated in Figure 1 . In this example, we established a partial scan chain involving three flip-flops using the functional logic and the area overhead is a two-input AND gate, while conventional scan design would require two multiplexers.
In our method, the cost of inserting a test point is one
AND (OR) gate and a connection from the test input T )
while converting a flip-flop int.0 a MUXed scan flip-flop requires a multiplexer, a connection from another flip-flop, ' This work was conducted when the author was in University of California, Santa Barbara.
33rd Design Automation Conference@ Permission to make digitalhard copy of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. and a connection from the test input T . Inserting tes points is advantageous in terms of area, if inserting k tes points can successfully establish k more scan paths. A scai path here is defined as a physical path between two flip flops that can be fully sensitized in the test mode. More over, the method of inserting test points can be applied fo timing-driven scan design. For example, we can add tes points away from the critical paths while still being able tl establish scan paths through critical nets.
In this paper, we discuss two applications of using t h test point insertion technique for scan design. First, w consider the full scan design environment, where the ob jective is to estjablish as many scan paths and use as fet test points as possible. The advantage of our technique ii this application is the reduction of area overhead. Next, w consider the partial scan design environment. The objec tive is to break cycles without degrading the performanc of the design. In partial scan design, a flip-flop is selectec by the cycle-breaking algorithms [4, 6, 71 sequentially ant if timing constraints are not met for converting it into MUXed scan flip-flop, test point insertion technique ca. be applied to avoid adding test circuitry onto the critica path to eliminate timing degradation. Due to the spac limitation, some details are omitted. Please refer to [8] .
11. REVIEW AND TERMINOLOGY To improve the ATPG efficiency, a testpoint insertio.
technique [SI inserted a set of test-cells into a circuit to im prove the observability and controllability of some selectec internal signals. The size of a test-celi may be large and t h compound effect of adding such cells may result in signifi cant area overhead (a test-cell requires at least one flip-flo. and two multiplexers). Our test point is simply a two-inpu AND or a two-input OR gate and the purpose of insertin test points is to establish a scan chain, which in turn make scanned flip-flops fully observable and controllable.
The work in [lo, 11, 121 presented algorithms to reduc scan overhead by attempting to merge scan MUXs into th combinational logic during logic synthesis. In [13] , a scai design methodology called free-scan was proposed. By set ting appropriate values at primary inputs during the tes mode, some combinational paths between flip-flops can b sensitized and thus a portion of the scan chain can be es tablished without any D F T overhead. In [14] , the concep of embedded scan was proposed and attempts were mad to embed the scan-multiplexers into the logic immediate1 preceding the scan flip-flops.
We define some terminology used in the following dis cussion. A connection is specified by a pair of gate [ To establish a scan path between two flip-flops may require more than one test points. The number of side-inputs along a selected combinational path is an upper bound on the requirement of the number of test points for establishing a scan path through the selected path.
In general, assigning a constant value at a connection (by inserting a test point) may potentially disable more than one ,side-input because the connection may have multiple fanouts. To efficiently utilize this methodology, we should ana1,yze the circuit's topology and determine the global effect of inserting a particular test point,. The objective is to decide at which connections test points should be inserted and what constant values they should be, so that we can establish as many scan paths as possible with as few test points as possible.
A . Test point insertion for full scan design
Fclr a full scan design, the goal here is to use the test point insertion technique to establish as many scan paths (through functional logic) with as few test points as possible, and then use the conventional scan conversion (MUX insertion) for the missing scan paths in order to have a connected scan chain. We developed an algorithm, called TPGREED, for this purpose. TPGREED examines the combinational paths between flip-flops in the circuit and then, in a greedy way, sequentially inserts the test points with appropriate values. During the insertion, all the possible candidate locations are sorted according to their potent ial contribution in establishing scan paths. The details of the algorithm are as follows. Given a sequential circuit, we build first a sparse matrix A, where the entry Ai3 represents a set of combinational paths from flip-flop Fi to Fj . Since there might exist a large number of paths in the circuit and in general it is more costly to establish a scan path through a combinational path with a large number of side-inputs, we heuristically limit the number of paths for consideration and record only those paths with a number of side-inputs smaller than a user specified upper bound I<bound to save computation time.
Given a combinational path pk in Aij, let lPkl denote the number of side-inputs along this path. During the iteration of test point insertion and the forward implication of the assigned constants, side-inputs of Pk may have either sensitizing, controlling or unknown values. If there exists a side-input which has a controlling value, it will be impossible to build a scan path through p k . We call such a path a nullified path and remove it from A;j. On the other hand, if there is no side-input with a controlling value, we use wk to denote the number of side-inputs which have an unknown value. The gain of setting one of the side-inputs to a sensitizing value is 1/wk. Notice that, for each path p k , the number j p k ) does not change while wk decreases during the process. When wk is reduced to zero, the path p k successfully becomes a scan path.
Given 
and use one test point to achieve another desired constant. In general, an optimization algorithm is required to decide the optimal input assignment to maximize the number of signals with desired values without inserting test points. We adopt the algorithm described in [13] for this purpose.
Overall algorithm
Besides the circuit, users should provide two extra parameters Kbound and gainbound. The parameter I<bound is used to limit the number of side-inputs for paths considered for establishing scan paths. The parameter gainbound is used to terminate the algorithm when the highest gain computed by Equation 1 for all candidate connections is smaller than gainbound. During the iteration, some scan paths may be established. Besides adding them as a portion of the scan chain, we also have to make sure that the subsequent insertions will not destroy the established scan paths. In our current implementation, after a test point is inserted, we re-compute the gain of inserting a test point at each connection, before inserting the next one. This could cause high computation time. One possible solution is to have an incremental algorithm which only re-computes the gain of those affected connections. We also apply a procedure which determines the values for the primary inputs to reduce the number of required test points (as discussed in Section B).
D. Experimental results
We tested the proposed test point insertion method on a number of ISCAS89 and MCNC91 sequential benchmarks. All circuits are optimized by SIS script.algebraic script and mapped using technology libraries nand-nor.genlib and mcnc-1atch.genlib for minimal area. In the current implementation, we can only handle primitive gates, including AND, OR, NAND, and NOR gates.
The results of test point insertion are shown in Table I. We report the number of flip-flops in the circuit (A), the number of test point inserted (B), the number of test points' values which can be setup freely by primary inputs (C), and the number of scan paths established (D). The CPU time is measured on a SUN SPARC 5 with 128 Megabyte memory. In our experiments, the parameters Kbound and gainbound are set to 10 and 0.5. For example, we inserted 137 test points in circuit 5-15850 to establish 244 scan paths. Among the 137 test points, we can use primary inputs to set up two of them. So the actual number of required test points is 135. Assuming that the area costs of inserting a multiplexer and a test point are 2 and 1, the reduction of area overhead will be If we use MUXed D flip-flops, the area overhead can be approximated as 2A. For our method, the term ( B -C) rep resents the number of test points inserted and term (A-0: represents the number of remaining flip-flops which require: a multiplexer for each of them.
The amount of the reduction depends on a circuit's structure, the logic synthesis algorithm, and our test point i n sertion algorithm. In the case of ~35932, as much as 83% in the area overhead reduction can be achieved. The computation time for ~38584 is quite high. This is because the number of paths considered in our algorithm is hugt (270463). Possible ways to reduce the computation timc are to have a smaller Kbound, or have an incremental algo rithm for re-computing the gains as discussed in Section C
Iv. TIMING-DRIVEN SCAN PATH DESIGN BY T E S T P O I N T INSERTION
Although partial scan has a lower overhead in terms o area, it may not be so when we consider timing issues In [7] , a timing-driven partial scan flip-flop selection algo rithm was proposed. There, a flip-flop with a slack timc less than the gate delay of a multiplexer is not allowed fo: selection, even if it has high gains for breaking cycles. Ai a result, the number of selected flip-flops for breaking cy cles is usually larger than the case in which timing issuer are not considered. Moreover, there are circuits that havc no cycle-breaking solutions without degrading the perfor. mance. Here, we enhance the timing-driven partial scar design methodology [7] by combining the cycle-breakinj algorithm and the test point insertion method.
If we scan a flip-flop by converting it to a MUXed scar flip-flop, where the slack time of the flip-flop is less thar the gate delay of a multiplexer, such a conversion will re sult in timing degradation. However, by incorporating thc test point insertion technique, we may scan the flip-flol without any timing penalty. Figure 3(a) shows a portio1 of a sequential circuit, where the bold lines denote a critica path. To scan the flip-flop Fz by inserting a MUX directl: behind F2 will increase the critical delay and result in tim ing degradation, as shown in Figure 3(b) . However, then exists a combinational path from FI to F2. To make thi combinational path Fl -+ g l -+ 92 4 F2 a scan path, a1 the side-inputs, a and c, must have sensitizing values ii the test mode. To achieve this, we insert a test point (01 gate) at a. However, we cannot insert a test point at without degrading the performance, since c is on a critica path. Instead, we can insert a test point (AND gate) at b which in turn will induce a sensitizing value 0 at c. Thl insertion of test points at a and 6 causes no timing viola tion and establishes a sca.n path from FI to F2. The resul is shown in Figure 3(c) . The above transformation has one disadvantage. That s, since the scan path is from Fl to F 2 , we have to scan F ' l too in order to have a connected scan chain. In the par-;ial scan environment, scanning FI might not help break :yclet;. Also, there is no guarantee that we can scan Fl without timing degradation. To overcome this problem, we :an consider insertion of MUX's as well. The MUX's need not k'e placed immediately behind the scan flip-flops. We mly -insert them at connections with enough slack times. If necessary, we may also insert test points at the corresponding side-inputs to sensitize the scan path. For example, in Figure 4 (a), we can insert a multiplexer at a , and a test point at b to establish a scan path Fl to F'2 ( Figure 4(b) ). Notice that, using the above transformation, the predecessor of Fz in the scan chain need not be Fl and could be any other flip-flop.
A. lbpological feasibility analysis
Given a flip-flop selected by the cycle-breaking algorithm for scan, we derive the formula in Figure 5 to check if we can scan it without timing degradation. For simplicity, we assume that a gate has one of the following five types: Tc' scan a flip-flop, some connection in its fanin cone has to carry the signal from the scan chain. Such a signal is denoted as scana,. Also, some connections have to be set to 1 or 0. For example, to convert the circuit in Figure 4 (a) to Figure 4(b) , we assign a constant value 0 at b and assign a as the scani,. We define cost(ci, value) as the area cost of assigning a connection c; as scan;,., 1 or 0 , where value is scanin, 1 or 0. In Equation 2 of Figure 5 , if the slack time of ci is greater than the gate delay of a multiplexer, we simply insert a multiplexer and the cost is the area of a multiplexer. Otherwise, we recursively check if we can use cj (one of ci's fanins) to be part of the scan p a t h (assigning it to scani,) and make other fanins ck's ( k # j )
have sensitizing values (assigning them to 1 or 0). Since there may exist multiple solutions, we choose the one with ,scun;,) is not 00, we can scan this flip-flop without timing degradation. The selection of scan flip-flops and the insertion of test points are done sequentially. It is important to keep track of the created scan paths and make sure that the subsequent insertions' will not destroy the previous efforts. That is, there are sorne connections which have constant values associated with them due to the previous insertions. We classify them into two categories: desired constants and side-eflect consliants. For example, in Figure 6(a) , to make the connection (5 to be 0 in the test mode, we can insert an AND gate at c (set c to 0), insert an AND gate at b (set b to 0), or insert a OR gate at a (set a to 1). Assume that the slack times of b and c do not satisfy the requirement, while the slack time of a does. A test point can be inserted at a (Figure 6 while e = 1 is a side-eflect consfant. To preserve the e 1 forts of this insertion, the desired constants should not be changed by subsequent test point insertions. On the other hand, we are free to change the constant value of side-effect constants.
There is a problem in using the recursive operations defined above. That is, when a test point is inserted at a connection c, the slack times of gates in c's fanin or fanout cone may be affected. Consequently, the function slack(c;) is not a constant value but depends on the decisions made in the previoufi recursions. Taking such update into account will result in a very complicated recursive process. To simplify this problem, we restrict the application of recursion only to the non-reconvergent fanin regions as defined below. With this restriction, we don't have to update the slack times during the recursion and the result is guaranteed to be correct. Notice that since we restrict our solution space to non-reconvergent fanin region, the obtained solution might be a sub-optimal solution. Definition 1 Given a connection c , we define its nonreconvergent fanin region to be a set of connections in its fanin cone, so that each connection has exactly one path to See the circuit in Figure 7 for illustration. The dotted region is the non-reconvergent fanin region of the connection c. Although the gate g l has two fanouts, a and e , there is only one path from g l to c passing through a. As a result, the connections a, b and dare in the non-reconvergent fanin region of c. On the other hands, since the gate g 3 has two paths to c, the connections j and IC are not in. The non-reconvergent fanin region of a connection c can be constructed in linear time in terms of its size by using breadth-first-traversal from the connection c toward the inputs.
B. Timing-driven partial scan algorithm
The overall algorithm integrates a conventional cyclebreaking algorithm [SI and our test point insertion algorithm. The cycling-breaking algorithm used here is originally from [6] and then modified by [7] . It consists of two major steps: (1) graph reduction and (2) heuristic selection. In the graph reduction step there are five operations. The first three (source operation, sink operation, self-loop operation) are exactly the same as the ones given in [SI while the last two reduction operations (unit-in operation and unit-out operation) are modified to take into account the slack times of the flip-flops. In the heuristic selection step, the algorithm chooses the one with maximal summation of the fanins and fanouts. For more details, please refer to [6, 71.
In our algorithm, we examine the t,opological structure of the given circuit and build the flip-flop connectivity graph excluding self-loops. Given a flip-flop jtf selected by the cycle breaking procedure for scan, Equations 2,3,4 are performed to find a zero-performance degradation solution to scan ff in the non-reconvergent fanin region of ff If such a solution exists, we always find it and return the set of test points. The algorithm t,hen inserts appropriate MUX, AND or OR gates into the circuit and performs an incremental static timing analysis for the next run. If there exist no zero-performance degradation solutions, it returns NULL and the algorithm will mark this flip-flop and instruct cycle breaking procedure to choose another one. It continues until no cycles are left in the resulting graph or all flip-flops left have been marked. If there still exist cycles in G , we know there is no zero-performance degradation solution for this circuit. The algorithm then iteratively selects a flipflop with minimal timing degradation using the equations similar to the ones described in Equations 2,3 and 4. We have implemented a prototype system, named TP. TIME, based on the SIS-1.2 [15] package. The experimen.
C. Experimental results
tal results for a number of ISCAS89 and MCNC91 sequen. tial benchmarks and the experimental setup are describec as follows.
All the circuits are first optimized by SIS scrzpt.delaj script and then mapped for minimal delay. Since we targei the timing-driven partial scan design, it is more reasonablc to optimize the original circuits for minimal delay. Tht longest delay of the optimized circuit is used as the circui, timing constraint. The technology libraries used for map ping are based on nand-nor.genlib and mcnc-latch.genlii from SIS-1.2 package. We choose nand-nor.gen1ib be. cause the current implementation can only handle prim itive gates. To facilitate test, point insertion (adding AND OR and MUX gates into the circuit), we appended thret entries in the technology library in order to perform stati, timing analysis in SIS-1.2. Each library cell's drive(g) is s e to 0.2 and the input capacitive load is set to 1. For exam ple, inserting a multiplexer at a connection will decrease it, slack time by 2.2, since its block delay is 2.0 and the extrc 0.2 is due to the fanout of the multiplexer. The statistics o the SIS-1.2 optimized circuits are shown in Table 11 . No tice that the test input T might have many fanouts an( consequently its large capacitive load would cause tiniin; problems. Fortunately, in the mission (normal) mode, sincl the value of T is fixed to 1, the paths from T to test point, or MUX are false paths. Therefore, we should disable th, paths originating from T during the static timing analysis
Three different experiments were performed for each op timized circuit. First, we ran the Lee-Reddy [6] cycle breaking algorithm(CB) which does not take timing intc ram (TPTIME). The results are shown in Table 111 . For ach experiment, we report the number of selected flip-.ops, and the area and delay of the resulting circuit. As ?e can see, without taking timing into account, the first nethod (CB) selected fewer flip-flops and had smaller area lverhead, but all the tested circuits have timing degradaion ranging from 2.2% to 16.4%. On the other hand, the ccouiit. Second, we ran the timing-driven cycle-breaking TD-CB) algorithm shown in [7] . Third, we ran our .proiming-driven cycle-breaking algorithm. (TD-CB) selected nore -Bip-flops and had a larger area overhead, but the timng degradations for the tested circuits are smaller, ranging tom 0.0% to 16.4%.
Our method (TPTIME) incorporates test point inserion tsechnique to scan timing-critical flip-flops. Compared o CEi, TPTIME has a larger area overhead due to the 'xtra AND or OR gates. However, compared to TD-CB, ince the number of selected flip-flops is smaller, the area iverhl2ad may be less. In term of timing degradation, our nethod TPTIME obtains the best results among the three nethods. In most cases, there is no timing degradation at 111.
In this paper, we propose a low-overhead scan design nethodology which employs the test point insertion techiique to establish scan paths through the functional logic. ipplications for reducing either area or timing overhead tre addressed and the exDerimenta1 results demonstrate its Isefu'lness.
Since the scan path is a part of the combinational logic, t is necessary to test the scan path prior to testing the :ntire circuit. This can be accomplished by scanning in 2 sequence of alternating 0's and 1's and scanning them >ut [13] . If there are some discrepancy between the scann and scan-out data, we know that the circuit is faulty. Moreover, by examining the scan-out data, certain faults {in the combinational logic) which affect the correctness of the scan chain can be tested before the application of scan tests.
