Abstract -This paper presents a non-scan design-for-testability technique applicable to register-transfer (RT) level dafapath circuits, which are usually very hard-to-test due to the presence of complex loop structures. We develop a new testability measure, and utilize the RT-level structure of the data path, for cost-eflective re-design of the circuit to make it easily testable, without having to either scan anypippop or breakloops directly. The non-scanDETtechnique was applied to several data path circuits. Experimental results demonstrate the feasibility of producing non-scan testable data paths, which can be tested at-speed. The hardware overhead and the test application time requiredfor the non-scan designs is signijicantly lower than the corresponding partial scan designs.
I. INTRODUCTION
Several high level synthesis for testability approaches have been proposed to generate easily testable data paths for both Built-In-SelfTest (B1ST)-based testing methodology [l, 2, 3] , and Automatic Test Pattem Generation (ATPG) methods [4, 5, 6 , 71. However, almost all BIST-based approaches assume a scan design methodology since random testing is not well-suited for sequentialcircuits. Also, almost all the ATPG-based high level synthesis for testability approaches assume the use of scan registers to make the data paths testable. Like the high-level techniques, all the existing Register Transfer (RT) level techniques [8,9,10] are scan-based, and cannot generate testable data paths without the use of scan.
Scan-based techniques have the disadvantage that the test application time is very large compared to non-scan designs, since the test vectors have to be shifted through the scan chain. Reduction of test application time has been addressed in several ways, like arranging scan flip-flops in parallel scan chains [ll] , and reconfiguring scan chains [12] . While the former method is limited by the number of primary inputs and primary outputs of the circuit, the latter approach is limited by the ability of the circuit to be decomposed into a set of kemels. On the other hand, non-scan DFT techniques do not require to scan any FPs, thus eliminating the need to shift test vectors through scan chains, and greatly reducing the test application time.
However, the biggest disadvantage of scan-based DFT techniques is the inability of scan designs to be tested at-speed, which assumes significance in light of recent studies showing that a stuck-at test set applied at-speed identifies more defective chips than a test set having the same fault coverage but applied at alower speed [13] . The studies motivated researchers to investigate non-scan DFT techniques to make sequential circuits testable by introducing controllability and observabilitypoints [14] . The main advantageof the non-scandesigns is that the test vectors can be applied at-speed.
This 
640
testable, without using scan registers. We assume that the control signals to the data path can be made fully controllable by loading the FFs of the controller with primary input signals, using the technique in [14] . Knowledge of the RT-level structure, as well as the functions of the RT-level components, are utilized in the proposed DFT approach. Instead of conventional techniques of selecting flip-flops (FF) to make controllable/observable, outputs of execution units (EXU) are selected using the EXU S-graph introduced in the paper. We introduce a new testability measure, k-level controllable and observable loops, and demonstrate that it suffices to make all the loops k-level controllable/observable, k > 0, to achieve very high test efficiency. The new testability measure eliminates the need by traditional DFT techniques to make all loops directly (0-level) controllable/observable, reducing significantly the hardware overhead required, and making the non-scan DFT approach feasible and effective. We introduce the use of dual points to make one loop controllable while making another loop observable. We present efficient algorithms to add the minimal hardware possible to make all loops in the data path k-level controllable/observable, without the use of scan FFs.
We applied our non-scan DFT technique to several moderately sized data paths. The experimental results demonstrate the effectiveness of the k-level heuristic and our non-scan DFT technique to design highly testable data paths, with nominal hardware overhead. Besides the main advantage of at-speed testing, experimental results also demonstrate that the hardware overhead and the test application time required for the non-scan designs is significantly lower than the partial scan designs. order W cascade filter, synthesizedfrom behavioraldescription using the HYPER high-level synthesis system [15] . The corresponding register S-Graph [16, 17, 61 in Figure l(b) shows the dependencies betweenthe registers of the data path. The register S-graph reveals the existence of severalloops involving the registers. As can be expected, sequential ATPG is v e q difficult for the datapath, as indicated in Table  2 by the row Orig.
SCAN AND NON-SCAN DFT
The testability of the data path can be improved using partial scan techniques [17] , shown by the rows Opus and LR respectively in Table  2 . The sequential ATPG program HITEC [19] , can achieve 100% test efficiency on the scan designs. Besides the high area overhead, the scan designs have high test application time, indicated by column Tappl. Also, the scan designs cannot be tested at-speed.
A . EXU S-Graph
Unlike a partial scan DFT technique, it is not necessary for a nonscan DFT technique to restrict to only registers the choice of nodes to break, that is, make controllable/observable. Also, as opposed to making the same point controllable as well as observable, it may be more cost-effective to make some points controllable, while some other points observable. In this section, we introduce the EXU Sgraph. We show that in a data path, (the outputs 00 EXUs are better choices for controllable/observable points than registers. Each node in the EXU S-graph represents an EXU in the data path. There exists a directed edge from node U to node U, labeled i, and denoted U - 
B. Non-Scan DFT of the IIR Cascade Filter
Next, we proceed with the task of non-scan DFT of the data path in Figure l (a). Since A1 and A2 form the MFVS of the EXU S-graph, making the outputs of A1 and A2 controllable/observable would break all the loops directly, that is, make all the loops 0-level controllable/observable. That is, any value at the outputs of AI, A2 can be controlled and observed in 1 clock cycle (time frame). Compared to the Register S-graph solution, which requires making three registers controllable/observable, this solution seems better. However, we show that much less expensive non-scan DFT techniques would suffice to make the data path testable.
The EXU S-graph in Figure l(c) reveals that all loops through A2 are observable, since A2 goes directly to the PO Out. Hence, we need to add only a controllability point to output of A2, while adding both a controllability and observability point to the output of Al. Figure  2 (a) shows the modified data path of Figure l(a), with test hardware added (shown in bold) to insert one controllability point at the output of A1 and A2, and one observability point from the output of A2. The output of A1 is made controllable by multiplexing it with the PI In. The multiplexor is controlled by the test pin ntest, which is set to "0" during the normal operation of the data path, and can be set to any value required during the test mode. Similarly, the output of AI is made observable by multiplexing A1 with a PO Out, as shown in Figure 2(a) . A test efficiency of 100% could be achieved on the resultant data path, as evidenced by the row 0-lev in Table 2 . The test hardware overhead required for the modified data path is 429 cells, (5.7% of the original data path), which is less than the overheadof 665 cells needed for the scan designs (rows Opus, LR in Table 2 ). Besides having the main advantage of at-speed testing, the number of clock cycles required for test application (column Tappl) for the non-scan design is much less than the scan design.
It is not necessary to make the loops of the data path directly (0-level) controllable/observable. Figure 2(b) shows an altemate testable design, with the non-scan test hardware shown in bold. Instead of adding a controllability point to the output of A2, only a constant ("O", the identity element of the adder) is added to the right register file (RA2) of A2. Any value at the output of A2 can still be justified by at most two time frames. For example, if a value of 9 needs to be justified at the output of A2, in one time frame the registers LA2 and RA2 can be set to appropriate values 9 and 0, and in the next time frame the values of LA2 and RA2 can be justified by In and the constant. Adding the constant requires much less hardware overhead than adding a controllability point at the output of A2, since the multiplexor logic associated with constant signals can be pruned. The loops through A2 are now 1-level controllable. The resultant (1-level controllable/observable) data path shown in Figure 2 (b) has less hardware overhead than the 0-level solution shown in Figure 2(a) . Also, a high test efficiency of 99% could be achieved on the resultant data path, as evidenced by the row 1-lev in Table 2 .
The data path in Figure 2 (c) demonstrates more effectively the benefits of non-scan DFT at the RT-level, and the new notion of klevel controllable/observable loops. The data path shows the addition of just two constants to the right registers, RA1 and RA2, of the EXUs A1 and A2 respectively. As explained in section lII, all the loops in Table 2) , as compared to an overhead of 665 cells for the scan designs, 429 cells for the 0-level non-scan design, and 349 cells for the I-level non-scan design. The 2-level testable design, however, has a very high test efficiency of 98%, comparable with the test efficiency achieved by the more expensive scan designs, and the 0-level and 1-level non-scan designs.
The non-scan designs and their high test efficiency results demonstrate the feasibility of using non-scan DFT schemes, and establishes the technique of making loops k-level controllable/observable as an efficient alternative to the traditional DET technique of breaking all loops directly. The new testability metric is explained in the next section.
K-LEVEL CONTROLLABLE/~SSERVABLE LOOPS: A
COST-EFFECTIVE DFT APPROACH We first define k-level controllable/observable nodes. We consider EXUs (their output bus) as the only possiblenodes. We introduce nonscan DFT techniques to make nodes k-level controllable/observable.
Next, we define k-level controllable/observable loops in terms of klevel controllable/observable nodes.
Defiition 1 An EXU M is k-level controllablelobservable if any value on the output of M can be justifredlpropagated in at most k i l clock cycles {time frames). Alternatively, for any value that needs to be justijied at the output ofM, there exists at least one vector sequence of length at most k i l that justifies the value.
Consider the data path shown in Figure 2(c) . The output of A I is 2-level controllable, as explained below. For example, to jushfy a value of 15 at the output of A l , in the first time frame LA1 can be set to 15, and RA1 to 0. In the second time frame, the value of RA1 can be immediately justified by the constant. To jushfy the value of EA1, which is the output of A2, the input registers of A2, LA2 and RA2, are set to 15 and 0 respectively. In the third time frame, RA2 can be justified because of the presence of the constant to RA2. Suppose the constant K4 to M3 is 1. LA2 can be justified by setting I n to 15. Similarly, any value at the output of A1 can be justified in 3 time Erames, making A1 2-level controllable. Note that without the addition of the constants, the output of AI is not controllable, as is in the original data path of Figure 2 (a). It can be similarly shown that the output of AI is 2-level observable, since any value at the output of A1 can be propagated out in 3 clock cycles.
The output of an EXU, Z, can be made k-level controllable/observable either by a direct scheme, or a register-file based scheme. In the direct scheme, the EXU output is directly multiplexed with a k-level controllable node to make Z k-level controllable. The EXU output is made k-level observable by directly multiplexing it with another node which is k-level observable. Consider the EXU shown in Figure 3(a) . Figure 3(b) shows how ALUl is made k-level controllable and observable using the direct scheme.
In the register-file based scheme, an EXU (output) is k-level controllable if at least one register of each register file of the EXU has a k-1 level controllable input, as shown in Figure 3(c) . An EXU is k-level observable if it has an interconnect to a aregister Ele of another EXU, which is k-1 level observable, and whose other register file has a 1-level controllable input. Figure 3(d) shows how ALUl is made k-level observable.
Definition 2 A loop is k-level controllable ifthere is at least one node in the loop which is k-level controllable. A loop is k-level observable
if there Is at least one node in the loop which is k-level observable.
Definition 3 A data path is k-level testable ;fall loops in the data path are k-level or less controllable and observable.
Consider the data path shown in Figure 2 (c). It has been derived from datapath in Figure l(a) , by adding two constants ("0") to theright registers, RA2 and RA1, of the EXUs A2 and A1 respectively. All loops going through A1 are 2-level controllable and 2-level observable since A1 is 2-level controllable/observable, as shown before. Similarly, all loops going through A2 are 1-level controllable/observable. Hence, the data path shown in Figure 2 (c) is 2-level testable.
hardware overhead for a dual point is the same as a controllability or an observability point. Hence, the dual point solution is less expensive than the the 0-level solution shown in Figure 4 (b), which employs controllability and observability points. In fact, the hardware overhead of the dual point solution (row 3-lev) is 40% less than the overhead of the 0-level solution, as shown in Table 5 . Also, the dualpoint solution has avery high test efficiency, 99%, shown by row 3-lev in Table 5 .
V. ALGORITHMS TO ADD TEST HARDWARE FOR K-LEVEL

TESTABLE DATA PATHS
We briefly describe the algorithm which adds the minimal hardware possible to make all loops in the data path k-level controllable and k-level observable, for a user-specified value of k. Since addition of a controllability point (cp) or observability point (op) requires a new interconnect and a multiplexor, we considerthat it is always preferable to add constants as a means of enhancing observability and controllability than to add either a cp or an op. Since the number of loops in the EXU S-graph can be exponential, it is not possible to enumerate them individually. Instead, at each step of the algorithm, we count the number of nodes in all loops (strongly connected components) which either have the level of controllability or the level of observability higher than required. Note that all nodes in the EXU S-graph have to be considered for addition of cp or op, not only the nodes in strongly connectedcomponents, as is the case when minimum feedbackvertex set has to be found.
The input to the algorithm is the target datapath, and the maximum number of allowed cp or op, specified by the,user. The following pseudo code summarizes the heuristic algorithm used. A test point, p, refers to either a controllability point or an observability point.
add-test-points()
I, while 3 a loop whose controllabilitylobservability level > k 2. 3 .
4.
.
6.
add best test point; 7 . 8.
for each veriex 9. 10.
11.
add best constant;
12.
13.
updatedhenumberafnodesin_remainingSCC();
14. } Both test points and constants are evaluated according to the objective function E@), where p is the test point or the constant being ( p ) ) . The LCM (Loop Controllability Measure) cost is equal to the number of nodes which are in loops whose controllability level is greater than k. Similarly, the LOM (loop observability measure) cost is equal to the number of nodes which are in loops whose observability level is greater than k. A denotes the changein the LCM and LOM cost due to insertion of the candidate test point or constant. Details of the algorithm, including calculating the controllability/observability levels of loops, and using dualpoints, can be found in Consider the data path of the 4th order W parallel filter shown in Figure 4 (a). The original data path is very untestable, as shown by the results of running HITEC (row Orig) in Table 5 . A nonscan 0-level testable design, using three controllability points and two observabilitypoints, is shown in Figure 4(b) . The test hardware added is shown in bold. The non-scan design has a very high test efficiency, as evidenced by the row 0-lev in Table 5 .
However, using dual points reduces the test hardware requirement. Adding a constant to the left register of 1+ makes all loops through 1+ 1-level controllable. A dual point added from 1+ to the left register of 3+, and another dual point added from 3+ to the right register of 6+ (with a constant added to the left register of 6+) makes the loops through 3+ 2-level controllable and 2-level observable, the loops through I+ 3-level observable, and the loops through 6+ 3-level controllable. The resulting data path, shown in Figure 4 (Cells)  20  2  3  12  12  20  7486  16  3  3  23  29  20  6968  20  1  1  18  23  6  1538   0   23  9757  20  6  6  23 We applied the non-scan design-for-testability algorithm on several data path circuits, synthesized using the high level synthesis system HYPER 1151 from behavioral descriptions [21] . In this section, we report the results obtained on the following data paths: (1) 4th order IIR cascade filter (4IIRcas), ( 2 ) 5th order elliptical wave digital filter (EWF), (3) 5th order elliptical wave digital filter, synthesized using high hardware sharing (EWFhigh), and (4) 4th order W parallel filter, synthesized using no hardware sharing. Table 1 shows various parameters of the circuits: the word size of the designs (Bits), the number of adders (Add), multipliers (Mult), registers (Reg), multiplexers ( M u ) , and interconnects (Inter). The number of cells needed for the final technology mapped circuif using the SIS technology mapper [22] , and the lib2.genlib standard cell SCMOS 2.0 library [23] , is reported in column Area.
VI. EXPERIMENTAL RESULTS
Bits Add Mult Reg Mux Inter Area
The results of applying partial scan and non-scan DFT techniques on the data paths in Table 1 are reported in the Tables 2, 3 The experimental results also validate the effectiveness of the klevel controllable/observable loops measure introduced in this paper. The results show that it is not needed to make all the loops directly (0-level) controllable/observable to achieve high test efficiency, as evidenced by the very high test efficiency reported for the k-level testable data paths, IC > 0. Most significantly, the experimental results demonstrate the feasibility of producing highly testable data paths, which can be tested at-speed, without the use of scan.
VII. CONCLUSIONS
