Zhiqiang
Introduction
Non-scan built-in self-test (BIST) is a promising approach that can realize at-speed testing with a short application time. However, existing BIST schemes have excessive hardware overheads. Moreover, the excessive power dissipation during these BIST schemes constitutes a considerable problem in some applications.
The techniques in [1] , [2] propose a test synthesis and scheduling algorithm under power constraints for BISTed register-transfer level (RTL) data paths. These proposed techniques, which use adjacent non-scan BIST [3] , may exhibit high hardware overheads due to the use of an excessive number of reconfigured registers.
Masuzawa et al. [4] propose a BIST methodology for RTL data paths that uses a boundary non-scan BIST scheme. The approaches in [5] , [6] improve the method in [4] by introducing concurrent testing, exploiting time division between existing test pattern generators (TPGs), so that two different input ports of a module can share the same TPG. However, these previous works did not consider the problem of power dissipation during test. More specifically, when these methods try to excite a single module under test † † The author is with Nara National College of Technology, Yamatokoriyama-shi, 639-1080 Japan.
† † † The author is with New Jersey Institute of Technology (NJIT), USA.
a) E-mail: you-z@is.naist.jp b) E-mail: yamaguti@info.nara-k.ac.jp c) E-mail: kounoe@is.naist.jp d) E-mail: savir@njit.edu e) E-mail: fujiwara@is.naist.jp DOI: 10.1093/ietisy/e88-d. 8.1940 (MUT), and observe its test response, multiple modules and registers, that are not adjacent to the data path of the MUT, also dissipate power. As a result, the accumulated power dissipation is quite high. For some applications, this high power dissipation is unacceptable. Furthermore, hardware overheads in these methods are way too high. As we show in this paper, lower hardware overheads are achievable, while still limiting the power dissipation during test. In [3] TPGs and response analyzers (RAs) are placed not only at the chip boundary, but also inside the data path itself. We will continue to utilize this approach in this paper as well.
In this paper, we introduce two power-constrained DFT algorithms. The first uses a boundary non-scan BIST scheme that focuses on achieving a low hardware overhead (referred to in the paper as "problem 1"). This scheme, therefore, is more efficient in reducing the hardware overhead than previously described methods. The second algorithm is based upon a general non-scan BIST scheme that explores possible trade-offs between hardware overhead and test application time under power constraints (referred to in this paper as "problem 2"), rather than consider only one such factor, as previous published power-constrained methods do. This paper is organized as follows. Section 2 introduces some basic concepts, such as the data path digraph, and outlines the problems to be solved. Section 3 addresses the power constraints for problem 1 and shows algorithms for performing the test while meeting the given constraints. Section 4 addresses the same issues for problem 2. Section 5 reports on some experimental results using our proposed schemes. Section 6 concludes with a brief summary.
Preliminaries

The Data Path Digraph
A data path [4] consists of hardware elements and lines. Hardware elements, in this context, include primary inputs (PIs), primary outputs (POs), registers (Rs), multiplexers (MUXes), and functional modules (Ms) that have any number of input ports and one output port. Since the multiplexing function can be embedded within an M, we will use the term M in this wider sense of its capability (including multiplexing). Input patterns enter the circuit through the PIs, and exit through the POs. Input values enter into a hardware element through its input ports, and exit through its output port. For any given data path, we assume that every non-constant input port of any M has at least one path from some PI, and every output port of any M has at least one path to some PO.
Copyright c 2005 The Institute of Electronics, Information and Communication Engineers
Similar to the definition in [4] , we define a data path digraph G = (V, A) as follows.
-V H is the set of nodes that correspond to all hardware elements in the data path.
where, V M , V R and V OT H are the set of nodes which represent modules, registers and other hardware elements respectively. -V IN is the set of nodes which correspond to all input ports in the data path, and -V OUT is the set of nodes which correspond to all output ports in the data path.
•
Note that in a digraph, each PI or PO corresponds to a pair of nodes, and not to a single node. For example, Fig. 1 shows a data path fragment with its associated digraph.
An input port i j ∈ V IN is an input port of a node u M ∈ V M , such that they are connected together by an arc in A 2 . We denote the arc outgoing from node u M by e M ; and the head node of an arc e by h e . The sequential depth of a path is the number of register elements along the path.
Definitions
We define the following two concepts.
Definition 1:
A data path is boundary non-scan BIST-able if each module M in the data path can be tested as follows.
There exists a TPG for each input port of M, and an RA (response analyzer) for the output port of M such that (I-i). TPGs and RAs are placed only at PIs and POs respectively. (I-ii). There are paths that propagate test patterns generated by the TPGs to the input ports of M, and test responses of M to the corresponding input ports of the RA, concurrently, without any conflict of control signals.
(I-iii). For any two input ports of any M, test patterns can either be propagated to these from two different TPGs, or from the same TPG, provided it has different sequential depths leading to these two ports.
Notice that we allow test patterns to be propagated through a module M using its thru input function, if such a function exists. Thus, a module with a thru input can be operated in a transparent mode to pass test patterns generated upstream to other components downstream.
In Definition 1, the control signals include select signals for MUXes; hold inputs for registers, and thru inputs for functional modules.
Definition 2:
A data path is non-scan BIST-able if each module M in the data path can be tested as follows.
There exists a TPG for each input port of M, and an RA for the output port of M, such that properties (II-i), (I-ii), and (I-iii) in Definition 1&2 hold.
(II-i). TPGs and RAs can be placed at PIs and POs respectively, and any register inside the data path can be a candidate for augmentation into a TPG or an RA.
In boundary non-scan BIST, and non-scan BIST schemes, we categorize the different types of control paths that propagate test patterns from TPGs to the inputs of a module under test. We distinguish, therefore, between the following cases:
Type 1: A control pattern can be chosen such that no two input ports of M share a TPG. Type 2: Some input ports share a TPG with paths of different sequential depths. Type 3: Some input ports share a TPG, and the control path for one of its input ports passes through another input and output ports of this same module (See Fig. 2 ).
An observation path propagates test responses from the output port of a module to an RA. In the sequel, we will refer to both control paths and observation paths simply as test paths.
Problem Description
Two problems have been formulated in [3] and are repeated here. Let f H (HOH,TAT) be a test overhead cost function, such that f H (h 1 , t 1 ) < f H (h 2 , t 2 ) if h 1 < h 2 or (h 1 = h 2 and t 1 < t 2 ). The "hardware" argument reflects hardware overhead (HOH), and the "time" argument of the function reflects test application time (TAT).
Problem 1:
Minimize the hardware overhead of a given data path under a boundary non-scan BIST, and a test scheduling algorithm, subject to a given power constraint. Stating it more formally, Given:
• Input: a data path and peak power dissipation limit P max .
Task:
• Output: a boundary non-scan BIST-able data path, a test schedule that satisfies P max , and that achieves the • Objective: minimization of f H (HOH,TAT), i.e. minimize hardware overhead.
In order to achieve this task we are allowed to add DFT elements, such as linear feedback shift registers (LFSRs), multiple-input signature registers (MISRs), test MUXes (T MUXes), hold functions for registers, and thru-functions for functional modules.
Problem 2: Given a design parameter α, design a non-scan BIST-able data path, and a test scheduling algorithm, under a given power constraint. More formally, Given:
• Input: a data path, co-optimization ratio α (0 ≤ α ≤ 1), and a peak power dissipation limit P max .
• Output: a non-scan BIST-able data path, a testschedule satisfies P max , and that achieves the
In order to achieve this task, we are allowed to add DFT elements, such as Built-In Logic-Block Observations (BILBOs) [7] , concurrent BILBOs (CBILBOs) [8] , LFSRs, MISRs, T MUXs, hold functions for registers, and thrufunctions for functional modules.
Power Constrained DFT Algorithm for Problem 1
Algorithm Description
This algorithm consists of the following three phases.
Phase 1.
Convert the given data path to a boundary nonscan BIST-able one utilizing the following steps: If e is a critical arc of u M , we say u M is dominated by e.
The hardware area of a T MUX is usually higher than that of a module-embedded thru-function. There are, however, instances where only T MUXes can be used to establish the desired testability. These instances occur when there is a need to eliminate critical arcs. We, therefore, consider adding a minimum number of T MUXes into the data path only when it is necessary.
Theorem 1:
If all modules have thru-functions for their input ports, a data path is boundary non-scan BIST-able if and only if (iff) there does not exist a critical arc in its associated digraph.
If more than one module are dominated by a critical arc, the order by which we handle these modules plays a key role in reducing the overall hardware overhead. To determine this order, we introduce notions that reflect the relationship between two dominated modules, called a downstream module (DSM), and an up-stream module (USM). From the above definition, the following theorem follows.
Theorem 2:
If M is the USM of M , the critical arcs of both M and M can be eliminated by introducing a T MUX to add a path from one PI to some other input port of M. Similarly, if M is a DSM of M, the critical arcs of both M and M can be eliminated by introducing a T MUX to add a path from the output port of M to some PO. Figure 3 illustrates how to eliminate a critical arc. From Definition 3, and the original data path digraph (Fig. 3 (a) ), we find that both modules, M 2 and M 3 , have one critical arc e in Fig. 3 (a) . M 2 is the predecessor of M 3 , in other words, M 2 is the USM of M 3 . Therefore, according to Theorem 2, addition of a T MUX (M 4 , in Fig. 3 (b) ) to establish a path from PI 1 to one input port of M 2 , eliminates the critical arc e for both modules. The data path digraph after adding the T MUX for e is shown in Fig. 3 (b) .
The problem of adding a minimum number of T MUXes to eliminate critical arcs is equivalent to the minimum prime-implicant covering problem, which is known to be NP-hard. We, therefore, use a greedy algorithm, where we select a dominated module that can eliminate critical arc(s) of the maximum number of dominated modules, by adding an extra path to that module. We repeat this algorithm until we eliminate all the critical edges in the system.
Thru-Function Addition
After adding the necessary T MUXes, we consider adding a minimum number of thru-functions, whose hardware overhead is usually lower than that of a T MUX, in order to achieve boundary non-scan BIST-ability. First, we add some necessary thru-functions as described in the following theorem.
Theorem 3: If there exists a module M, that is an immediate successor of another functional module M , then an addition of a thru-function to M is needed to test M.
After adding the necessary thru-functions, it may still be possible that the data path in question is not boundary non-scan BIST-able. We, therefore, may need to add some more thru-functions. In Fig. 4 there is no critical arc. However, a thru function from Q to PO needs to be added in order to facilitate vector propagation through module M 2 .
Control Paths and Observation Paths Determination
After the thru-function addition, the data path is boundary non-scan BIST-able. We now determine the control paths and observation path for each module using the shortest, power-weighted, path.
Bypassing Overly Power Consuming Paths
In a boundary non-scan BIST scheme, TPGs and RAs are placed only at PI and PO sites respectively. Therefore, some modules may end up having long test paths, thus dissipating an extended amount of power. If some modules have long test paths, which dissipate more power than P max , we try to bypass some of them by inserting T MUXes. In this case, if two or more modules share a portion of their test paths (subpaths), these modules might be able to share the added bypass as well. In this stage, we search for a minimum number of common sub-paths, so that when being bypassed, the underlying modules satisfy the given power constraints. This problem is also equivalent to the minimum prime-implicant covering problem. We, therefore, use a greedy algorithm, where we always select the common sub-path such that, if bypassed, it reduces the maximum sum-of-powers for the modules involved. Finally, we add the needed T MUXes to bypass these sub-paths so identified.
Test Scheduling
We proceed to obtain the test incompatibility graph defined similarly to that given in [9] . Since modules can share TPGs and parts of control paths, the power dissipated in these LFSRs and parts of these control paths, need not be accounted for repeatedly, when considering all modules under test. We, therefore, introduce the following concept.
Definition 6: Essential power dissipation is:
i. the power consumed by the module itself and its associated observation path, if the test path of the module is either of type 1 or of type 2. ii. the power dissipated in the tested module, its associated observation path, and its feed-around portion of the control path, if the test path of the module is of type 3.
For example, the hardware elements on the bold lines of Fig. 5 (line feeding the RA and the feedback line) dissipate essential power for the module and its type 3 path.
After bypassing the overly power-consuming subpaths, we create the incompatibility graph. In this graph, the nodes are the tested modules, and edges only exist between incompatible modules. We extend the scheduling algorithm from [10] for concurrent testing of multiple modules. In [10] the power is evaluated as the sum of the powers consumed by the individual logic blocks. In our extended algorithm, presented here, two important features come to light: a. By sharing control paths of different tested modules, we decrease the total consumed power. b. If it so happens that two modules activate secondary paths off their main test paths, and the paths reach different ports of the same MUX, and since we cannot stop the activity at the MUX, the total power consumed is larger than the sum of the powers of their individual stand-alone paths.
The approach in [10] schedules blocks based on the "necessary" power dissipation. Here we consider "unnecessary" power dissipation, as well as essential power dissipation.
Power Constrained DFT Algorithm (Tabu Search-
Based) for Problem 2 Figure 6 summarizes the tabu search-based algorithm [11] . Line 1 starts with an initial solution, taken as the solution for Problem 1. Lines 3-19 are the heart of the optimization process. For every register and functional module, we try every possible move † , which is not in the tabu list (lines 4-5). After a move, if the data path D i is non-scan BISTable, proceed to schedule the test (S i ). If it meets the power constraints, compute the test application time (T i ), and hardware overhead (H i ), (lines 6-9). Here, we treat the internal test registers as either PIs or POs, depending on whether they are used to generate values, or capture responses. We, then, search for a solution † † S k that minimizes the value of the cost function α·H i +(1−α)·T i , and set S current = S k ,. This move is then recorded in the tabu list (line 15). If this solution turns out to be the best one so far, we set S best = S k . The algorithm ends when either the maximum number of iterations is reached (N itr1 ), or the maximum number of iterations since the last obtained best solution exceeds some predetermined value (N itr2 ).
Experimental Results
We have conducted experiments on the data paths of LWF [4] , Paulin [12] and Tseng [13] . Table 1 shows the characteristics of these data paths. Columns #Pi, #Po, #R, #Mux, #M, denote the number of PIs, POs, registers, MUXes and functional modules, respectively. Columns "Bit" and "Area" denote bit-width, and the equivalent area as synthesized and reported by the Synopsys Design Compiler.
We first treat modules of type 1 test paths. Let T M be † A move is a general term for adding/removing thru functions in a module; reconfiguring a register into a BILBO, or CBILBO, adding a hold function to a register, or removing of some previously added hardware.
† † A solution is a complete test scheduling with established values for TAT, HOH, and the resulting power. 
The test application time of a module with test path of either type 2, or type 3, are assumed to be T type2 =1.5T type1 , and T type3 =2T type1 , respectively. Let P u be a standard unit of power. Using the technique in [14] , we further assume that the power dissipations for MUX (P M ), AND gate (P & ), OR gate (P | ), register (P Reg ), adder (P + ), subtractor (P − ), multiplier (P * ), constant-input multiplier (P * ), BILBO (P BIL ), and CBILBO (P CBIL ), are the power-driven optimization TCSC (PTCSC) methods. TCSC is our previous methodology [6] . We have extended it here mainly in order to save power by assigning fixed values to unused control signals. Columns α, P max , Pow, HOH and TAT are the co-optimization ratio, peak power dissipation limit, actual peak power dissipation, hardware overhead, and test application time, respectively. Notice that for a fixed P max , the hardware overhead decreases with the increase of α. By the same token, the test application time increases with the increase of α. There is, therefore, a tradeoff between HOH and TAT. Notice that when P max is increasing, the hardware overhead and test application time are both decreasing due to a potentially higher test activity. If we relax the peak power dissipation limit, we can use this relaxation in power to schedule more modules in a given test session, or, equivalently may need less hardware to test the modules in a given test session. In Table 4 , for the case of α=1 and P max =60, notice that PCTSP2 enjoys lesser hardware overhead than PCTSP1. This is because in the non-scan BIST scheme we can add more kinds of DFT elements that will make the approach more hardware-efficient. For cases other than α=1, the results are pretty much the same.
In Tables 2-4 , when P max is large enough, the hardware overheads of PCTSP1 and PCTSP2 (for α=1) are lower than that of PTCSC. This shows that our methodology is more efficient, even when there are no power constraints.
Conclusions
This paper proposed two power constrained DFT algorithms for two non-scan BIST schemes for RTL data-paths. The first proposed algorithm is for a boundary non-scan BIST scheme. Experimental results have shown that this method is efficient in achieving a low hardware overhead. The second algorithm is for a generic non-scan BIST scheme. We use a Tabu search algorithm to explore the solution space. Experimental results presented here show that it can cooptimize the hardware overhead, test application time, and the power dissipation. A chip designer may utilize these tradeoffs to prioritize one such parameter over the rest.
