A timing-driven pseudo-exhaustive testing of VLSI circuits by Chang, S. C.
A Timing-Driven Pseudo-Exhaustive Testing 
of VLSI Circuits 
S. C. Chang and J. C. Rau 
Department of Computer Science and Information Engineering 
National Chung-Cheng University 
Chiayi, Taiwan, R. 0. C. 
Abstract 
The object of this paper is to reduce the delay penalty of 
bsc insertion for pseudo-exhaustive testing. We first propose a 
tight delay lower bound algorithm which estimates the mini- 
mum circuit delay for each node after bsc insertion. By 
understanding how the lower bound algorithm lose optimal- 
ity, we can propose a bsc insertion heuristic which tries to 
insert bscs so that the final delay is as close to the lower 
bound as possible. Our experiments show that the results of 
our heuristic are either optimal because they are the same as 
the delay lower bounds or they are very close to the optimal 
solutions. 
1 Introduction 
Exhaustive testing, one method of Built-In Self-Test 
(BIST) [1][2][3], becomes very attractive because that the 
method can guarantee the detection of all non-redundant 
combinational faults. Unfortunately, the test time of exhaus- 
tively testing a combinational circuit increases exponentially 
to the number of primary inputs (PIS) of the circuit, which 
makes this test method unpractical for circuits with large PIS. 
To pseudo-exhaustively test a circuit using 2k pattems, some 
bypass storage cells (bscs) may be inserted so that the depen- 
dency of each node is less than or equal to the pre-determined 
value k called the dependency constraint. A bsc is transparent 
in the normal mode and acts as both pesudo-input and 
pseudo-output in the test mode. The bsc structure is presented 
in Fig. 1. During the normal mode, the signal, test is set to 0 
and all bscs become transparent; that is, they do not affect the 
circuit’s function. In each cycle of the test mode (test = I ) ,  a 
test pattern is serially shifted to the flip-flops of bscs, and the 
output responses is directed to the output response analyzer 
(ORA) which is analyzed later. 
 to ORA 
to next gate 
to next bsc 
normal data 
test pattern 
(from previous bsc) 
tkst 
Fig. 1 : The bsc structure. 
However, adding bscs can increase the circuit size. Most 
previous work [5][  1111 151 [ 161 attempts to reduce the number 
of bscs under a given dependency constraint. Basically, based 
on the area consideration, the bsc insertion methods, for 
pseudo-exhaustive testing can be divided into two categories. 
One uses the constrained partitioning strategy which parti- 
tions a circuit into several disjoint sub-circuits [5][15][16]. 
The other uses the un-constrained partitioning strategy which 
makes no such constraint [ 111. In this paper, our strategy of 
bsc insertion is based on the first one. 
In addition to the area overhead of inserted bscs, the 
inserted bscs may be placed on critical paths and therefore 
can worsen the circuit delay. Consider the circuit of Fig. 2(a) 
and assume the path a->g->j->k->l is critical. Fig. 2(b) 
shows a solution for pseudo-exhaustively testing the circuit. 
Despite the inserted bscs do not affect circuit’s original func- 
tion in the normal mode, the circuit delay is increased clue to 
the placement of two bscs (bl and b2) on the critical path. 
Instead, if we insert bscs as in Fig. 2(c), where only one bsc 
(bl)  is placed on the critical path, the critical path delay ks less 
than the delay in Fig. 2(b). Note that the maximum depen- 
dency size of both circuits is 3. :b 0 
0 
0 
Fig. 2 : Bsc insertion under the dependency constraint 3.  
The purpose of this paper is to minimize the delay penalty 
resulted from inserting bscs under a given dependency con- 
straint. Our algorithm consists of two phases. In the first 
phase, we discuss an algorithm which finds a lower bound of 
the minimum delay of a node after inserting bscs. With the 
lower bound information and the understanding of how the 
optimality may lose, we then develop a heuristic to guide bsc 
insertion so that the delay result (after inserting bscr;) is as 
close to the lower bound as possible. Our experimental results 
show that, for many benchmark circuits, our heuristic 
11-668 
Authorized licensed use limited to: Tamkang University. Downloaded on March 23,2010 at 23:29:54 EDT from IEEE Xplore.  Restrictions apply. 
achieves the same delay results as the lower bounds; i.e., opti- 
mal solutions. 
This paper is organized as follows: Section 2 describes the 
delay model with the associated delay computation. Section 3 
describes an algorithm compting a delay lower bound for 
each node under the dependency constraint. The approxima- 
tion algorithm to guide the bsc insertion process is developed 
in Section 4. Finally, the experimental results and conclusions 
are given in Sections 5 and 6. 
2 Delay Model and Delay Computation 
In this section, we describe the delay model of bsc inser- 
tion. We said edge e (v-> U) is an (immediate)fanin edge of U 
and node v is an (immediate) fanin node of U. In addition, 
node w or edge e is a transitive fanin of node U if there exists 
a path from w or e to U. The set of nodes which are transitive 
fanins of node v is referred to as the input cone of node v, 
Cone(v). When a bsc is inserted on edge e, our delay model 
assumes that one constant delay penalty, d(bsc), is added to 
the delay of edge e. Consider again the circuit in Fig. 2(a). 
Assuming that edge delay d(e) and bsc delay penalty d(bsc) 
both are I ,  and we can obtain the delay of each node shown 
outside the circle. In Fig. 2(a), without inserting any bsc, the 
delay of node k is 3 because the longest path from PIS to k is 
a->g->j->k which contains 3 edge delays. In Fig. 2(c), after 
inserting bscs bl, bZ, and b,, the delay of node k becomes 4 
because of adding bsc delay. 
3 Delay Lower Bound Computation 
The proposed algorithm consists of two phases: the label- 
ing phase and the bsc insertion phase. We first give some def- 
initions used in our discussion. For simplicity, we assume that 
a bsc has been inserted on the fanout stem of each PI and let 
the dependency set of node n, denoted dep(n), be the set of 
bscs on which node n depends and Idep(n)l be the size of 
dep(n). The bscs of dep(n) can be viewed as a “cut set” which 
separates node n from the PIS. If Idep(n)l I k, the set of nodes 
on the fanout stems of which the bscs of dep(n) are inserted 
are called a k-cutset of node n. For example, in Fig. 2(c), the 
bsc set { bl, b2, b3} is node 1’s dependency set and the node 
set ci, h, i} is node 1’s 3-cutset. 
The label of node n,  l(n), is the (estimated) minimum delay 
of node n under dependency constraint k. Initially, all PIS’ 
labels are set to 0 and then we compute the label of each node 
in the topological order from PIS to POs. Hence, before com- 
puting the label of node n, all the labels of nodes in node n’s 
input cone have been already done. Note that there are many 
possible k-cutsets for node n. To determine the minimum 
delay of node n, i.e., the label, we need to find an “optimal” k- 
cutset for node n. In the following, we discuss an important 
property: the possible range of a node’s label can be deter- 
mined from the labels of its (immediate) fanin nodes. 
Definition 1: The best-label, bl(n), of node n is defined as 
bl(n) = MAX{ l(ni)+d(ei) I where ei (ni-> n) is a fanin edge of 
Lemma 1: The label l(n) of node n has the property: 
bl(n) I l(n) I bl(n)+d(bsc). (EQ 1) 
Proof. omitted. 
We now use an example to illustrate Lemma 1. Consider 
the circuit of Fig. 3, where the number outside a node is the 
corresponding node’s label which we are about to compute. 
Again, let the delay of an edge, d(e), and the bsc delay pen- 
alty, d(bsc), be both 1. Here, we assume the dependency con- 
straint is 4 .  Initially, the labels of all the PIS are set to 0. Our 
labeling algorithm will start from PIS to POs. For node h, it is 
easy to see that there is no need to insert bscs for node h so 
the label of h is equal to 1. Similarly, for nodes i, j ,  k ,  and 1, 
we have the labels l(i) = I ,  10) = 1, l(k) = 2 ,  and l(1) = 3. For 
node m which has 7 PIS in its transitive fanins, we do need to 
insert bscs for m. Since m’s immediate fanin nodes are 1 and j ,  
from Definition 1, we have bl(m) = MAX{ l(l)+d(e), l(j)+d(e)} 
= MAX{S+I, I + I }  = 4.  Then, by EQ 1, we can find out that 
the range of m’s label is 4 = bl(m) I l(m) 5 bl(m)+d(bsc) = 5. 
Hence, there are only two choices for m’s label, i.e., either 
l(m) = 4 or l (m) = 5. In later discussion, we will show that 
there does not exist a solution which makes the delay of m to 
be 4 under the dependency constraint 4,  so m’s IabeI must be 
c 
0 
0 
0 
0 
0 
0 
The dependency constraint = 4 
ow Fig. 3 : The tirmng feasible nodes for bl(m) = 4. 
From Lemma 1, we can obtain a possible range of node 
n’s label, I(n), whose best value is bl(n) and worst value is 
bl(n)+d(bsc). Without losing generality, let us assume that the 
delay penalty d(bsc) and the delay d(e) of each edge e are 
integers. If the delay penalty d(bsc) is I ,  according to EQ 1, 
we can have only two choices for node n’s label l(n): one is 
bl(n) and the other is bl(n)+l. Therefore, the problem of find- 
ing node n’s label has been changed to the yeslno problem of 
determining whether bl(n) is achievable. Note that bl(n)+l is 
always achievable by inserting bscs on the fanin edges of 
node n. Luckily, this yes/no problem has an approximate 
solution using the maximal-flow minimal-cut (MFMC) algo- 
rithm. Our algorithm can be easily extended when d(bsc) and 
d(e) are not integers. 
Let d(c->n) be the longest path delay from c to n without 
involving any bsc delay penalty. If node c is selected into an 
optimal k-cutset of node n, we have l(c)+d(c->n)+d(bsc) I 
l(n) in Fig. 4. Therefore, to achieve the best-label bl(n) of 
node n, if node c (other than PIS) is selected into the k-cutset, 
its label, l(c), must be less than or equal to the value of bl(n)- 
d(c->n)-d(bsc). In addition, because a PI can be considered as 
a bsc, we do not need to insert bsc at the fanout of a PI. We 
11-667 
Authorized licensed use limited to: Tamkang University. Downloaded on March 23,2010 at 23:29:54 EDT from IEEE Xplore.  Restrictions apply. 
An optimal k-cutset Cut for node n 
Fig. 4 : Label dete&nation. 
define a node whose label satisties EQ 2 and EQ 3 to be a tim- 
ing feasible node for bl(n). 
(EQ 2 )  
1@i) I bl(n)-d@i->n) (EQ 3) 
In order to achieve l(n) = bl(n), each node in the k-cutset must 
be a timing feasible node for bl(n). Note that, for each node c 
in the input cone of node a, the values of l(c), d(c->n), d(bsc) 
and bl(n) can be pre-computed. Hence, we can easily deter- 
mine whether node c is timing feasible for bl(n). 
Retum to Fig. 3 where nodes h, i, j ,  k, and 1 have been 
labeled and label of m is under consideration. Since 10’) = 1 
and 1(1) = 3, according to Definition 1 and EQ 1, we have 
bZ(m) = 4 and 4 I I(m) I 5. In order to achieve l(m) = bl(m) = 
4 ,  we can find that node j is a timing feasible node because 
10) I bZ(m)-d(i->m)-d(bsc), i.e., 1 14-1-1. Similarly, node i is 
also a timing feasible node for bl(m) = 4. On the other hand, 
node 1 is not a timing feasible node for bZ(m) = 4 because 1(1)> 
bl(m)-d(l->m>-d(bsc), i.e., 3 > 4-1-1. Also, all PIS which are 
considered as bscs are timing feasible from EQ 3. In Fig. 3, 
all timing feasible nodes for l(m)=4 are marked. 
Now, let us check the feasibility of bl(n) for node n’s 
label. Assume all the labels of nodes in n’s input cone 
Cone(n), except n, are already computed. Also, by applying 
EQ 2 and EQ 3, we can find all timing feasible nodes in 
Cone(n). With these information, we are ready to use the 
h4FMC algorithm to resolve the yesho problem, i.e., to 
decide whether there is a k-cutset containing only timing fea- 
sible nodes for bl(n). To apply the MFMC algorithm, we 
transform Cone(n) into a weighted graph as follows. We first 
add two nodes, the source node S and the destination node D. 
We also add the edges connecting S to each PI and the edges 
connecting node n to D. Then, the weight of each timing fea- 
sible node for bl(n) is set to 1 and others to infinite. When 
applying the MFMC algorithm on this transformed graph, the 
MFMC algorithm tries to select a cutset whose total weight is 
minimum. The weights which we assign on the transformed 
graph cause the MFMC algorithm to select (cut) only timing 
feasible nodes because the weights of timing feasible nodes 
are much smaller than others. In addition, the MFMC algo- 
rithm will try to find.the minimum number of possible timing 
feasible nodes to form a cutset. If the size of the selected cut- 
set is greater than k, it means that there does not exist a k-cut- 
set which contains only timing feasible nodes, i .e.,  l(n) = bl(n) 
is not feasible. On the other hand, if the cutset size is not 
greater than k, though in many cases, I(n) = bl(n) is feasible, 
I(c) I bl(n)-d(c->n)-d(bsc) if c is not a PI 
there still exists some case, it is NOT feasible. We detail this 
discussion in Section 4.1. 
4 Bsc Insertion 
The bsc insertion process is iteratively performed from POs 
to PIS. Initially, we put all POs into a processing list. In leach 
iteration, we select a node n with the maximal label from the 
list. After finding a k-cutset for node n,  we insert bscs on the 
fanout stems of nodes in the k-cutset. Then, node n is 
removed from the processing list and the nodes in the k-cutset 
are added into the list. This process continues until there is no 
node in the processing list. 
In the following subsections, we discuss two effects, the 
bsc chain effect and the interference effect, which may cause 
some node’s delay to be larger than its label after bsc inser- 
tion. Also, these two effects are explained in details and the 
heuristics to alleviate these two effects are also discussecl. 
4.1 The Bsc Chain Effect 
If the cut size retumed from MFMC is greater than k ,  we 
set I(n) = bl(n)+l. On the other hand, if the cut size from 
MFMC is not greater than k, we set l(n) to bZ(n). However, the 
assignment of Z(n) = bl(n) for the later case (cut size I k) may 
be too optimistic. The reason is as follows. Even though each 
node in the k-cutset suggested by h4FMC is timing feasible, 
all together they may not be timing feasible for l(n) = bl(n). 
For example, assume all labels have been obtained ancl con- 
sider to insert bscs for node 1 in Fig. 5(a) where the depen- 
dency constraint is 3. The MFMC algorithm may return the 3- 
cutset g, j )  for 1 as shown in Fig. 5(b). Since the cutset size 
is equal to the dependency constraint, 3, the labeling algo- 
rithm sets 1(1) to 5. Note that, in the configuration of this 3- 
cutset, node j is in the fanout of nodef so there are two bscs 
in the critical path a->f->j->i->l. In this special case, after 
inserting bscs for this 3-cutset, the delay of node 1 is 6 (= 
2*d(bsc)+4*d(e)) which is larger than its label, 5. In this 
example, the 3-cutset has the configuration that f is in the 
fanin of j .  If the configuration of the k-cutset for some node n 
contains more than one node in a path as in the example, then 
the labeling algorithm may wrongly estimate the delay of 
node n. We call such situation the bsc chain effect. However, 
there may exist other k-cutsets which can cause n’s delay to 
be the same as its label. 
Fig. 5 : An example to illustrate the bsc insertion algorithm. 
Lemma 2: Suppose the labels of all the nodes in n’s input 
cone are the same as the optimal delays under the dependency 
constraint, k. If the k-cutset of node n suggested by the 
MFMC algorithm has no bsc chain effect, then n’s label l(n) 
set by the labeling algorithm is optimal. 
11-666 
Authorized licensed use limited to: Tamkang University. Downloaded on March 23,2010 at 23:29:54 EDT from IEEE Xplore.  Restrictions apply. 
ISCAS 2000 - IEEE International Symposium on Circuits and Systems, May 28-31, 2000, Geneva, Switzerland 
Proof. omitted. 
4.2 The Interference Effect 
In this section, we discuss the other effect, the interference 
effect, which may also cause some node’s delay solution from 
the MCMF algorithm to be larger than its label. Basically, the 
interference effect comes from ttie sequential processing of 
nodes’ labels. During the labeling phase, we evaluate a node’s 
label only based on the labels of nodes in its input cone. How- 
ever, a node’s label may be interfered by the k-cutsets of 
nodes outside its input cone. Consider the two input cones of 
nodes nl and n2 in Fig. 6(a). Suppose nl is processed first and 
we find a k-cutset Cl for node n l .  Then, we process node n2 
and also find a k-cutset C2 for n2. Because of additional bscs 
from C2 are inserted inside nl’s input cone, the delay of n1 
may be increased accidentally. Therefore, during the labeling 
phase, ignoring the effect of the k-cutsets of nodes outside 
nl’s input cone may cause nl’s label to be too optimistic. Fig. 
6(b) is another case which also causes such effect. Similar to 
the bsc chain effect, it is possible that there exist other k-cut- 
sets which do not have the interference effect such as Fig. 6(c) 
and Fig. 6(d). 
Fig. 6 : Interference effect between different cutsets. 
5 Experimental Results 
In this experiment, we assume that the delays of all wires 
and bscs are 1; i.e., d(e) = d(bsc) = I .  Our experimental 
results are shown in Table 1 with the dependency constraint 
of 20. Column one gives the name of each circuit. Column 
two shows the critical path delay without inserting any bscs. 
Column three shows the largest label among all POs. The 
largest label of a circuit is a lower bound of the critical path 
delay. After inserting bscs by the algorithm, column four 
shows the circuit delay and column five shows the number of 
bscs inserted. We also re-implement the algorithm [I61 and 
column six shows the results of delay and column seven 
shows the results of bscs needed. For example, in Table 1, the 
critical path delay of (331.5 is originally 49 without inserting 
bscs. Under the dependency constraint of 20, the largest label 
is 50. Our heuristic requires 68 bscs to achieve the delay of 50 
which must be an optimal solution. We also highlight our 
results which are the same as the lower bounds, i.e. optimal 
solutions. 
6 Conclusions 
In this paper, we first presented an estimation formula to 
estimate each node’s label, i.e., the lower bound of its delay. 
Since the bsc chain effect and the interference effect may 
occur during the bsc insertion process, the delay of some 
nodes may not achieve their labels by the formula. Then we 
also explored the reasons which make the two effects occur. 
To alleviate these effects, we further proposed some heuristics 
on which a bsc insertion algorithm was developed based. 
Finally, the experimental results of ISCAS85 benchmark cir- 
cuits show that our heuristic algorithm can achieve or be very 
close to the labels (optimal solution). 
Table 1: Dependency constraint = 20 
-0 
7 References 
[ 11 M. Abramovici, M. A. Breuer and A. D. Friedman, Digital Sys- 
tem Testing and Testable Design, Computer Science Press, New 
York, 1990. 
[2] V. D. Agrawal and S .  C. Seth, Test Generation for V U 1  Chips, 
IEEE Computer Society, 1988. 
[3] P. H. Bardell, W. H. McAnney and J. Savir, Built-In Test for 
V a l :  Pseudo-random Techniques, John Wiley and Sons, New 
York, 1987. 
[4] Z.  Barzilai, J. Savir, G. Markowsky and M. G. Smith, “The 
Weighted Syndrome Sums Approach to VLSI Testing”, IEEE 
Trans. on Computers, vol. C-30, no. 12, pp.996-1000, Dec. 
1981. 
[5] S. N. Bhatt, F. R. K. Chung and A. L. Rosenburg, “Partitioning 
Circuits for Improved Testability”, Proc. of 4th MIT Conj on 
Advanced Research in VLSI, pp. 91-106, Apr. 1986. 
[6] J. Cong and Y. Ding, FlowMap: An Optimal Technology Map- 
ping Algorithm for Delay Optimization in Lookup-Table Based 
P G A  Designs, IEEE Trans. on CAD of Integrated Circuits and 
Systems, vol. 13, no. 1, Jan. 1994. 
[7] T. Cormen, C. Leiserson, and R. Rivest, Algorithms, Cambridge, 
MA: MIT press, 1990. 
[SI N. Deo, Graph Theory with Applications to Engineering and 
Computer Science, Prentice Hall, Inc., 1974. 
[9] S. Devadas, A. Ghosh and K. Keutzer, h g i c  Synthesis, McGraw- 
Hill, Inc., 1994. 
[lo] L. R. Ford and D. R. Fulkerson, Flows in Networks, Princeton, 
NJ: Princeton Univ. Press, 1962. 
[l 11 W. B. Jone and C. A. Papachristou, “A Coordinate Circuit Parti- 
tioning and Test Generation Method for Pseudo-Exhaustive 
Testing of VLSI Circuits”, IEEE Trans. on CAD, vol. 14, no. 3, 
[12] E. J.  McCluskey, “Verification Testing - A Pseudoexhaustive 
Test Technique”, IEEE Trans. on Computers, vol. C-33, no. 6, 
pp. 541-546, Jun. 1984. 
[13] E. J. McCluskey and S. Bozorgui-Nesbat, ‘‘ Design for Autono- 
mous Test”, ZEEE Trans. on Computers, vol. C-30, no. 11, pp. 
[ 141 G. D. Micheli, Synthesis and Optimization of Digital Circuits, 
McGraw-Hill, Inc., 1994. 
[ 151 M. W. Roberts and P. K. Lala, “An Algorithm for the Partition- 
ing of Logic Circuits”, IEE Proc., vol. 131, pt. E, no. 4, Jul. 
1984. 
[ 161 R. Srinivasan, S. K. Gupta and M. A. Breuer, “An Efficient Par- 
titioning Strategy for Pseudo-Exhaustive Testing”, Proc. of 
Design Automation Conf, pp. 242-248, Jun. 1993. 
pp. 374-384, Mar. 1995. 
866-875, NOV. 198 1. 
0-7803-5482-6/99/$10.00 92000 IEEE 
11-665 
Authorized licensed use limited to: Tamkang University. Downloaded on March 23,2010 at 23:29:54 EDT from IEEE Xplore.  Restrictions apply. 
