Efficient Wrapper/TAM Co-Optimization for Large SOCs by Vikram Iyengar et al.
Efﬁcient Wrapper/TAM Co-Optimization for Large SOCs
Vikram Iyengar
￿
, Krishnendu Chakrabarty
￿
and Erik Jan Marinissen
￿
￿
Department of Electrical & Computer Engineering
￿ Philips Research Laboratories
Duke University, Durham, NC 27708, USA 5656 AA Eindhoven, The Netherlands
￿
vik,krish
￿ @ee.duke.edu erik.jan.marinissen@philips.com
Abstract
Core test wrappers and test access mechanisms (TAMs) are im-
portant components of a system-on-chip (SOC) test architecture.
Wrapper/TAM co-optimization is necessary to minimize the SOC test-
ing time. Most prior research in wrapper/TAM design has addressed
wrapper design and TAM optimization as separate problems, thereby
leading to results that are sub-optimal. We present a fast heuris-
tic technique for wrapper/TAM co-optimization, and demonstrate its
scalability for several industrial SOCs. This extends recent work on
exactmethods for wrapper/TAM co-optimization based on integer lin-
ear programming and exhaustive enumeration. We show that theSOC
testing times obtained using the new heuristic algorithm are compa-
rable to the testing times obtained using exact methods. Moreover,
more than two orders of magnitude reduction can be obtained in the
CPU time compared to exact methods. Furthermore, we are now
able to design efﬁcient test access architectures with a larger number
of TAMs.
1 Introduction
The general problem of system-on-chip (SOC) test integration in-
cludes the design and optimization of wrapper/TAM architectures
and test scheduling. Test wrappers form the interface between cores
and TAMs, and TAMs transport test data between SOC pins and test
wrappers [15]. Test scheduling determines the order in which tests
are applied. We focus here on wrapper/TAM co-design to minimize
testing time under TAM width constraints. Wrapper/TAM design is
challenging because (i) wrapper and TAM optimization must be car-
ried out in conjunction [8], (ii) TAMs must be designed to minimize
testing time under the constraint of limited chip I/Os available for
testing, and (iii) wrapper/TAM co-optimization techniques must be
scalable for industrial SOCs containing not only a large number of
cores with hundreds of I/O terminals and scan chains, but also a large
number of TAMs.
Most prior research has either studied wrapper design and TAM
optimization as independent problems [1, 4, 5, 12], or not addressed
the issue of sizing the TAMs to minimize SOC testing time [14].
Alternative approaches that combine TAM design with test schedul-
ing [9, 13] do not address the problem of wrapper design and its re-
lationship to TAM optimization. New techniques for wrapper/TAM
co-optimization are therefore needed to minimize testing time under
TAM width constraints. Such techniques should be scalable for SOCs
that employ a large number of TAMs.
The ﬁrstintegrated method for wrapper/TAM co-optimization was
proposed in [8]. TAM optimization was carried out by enumerating
over the different partitions of TAM width as well as over the number
of TAMs on the SOC. Integer linear programming (ILP) was used to
calculate the optimal core assignment and resulting testing time for
￿
This research was supported in part by the National Science Foundation
under grant number CCR-9875324 and by an IBM Graduate Fellowship.
each partition. A drawback of this approach is that the wrapper/TAM
designs considered in [8] are limited to a small number of TAMs in
order to maintain feasible compute times. However, if the total num-
ber of TAM wires on the SOC is large, the testing time can often be
reduced by increasing the number of TAMs. This is because of two
reasons. Firstly, when there are multiple TAMs of different widths, a
larger number of cores can be assigned to TAMs whose widths match
the cores’ own test data requirements; thus the number of unneces-
sary (idle) TAM wires assigned to cores is reduced. Secondly, mul-
tiple TAMs provide greater test parallelism, thereby decreasing total
testing time. The methods in [8] are therefore inadequate for large
industrial SOCs.
In [8], four problems structured in order of increasing complex-
ity were formulated, such that they serve as stepping stones to the
problem of wrapper/TAM co-optimization for SOCs. We ﬁrst review
these four problems.
1.
￿
￿
￿ : Design a wrapper for a given core, such that the core testing
time is minimized, and the TAM width required for the core is mini-
mized.
2.
￿
￿
￿
￿ : Determine (i) an assignment of cores to TAMs of given
widths, and (ii) a wrapper design for each core such that SOC testing
time is minimized. (Item (ii) corresponds to
￿
￿
￿ .)
3.
￿
￿
￿
￿
￿
￿ : Determine (i) a partition of the total TAM width among
the given number of TAMs, (ii) an assignment of cores to the TAMs,
and (iii) a wrapper design for each core such that SOC testing time is
minimized. (Items (ii) and (iii) together correspond to
￿
￿
￿ .)
These three problems lead up to
￿
￿
￿
￿
￿
￿
￿
￿ , the more general problem,
described as follows.
4.
￿
￿
￿
￿
￿
￿
￿ : Determine (i) the number of TAMs for the SOC, (ii) a
partition of the total TAM width among the TAMs, (iii) an assign-
ment of cores to TAMs, and (iv) a wrapper design for each core, such
that SOC testing time is minimized. (Items (ii), (iii) and (iv) together
correspond to
￿
￿
￿
￿
￿ .)
The above four problems are all
￿
￿
￿ -hard [8]. Therefore, efﬁcient
heuristics are needed for large problem instances.
In this paper, we present heuristics to effectively solve the wrap-
per/TAM co-optimization problems reviewed above for large SOCs
containing multiple TAMs. As in [8], we use the test bus model for
TAMs. To solve
￿
￿
￿ , we use the Design wrapper algorithm pre-
sented in [8]. We then describe a new algorithm to solve problem
￿
￿
￿
￿ efﬁciently. The solution to
￿
￿
￿
￿ thus obtained is a good ap-
proximation of the optimal solution for a given TAM width parti-
tion. An efﬁcient technique for TAM partition enumeration using
solution-space pruning is used to obtain the TAM width partition and
number of TAMs with the lowest testing time (problems
￿
￿
￿
￿
￿
￿ and
￿
￿
￿
￿
￿
￿
￿ ). This yields an intermediate solution to
￿
￿
￿
￿
￿
￿
￿ . In the
ﬁnal step, an exact mathematical programming model is used to op-
timize the ﬁnal assignment of cores and the SOC testing time. This
two-step approach allows us to apply our methods to design effec-
tive wrapper/TAM architectures for large industrial SOCs. We showthat while the SOC testing times obtained using the new heuristic
algorithm are comparable to the testing times obtained using exact
methods, over two orders of magnitude reduction is obtained in the
CPU time needed. This is especially important since in some cases,
the minimum SOC testing time is obtained for a larger number of
TAMs, which we could not compute earlier in a reasonable amount
of execution time.
2 New algorithm for core assignment
The ﬁrst problem
￿
￿
￿ is that of designing an optimal wrapper
for the I/O terminals and internal scan chains of a core, such that
the core testing time is minimized. To solve
￿
￿
￿ , we use an algo-
rithm based on the Best Fit Decreasing (BFD) heuristic for the Bin
Packing problem [6]. Our Design wrapper algorithm (proposed ear-
lier in [8]) has two priorities: (i) minimizing core testing time, and
(ii) minimizing the TAM width required for the test wrapper. These
priorities are achieved by balancing the lengths of the wrapper scan
chains designed, and identifying the number of wrapper scan chains
that actually need to be created to minimize testing time. Priority
(ii) is addressed by the algorithm since it has a built-in reluctance to
create a new wrapper scan chain, while assigning core-internal scan
chains to the existing wrapper scan chains.
The second problem
￿
￿
￿ is that of assigning cores to TAMs
of given widths. An ILP model was developed to solve
￿
￿
￿
￿ ex-
actly in [8]. The CPU time for this ILP model was reasonably short
for a single execution and optimal solutions for the core assignment
problem were easily obtained. However,
￿
￿
￿ is an
￿
￿ -hard prob-
lem, and execution times can get high for problem instances larger
than those encountered in [8]. Furthermore, solutions to the prob-
lems
￿
￿
￿
￿
￿ and
￿
￿
￿
￿
￿
￿
￿ were obtained by enumerating optimal
ILP solutions to
￿
￿
￿ for each TAM width partition on the SOC. As
a result, CPU times for
￿
￿
￿
￿
￿
￿ and
￿
￿
￿
￿
￿
￿
￿
￿ were found to be pro-
hibitively large, especially for an industrial SOC. Moreover, during
execution of the ILP model of
￿
￿
￿ , values of SOC testing time
￿
are calculated only at the end of each ILP iteration. Therefore, execu-
tion of the ILP model cannot be halted prematurely if it is discovered
that the testing time of a TAM has already exceeded the best-known
value
￿
calculated earlier. Therefore, a faster algorithm for
￿
￿
￿
that produces efﬁcient results in polynomial time is clearly needed.
Here, we present a new heuristic algorithm for
￿
￿
￿ based on
the relationship between TAM width and core testing times calcu-
lated for
￿
￿ . The testing time of Core
￿ when connected to a
TAM
￿ of width
￿
￿
￿ is denoted by
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿ . We design the heuris-
tic for
￿
￿
￿ , based on an approximation algorithm for the problem
of scheduling
￿ independent jobs (tests) on
￿ parallel, equal pro-
cessors (TAMs) [3]. The pseudocode for our heuristic Core assign
is presented in Figure 1. Intuitively, in each iteration the algorithm
calculates the summed testing time on each TAM by adding up the
testing times of all the cores assigned to that TAM. Then the core
with the largest testing time (among all unassigned cores) is assigned
to the TAM with the shortest current summed testing time. Further-
more, during core assignment, if the time
￿
￿
￿ on any TAM
￿ exceeds
the best-known value
￿
computed earlier, the algorithm returns
￿
and halts. This plays a signiﬁcant role in reducing computation when
Core assign is executed a large number of times, as will be shown in
Section 3.
We illustrate the Core assign algorithm using an example SOC
containing ﬁve cores and three TAMs. The testing times for the ﬁve
cores when assigned to the TAMs of widths 8, 16, and 32 are shown
in Figure 2 (a). Initially, the testing time on all TAMs is 0 cycles;
Procedure Core assign(B,C,
￿ )
1 Let C be the set of cores;
2 Let B be the set of TAMs;
3 Let
￿ be the best-known testing time for (B,C);
4 For each core
￿
￿
￿ C
￿
5 For each TAM
￿
￿
￿ B
￿
6 Find
￿
￿
￿
￿
 
￿
￿
￿
"
! using Design wrapper;
#
$
#
7 For each TAM
￿
%
￿ B
￿
8 Set testing time
￿
￿ on TAM
￿ to 0;
#
9 While C
&
’
)
(
￿
10 Select TAM
￿
*
￿ B, such that
￿
￿ is minimum;
11 If there are two or more such TAMs
￿
12 Select TAM
￿ , such that
￿
+
￿ is maximum;
#
13 Select Core
￿
￿
￿ C, such that
￿
￿
￿
￿
,
￿
￿
￿
"
! is maximum;
14 If there are two or more such cores
￿
15 Select TAM
-
.
￿ B, such that (
￿
0
/
2
1
3
￿
+
￿ AND
￿
0
/ is maximum);
16 Select Core
￿ , such that
￿
￿
￿
￿
 
￿
0
/
"
! is maximum;
#
17 Assign Core
￿ to TAM
￿ ;
18 Determine TAM
4
5
￿ B, such that
￿
6
￿ is maximum;
19 If
￿
6
￿
5
7
8
￿
￿
￿
20 Return SOC testing time
￿ ;
#
21 C = C -
￿
9
￿
:
# ;
#
22 Return SOC testing time
￿
6
￿ ;
Figure 1. New algorithm for core assignment.
Testing time (cycles)
TAM 1 TAM 2 TAM 3
Cores 32 bits 16 bits 8 bits
1 50 100 200
2 75 95 200
3 90 100 150
4 60 75 80
5 120 120 125
Testing
time
Core TAM (cycles)
1 2 100
2 3 200
3 2 100
4 1 60
5 1 120
(a) (b)
Figure 2. Core testing times for (a) the SOC used to illustrate
Core assign, and (b) the ﬁnal assignment.
therefore TAM 1 of width 32, being the widest, is considered ﬁrst.
Core 5 has the highest testing time on TAM 1, therefore Core 5 is
assigned to TAM 1. Next, there is a choice between Cores 1 and 3
to be assigned to TAM 2 of width 16. We choose to assign Core 1 to
TAM 2 here because the testing time for Core 1 on TAM 3 is higher
than the testing time for Core 3 on TAM 3 (Line 14 of Core assign).
Next Core 2 is assigned to TAM 3. TAM 2 is now the minimally
loaded TAM; therefore, Core 3 is assigned to TAM 2. Finally, Core 4
is assigned to TAM 1. Figure 2 (b) presents the ﬁnal assignment of
cores to TAMs. The testing times on TAMs 1, 2, and 3 are 180, 200,
and 200 clock cycles, respectively. The complexity of Core assign
is
;
<
￿
>
=
3
?
@
￿ , where
= is the number of cores in the SOC. Core assign
executes two orders of magnitude faster than the ILP model in [8];
hence, a signiﬁcantly larger number of Core assign iterations can be
executed in the time taken to execute the ILP model.
3 TAM width partitioning
In this section, we describe how the Core assign heuristic is used
to develop an algorithm to quickly reach an intermediate solution to
￿
￿
￿
￿
￿ and
￿
￿
￿
￿
￿
￿
￿ . We also demonstrate how, once an approxi-
mate solution for
￿
￿
￿
￿
￿
￿
￿ has been reached, an exact mathematical
programming model for
￿
￿
￿ can be executed to perform a ﬁnal op-
timization of the core assignment, thus achieving near-optimal results
with little computation time.3.1 Fast algorithm for
￿
￿
￿
￿
￿
￿
￿
￿
In [8], it was shown that the
￿
￿
￿
￿ ILP model takes a relatively
small time to execute for problem instances of resonable size. This
can be exploited to execute the model for each unique TAM width
partition and record the partition and core assignment with the best
testing time. Solutions to
￿
￿
￿
￿
￿
￿ and
￿
￿
￿
￿
￿
￿
￿
￿ can thus be obtained.
This method is applicable because the number of unique partitions of
TAM width is relatively small for a small number of TAMs. The
number of unique partitions
￿
￿
￿
5
￿
￿
￿
￿ for a given total TAM width
￿ , and a given number
￿ of TAMs can be estimated using parti-
tion theory in combinatorial mathematics [10]. In [10],
￿
￿
￿
￿
￿
￿ is
shown to be approximately
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿ for
￿
￿
￿
￿
￿
￿ . For
￿
￿
￿
￿
￿ ,
￿
?
￿
￿
￿
￿
￿
￿
￿
￿
?
￿
. For
￿
￿
￿
!
  , the number of partitions can be shown
to be
"
$
#
&
%
’
)
(
￿
+
*
-
,
￿
￿
￿
.
￿
￿
/
￿
1
0
￿
￿
￿
?
￿
[8]. From this formula,
￿
/
￿
3
2
&
4
￿
5
￿
6
 
&
4
8
7 .
Therefore, the execution time for SOCs having three TAMs is reason-
able, even for large
￿ . The challenge to effective partition enumera-
tion lies in the fact that for
￿
:
9
;
  , there is no simple and systematic
method to enumerate only the unique partitions. In fact, no exact for-
mula is available to calculate the total number of unique partitions
for a given value of
￿ and
￿ . The value of
￿
￿
￿
￿
￿
￿ can thus only
be approximated assuming
￿
<
￿
￿
=
￿ . One way to ensure that only
unique partitions are evaluated is to discard, prior to evaluation, each
new partition that appears to be a cyclical isomorphism of a previ-
ously handled partition. However, the memory requirements for this
method and the number of partition comparisons required to be per-
formed grow exponentially with
￿ and severely limit the scalability
for large
￿ . Furthermore, as
￿ increases, the time required to enu-
merate unique partitions and evaluate them using ILP increases sig-
niﬁcantly. This method is therefore inadequate for industrial SOCs
having multiple TAMs.
In this subsection, we use Core assign to develop a fast method to
evaluate width partitions; this effectively addresses the problems in-
herent to the ILP model and “enumeration-comparison” method de-
scribed above. The new heuristic employs extensive solution-space
pruning, and is thus applicable to wrapper/TAM design for industrial
SOCs having a large number of TAMs. In our experiments with in-
dustrial SOCs, we were able to evaluate width partitions and testing
times for wrapper/TAM architectures having upto ten TAMs within a
few minutes. Test access architectures having more than ten TAMs
could also be evaluated, but were found to be less useful for testing
time minimization because testing time increases signiﬁcantly as the
relative width of each TAM decreases beyond a threshold.
The new algorithm Partition evaluate for problems
￿
￿
￿
￿
￿ and
￿
￿
￿
￿
￿
￿
￿ is presented in Figure 3. This algorithm employs three lev-
els of solution-space pruning. Firstly, the number of partitions enu-
merated is signiﬁcantly limited by the restriction in Line 1 of the
recursive function Increment. To enumerate partitions of
￿ over
￿ TAMs, Increment dynamically creates
￿ nested loops with loop
variables
￿
￿
￿
>
?
>
?
>
￿
@
￿ . Without the restriction in Line 1, enumeration
would be as follows:
￿
￿
￿
B
A
>
?
>
?
>
A
￿
￿
￿ =
￿
7
A
>
C
>
?
>
A
7
A
￿
￿
￿
E
D
F
￿
A
7
@
￿
￿ ,
￿
7
A
>
G
>
?
>
A
￿
A
￿
￿
￿
H
D
F
￿
￿
￿ ,
>
C
>
?
> ,
￿
￿
￿
￿
I
D
F
￿
A
7
￿
A
7
A
>
B
>
?
>
A
7
￿ . However,
a sizeable number of repeated partitions is prevented by establishing
an upper bound
J
￿
￿
"
L
K
￿
￿
￿
M
O
N
Q
P
￿
R
M
￿
S
￿
.
￿
￿
￿
￿
￿
￿
=
T on each variable
￿
￿
￿ during enu-
meration. For example, for
￿
U
￿
W
V
Y
X
Z
￿
:
￿
[
4 , the ﬁrst three partitions
enumerated are
￿
7
A
7
A
7
A
]
\
￿ ,
￿
7
A
7
A
￿
A
4
￿
￿ , and
￿
7
A
7
A
 
A
 
￿
￿ ,
respectively. If the restriction of Line 1 were not present, the repeated
partition
7
A
 
A
7
A
  would also subsequently be enumerated.
Procedure Partition evaluate(
^ )
1 Let
^ = total TAM width;
2 Let C = set of cores;
3 Let
_
￿
‘
b
a
G
c = upper limit on number of TAMs;
4 For
_ = 1 to
_
‘
b
a
?
c
￿
5 Let set of TAMs B =
￿
￿
￿
G
d
￿
?
d
f
e
Z
e
Z
e
f
d
￿
￿
# ;
6 Set SOC testing time
￿ =
g ;
7 For TAM
￿ = 1 to (
_
i
h
k
j ), set
￿
+
￿ = 1;
8 Set
￿
￿
￿
￿
￿ = 0; Set
l
n
m
1
o
q
p
’
s
r ;
9 While
l
n
m
t
o
&
p
&
’
v
u
￿
10 Increment(B,
_
v
h
u
d
^ );
11 New SOC testing time
￿
n
w
B
x
R = Core assign(B,C,
￿ );
12 If
￿
y
w
B
x
R
1
￿
%
￿
13 Set
￿
’
￿
w
z
x
R ; Set B
{
x
}
|
}
~ = B;
#
14 Output B
{
x
￿
|
}
~ ,
￿ ;
Procedure Increment(B,
￿
d
^ )
1 If
￿
+
￿
￿
1
￿
J
￿
￿
"
K
￿
￿
￿
M
Z
N
8
P
Q
R
M
￿
￿
￿
.
￿
￿
￿
￿
￿
￿
T
￿
2 Set
￿
￿
￿
’
￿
￿
￿
b
￿
u ;
3 Set
￿
￿
’
^
$
h
"
￿
￿
￿
￿
/
*
￿
￿
￿
/ ;
4 Return;
#
5 Else
￿
6 If
￿
’
￿
u
￿
7 Set
l
Y
m
t
o
q
p
’
v
u ; Return;
#
8 Else Increment(B,
￿
￿
h
u
d
^ );
#
Figure 3. Fast algorithm for partition evaluation.
However, Line 1 establishes an upper bound of 2 on
￿
? ; thus, par-
tition
7
A
 
A
7
A
  is not enumerated. Secondly, Lines 18 to 20 in
Core assign terminate the evaluation of any partition for which the
testing time
￿
￿
￿ of any TAM
￿ has already exceeded the best-known
value of testing time
￿
calculated previously. The number of such
partitions, whose evaluation can be terminated was found to be large
in our experiments; therefore the execution time of Partition evaluate
is reduced signiﬁcantly. Finally, Partition evaluate uses heuristic al-
gorithm Core assign to evaluate partitions. This
;
￿
>
=
?
￿ algorithm
signiﬁcantly reduces the computation performed.
Table 1 presents some experimental data on the efﬁciency of Par-
tition evaluate for partition-space pruning. The number of possible
unique partitions
￿
￿
￿
￿
￿
￿ (estimated using
￿
￿
￿
￿
￿
￿
5
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
t
￿
￿
￿
￿
￿
￿
￿ ) is
presented for several values of
￿ and
￿ . We present results for
￿
￿
9
￿
4
z
4 , because the formula
￿
￿
￿
￿
￿
￿
)
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
t
￿
￿
￿
￿
￿
￿
￿ is accurate only
for larger values of
￿ . The number of partitions
￿
x
}
￿
Z
a
G
￿ that are ac-
tually evaluated to completion (pruned by Line 1 of the Increment
function and by lines 18 to 20 of Core assign) is presented for an ex-
ample SOC p21241 from Philips. (The relevant details of this SOC
are presented in Section 4.) Finally, the efﬁciency
￿ of our heuris-
tics is calculated using
￿
!
￿
￿
￿
z
￿
￿
￿
Z
￿
O
￿
￿
￿ . Here,
￿
!
￿
￿
￿
>
￿
Q
7 implies that
approximately 1% of the number of unique partitions are evaluated
to completion by Partition evaluate. We choose p21241 to illustrate
the efﬁciency of our heuristics, because the exhaustive method [8]
was found to be inadequate for wrapper/TAM co-design for p21241;
the method did not complete even for
￿
6
￿
￿
  . From Table 1, it can
be seen that Partition evaluate evaluates on average only 2% of the
unique partitions. Thus there is a signiﬁcant reduction in the execu-
tion time using this heuristic compared to the exhaustive method.￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
Z
￿
O
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
Z
￿
Z
￿
￿
44 1909 46 0.02 1571 170 0.1
48 2949 46 0.02 2889 48 0.02
52 4401 65 0.01 5059 100 0.02
56 6374 111 0.02 8499 110 0.01
60 9000 278 0.03 13776 172 0.01
64 12428 708 0.06 21643 256 0.01
Table 1. Efﬁciency of the Partition evaluate heuristic.
3.2 Final optimization step
Partition evaluate provides a fast approximation of the optimal
values of TAM width partition and testing time. We further improve
on this result by performing a ﬁnal optimization step using the ILP
model for
￿
￿
￿ [8]. Since this ﬁnal step is performed only once,
and since the execution time for a single iteration of the ILP model
for
￿
￿
￿
￿ is relatively small, this results in a near-optimal solution to
￿
￿
￿
￿
￿
￿
￿
￿ in a short execution time. Here, we repeat the ILP model
for
￿
￿
￿ from [8] for reasons of completeness, and to comment on
its complexity.
To model
￿
￿
￿ , consider an SOC consisting of
= cores and
￿
TAMs of widths
￿
￿
X
￿
￿
?
X
>
?
>
?
>
X
￿
￿
￿ . The time taken to test Core
￿
assigned to TAM
￿ , given by
￿
￿
￿
￿
￿
￿
￿ clock cycles, is calculated using
Design wrapper. We introduce binary variables
￿
￿
￿ (where
7
￿
￿
￿
￿
￿
= and
7
￿
￿
￿
￿
￿
6
￿ ), which are used to determine the assignment
of cores to TAMs in the SOC. Let
￿
￿
￿
￿ be a 0-1 variable deﬁned as
follows:
￿
￿
￿
￿
￿
7
B
X if Core
￿ is assigned to TAM
￿
￿
Y
X otherwise
The time needed to test allcores on TAM
￿ is given by
"
￿
￿
￿
1
*
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿ . Since all the TAMs can be used simultaneously for testing, the
system testing time equals
￿
￿
￿
￿
￿
￿
 
￿
￿
￿
￿
"
￿
￿
1
*
￿
￿
￿
￿
￿
￿
￿
￿
￿
"
￿
!
￿
"
￿
￿
￿ . The ILP
model for
￿
￿
￿
￿ can be formulated as follows.
Objective: Minimize testing time
￿
, subject to
1.
￿
9
6
"
￿
￿
+
*
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
#
￿
$
￿
￿
￿
n
X
G
7
￿
￿
￿
%
￿
￿ , i.e.,
￿
is the maximum
testing time on any TAM
2.
"
￿
￿
C
*
￿
￿
￿
￿
￿
L
7 ,
7
&
￿
￿
#
￿
= , i.e., every core is assigned to exactly
one TAM
The number of variables and constraints for this model (a measure of
its complexity) is given by
=
’
￿
t
￿ , which is
;
￿
>
=
?
￿ , and
=
A
￿ , which
is
;
￿
>
=
￿ , respectively. This ILP model uses the best width partition
obtained from Partition evaluate to optimize the core assignment and
obtain a near-optimal wrapper/TAM architecture in the ﬁnal step of
our co-optimization methodology.
4 Experimental results
In this section, we present experimental results on our wrap-
per/TAM co-optimization methodology for four example SOCs. The
ﬁrst, d695, is an academic benchmark SOC from Duke University.
The other three SOCs p93791, p21241, and p31108 are from Philips.
The number (e.g., 93791) in each SOC name is a measure of its test
complexity. We calculate the SOC test complexity number using the
formula presented in [8].
The experimental results presented in this paper were obtained at
Duke University using a Sun Ultra 10 with a 333 MHz processor and
256 MB memory. The results in [8] were obtained at Philips Re-
search Laboratories using a Sun Ultra 80 with a 450 MHz processor
and 4096 MB memory. For the problems in this paper, we found that
the Sun Ultra 80 leads to ﬁve times faster execution compared to the
Sun Ultra 10. Therefore, the CPU times reported in [8] have been
multiplied by a factor of ﬁve to facilitate a comparison with the CPU
times reported here. Note that we achieve an order of magnitude im-
provement in CPU time over [8] even without the 5
( adjustment fac-
tor. Note also that all SOC testing times in this section are expressed
in clock cycles.
4.1 Results for SOC d695
In this subsection, we present experimental results for SOC d695.
SOCd695 consists of two ISCAS’85 and eight ISCAS’89 benchmark
circuits [8].
Results in [8] for
￿
￿
￿
￿
)
Core Testing time Exec. time
￿
*
￿
,
+
*
.
-
assignment
/
1
0
￿
3
2 (cycles)
4
￿
0
￿
3
2 (sec)
16 6+10 (1,2,1,1,2,2,1,1,2,1) 45055 5
24 6+18 (2,1,1,1,2,2,1,1,1,2) 29501 5
32 11+21 (1,2,1,1,2,2,2,1,1,1) 25442 5
40 8+32 (2,1,1,1,2,2,1,1,2,2) 21359 10
48 16+32 (2,1,1,1,2,1,2,2,2,2) 19938 10
56 19+37 (1,2,1,1,2,1,2,2,1,2) 18434 10
64 20+44 (1,2,1,1,2,1,2,2,1,2) 18205 15
(a)
New co-optimization method for
￿
￿
￿
￿
)
Core
/
￿
5
￿
7
6
4
"
5
￿
8
6
9
/ Ratio
￿
*
￿
:
+
*
.
-
assignment (cycles) (sec) (%)
;
5
￿
7
6
;
0
￿
3
2
16 8+8 (2,1,2,1,1,2,1,2,1,2) 45055 1 +0.00 0.2
24 12+12 (2,2,1,1,1,2,1,1,2,2) 34455 1 +16.79 0.2
32 16+16 (2,1,2,1,2,1,1,2,1,2) 25828 1 +1.52 0.2
40 20+20 (2,1,1,1,1,2,1,2,1,2) 22848 1 +6.97 0.1
48 20+28 (1,2,1,2,1,1,2,2,1,2) 22804 1 +14.37 0.1
56 23+33 (2,2,2,2,2,1,1,1,2,2) 18940 1 +2.74 0.1
64 32+32 (1,1,1,2,1,1,2,2,1,2) 18869 1 +3.65 0.08
(b)
Results in [8] for
￿
￿
￿
￿
<
*
￿
+ Core
/
=
0
￿
3
2
4
￿
0
￿
3
2
￿
*
.
-
+
*
’
assignment (cycles) (sec)
16 3+5+8 (2,2,1,1,2,3,1,1,3,3) 42568 20
24 2+5+17 (2,2,2,1,3,3,3,1,3,2) 28292 50
32 4+10+18 (1,2,2,1,2,3,3,1,1,3) 21566 80
40 4+17+19 (2,1,2,1,2,3,2,1,2,3) 17901 120
48 4+19+25 (3,3,3,2,3,2,3,1,3,1) 16975 200
56 5+18+33 (3,2,1,1,3,2,3,1,2,3) 13207 265
64 5+17+42 (2,2,2,1,3,2,3,1,2,3) 12941 420
(c)
New co-optimization method for
￿
￿
￿
￿
<
*
￿
+ Core
/
5
￿
7
6
4
5
￿
8
6
9
/ Ratio
￿
*
.
-
+
*
’
assignment (cycles) (sec) (%)
;
5
￿
7
6
;
0
￿
3
2
16 5+5+6 (1,1,1,2,1,3,3,2,2,2) 42952 1 +0.9 0.06
24 8+8+8 (3,1,3,2,3,1,2,3,1,2) 30032 1 +6.15 0.03
32 8+12+12 (2,2,2,2,2,1,3,3,3,3) 24851 1 +15.23 0.02
40 7+16+17 (2,2,2,1,2,3,2,1,1,3) 18448 1 +3.06 0.01
48 16+16+16 (3,2,3,2,3,1,3,1,2,2) 17581 1 +3.57 0.01
56 17+19+20 (2,2,2,1,3,2,2,3,1,1) 15510 1 +17.44 0.004
64 18+20+26 (2,2,1,1,2,3,2,3,1,1) 15442 1 +19.33 0.003
(d)
Table 2. Results for d695 (Problem
￿
￿
￿
￿
￿ ).
Table 2 compares the results obtained in [8] with the results of
the new wrapper/TAM co-optimization method for d695 for
￿
I
￿
￿
￿
and
￿
￿
H
  (Problem
￿
￿
￿
￿
￿
￿ ). The testing times
￿
w
B
x
R of the new
method are comparable to the testing times
￿
?
>
￿
@ in [8]. However, the
CPU times
A
w
z
x
R of the new method are at least an order of mag-
nitude less than the CPU times
A
>
￿
@ in [8] for larger values of
￿ .
The core assignment vector follows the notation introduced in [5]
and further used in [8]. Each position in the vector refers to the core
number and the entry in each position refers to the TAM to which
the corresponding core is assigned. The percentage change in testing
time using the new method is calculated using the formula
B
￿
(%)
￿
D
C
5
￿
8
6
￿
C
0
￿
3
2
C
0
￿
3
2
(
7
G
￿
z
￿ .New co-optimization method
TAM Core
/
5
￿
8
6
4
5
￿
7
6
9
/ Ratio
￿
￿
partition assignment (cycles) (sec) (%)
;
5
￿
8
6
;
0
￿
3
2
16 4 3+3+ (3,3,4,1,3, 42644 1 +0.18 0.06
5+5 4,1,4,1,2)
24 3 8+ (3,1,3,2,3, 30032 1 +6.15 0.03
8+8 1,2,3,1,2)
32 4 7+6+ (3,4,3,2,4, 22268 1 +3.26 0.02
9+10 3,2,2,1,1)
40 3 7+16+ (2,2,2,1,2, 18448 1 +3.06 0.01
17 3,2,1,1,3)
48 5 5+2+8+ (3,4,3,2,4, 15300 2 -9.86 0.01
16+17 5,5,1,4,3)
56 5 5+8+11+ (5,3,5,1,5, 12941 2 -2.01 0.009
16+16 4,2,1,2,3)
64 6 5+3+8+ (4,3,4,1,6, 12941 7 +0.00 0.02
12+18+18 5,3,1,2,4)
Table 3. New results for d695 (
￿
￿
￿
￿
￿
￿
￿
￿ ).
While the methods of [8] are limited to
￿
￿
!
  due to high com-
putation cost, the heuristic proposed in this paper can evaluate more
efﬁcient TAM designs with higher values of
￿ . Table 3 presents the
results obtained with the new wrapper/TAM co-optimization method-
ology for d695 over a larger number of TAMs (Problem
￿
￿
￿
￿
￿
￿
￿
￿ ,
￿
￿
[
7
G
￿ ). The testing times and CPU times in [8] have already been
presented in Table 2. The best results in [8] were obtained for
￿
[
￿
!
  .
The testing times obtained using the new co-optimization technique
are better than or equal to the best testing times in [8] for larger values
of
￿ . For
￿
￿
L
4
y
￿ , the new testing times are on average only 3%
larger than those reported in [8]. We improve upon the testing times
compared to [8] for larger values of
￿ because the exhaustive ap-
proach of [8] did not terminate within a reasonable CPU time for
these values of
￿ for larger
￿ . Only the best results for
￿
￿
 
were reported in [8]. Since the exhaustive method did not terminate
within a reasonable CPU time for
￿
9
I
  , the ratio
￿
5
￿
7
6
￿
0
￿
3
2
reported
in Table 3 is obtained using the value of
A
>
￿
@ for
￿
￿
I
  . There is
an improvement of two orders of magnitude in the CPU times in all
cases. The new technique is therefore scalable for industrial SOCs
having multiple TAMs, as illustrated in the following subsections.
4.2 Results for SOC p21241
SOC p21241 contains 28 cores. Of these, 6 are memory cores
and 22 are scan-testable logic cores. Table 4 presents a summary
of the data for the 28 cores. Test data for this SOC has not been
published before.
Number range Scan chain
Circuit Test Functional Scan lengths
(core) patterns I/Os chains Min Max
Logic
cores 1–785 37–1197 1–31 1 400
Memory
cores 222–12324 52–148 0 – –
Table 4. Ranges in test data for the 28 cores in p21241.
Tables 5 and 6 present experimental results for p21241 for
￿
[
￿
￿ .
The exhaustive method [8] did not run to completion for
￿
￿
￿
  ,
even after two days of execution. The results for
￿
￿
￿
￿
￿
￿
￿ over a
larger number of TAMs (
7
￿
￿
￿
￿
￿
7
G
￿ ) using the new co-optimization
method are presented in Table 7. For
￿
9
!
￿
q
4 , the testing times ob-
tained using the new co-optimization technique are on average 25%
lower than those obtained using the Exhaustive method in [8]. This is
because using Partition evaluate, we were able to partition
￿ among
a larger number of TAMs than was possible using Exhaustive. The
values of
A
>
￿
@ in Table 7 are for
￿
￿
$
￿ . The new CPU times are
Exhaustive method
Core Testing time Exec. time
￿
*
￿
+
*
-
assignment
/
0
￿
3
2 (cycles)
4
0
￿
3
2 (sec)
16 6+10 (2,1,2,2,2,1,1,1,1,1,1,1,1,1, 462210 11
1,2,1,1,1,1,1,1,1,1,2,2,2,1)
24 8+16 (2,2,2,1,1,2,1,1,1,1,1,1,2,1, 361571 24
1,1,1,1,1,1,1,1,1,1,2,2,2,1)
32 10+22 (1,2,2,2,1,2,2,2,2,1,1,2,2,2, 312659 49
1,1,2,1,2,1,1,2,2,1,2,2,1,2)
40 10+30 (1,2,2,2,2,2,2,2,2,1,1,2,1,2, 278359 60
1,1,1,1,2,1,1,1,1,1,2,2,2,1)
48 10+38 (1,2,2,2,2,2,2,2,2,2,2,2,1,1 268472 84
1,1,1,1,2,1,1,1,1,2,2,2,2,2)
56 10+46 (1,2,2,2,2,2,2,2,2,2,2,2,2,1, 266800 80
1,1,1,1,1,1,1,2,1,1,2,2,1,2)
64 10+54 (1,2,2,2,2,2,2,2,2,1,1,2,1,2, 260638 122
2,1,2,1,2,1,1,2,2,2,2,2,2,1)
Table 5. Exhaustive results for p21241 for
￿
[
￿
￿
￿ (
￿
￿
￿
￿
￿ ).
New co-optimization method
Core
/
￿
5
￿
7
6
4
"
5
￿
8
6
9
/ Ratio
￿
*
￿
+
*
-
assignment (cycles) (sec) (%)
;
5
￿
8
6
;
0
￿
3
2
16 6+10 (2,1,2,2,2,1,1,1,1,1, 462210 1 +0.00 0.1
1,1,1,1,1,2,1,1,1,
1,1,1,1,1,2,2,2,1)
24 10+14 (1,2,1,1,2,2,2,2,2,1, 365947 2 +1.21 0.08
1,2,2,1,1,1,1,1,1,
1,1,1,1,1,1,2,2,1)
32 10+22 (1,2,2,2,1,2,2,2,2,1, 312659 9 +0.00 0.18
1,2,2,2,1,1,2,1,2
1,1,2,2,1,2,2,1,2)
40 12+28 (1,2,2,2,1,2,2,2,2,1, 290644 2 +4.41 0.03
1,1,2,1,1,1,2,2,1
1,1,1,1,2,2,2,1,1)
48 20+28 (1,2,1,2,1,2,2,1,2, 290644 1 +8.25 0.01
1,1,1,1,2,1,1,2,1,
1,1,2,2,2,2,1,2,2,2)
56 28+28 (1,2,1,1,1,1,1,2,1,1 290644 1 +8.97 0.01
1,1,1,1,2,1,2,1,2
1,2,1,1,1,2,2,1,1,1)
64 29+35 (1,2,1,1,1,1,1,2,1,1, 271330 6 +4.10 0.05
1,1,1,1,2,1,x,1,2,
1,2,1,1,1,2,2,1,1)
Table 6. New results for p21241 for
￿
L
￿
￿
￿ (
￿
￿
￿
￿
￿ ).
comparable to the CPU times of the Exhaustive method. The ﬁnal
optimization step took between 10 and 20 seconds to complete in
all cases. This was because the ILP models for this SOC were par-
ticularly intractable. This also explains why the Exhaustive method
from [8] did not terminate within a reasonable CPU time for
￿
￿
￿
￿
for p21241.
For
￿
￿
￿
￿
￿
￿
￿ and
￿
￿
7
?
2 , the Partition evaluate heuristic re-
turns a partition of 1+1+4+10. This yields a testing time of 468011
cycles after the ﬁnal optimization step (Table 7). However, a lower
testing time of 462210 cycles is actually achievable using only 2
TAMs (Table 6). This is because the testing time for four TAMs re-
turned by Partition evaluate is lower than that for two TAMs before
the ﬁnaloptimization step. Similarly, for
￿
￿
;
2
B
4 , Partition evaluate
obtains a partition of 13+10+10+10+21 (ﬁve TAMs), which gives
a lower (heuristic) testing time than does the partition obtained for
￿
￿
\
2 : 5+4+10+10+10+17 (six TAMs). However, after the ﬁnal
optimization step, the partition for
￿
￿
\
2 is able to achieve the
same testing time as that for
￿
￿
￿
2
B
4 . This anomalous behavior of
our algorithm is due to the fact that Partition evaluate uses heuristics
to quickly approximate the ﬁnal result. Therefore, the partition that
actually provides the lowest testing time after ﬁnal (exact) optimiza-
tion might not be returned by Partition evaluate.
4.3 Results for SOC p31108
SOC p31108 contains 19 cores. Of these, 15 are memory cores
and 4 are scan-testable logic cores. Table 8 presents the data forNew co-optimization method
TAM Core
/
5
￿
7
6
4
5
￿
8
6
9
/ Ratio
￿
￿
partition assignment (cycles) (sec) (%)
;
5
￿
8
6
;
0
￿
3
2
16 4 1+1+ (4,3,4,4,4,4,3,2,3,2, 468011 10 +1.25 0.9
4+10 2,1,4,2,4,2,2,1,1,
2,1,1,3,1,1,1,4,1)
24 3 4+10+ (2,2,2,1,3,3,3,1,2,1, 313607 41 -13.27 1.71
10 1,2,1,1,1,1,1,1,1,
1,1,1,1,1,2,3,2,1)
32 4 1+10+ (4,4,3,2,2,2,2,2,2,1, 246332 192 -21.21 3.92
10+11 1,3,4,2,1,1,2,2,1,
3,4,1,2,4,2,3,1,1)
40 5 5+5+ (4,4,3,5,3,1,1,1,3,1, 232049 60 -16.64 1.00
10+10+ 1,5,3,2,1,3,5,3,1
10 4,3,1,1,4,4,5,3,5)
48 6 5+3+ (5,5,2,2,3,2,3,2,3,2, 232049 15 -13.57 0.18
10+10+ 2,2,2,2,2,2,2,3,2,
10+10 2,4,2,2,2,3,6,2,1)
56 6 5+4+ (5,5,2,2,3,2,3,2,3,2, 153990 69 -42.28 0.86
10+10+ 2,2,2,2,2,2,2,3,2,
10+17 2,4,2,2,2,3,6,2,1)
64 5 13+10+ (6,4,4,3,2,4,4,5,4,2, 153990 138 -40.92 1.31
10+10+ 4,6,5,4,3,5,4,4,3,
21 5,6,4,5,1,6,5,5,2)
Table 7. New results for p21241 (
￿
￿
￿
￿
￿
￿
￿
￿ ).
the 19 cores. Test data for this SOC has not been published before.
Number range Scan chain
Circuit Test Functional Scan lengths
(core) patterns I/Os chains Min Max
Logic
cores 210–745 109–428 1–29 8 806
Memory
cores 128–12236 11–87 0 – –
Table 8. Ranges in test data for the 23 cores in p31108.
Exhaustive method
Core Testing time
4
0
￿
3
2
￿
*
￿
:
+
*
-
assignment
/
1
0
￿
3
2 (cycles) (sec)
16 8+8 (1,2,2,2,2,1,1,1,1,2, 1080940 5
1,1,2,2,1,1,1,1,1)
24 9+15 (1,2,2,2,2,2,1,2,1,2, 820870 7
2,1,1,2,1,1,2,1,2)
32 11+21 (1,2,2,2,2,1,1,2,2,2, 733394 9
2,2,2,2,1,1,2,1,2)
40 15+25 (1,2,2,2,2,1,2,2,2,2, 721564 10
2,2,2,1,2,1,1,1,2)
48 16+32 (1,1,2,2,2,2,1,2,2,1, 709262 14
2,1,1,2,2,1,1,2,2)
56 16+40 (1,1,2,2,2,2,1,2,2,1, 704659 18
2,2,2,1,2,1,1,2,2)
64 16+48 (1,1,2,2,2,2,2,2,2,1 700939 18
2,2,2,2,2,2,2,2,2)
Table 9. Exhaustive results for p31108 for
￿
W
￿
!
￿ (
￿
￿
￿
￿
￿
￿ ).
Tables 9, 10, 11, and 12 compare the results obtained by the ex-
haustive method with the results obtained by the new co-optimization
method for p31108 for
￿
￿
H
￿ and
￿
￿
H
  (Problem
￿
￿
￿
￿
￿
￿ ). For
￿
￿
￿
4 , the exhaustive method of [8] did not provide a solution even
after two days of CPU time. Table 13 presents the new experimen-
tal results for p31108 (Problem
￿
￿
￿
￿
￿
￿
￿
￿ ). For
￿
￿
 
y
￿ , the testing
times obtained using the new co-optimization technique are on av-
erage 15% higher than those obtained using the Exhaustive method.
For
￿
9
[
4
y
￿ , we reach the optimum testing time of 544579 cycles.
The testing time of this SOC does not decrease beyond 544579 cy-
cles as
￿ is increased beyond 40 and
￿ is increased beyond 3. This
is because the testing time for Core 18 in p31108 reaches a mini-
mum value of 544579 cycles when the width of the TAM to which
it is assigned reaches 10 bits. Note that in Tables 11, 12 and 13,
for
￿
￿
9
6
4
y
￿ , Core 18 is always assigned to a TAM, whose width
New co-optimization method
*
￿
+ Core
/
5
￿
8
6
4
5
￿
8
6
9
/ Ratio
￿
*
.
-
assignment (cycles) (sec) (%)
;
5
￿
8
6
;
0
￿
3
2
16 8+8 (1,2,2,2,2,1,1,1,1,2, 1080940 1 +0.00 0.2
1,1,2,2,1,1,1,1,1)
24 10+14 (2,1,2,2,2,1,1,2,1,1, 928782 1 +13.15 0.14
2,1,1,2,1,1,1,2,2)
32 16+16 (1,2,1,2,2,2,1,2,1,2, 750490 1 +2.33 0.11
2,1,1,1,1,2,1,1,2)
40 16+24 (1,2,2,2,2,2,2,2,2,2, 721566 1 +0.0002 0.1
2,2,1,2,2,2,1,1,2)
48 16+32 (1,1,2,2,2,2,1,2,2,1, 709262 1 +0.00 0.07
2,1,1,2,2,1,1,2,2)
56 16+40 (1,1,2,2,2,2,1,2,2,1, 704659 1 +0.00 0.06
2,2,2,1,2,1,1,2,2)
64 16+48 (1,1,2,2,2,2,2,2,2,1 700939 1 +0.00 0.06
2,2,2,2,2,2,2,2,2)
Table 10. New results for p31108 for
￿
W
￿
￿
￿ (
￿
￿
￿
￿
￿
￿ ).
Exhaustive method
*
￿
 
+ Core Testing time Exec. time
￿
*
.
-
+
*
’
assignment
/
1
0
￿
3
2 (cycles)
4
=
0
￿
3
2 (sec)
16 1+7+8 (1,3,2,3,2,1,1,2,3,3, 998733 222
3,1,2,2,3,1,2,2,1)
24 9+7+8 (2,3,2,2,1,2,2,3,2,2, 720858 325
3,3,3,2,3,3,2,1,2)
32 17+5+ (2,1,3,3,2,2,2,1,2,1, 591027 1576
10 2,2,1,2,1,2,2,3,2)
40 9+10+ (1,3,1,1,3,3,1,3,1,1, 544579 1081
21 3,1,1,1,3,1,3,2,3)
48 9+10 (1,3,1,1,3,3,1,3,1,1, 544579 6198
29 3,1,1,1,3,1,3,2,3)
56 9+10 (1,3,1,1,3,3,1,3,1,1, 544579 11331
37 3,1,1,1,3,1,3,2,3)
64 17+15+ (3,3,1,1,3,1,3,3,3,1, 544579 1125
32 1,1,1,3,3,3,3,2,1)
Table 11. Exhaustive results for p31108:
￿
W
￿
;
  (
￿
￿
￿
￿
￿
￿ ).
is always 10 bits or more and which does not have any other cores
assigned to it; thus our method achieves the theoretical lower bound
on testing time for this SOC. For
￿
￿
￿
\
2 and
￿
￿
2
B
4 , TAM 1 is
not used since the algorithm is able to assign the cores to the remain-
ing TAMs, while achieving the lower bound of 544579 cycles. Re-
sults are shown for six TAMs, however, since the Partition evaluate
heuristicobtains alower testingtime forsix TAMsthan for ﬁveTAMs
before the ﬁnal optimization step. The values of
A
>
￿
@ in Table 13 are
for
￿
￿
￿
W
  . The new CPU times are on average between 1 and 2 or-
ders of magnitude less than the CPU times of the exhaustive method.
This is because the individual
￿
￿
￿ Exhaustive models for p31108
took particularly long to solve. This signiﬁcantly affected the CPU
time of the exhaustive method for
￿
￿
￿
￿
￿ and
￿
￿
￿
￿
￿
￿
￿ .
New co-optimization method
*
￿
+ Core
/
￿
5
￿
7
6
4
"
5
￿
7
6
9
/ Ratio
￿
*
.
-
+
*
’
assignment (cycles) (sec) (%)
;
5
￿
7
6
;
0
￿
3
2
16 4+6+ (3,2,1,3,1,1,1,1,1,1, 1174710 10 +17.62 0.04
6 3,3,1,3,3,3,1,3,1)
24 6+9+ (1,2,1,2,3,2,2,1,3,1, 729872 10 +1.25 0.03
9 1,2,3,1,1,3,2,3,2)
32 6+12+ (1,3,1,1,3,3,3,2,3,1, 680591 13 +15.15 0.008
14 1,3,1,3,3,3,3,2,2)
40 9+15+ (1,3,3,3,3,3,1,3,3,1, 544579 12 +0.00 0.01
16 1,1,1,3,1,1,3,2,3)
48 9+16+ (1,3,3,3,3,3,1,3,3,1, 544579 12 +0.00 0.002
23 1,1,1,3,1,1,3,2,3)
56 9+16+ (1,3,3,3,3,3,1,3,3,1, 544579 12 +0.00 0.001
31 1,1,1,3,1,1,3,2,3)
64 9+16+ (1,3,3,3,3,3,1,3,3,1, 544579 11 +0.00 0.0.01
39 1,1,1,3,1,1,3,2,3)
Table 12. New results for p31108 for
￿
W
￿
;
  (
￿
￿
￿
￿
￿
￿ ).New co-optimization method
TAM Core
/
5
￿
8
6
4
5
￿
8
6
9
/ Ratio
￿
￿
partition assignment (cycles) (sec) (%)
;
5
￿
8
6
;
0
￿
3
2
16 4 3+3+ (1,3,3,3,2,2,3,1,3,2, 1033210 27 +3.45 0.12
5+5+ 1,3,1,2,1,3,3,4,1)
24 4 5+5+ (2,3,2,1,4,2,1,4,3,2, 882182 27 +22.38 0.08
6+8+ 4,3,1,4,2,1,3,4,2)
32 5 5+4+ (3,5,3,5,2,2,3,3,5,3, 663193 51 +12.21 0.03
6+8+9 5,3,1,3,2,3,5,4,1)
40 4 3+7+ (2,3,1,1,3,1,2,3,1,2, 544579 39 +0.00 0.04
15+15 1,1,2,2,1,1,1,4,1)
48 5 5+3+8+ (5,5,2,3,5,5,5,2,5,3, 544579 205 +0.00 0.03
15+17 1,2,2,2,2,5,5,4,2)
56 6 5+3+5+ (6,6,6,6,4,6,6,6,6,4, 544579 109 +0.00 0.01
8+15+20 3,4,6,2,3,4,6,5,4)
64 6 5+3+5+ (6,6,6,6,4,6,6,6,6,4, 544579 288 +0.00 0.26
8+15+28 3,4,6,2,3,4,6,5,4)
Table 13. New results for p31108 (
￿
￿
￿
￿
￿
￿
￿
￿ ).
4.4 Results for SOC p93791
Number range Scan chain
Circuit Test Functional Scan lengths
(core) patterns I/Os chains Min Max
Logic
cores 11–6127 109–813 11–46 1 521
Memory
cores 42–3085 21–396 0 – –
Table 14. Ranges in test data for the 32 cores in p93791.
Exhaustive method
Core Testing time Exec. time
￿
*
￿
+
*
-
assignment
/
0
￿
3
2 (cycles)
4
0
￿
3
2 (sec)
16 4+12 (2,2,1,2,1,2,2,2,2,1,1,2, 1798740 6
2,2,2,2,1,2,2,1,2,2,2,1,
1,1,2,2,2,2,2,1)
24 1+23 (2,1,1,1,2,2,1,1,1,2,1,2, 1211740 24
2,2,1,2,2,1,2,2,1,1,2,1,
1,1,2,2,2,1)
32 9+23 (1,2,1,2,1,2,1,1,1,2,1,2, 894342 33
2,2,1,1,1,1,2,2,1,2,2,1,
1,1,2,1,1,1,1,1)
40 17+23 (1,2,2,1,1,2,1,1,2,1,1,1, 747378 30
2,2,2,2,2,1,1,1,1,2,2,1,
1,1,2,2,2,2,1,1)
48 2+46 (2,1,1,2,2,2,1,1,1,2,1,2, 622199 85
2,2,1,2,2,2,2,2,2,2,2,1,
1,1,2,2,2,1,1,1)
56 10+46 (2,1,1,1,1,2,1,1,1,1,1,2, 524203 66
2,2,1,1,2,1,1,2,1,1,2,1,
1,1,2,1,1,1,1,1)
64 18+46 (1,2,1,2,1,2,1,1,2,2,1,2, 467424 71
2,2,1,2,2,2,1,2,2,2,2,1
1,2,2,1,1,2,2,1)
Table 15. Exhaustive results for p93791:
￿
W
￿
￿
￿ (
￿
￿
￿
￿
￿ ).
SOC p93791 contains 32 cores [8]. Of these 32 cores, 18 are
memory cores and 14 are scan-testable logic cores. A summary of
the 32 cores is presented in Table 14.
Tables 15, 16, 17, and 18 compare the exhaustive results for
p93791 with the new results for
￿
W
￿
￿
￿ and
￿
W
￿
;
  (Problem
￿
￿
￿
￿
￿
￿ ).
Note that for both
￿
￿
￿ and
￿
￿
  , the new co-optimization
method results in optimum testing times for several values of
￿ .
Here too, we did not achieve a solution with the exhaustive method
after two days of execution for
￿
:
￿
[
4 . Table 19 presents results for
￿
￿
￿
￿
￿
￿
￿ . The testing times obtained using the new co-optimization
technique are comparable to the best testing times using the exhaus-
tive method for all values of
￿ , and equal to the results in [8] for
￿
￿
I
4
y
V and 56. The testing times are on average 3% larger than
those reported in [8]. The CPU times of the new Partition evaluate
New co-optimization method
*
￿
+ Core
/
5
￿
7
6
4
5
￿
7
6
9
/ Ratio
￿
*
-
assignment (cycles) (sec) (%)
;
5
￿
8
6
;
0
￿
3
2
16 1+15 (2,1,1,1,2,2,1,1,1,1,1, 1952800 1 +8.56 0.17
2,2,2,1,1,2,1,2,2,1,1,
2,1,1,1,2,1,1,1,2,2)
24 8+16 (2,1,1,1,1,2,1,1,1,2,1, 1217980 1 +0.51 0.04
2,1,2,1,1,1,1,2,1,1,1,
2,1,1,1,2,2,1,1,1,2)
32 9+23 (1,2,1,2,1,2,1,1,1,2,1, 894342 1 +0.00 0.03
2,2,2,1,1,1,1,2,2,1,2,
2,1,1,1,2,1,1,1,1,1)
40 16+24 (1,2,2,2,2,2,1,1,2,1,1, 750311 1 +0.39 0.033
2,1,1,2,2,2,2,2,1,2,2,
2,2,1,2,2,2,2,2,2,1)
48 23+25 (1,1,2,2,2,2,1,1,1,2,2, 632474 1 +1.65 0.01
2,2,1,1,2,2,1,2,1,2,2,
1,2,2,2,1,2,1,1,2,2)
56 10+46 (2,1,1,1,1,2,1,1,1,1,1, 524203 1 +0.00 0.02
2,2,2,1,1,2,1,1,2,1,1,
2,1,1,1,2,1,1,1,1,1)
64 18+46 (1,2,1,2,1,2,1,1,2,2,1, 467424 1 +0.00 0.01
2,2,2,1,2,2,2,1,2,2,2,
2,2,1,2,2,1,1,2,2,1)
Table 16. New results for p93791 for
￿
W
￿
￿
￿ (
￿
￿
￿
￿
￿
￿ ).
Exhaustive method
*
￿
+ Core Testing time Exec. time
￿
*
-
+
*
’
assignment
/
0
￿
3
2 (cycles)
4
0
￿
3
2 (sec)
16 5+3+8 (3,1,1,3,1,2,1,1,3,3,2, 1771720 25
3,3,2,1,3,1,1,1,3,1,2,
3,2,1,1,3,2,1,3,2,1)
24 7+8+9 (3,1,2,1,1,2,2,2,3,3,1, 1187990 50
1,2,2,1,1,3,2,1,3,2,1,
3,3,2,1,1,1,1,1,2,3)
32 4+5+23 (1,1,1,1,2,3,1,2,1,2,1, 887751 85
3,3,3,1,1,3,1,2,3,1,1,
3,1,1,1,3,2,2,1,1,2)
40 6+12+23 (3,1,1,1,2,3,1,1,1,2,1, 698583 130
3,2,2,1,1,1,1,2,3,2,1,
1,1,1,1,3,1,2,1,1,2)
48 9+16+23 (3,2,2,2,2,3,2,2,1,2,1, 599373 210
2,1,2,2,2,2,2,2,3,2,2,
3,2,2,1,1,2,2,2,2,3)
56 10+23+23 (1,3,2,3,2,3,3,3,3,3,1, 514688 270
2,2,2,2,3,2,1,2,3,3,3,
2,1,1,3,3,3,2,2,2,2)
64 18+23+23 (3,2,3,2,2,3,2,2,2,3,3, 460328 440
2,2,2,2,3,2,1,1,1,3,2,
3,1,1,2,1,3,2,2,3,2)
Table 17. ILP results for p93791:
￿
L
￿
;
  (
￿
￿
￿
￿
￿ ).
algorithm are between two and three orders of magnitude smaller
than the CPU times of the exhaustive method. This is because Parti-
tion evaluate is able to effectively prune the solution space by halting
evaluation of unnecessary partitions, for which the testing time of a
TAM exceeds the previous minimum value for the SOC. For exam-
ple, of the 341 unique partitions for
￿
￿
￿
2
B
4 and
￿
$
￿
I
  , only 23
were evaluated to completion. Furthermore, the new heuristic algo-
rithm Core assign makes it possible to evaluate wrapper/TAM archi-
tectures for industrial SOCs having multiple TAMs, which was not
feasible using the methods in [8].
5 Conclusion
We have presented a new efﬁcient technique for co-optimization
of the wrapper/TAM architecture for industrial SOCs. The gen-
eral wrapper/TAM co-optimization problem has been formulated as
a progression of four problems. The ﬁrst problem
￿
￿
￿ relating to
wrapper design is solved using an efﬁcient algorithm presented ear-
lier. For the second problem
￿
￿
￿ , relating to core assignment
among TAMs of ﬁxed widths, we have presented an efﬁcient pro-
cedure called Core assign that executes signiﬁcantly faster than anNew co-optimization method
*
￿
+ Core
/
=
5
￿
8
6
4
"
5
￿
8
6
9
/ Ratio
￿
*
.
-
+
*
’
assignment (cycles) (sec) (%)
;
5
￿
7
6
;
0
￿
3
2
16 5+5+6 (2,1,2,1,3,3,1,1, 1786200 2 +0.82 0.08
1,2,1,2,3,3,3,1,
1,1,2,2,1,3,2,1,
1,1,3,3,1,3,1,1)
24 8+8+8 (2,1,1,3,1,3,1,1, 1209420 3 +1.80 0.06
2,1,1,2,1,1,2,1,
1,1,1,3,3,1,2,1,
1,3,2,1,1,2,3,3)
32 4+5+23 (1,1,1,1,2,3,1,2, 887751 2 +0.00 0.02
1,2,1,3,3,3,1,1,
3,1,2,3,1,1,3,1,
1,1,3,2,2,1,1,2)
40 6+10+24 (3,1,1,1,2,3,1,2, 741965 1 +4.60 0.01
2,1,1,3,1,1,2,2,
2,1,1,3,1,1,2,2,
1,2,3,1,1,1,1,3)
48 9+16+23 (3,2,2,2,2,3,2,2, 599373 3 +0.00 +0.01
1,2,1,2,1,2,2,2,
2,2,2,3,2,2,3,2,
2,1,1,2,2,2,2,3)
56 10+23+23 (1,3,2,3,2,3,3,3, 514688 3 +0.00 0.01
3,3,1,1,2,2,2,3,
2,1,2,3,3,3,2,1,
1,3,3,3,2,2,2,2)
64 15+23+26 (2,1,2,1,3,3,1,1, 473997 2 +2.96 0.004
1,3,1,3,1,1,3,2,
2,2,1,2,1,2,2,1,
1,3,3,2,2,1,1,1)
Table 18. New results for p93791 for
￿
W
￿
;
  (
￿
￿
￿
￿
￿ ).
ILP model for the same problem presented earlier. The third and
fourth problems in the progression,
￿
￿
￿
￿
￿
￿ and
￿
￿
￿
￿
￿
￿
￿
￿ relate to
determining a partition of TAM width and an effective number of
TAMs for the SOC, such that testing time is minimized. These two
problems have been solved using a new heuristic procedure called
Partition evaluate that quickly reaches within the neighborhood of
the optimal solution to
￿
￿
￿
￿
￿ and
￿
￿
￿
￿
￿
￿
￿ . Partition evaluate uses
extensive solution-space pruning to identify an effective TAM parti-
tion for the SOC. Finally, the existing ILP model for
￿
￿
￿
￿ is used
to optimize the core assignment and testing time for the width parti-
tion produced by Partition evaluate. Experimental results for several
industrial SOCs demonstrate that wrapper/TAM co-optimization can
be effectively carried out in over an order of magnitude less time than
exact methods based on ILP and exhaustive enumeration presented
earlier.
The drawback of the heuristic methods presented in this paper are
that they exhibit anomalous behavior at times. The width partition
and number of TAMs returned by Partition evaluate do not always
provide the lowest testing time after the ﬁnal (exact) optimization
step is performed.
Acknowledgements
We thank Henk Hollman for his help with partition theory. We are
grateful to Graeme Francis, Harry van Herten and Erwin Waterlander
for their help with providing data for the Philips SOCs, and Bart Ver-
meulen, Harald Vranken and Graeme Francis for their comments on
an earlier version of this paper.
References
[1] J. Aerts and E.J. Marinissen. Scan chain design for test time reduction
in core-based ICs. Proc. Int. Test Conf., pp. 448-457, 1998.
[2] M. Berkelaar. lpsolve 3.0, Eindhoven University of Technology, Eind-
hoven, The Netherlands. ftp://ftp.ics.ele.tue.nl/pub/lp solve
[3] P. Brucker. Scheduling Algorithms. 3rd ed., Springer, Berlin, Germany,
2001.
New co-optimization method
TAM Core
/
5
￿
7
6
4
5
￿
8
6
9
/ Ratio
￿
￿
partition assignment (cycles) (sec) (%)
;
5
￿
7
6
;
0
￿
3
2
16 3 5+5+ (2,1,2,1,3,3,1,1, 1786200 2 +0.82 0.08
6 1,2,1,2,3,3,3,1,
1,1,2,2,1,3,2,1,
1,1,3,3,1,3,1,1)
24 3 8+8+ (2,1,1,3,1,3,1,1, 1209420 3 +1.80 0.06
8 ,2,1,1,21,1,2,1,
1,1,1,3,3,1,2,1,
1,3,2,1,1,2,3,3)
32 2 9+23 (1,2,1,2,1,2,1,1, 894342 1 +0.74 0.03
1,2,1,2,2,2,1,1,
1,1,2,2,1,2,2,1,
1,1,2,1,1,1,1,1)
40 3 6+10+24 (3,1,1,1,2,3,1,2, 741965 1 +4.60 0.01
2,1,1,3,1,1,2,2,
2,1,1,3,1,1,2,2,
1,2,3,1,1,1,1,3)
48 3 9+16+ (3,2,2,2,2,3,2,2, 599373 3 +0.00 +0.01
23 1,2,1,2,1,2,2,2,
2,2,2,3,2,2,3,2
2,1,1,2,2,2,2,3)
56 3 10+23+ (1,3,2,3,2,3,3,3, 514688 3 +0.00 0.01
23 3,3,1,1,2,2,2,3,
2,1,2,3,3,3,2,1,
1,3,3,3,2,2,2,2)
64 3 15+23+ (2,1,2,1,3,3,1,1, 473997 2 +2.96 0.004
26 1,3,1,3,1,1,3,2,
2,2,1,2,1,2,2,1,
1,3,3,2,2,1,1,1)
Table 19. New results for p93791 (
￿
￿
￿
￿
￿
￿
￿
￿ ).
[4] K. Chakrabarty. Design of system-on-a-chip test access architectures
under place-and-route and power constraints. Proc. Design Automation
Conf., pp. 432-437, 2000.
[5] K. Chakrabarty. Optimal test access architectures for system-on-a-chip.
ACM Trans. Design Automation of Electronic Systems, vol. 6, pp. 26–
49, January 2001.
[6] M.R. Garey and D.S. Johnson. Computers and Intractability: A Guide
to the Theory of NP-Completeness. W.H. Freeman and Co., San Fran-
cisco, CA, 1979.
[7] J. Hromkovic. Algorithmics for Hard Problems. Springer, Berlin, Ger-
many, 2001.
[8] V. Iyengar, K. Chakrabarty, and E.J. Marinissen. Test wrapper and test
access mechanism co-optimization for system-on-chip. J. Electronic
Testing: Theory and Applications, vol. 18, March 2002, in print.
[9] E. Larsson and Z. Peng. An integrated system-on-chip test framework.
Proc. Design, Automation, and Test in Europe (DATE), pp. 138-144,
2001.
[10] J.H.van Lint and R.M.Wilson, ACourse in Combinatorics, Cambridge
University Press, 1992.
[11] E.J. Marinissen et al. A structured and scalable mechanism for test ac-
cess to embedded reusable cores. Proc. Int. Test Conf., pp. 284-293,
1998.
[12] E.J. Marinissen, S.K. Goel and M. Lousberg. Wrapper design for em-
bedded core test. Proc. Int. Test Conf., pp. 911–920, 2000.
[13] M. Nourani and C. Papachristou. An ILP formulation to optimize test
access mechanism in system-on-chip testing. Proc. Int. Test Conf., pp.
902–910, 2000.
[14] P. Varma and S. Bhatia. A structured test re-use methodology for core-
based system chips. Proc. Int. Test Conf., pp. 294–302, 1998.
[15] Y. Zorian, E.J. Marinissen and S. Dey. Testing embedded-core-based
system chips. IEEE Computer, vol. 32, pp. 52–60, June 1999.