University of Nebraska - Lincoln

DigitalCommons@University of Nebraska - Lincoln
CSE Conference and Workshop Papers

Computer Science and Engineering, Department
of

1995

Parallel Test Generation With Low Communication Overhead
Sivaramakrishnan Venkatraman
LSI Logic Corporation

Sharad C. Seth
University of Nebraska-Lincoln, seth@cse.unl.edu

Prathima Agrawal
AT&T Bell Laboratories, Murray Hill. NJ

Follow this and additional works at: https://digitalcommons.unl.edu/cseconfwork
Part of the Computer Sciences Commons

Venkatraman, Sivaramakrishnan; Seth, Sharad C.; and Agrawal, Prathima, "Parallel Test Generation With
Low Communication Overhead" (1995). CSE Conference and Workshop Papers. 49.
https://digitalcommons.unl.edu/cseconfwork/49

This Article is brought to you for free and open access by the Computer Science and Engineering, Department of at
DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in CSE Conference and
Workshop Papers by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln.

Proceedings of the 8th International Conference on VLSI Design, 1995. doi: 10.1109/ICVD.1995.512088

Parallel Test Generat ion With Low Communication Overhead
Sivaramakrishnan Venkatraman
LSI Logic Corporation
Milpitas, CA 95035

Sharad Seth
University of Nebraska-Lincoln
Lincoln, NE 68588-0115

Abstract

with large number of processors. Once scheduled
a processor can proceed independently. We also
try to minimize the communication overheads for
scheduling a new processor.
(b) Search Space Partitioning: We use a greedy
heuristic for search space partitioning which tends
to break the search space evenly amongst available processors and is easy to implement. Our
heuristic is much simpler than that used by Patil
and Banerjee [l],yet the results show it to be very
effective.

In this paper we present a method of parallelizing
test generation for combinational logic using boolean
satisjiability. We propose a dynamic search-space allocation strategy to split work between the available
processors. This strategy is easy to implement with
a greedy heuristic and is economical in its demand for
inter-processor communication. We derive an analytical model t o predict the performance of the parallel versus sequential implementations. The effectiveness of
our method and analysis is demonstrated b y an implementation on a Sequent (shared memory) multiprocessor. The experimental data shows significant performance improvement in parallel implementation, validates our analytical model, and allows predictions o f
performance for a range of time-out limits and degrees
of parallelism.

1

Prathima Agrawal
AT&T Bell Laboratories
Murray Hill, NJ 07974

( c ) Multiple Heuristics: Both our sequential and parallel algorithms incorporate four different heuristics in order; one is tried only if all the preceding
ones have time out. The effectiveness of orthogonal heuristics has been shown for sequential algorithms [2, 31 and suggested for parallel ones [4]but
not evaluated on actual implementations. Our
results show that in the context of good searchspace partitioning, multiple heuristics are not as
effective for the parallel case as they are for the
sequential case.
(d) Performance Analysis: We believe our use of a
long-tail distribution [5] to model the detection
times of HTD faults accurately captures the intractability of test generation for a small fraction of faults using a given algorithm. The single parameter of the distribution is a measure of
algorithm-specific testability of a circuit for HTD
faults. The model is characterized and validated
by experimental data. It can be used for predictions of performance for a range of time-out and
fault- coverage limits.

Introduction

In this paper we present a parallel implementation
of the boolean satisfiability algorithm in which a test
for a given fault (if it exists) is found cooperatively by
the available processors. The results of test generation
for the hard-to-detect faults in the ISCAS-85 benchmark circuits prove that the parallel test generation
scheme results in significantly improved fault coverage and CPU time. We also propose a probabilistic
model which accurately predicts the performance of
parallel versus sequential implementation.
Our contribution differs from the earlier work in
several important ways:

Due to space limitation, we assume the reader is
familiar with the boolean satisfiability formulation of
the test generation problem and the concepts related
to its solution, as described by Larrabee [3]. In particular, familiarity will be assumed with the following concepts: the implication graph constructed from
the binary clauses in the boolean formula, the base

(a) Processor Scheduling and Scalability: In our
scheme each processor can schedule another idle

processor and distribute work. There is no centralized scheduler that can become a bottleneck
Support of this research by AT&T Bell Laboratories
and a Collaborative Research grant from NATO is gratefully
acknowledged.

116
1063-9667/95 $4.00 0 1995 IEEE

8th International Conference on VLSI Design -January I995

Eevel system for finding a satisfiable solution, and the
heuristics added t o improve the base level system.
Amongst the various suggested heuristics (nonlocal
implications, active clauses, unique sensitization etc.)
we retain only the active clauses in our sequentia 1 im’
plementation and add the following four t o deffine a
static ordering of the variables:
( F l ) Forward Cone with one: Associate with each
variable the number of nodes in the implication graph
that are forced to be true when this variable is assigned the true value. Then sort the variables in the
decreasing order of this number. The 2SAT solutions
t o the boolean formula are iterated in the descending
order of variable assignments, that is, first the true
value is tried for a variable and them, if backtracking
makes it necessary, the false value is tried.
(BO) Backward Cone with zero: This is similar to
the above except that the sorting key for variables is
the number of nodes in the implication graph that are
forced t o be false when the false value is assigned to
the variable. Further, the 2SAT solutions are iterated
in the ascending order of variable assignments.
(FO) Forward Cone with zero: The variables are
sorted as in Fl but assigned in the ascending order.
( B l ) Backward Cone with one: The variables are
sorted as in BO but assigned in the descending order.

2

We implemented a static partitioning algorithm on Sequent and found that (a) the processor utilization was
quite low, and (b) the parallel implementation showed
only marginal performance improvement over the sequential implementation [6]. These results led us t o
consider a dynamic partitioning of the search space.

2.1

Dynamic Partitioning

In the parallel test generation phase, each processor is given a complete copy of the implication graph
and the set of SCNF clauses so that they can proceed
independently to find a solution. When one processor
finds a solution it stops all the other processors. When
all the processors exhaust their search space without
a solution, the fault is declared t o be redundant.
Amongst the many possible alternatives for dynamic allocation of search subspaces to processors,
we chose a greedy distributed algorithm that tries t o
minimize communication during scheduling of a new
process t o an idle processor. All processors work completely asynchronously and no processor works a s a
centralized scheduler. Whenever a processor becomes
free, it gets work dynamically from one of the busy
processors. The background information in the next
paragraph is essential to understanding the details of
how this is accomplished.
In the test generation algorithm, a binary direction variable (dir) can be either “Forward” or “Backward)). In the Forward direction the algorithm guarantees that the current variable assignments are consistent with the SCNF clauses and the next yet-to-be
assigned variable is chosen for assignment. The Backward direction, on the other hand, results from falsification of the formula by the current assignments; the
next step of the algorithm is t o backtrack and undo
the most recent variable binding. A choice poznt is a
variable that has been bound t o a value in the forward
direction and the alternate value is yet to be tried.
In our parallel implementation, when a processor is
in the forward direction and finds another processor
free, it splits work at the first (most recently) available choice point. It passes the partial assignment list
(upto the choice point) to the new processor and readjusts its search space to reflect this reduced work (see
Figure 2.1). Tlhis greedy work splitting is chosen in order to keep thle message transferred between the processors minimal. The free processor, on receiving the
work, initializes its implication graph with the partial
assignment list and assigns the opposite (alternate)
value to the choice point. Then the free processor carries out the implications of these assignments t o recreate the starting state of the implication graph. Since

Parallel Test Generation

Three broad alternatives are available for parallelizing a test generation algorithm: algorithmic partitioning, fault partitioning, and search-space partitioning. We evaluated each alternative for parallelizing
the boolean satisfiability algorithm and chose thle last
alternative. A detailed justification for this choice can
be found elsewhere [B].
In search space partitioning multiple processors divide the space of input vectors amongst themselves in
order to find a test for a given fault [7, 11. An important design issue in search space partitioning is the
choice of the target fault set. Because of the overhead
involved in allocating disjoint parts of the search space
t o available processors and coordinating their results,
search-space partitioning is not cost-effective for the
easy-to-detect (ETD) faults. This is particularly true
for the boolean satisfiability algorithm which was proposed for only hard-to-detect (HTD) faults even on
a uniprocessor [3]. For this reason, we report the results of our parallel implementation only for the HDT
faults.
Another important design issue is whether to use a
static or a dynamic partitioning of the search space.

117

Proc P

0

Pmc P

A processor is not allowed to split its work if it
has less than certain threshold number of nodes
to be assigned.

All processors work independently, giving and taking
work from each other until a processor comes up with
a test or all of them exhaust their search space.

3

0Subtree assign& lo another pmossor

e

0 Sublrsa exhaust4 by pmeesaor P

(7...
:s

Split Node

Figure 2: The dynamic sharing of work between
processors.
there is no backtracking involved in this initialization,
the time required is minimal: in the worst case it is
proportional to the product of the number of assignments and the size of the implication graph. After
initialization, a processor starts exploring its search
space looking for a test.
In comparing the above scheme from that proposed
by Patil and Banerjee [8] there are three major differences:
(a) Their scheme relies on a centralized scheduling
of work compared with the distributed scheduling used
here. When a large number of processors are involved,
the scheduler can become a bottleneck in centralized
scheduling.
(b) The search space is split in their scheme between two processors by assignment of alternate choice
points in the search tree to each processor. Thus a
much more complex data structure needs to be maintained for the search tree by each processor.
(c) In their scheme individual processors do not
have any control on when the search space is split between processors. The scheduler does this whenever
there is an idle processor available. We use a distributed control hence it is easy to incorporate strategies to minimize overhead of starting a new process.
The current implementation includes the following two
optimizations in addition to the basic search-space
splitting scheme outlined above:
0

PERFORMANCE ANALYSIS

It is assumed that test generation for a fault is
aborted if it takes longer than a predefined timeout
limit. We consider speedup and f a u l t coverage as the
measures of performance. Our analysis uses a probabilistic model to predict the performance of a parallel
test generation algorithm on the HTD faults. It is
based on the assumption that each HTD fault is independently targeted for test generation (which agrees
with our implementation). However, it can be adapted
to the case when faults are dropped by fault simulation.

3.1

Uniprocessor Test Generation Time

We assume a long-tail distribution for the detection times of HTD faults in a circuit for a given test
generation algorithm. It is characterized by values so
removed f r o m the m e a n , t h e median, and other 'typical
indicators of location' that they do not s e e m t o be generated by the s a m e m e c h a n i s m as t h e values near the
median [5]. Those with experience in running test generation programs will recognize this to be commonly
the case with detection times of HTD faults.
Let F l ( t ) represent the probability of a randomly
chosen HTD fault having a detection time less than t .
In other words, F l ( t ) can be considered as the fraction
of the HTD faults having a detection time less than t .
According to the long-tail distribution:

F l ( t ) = 1-

1
(1 t)"S

+

t > o , as>O

Here, the parameter a s may be regarded as an
algorithm-specific testability measure of the circuit.
The density function f l ( t ) is given by the derivative
of the the distribution function F l ( t )

A processor is not allowed to split its work until it has expended a certain threshold value of
processing time between work splits.

Let X be the timeout value. Now, the average test
generation time per fault comprises of two components:

118

1. the weighted sum of the time spent on the €aults
detected,
t.fi(t)dt, and

For a given fault coverage FC, the timeout value
required in the sequential case, X I can be derived from
Equations (6) and (7). Then substituting these values
of timeouts in Equations (3) and (5) we can get an
equation for speedup for a given fault coverage.

2. the weighted sum of the time spent on the faults
aborted after the timeout value X ,

x (Jx” fl(t)dt) = x (1 - s,”f W t )

4

Hence, the average test generation time Tl(X), after
simplification, is given by

+

1 - (1 X)ffs-l
T1(x) = (1 - at.s)(l+ X)ffs-l

We implemented the parallel algorithm and evaluated its performance on the ISCAS-85 benchmark
circuits. In this section we present uniprocessor and
6-Processor results of test generation for HTD faults
based on our implementation. The HTD faults were
obtained by running Podem on a uniprocessor with
a small timeout limit. For the circuit C880, all the
faults were detected in preprocessing only hence this
circuit was not included in our evaluation.
All results were obtained on the Sequent Symmetry
system with the Intel 80386 processor running at 16
MHz. First, a sequential version of the boolean satisfiability algorithm was implemented t o obtain data
for the uniprocessor performance. To calibrate the
performance of this implementation we compared it
against the Nemesis data [3] and found that, after accounting for the machine differences, our timings were
quite comparable t o the Nemesis system.
We further improved the performance of our sequential version before comparing it against the parallel implementation. This was achieved by trying the
four heuristics mentioned in Section 1 serially, each
with a timeout value of 1 sec. The heuristics were
tried in the following order: F1, B1, BO, and FO. As
expected, this implementation was able t o achieve a
higher fault coverage a t the expense of longer CPU
times.
The results of the sequential and parallel implementations with the composite heuristics, are shown in
Table 1. We report only the times taken for satisfying
the CNF formula; the times for parsing the circuit and
extracting the boolean formula are not reported since
they are incurred only once for each circuit.
For the parallel implementation, each processor ran
composite heuristics with the same timeout value as
the sequential implementation. Perfect fault coverage
was achieved in all cases for the parallel implementation and the speedup values ranged from 1.29 to 12.34.
As is evident from the table, the sequential implementation needed t o rely on the multiple heuristics lot
more often and! could not achieve perfect fault coverage in some cases. This fact is essentially responsible
for the superlinear speedup achieved in some cases.

(3)

This equation gives the average test generation time
per fault on a uniprocessor for a timeout value X.

3.2

N-Processor Test Generation Time

If we have N independent processors working on disjoint subspaces then we could expect the probability of
a random HTD fault having a detection time greater
But since the processors are
than t t o be &.
not completely independent and there are overheads
involved in parallelizing, the actual measure of parallelism is not N but a fraction of N. The fraction of
HTD faults having a detection time less than t , with
N processors is given by

Fj~(t)
= 1where a p is the algorithm dependent testability measure similar t o as. Following a derivation similar to
that of Equation (3), we get
TN(x) =

1 - (1 + x)ffp-l
(1 - a p ) ( l + X)-l

(5)

where T N ( X )is the average test generation time per
fault on the N-processor with a timeout value of X .

3.3

Performance measures

For a given value of timeout X, the speedup can be
computed immediately as the ratio of T l ( X ) (Equation 3) and T N ( X )(Equation 5).
The fault coverages in the two cases are obtained
from Equations (1) and (4).

1
F I ( X ) = 1 - -(1 X>ffs

+

FN(X) = 1-

(1

+

1
X)-

RESULTS

(6)
(7)

119

Table 1: Sequential/Parallel Performance on Sequent with Composite Heuristics

action on Computer-Aided Design, vol. 9, pp. 313322, March 1990.

Additional experiments were carried out to validate
our performance analysis model in Section 3. Because
of space limitation we omit the details here and only
describe below the results briefly (interested reader
may want to refer to the report [SI).
The unknown parameters of our model, as and cyp,
relate to the testability of the circuit for the sequential
and parallel implementations respectively. These were
estimated by nonlinear regression analysis on data obtained by running test generation on a sample of HTD
faults. ‘The fault coverage and the speedup projections
(for a fixed timeout) from the model thus characterized were found to be close to the experimental value.
The model was also used to estimate speedup for a
fixed fault coverage.

5

[a] H.

B. Min and W. A. Rogers, “Search strategy switching: An alternative to increased backtracking,” in Proc. International Test Conference,
pp. 803-811, August 1989.

[3] T. Larrabee, “Test pattern generation using
boolean satisfiability,” IEEE Transaction on
Computer-Aided Design, vol. 11, pp. 4-15, January 1992.
[4] S. J . Chandra and J . H . Patel, “Test generation in
a parallel processing enviornment,” in Proc. International Conference on Computer Design: VLSI
in Computers and Processors, pp. 11-14, IEEE CS
Press, 1988.

CONCLUSION

[5] B. Mandelbrot, Mathematical Explorations in Behavioral Sciences, ch. The class of long-tailed probability distributions and the empirical distributions of city sizes, pp. 322-332. Homewood, Ill:
Richard D. Irwin Inc. and the Dorsey Press, 1965.

Our processor scheduling algorithm involves no central processor and requires minimal amount of interprocessor communication. For this reason, we believe,
the results reported here should apply equally well
to loosely coupled scalable MIMD architectures that
are increasingly becoming available. Since our parallel scheme is concerned only with search space partitioning and processor scheduling, it requires only an
incremental effort over that required for implementing the boolean satisfiability algorithm on a standard
workst at ion.

[6] S. Venkatraman, S. Seth, and P. Agrawal, “Parallel test pattern generation using boolean satisfiability,” Technical Report UNL-CSE-92-024, CSE
Department, Univ. of Nebraska-Lincoln, 1992.

[7] A. Motohara, K. Nshimura, H. Fujiwara, and
I. Shirakawa, “A parallel scheme for test pattern
generation,” in International Conference on Computer Aided Design, pp. 156-159, November 1986.
[8] S. Patil and P. Banerjee, “A parallel branch and
bound approach to test generation,” in Proc. 26th
Design Automation Conference, pp. 339-345, June
1989.

References
[1] S. Patil and P. Banerjee, “A parallel branch and
bound algorithm for test generation,” IEEE Trans-

120

