Exploring linear speedup in parallel ATPG through special topology by Shi, Zhimin




1+1 NalionalUbrary
"'''''''''''~~~B'anch
~~Slf~
NOTICE
Bibliolheque nalioll..'1c
duCan<1da
D!rCdion des acquisiti:"OlS CI
des services llibliographQucs
...~ ........
~JAcw.:u.ol
AVIS
The quality of this microform is
heavily dependent upon the
quality of the original thesis
submitted for microfilming.
Every effort has been made to
ensure the highest quality of
reproduction possible.
If pages are missing, contact the
university which granted the
degree.
Some pages may have indistinct
print especially if the original
pages were typed with a poor
typewriter ribbon or if the
university sent us an inferior
photocopy.
Reproduction in full or in part of
this microform is governed by
the Canadian Copyright Act,
R.S.C. 1970, c. C-30, and
sUbsequent amendments.
Canada
La qualite de cette microforme
depend grandament de la qualite
de la these soumisa au
microfilmage. Nous avons tout
fait pour assurer une qualile
superieure de reproduction.
S'il manque des pages, veuitlez
communiquer avec I'universite
qui a confare Ie grade.
La qualite d'impression de
certaines pages peut laisser a
desirer, surtout si les pages
originales ont ete
dactylographil~es a I'aide d'un
ruban use ou si I'universite nous
a fait parvenir une photocopie de
qualite inferieure.
La reproduction, meme partie tie,
de cette microforme est soumise
a la Loi canadienne sur Ie droit
d'auteur, SAC 1970, c. C-30, et
ses amendements subsequents.
Exploring Linear Speedup
in Parallel ATPG
Through Special Topology
By
Zhim\n Shi, BSc,
Ii thesis submitted to the School of Gl'uduale
Stllciies in partial fulfillment of the
I'cquiremcnls for the degree of
Master of Science
Department of Computer Science
Memorial University of Newfoundland
March 12, 1994
St. Juhn's Newfoundlnnd Canada
11+1 Nalionallibrary01 Canada Biblioll~quenahollalcduCanada
Acquisitions and Direction des acqu,sihons 01
Bibliographic Services Blanch des servk:es 1JI11hoo,1raplliques
395 WoIlIngIOn Slrool 395.II.CWCfj.'01On
fl:l~~~ano ~.r~~OnI(\/"'1
The author has granted an
irrevocable non-exclusive licence
allowing the National Library of
Canada to reproduce, loan,
distribute or sell copies of
his/her thesis by any means and
in any form or format, making
this thesis available to interested
persons.
The author retains ownership of
the copyright in his/her thesis.
Neither the thesIs nor substantial
extracts from it may be printed or
otherwise reproduced without
his/her permission.
l'auteur a accorde une licence
irrevocable et non exclusive
permettant it la Bibliotheque
"alionale du Canada de
reproduire, preter, distribuer au
vendre des copies de sa these
de quelque maniere et sous
quelque forme que ce soit pour
mettre des exemplaires de cetle
these a la disposition des
personnes interessees.
L'auteur conserve la propril~te du
droit d'auteur qui protege sa
these. NI la these ni des extraits
substantiels de celle-ci ne
doivent etre imprimes ou
autrement reproduits sans son
autorisation.
ISBN 0-315-91620-6
Canada
Abstract
In digital syste,ll design, test 11llt.tCfU gcncflltioll rCllllir...~ 11 t'()n~id,'rahl,' 1l1tl'''1lI1
of computing time. Using 11 level-sl'lIsith't' li,'ill} (k~;gll, ksl pntkrn ~'·lh·rHt.i"ll <'an
be confined to the combinational drcuils. [t lHls b''l.'11 siiowli lim! till' I't"hh'lIl
of test pattern generation for combinational circuits is NI'-CtlUlpl,·k. AlthollJ.:h
many excellent algorithms have been d~vclopcd to J:;clwmk 1.<'5t llillkrus, 1,IIl'Y
still do not keep pace with VLSI tech~oloSY. Hcscnrch is oll~oinl-\ in LIlt' ,j"Vt'I"p-
ment of pandle! processing It:chniqucs for tc:<t patkrn g"lIt'ralion, Il1lt 1.lwft, 11;1:<
ueen little research into what kind or topology has !.Ill' grca!.t'st I'"klll,i'll t .. sp,·,>,1
lip test pa-ltern generntion.
In this work, simulation soHwnre W11.5 dcvc1opc{1 for measllr"I1I"IlL"f tl,,' SI....>,llll>,
and three topologies are proposed to explorc ti,e parallc1i~lII for anl,('IIl;lt.i.' 1,"sL paL·
tern generation. Thcse topologies are: llLodilic(1 complek Iliuary Lrt~· (MeliTA),
autonomous modified complete hinary tree (AMenTA), IIml sqllar<~ army st.rll"·
ture (SQARRAY). The empirical resulh for these lOIlOlol;i(·s ~how that a spt~t:ial
topology has the potential capability to speed lip lesl jlatl(~rll p;clwrali'm allll
super-linear speedup can oHen result if an autonomous druclllrt~ is ;,d"plt·(1.
Acknowledgements
I wish h.. CXJlJ'CL~ Illy thanks t.o my supervisor Dr. Paul GiUard for his guidance,
juku."$!, conslrlldivc eriticism and enthusiasm. Without his contribution, it would
h..: inqM)!isilJtc to give this thl'lli. its current quality.
I would like to thank I.hc systems support staff for providing help and assistance
while I COlllludcrl this resenrch.
I am also very grntcfullo !.he Administrative staff who hnve helped in l)n~ way
"t lIUOt1WT in the IJTclmrnlion of this thesis.
In Mldition, I would like to acknowledge the financill.l support received from
till" D~·lliul.lllc/ll of Computer Science and the School of Graduate StudiC$.
Sl'l'l.:ii1llhanks nfC due to Illy fellow r;taduatc students and good friends, alld in
Ilarticular hi Zhcngqi LII, Xu He, Sun Yongmei, and Chen Hao fot their VlIluabJe
c<>lI1l11cnh and usefulsngscstions. I would abo like to thank Dr. Siwei Lu, Patricia
MUt)lhy and Elaille lloone for their help and assist&nce.
II
l'''i.~ /",si.~ i.~ rI, </i,.,lI, d I"
UIfII 11I.'1 11III"f1l1..;
I"r/h,il".';IIIJ/ml'lt/lIIltl"'''"I'1t.lltl/l,l/llhr'''I!/h",,1
II" "'''II"M' fr! III!! uItH'uli"lI
III
Contents
Introduction
I Illl.l'odllcl.ioll
2 COllycnl.iollnl ATPG Algorithms
2.1 Sluck·at. Faull Model & Testing Prohlcm .
2.1.1 Sluck-nl Fault Model ...........•..
2.1.2 Testing Problems .
2.2 D-a1gorithm.
2.3 PODEM Algorithm.
2.'1 "~AN Algoritllm .
:\ '.!'nxollolllY of PnrnJlcl ATPG Algorit.hms
:1.1 ~~null Partitioning.
:1.2 IICllristic PnrnllcliZlltioll
:u Search-space Pnrtilioning.
:1.'1 Functional (nlgorilhmic) Partitioning ..........•.
:Ui Topological Pnrtilioning
·1 Iintel-t.o-det.ect r"\Illts
IV
10
"
16
19
I9
21
23
25
27
29
4.1 Data rrom Experilll('nll .
4.2 Inrerence rrom E"perimenh
II Simulation EuvirolilUcllt
5 Model for MeilsUl·emclIt.
5.1 Modd ror Measurement.
5.2 Parallel Speedup in Test. PaUcrn Generation .
G Algorithms
6.1 Parser Construction.
6.1.1 The Crl\mmnr Rules
6.2 Compiler Driven SiulUll\t.ion
6.3 Checking Trial Test Patterns.
6,4 Heuristics
6.5 Expnnsion or Trilll Test PaUerns
6.6 Deledion or Redundant Faults.
III ATPG and 4 Connected Architecturc
Four COllnected Tnpology
7.1 4 Connected Structure and Examples
7.2 Characteristics or 4 Connected Topology
1.2.1 "CS naturally supports parallel A'I'PG
7.2.2 Isomorphic 4CS systems
v
:11
:13
:t·1
:1-\
;1';
"I
rill
G2
64
(ir,
(ilS
fiH
r,(J
x ATPG Usillg MeHT/\
H.I MeHTA Archit(~rtllrc ,LIllI Parallcl Algorithm
11.1. I Architectllrc and Algorithm
1\.1.2 l~lIIpiricld Rcsults and Analysis
1\.2 AliLollOIrlOllS MeDTA Archilcctllfe
1\.2.1 Architecture.
1i.2.2 A !',Hal1el Algorithm
1i.2.:1 I~m!,iricnl Results and Analysis
!l /\'1'1'(; Usi1lg S<lll:Il'C Army
n.1 S£luarc Army 111111 Hs Par"Uel Algorilhm
!I,I.l S'IlIilrC Array Architecture
!1.1.2 Completeness orS£]Ilare Array.
H.I.:l A Parallel Algorithm
!1.1.'1 ~lllpirical Results and Analysis
IV CoudusioH and Discussion
VI
71
71
81
86
86
87
88
101
List of Tables
2.1 Test Cubes for Justir.t:;.llon Io'irsl
2.2 Tesl Cube5 Cor Pro[lag[~tion first
3.1 SUllImary of Faull PartiLiollillg
3.2 Summary of Heuristic Pilraliclization
3.3 SUllllllary of Search Space Partitioning
3.4 SumnHLry of Algorithmic Partitioning.
3.5 Summary DC Topological Pnrtitionillr; ..
4.1 Slati5tieal Data for Circuits
5.1 A Process of the Proor of a Redundant F'lllllt .
5.2 A Process of the Proof of n Rcdllmlanl F;\ult .
VII
l'.l
:!Ii
"
"
List of Figures
2.1 A Circllit for O-algorithm
2.2 f\ Podcm Example
2.:\ )lu{I.:lrI Search Space Diagram .
2.'1 /\ Fan 1~'I{allllllc
!i. I Vitlual '2 Phn~e Clock
~1.2 Qlle PrOC(:l;oor Searches 16 Elements
f..;1 ,I Proccssot~ Search 16 Elelllents
fJA f\ Circuit willI One Fault .
1i.1 Ineffective Logic Gates
li.2 Diagram for it COlllponent
7.1 'I'he Diagram for Crapll in Example 1
7.2 The Diagram for Graph in Example 2.
7.:i '1'11'0 l~olllorplJic Graphs
7.4 /\ Non"planar of-connected Graph
7.f, A Subdivision of 1\'[,
S.I Compl..t!' Binary T1'1:"c Architecture
Ii.:! r-.lodiliccl Complete Binary Tree Architecture.
VIII
12
14
16
17
35
39
40
41
53
57
67
67
70
71
12
75
77
8.3 Layollt of t\'ICUTA \ISing lI·tnT ,;
8.'1 Spccdup for 4 Rcdundant. F11l11tS ;11 l\ICll'l'A S;.
8.5 Spccdup for an IrrcI!uudant llard~I<H!dl'd Faull. ill MeliTA S;,
8.6 A Pure MCUTA ,'I;
8.7 Autonomous 1\,ICBTA. Si'\
8.8 Spcedup for 4 Rcclumlant. F.mlts ill Al\'1CBTJ\
8.9 Speedup for an Irrcdnmlallt lIard~t(}·dckd I,'aull in AMeliTA
8.10 Scaled Diagram for Figure 8.9
8.11 Experiment With 1 Module
8.12 Experiment With 7 Modules
8.13 EX[lerilllell~ With 15 Mo(lul('j;
9.1 The Symbol for a Processor in SQAI?HAY
9.2 SQARRAY with 9 Processors
9.3 Speedup for 4 RedundlLlll Faults in SQAIl.lI.PY
9.4 Speedup for an Itredundant rault in SQA IUlAY
9.5 Scnled Speedup for the lrrcdUlldalll Faull ill SQ!', RRAY
9.6 Specdup in Complelh<ntary SQARRAY ('1 rct!\11l1lanl [11IlltS) . !J!I
9.7 Speedup in Complcmcntary SQARIti\Y (all irwdundalll f1WIt) lUll
IX
Part!
Introduction
Chapter 1
Introduction
Generating test patterns for testing digital cilcuils i~ OJ. very illlporlaHt asp"d. or
VLSI design. It often consumes a significant [lort,ion of the desigll tim,:. OWillA tel
techniques such as the widely used level-sensitive scan design[;,], lh(~ prnhlclil fIr
lest pattern generation is reduced to the problem of gClIcraliliS t(~st pallcrlls for
combinational circuits. Even this problem hiL~ hf,-'t:n sh"wlI to he Nil "Ollllll,:k[171.
There are two basic approaches to solve the automatic h'st pallern gCIlt:flltilJlI
(ATPG) problem: algorithmic test pallern generation and statistical, Of I ~lldo·
random, tcst pattern generation. [n the algoritlullic approach, a spccifk A'I'PG
algorithm is used to generate", test for each fault ill tlie circuit Most 'lf I.lll~~C
algorithms can be provcd to bc complete; that is, lhey are /,lunmllteetl to lind Il
test for a fault - as long as a test exists. Howevcr, tllis lIlay illvnlvt: sClLrcJlilig
the entire solution space, which is computationally expensive.
Statistical test pattern generation, on the othcr I,arlll, selects test ptlU,:rns
at random, or by using some heuristic, and lIses fault silllulation to ,Idcrllli/lt,
the fa.ults detected by the pattern. Test patterns nre selectell and added to lIle
test set if they detect any previously undetected faults, until sOllie required fnult
covcrage IlIea~lIrc or computation time limit is reached. This method finds teds
for the el~sy-to-deteetfnults quickly but becomes less and I~ss efficient as the easy-
to-detect faults arc removed from the fault list Rnd only the hard-to-deted faults
remain. In many cases, the required fault coverage cannot be a.:hieved without
exc{,ssive computation limes.
An efficient combined method for solving the ATPG problem uses statistical
mctholls to find tests for the easy·to-detect faults on the fault list and switches
to an algorithmic method to find tests for the remaining hard-to-detect faults.
In either the combined or the purely algorithmic method, a significant portion of
the computation time will be spent generating tests for the hard-to-detect faults
n1l:lorithmically. Therefore, finding a method to speed up this process should
reduce the overall computation time considerably.
Much research has gone into increasing the efficiency of algorithms for ATPG.
Ilowever, the overall gp.ins I\chicved through these improvements hOove not kept
(mce with inc.rensing circuit size, and computation time is still excessive. Another
npllroach to reducing computation time is simply to usc a faster machine. Parallel-
processing machines are becoming available (or general usc and are helping to solve
other problems in computer-aided design.
Much research has been done to parallelize the test pattern generation problem.
Most of this work concentrates on how to usc existing multi-processor systems,
snch M the Intel iPSC/2, Network o( Sun workst;;.tions, or the Links-l Z8000
hased systems, to effectively generate test patterns.
Parallel techniques for Al'PG problem can be classified into five major categories[8J:
1. fl\lIlt pa.rt,itioning,
2. heuristic piHalJelization,
3. search-space partitioning,
4. fundional (algorithmic) partitioning,
5. and topological partitioning.
Although some promising results have been shown, much work still rt~ml(ills.
As the development of microelectronics technology progresses, llIassively pow-
erful processors will be used to form special parallel archilecturcs to g,m<~mtt"
test patterns. New architectUl'es for the intercollnection of pro<.:cssors hllve lo he
studied so as to design a very efficient multi.processor system for A'fPG. Thert·-
fore, it is natural to investigate a good interconnection network to speed III' the
ATPG process, when many processor; are available. This thesis diSCllsses this
problem by proposing several special parallel architectures, and examining UI('
empirical results through simulation. These special architectures are the Ill()(lili,~/I
complete binary tree (MCBTA), the auton0ll10llS modified complete hi nary tr/:c
(AMCBTA), and the square array architecture. With these slleci;~1 archit"clurcs,
the parallel algorithms discussed in this work were fouml to achieve lill.:ar, ami
sometimes superlinear speedup. The empirical results abo show that AMUlJTA
is the best one among these special architectures.
This report is arranged as follows: Chapter 2 and chapter :i gjV(~ n survey
of automatic test pattern genera.tion. Chapter 11 snows our experimental results
about the faults. Chapter 5 discusses the model for measuring the PCrrOtrfHlJlCe
f)f a parallel automalic lelit paltern generation syltem. Chapter 6 describes all
algorithms used in our simulation $Oftware to limulate parallel automatic test
paltern generalion 11stems and evaJuate thor performance. Chapter 7 briefly
discusses 4 connected Itrudures and isomorphism of two graphs. Chapter 8 and
!) dcmonstrate our three architectures designed to solve test pattern generation
problcm in parallel. In the conclusion, the results are summarized and some future
work are discussed.
Chapter 2
Conventional ATPG Algorithms
In this chapter, we will review 3 widely used 1l1gorilhms, the l)-alboriUl1l1, the
Podem algorithm, and the Fan algorithm. Before this review, the stuck-at Illodd
and the testing problem are described.
2.1 Stuck-at Fault Model & Testing Problem
This section first introduces the stuck·at fnull nLOfle1 followed by 11 discussion or
the testing problem.
2.1.1 Stuck-at Fault Model
Logic gates are realized by transistors, nc>rmally either hipolar transistors or mcitd
oxide semiconductor ficld-effect transistors (MOSF'~T,or simply MOS). The tech·
nology families based on bipolar transistors are transistor-trllnsistor logic (TTL),
emitter-coupled logic (EeL), and so forth. Some logic families based on MO!WI~T
are p-channel MOSFET (p-MOS), n-channel MOSfET (n-MOS),lllld cOlllpll:I/I(:n-
tary MOSFET (CMOS). Although ECL and TTL art: important for high-specII
applications, their integration sizes are limited hy the heat generated by their
heavy power consumption and by large gate sizes. In contrasl, tile MOS logic
rl~lrlilit'S are well suited for LSi or VLSI, because higher integration can be ob-
tl~in,:J than with hipolnr logic families. Most LSI and VLsr circuits of today are
implcmcnted with MOS.
A jwdl in a circuit is a model at the logic level of the effect of a physical defect
<>[ onc or more of ils components. Faults can be classified as logical or parametric.
.. /".'Iiml jllllli is a ddect that causes the logic function of a circuit element or an
input signal to be changed to some other logic function; a pWYlmrll'ic jl/Il/l alters
the magnitude of n circuit parameter, causing a change in some factor such as
drcuit speed, current, or voltage.
Circuit malfunctions associated with timing are due mainly to circuit delays.
Those faults lhat relate to circuit delays such as slow gates arc called ddn!J jlllliis.
Usually, delay faults only affect the timing operation of the circuit, which may
GlUM: hal,iLrds or critical races.
I~allib tl1nt are present in some intervals of time nnd absent in others are
illl"l'will"1I1 jllll/l.~. Faults that are always present and do not appear, disappear,
or change their nature during testing are called IICl"/l/nIlPII/. jlllllJ.~ or solill faull.s.
Although many intermittent falllb eventua.1ly become solid, the early detection
of intermittent faults is very important to the reliable operation of a circuit.
However, there are no reliable means of detecting their occurrence, since such a
fault llIay disappear when test is applied. In this thesis, we will consider mainly
logical and solid faults.
Wilen an input or output of II. logic gate is always a fixed voltage, either high
or low. it is said to !Lave a HI!I('~'_(/I foull. For positive logic, if a node is low, it is
said to be .~11I'·~··1I1 0; when it is always a high voltage, it is said to be .~tllck-ol /.
The most popular fault modd used ill gate Icvel simulation is till' shl\~k·a~
fault. The stuck·at-fault model was originally lIsed as il means of desl"rihill~
faults in early electromagnetic relay CCllll\Hi1.cTS. 1I0wI:vcr, th modl,1 WitS als'l
found to be applicable to diode transistor logic (DTLlj this led to its lISl' ill small
scale integration (SST) and mediulIl scale integmtion (MSI) faulLlI1odclilig. Thus
the model became a standard widely used in the integm1.cll circuit illdllslryl:ll].
When Roth [32] developed ~he D-algori~hlll in 1966, Lo automatically gl'llcrlLLe L,,~t
sets based on the stuck·at·fault model, its continuation wa..~ a~~llrcd. lIoWI'Vl'T, as
failure modes in modern VLSI circuits nre better undl~rslood, its applicalJilily
and usefulness are being challenged [34). The CMOS stuck 0P'~l1 raulL mudd is
an example.
2.1.2 Testing Problems
To ensure the proper operation of a. system, we must be (.hle til dl,tcct II r"IlIL wh"ll
one has occurred and to locate it or isolate itto a specific C01ll11011O:lIt - prercrahly
an easily replaceable O'1e. The former procedure is called fllllil dd..,./iO/J, (ulIl LlIl'
latter IS called flllli/ Im'difJlI, fllllil. i,~fJl(//ilJ", or Jllldl rliIIHIIII,~i,~. TIII:se tasks arl'
accomplished with tests, A II'HI is a procedure to detect and/or IOClLte r"uILs.
Tests are categorized as fault·detection tests or fault diagnostic te~ts. A fault-
detection test tells only whether a circuit is faul~y or fau[L.frt.-e; i~ tells IInlllillJ;
about the identity of a fault if one is present. A fault diagnostic lesL pmvill':N tile
location !lnd the typeofa fault a.nd other information. TIte qUllutity ofinfnrmaLioll
provided is called the dillr/'W.,tic 1',~.WII,tl.ifl/l of the testj a faull·ddeclinJl tCNt i~ a
fault diagnostic test of zero diagnostic resolution. If a ~est not Duly detects a f(~llll
IJilt :.lso locates the faull, it is ll. fault diagnostic: test of higl. diagnostic resolution.
Logic circuits arc tested by applying a sequence of input l":llterns that produce
erroneous responscs when faults are present and then comparing tht: responses
with the correct (expected) ones. Such an input pattern used in testing is called
it It-.,·I fIIt/hl'/l. In general, Il. test for a logic circuit consists of many tes'. patterns.
'L'lley arc rcferrc<.l to as a /(h'l srf or lesf I<rqlll:/Ier. The latter term, which means
a series of test patterns, is used if the test patterns must be applied in a specific
orller. Test patterns, together with the output responses, are sometimes called
It .~I rill/II.
If there exists only one fault in n circuit, it is said to exhibit a .• i".l/h· fllllil. If
there exist two or more faulls at the same time, then the circuit exhibits /lllIllif/1t<
jllllll.•. Here, we are only concerned with a single fault in a circuit. For a circuit
willi ,; lines, there are f.t most 2t~ possible single stuck-at faults since each line
has nJ most 2 possible faults: stuck-d-O and stuck-at-l.
TI,c tcsting of logic circuits is performed in two main stages: generll.ting test
pntterns for a circuit under test (the le.•1 gC/H:l'llli(}1I stage) and applying the test
p:lltet1ls to the circuit (the 1,..•1 lili//lief/filll/ stage). Thus, the generation of test
pattcms is important; however, it is very difficult for large circuits, so most of
the efforL of tlle past 20 years in this field went into research and development of
dlicient and economical test generation procedures.
The quality of a test (a set or a sequence of test patterns) depends much on
the fault coverage as well as the size or length of the test. The jf/Ill/. I'Ovcmgc (or
/('../ 1·t1(·ITI/!/<,) of a test is the fraction of faults that can be detected or located
within the circuit under test. The fault coverage of a given test is determined by
a process called fllll1l .~;lIIullll;r"l, ill which every given t~st patkTll is (tlll,lie,l t" a
fault-free circuit and to ell-ell of the given faulty circuits, l'n('.h cirrllit hduwior is
simulated, and each circuit response is analyzed to lind what fanlh (Irl' dd,~\·1.<·d
by the test pattern. Fault simulation is also IIsed 10 prodw:e fUlril ,lid;fI/",rir.~,
in which the information needed to identify 1\ faully clelllent or COIII!l(lIll'lI~ is
gathered.
2.2 D-algorithm
The first algorithm for ATPG thaL was proved COI:llllcte is the lJ-algoriUI1I1 in-
troduced by Roth in 1966[41. The D-algorithm includes II notation lwd u mlclllus
with which a single st.uck-at fault can be detected at a node ill th(' circuit and
propagated to a primary output of the circuit. This algorithm lISI~ a five-vllhll,rJ
logic, which consists of the logic \'alue 0 ami 1, all unknown value X, ;lnd twu
additional values IJ and 7), A f) value signifies a vnlue of 1 in lIle Rood circlIilllnrl
o in the faulty circuit, and a 1j value representll a value of 0 in the good r.ircuit
and 1 in the faulty circuit,
Each gate in the circuit has two D-cubes associated wi til it, tire 1"';/11;1;111' /}-
I'ube of a fault (pdcf) and a III1J/!/I!lllli/J1l 1)-/'lIb,; (puc), A pdcf is the ~el of illlJl1ts
that produces an error signal on the output of that gde if it cunt;lins allY fault
of the particular type_ A pdc specifies the input. values necessary to prnpagatt: all
error signal on an input of a gale to the output.
The D-algorithm's basic operation is the repeated interseclion of the I)-cnlles
necessary to perform the tasks required to test. for a specific frmlt. Thc~e tasks
consist of three processes: fault sensitization, fault propagation, and jllslificdit:lll.
10
Hudt ~"nNiti~lLtion is the process by which the circuit node presumeed to exhibit
tile fault is made to prodlice an erroneous value as a result of the fault. Sensiti-
lIatiou is 1LccAllllplished hy specifying an input combination for the circuit element
containin).; the fault, lIsing the pdcf's, such thal the node presumed to exhibit the
fault holds the cornplemel:t of lhe fault value.
Thl' liNt of circuit clements closest to the primary outputs that have a J) or
Ti 011 thc output is called the IJ /l'IIl/li, 1'. The objective of fault propagation is to
allwuwI' til(' /) frontier to the primary outputs. This process sensitizf's all possible
pllthN from th~ fault site to the primary outputs. This multiple-path senr.itization
is lIcccssary for the D.algoritlllll to guarantee completeness.
Durillg fault sensitization and fault propagation, certain circuit nodes arc re-
Iluired to take on specific values. Establishing tbis value, or goal, on the node by
pl;".:illg values 011 tlle primary inputs is called jw,'ijil."fl'ioll. The primiLry inputs
thaI. C:L1l h(~ used to justify a goal ate usually determined by backtracking through
the circuit topology from the node in question to the primary inputs. A \'~ll1e
is choscn for olle of these inputs, and a forward simulation-like process, called
furw;ud implication, is performed tl' <ee if this input assignment is consistent
with ~lIti~fying the goal. If it is not, a different value is chosen and the process
i~ wpenlcd. A test is finally generated when the fault sensitized, a path for the
fault to hc observed at the primary Olltputs in sensitized, and all of the goals arc
j\l~tified.
As an example of the D-algorithm, consider tbe circuit under test shown in
Figure 2.1. AssulIIe that II. test is being generated for a stuck-at 1 fault on node .I.
The !irst step is 1,0 fill in an initial test cube with a 7J on node .1, as shown in test
11
Figure 2.1: !I. Circuit for D-ldgorithlll
Test Cube A B C D E F G II I
0
1 I S
2 1 1 I S
3 1 1 0 I 0
4 I I 0 0 I 0 0
5 I 1 0 S 0 0 1 0 0
7i
"if
1]
TJ
Ii IJ
7) TJ
Table 2.1: Test Cubes for .Iustilicatioll I~irst
cube 0 in Table 2.1. This value is then sensitilw,l using a pdcr fur lIle NOll. gaLe
(test cube 1). Next, all values implied on other circllilliodes hy Ull~ jJrl!vilJlIS step
are filled in (test cube 2). The ned step is to advance the j} rrollti'~r by sdtiuJ;
node II· to O. This implies values on nodes IIUld I" (test cube :1). TIle IJ Wtlll(! Oil
node 1 in turn implies 0 values on nodes F ami /I (test cuhe 4). '1'1,,: lillal sll:p is
to Justify the 0 value on node /I by setting illput r: to a U value (test Cllh(~ a).
If the values shown in test cube Ib, Tahle 2.2, were chosen WilCl1 sclceti/ll; lIl(:
12
I3CDEFGHIJKL
7J
x 7J
x 7J 1 1
Table 2.2: Test Cubes for Propagation First
]J(ld for the illit;nl faull, the implications of that clloice would have caused a test
to hI' imllo~sihle, ns test cube 2b shows. This problem would have caused the
algorithm lo hack track to the last point a choice was made, pick the alternate
choke, 1L1HlllrOCeed from t.here. In the D.a1gorithm, choices are available at many
illlt~rrlilJ HOlies in the circuit, and more than two choices can be present if there
arc gates in the circuit with more thnn two inputs. This fact greatly increases the
~i~,t, of the algorithm's search space and ma.kes backtracking more complex. The
[)'illgorithl1l cnn he implemented as recursive routine that pushes or pops test
cllhc~ off a test cube stack as rcquired for forward progress or backtracking.
Note that justificn.lion of two separate node assignments ca.nnot be undertaken
sillll1ltalleollsly, because if an inconsistency occurs, it will not be possible to de-
termine which \llli£lUe assigr.ment caused it. Also, the original n.aJgoritlulI does
not slwcify which process - fault sensitization or fault propagation - is to be
ull£ll'rlnkcl1 fint or whether j\lstificalioll is to be done in intermediate steps or
dderrcd until the process euds. These details are left to the implementation. Un-
rurl.lllLalcly, the eflic;ency with which a test can be generated for a specific fault
depends heavily on tbe order of these operations, and the most efficient order is
£11,tt>r11lillt'£! by the circuit topology. For example, if in generating a test for .1
13
Figure 2.2: A PmlCllI l~xlL1tlple
stuck-at-l in 1I1e circuit of I~igure 2.l, fault prOIJlLglltioll is !I01l<' ]wfot(· wh'Lill~
a. pdcf, the unique value of 0 required on nOlle I will he disl:ovt'r(·{1 ILIIII till' plkr
for the faulty gate will be fixed. Test generation can thell procc"ll with"lIt lIlc'
possibility of backtracking. Finding the most dlicicllt order for lJpcmti"ns 111111
detecling inconsistencies early in the process is tile roclls or IIlusl sulosc:cllI(:I11.ly
developed algorithms.
2.3 PODEM Algorithm
The Podem (Palh-Oriented DEcision-Making) algorithm is an Illtmllptlo r,:,hw,'
the si~e of the solution space tha.t must he searched. Ilct.:ll11 lhllt tllc, ]).111g"rilll1l1
tries to assign a value to ea.ch circuit node. Conflicts can arise wilcli values a~si.L:r ..:d
to different nodes cannot all be justified. Podem tries to c1illliw~lt: tllI:S" IlidclclI
conflicts by assigning valn..s only to the primary inputs.
The algorithm begins by trying to justify the 1J ur 1J at tht: 1It"1,, ulllh:r lest,
J4
sirnilM to the D-nlgorithrn. This justification is done by assignins values to pri-
mary iUJlllts th:Lt affect the node in qucstion. These primary inputs are asain
rO')lJlld hy lmcktmcking through the circuit topology. When an input assignment
is mad(~, a simulation-like process, r.alled forward implication, is run to find all
of the no(le vl~lues implied by the assignment. If this new input assignment is
in\:ollljllltibic with tile goal, the complementary value is tried. If the complemen-
tary value tLSsignlllcnl also conflicts, the algorithm backtracks efficiently to the
previolls input assignment. This process results in an orderly search methodology
thllt will implicitly search the entire input space.
This search methodology can be represented by a binary search tree, as Figure
2.:1 shows, After the value at the faulty node is justified, subsequent objectives are
sd lip to propagate the f) frontier along a path or paths to some primary output.
The exact order ill whkh this process occurs is again implementation dependenl,
The important point is thl\t this strategy of assigning values only to primary
irqlllls orders the search space. This procedure lets the search methodology prune
the search tree implicitly and increase efficiency.
Consider, for example, the Figure 2.2, a representation of the binary search
space for II .f stuck-at-O fault in the circuit under t.est. This search space was
"oll~t,rnctcd nsing the simple heuristic of always first t.rying the logic value 1 on a
primary input. Since assignment of the value 1 for node 13 in the left-hand subtree
is illCollsislenl, nil solutions that live below IJ in that part of the solution space
"an Ill' pruned from t.he search tree, This ordering of the search space also allows
it 10 bt' dividec1 into disjoint sections so that work on the different sections can
Vro"C'l-,l simultaneollsly. Note that the processor must have R(;(;eSS to the entire
15
lnconsistenl
Inconsistent
[nconsi,t~llt
IllcoMistent
Figure 2.3: Podem Search Space Diagram
circuit topology and that only one goal lIIay lie juslifil'll at a tinw, illi with UIC
D-algorithm_
2.4 FAN Algorithm
The Fan algorithm is similar to Podern but. includeli improvcment,; to illuca,;,·
its cfficiency. The major goal of Fan is to retluce the lI11lllbcr of IHu:ktracks in
the search tree. This is accomplished using scvcral tccllniques, ilidudilll-\ lb.·
consideration of fan-out branches in the circuit as a special C;L~C, hellce the Imllll'
Fan.
To cxamine this concept, we must define several terillS. A In t lim' is a cir,:uit
node that has no predecessors th..l are part of a fall-oul loop. As such, fwdiliCli
may have a uniquely assigned value. In Figure 2.1, lines A througlL / lIrt: exalllph~s
of freelines. A IJlIl/lllllit", is the opposite or a freclinc. Nodes .1 and 1,- arc h'JIIurl
lines and cannot have uniquc (independent) values assigned to them. I/f'IIdlil/l"~
16
Figure 2.4: A Fan Example
nrc frcclincs that drive a gate that is part of a reeonvergent fan-out loop. Node
f in the figure is a headline. By definition, headlines can also be assigned values
arbitrarily bemuse they arc freelines and can always be independently justified.
They can therefore be treated i\.S primary inputs in the justification process.
1clentifictt.tion of these nodes makes reconvergent fan-out loops much easier to
hanelle. Once a test is found by treo.ting headlines as primary input, the values
on ~helll can be justified at the end of the test generation process. Fan also uses
II. 11Lultiplc-backlrace procedure for reconvergenl fan-out branches buried in the
circuit 1,0 rcduce lhe number of backtracks that must be made in the search. For
example, if 1\ certain vallie is necessary nt node I, in the figure, and this circuit
is part of somc larger circuit, a single backtrace could be made along the path
I. -> .I .... (,' -> ..t, IJ. Values for inputs II and IJ could be chosen so thal the goal
i~ sntisficd with a unique value on nodes 1 and A'. Then if the value on A' cannot
be achieved with the value chosen for I, a significant amount of backtracking in
17
the search tree cnn result. First, with a multipl.... lmcktrno;:c both the I, -, .I ._, 1
and L -+ (,. -+ 1 paths can be IIsed to detcrmine thc vnluc nccdc,l at ( to satisfy
the goal. This value would then bc sel as n rcquirement for the jll~tilkati(ln tlf
the value nt node L This proccss can increasc the l~all al~orith11l'~ t'lliril~lw)·
significantl)· in n circuit with numcrous buried reconv<,rg<'l1t fall-oul lo()p~.
Three conventional automatic test pattern gencration algorithlll~ Wl'r" di~·
cussed. They can often be used to generate tcst pnllcrns for very hard·to·lh·h·d
faults. Parallelization is one of the mclhods used to specd "I' lhis prllccIluu·.
18
Chapter 3
Taxonomy of Parallel ATPG
Algorithms
As mentioned earlier, techniques to paraUelize ATPG can he classified into five
major ca~egories according to Robert's contribution in (81: 1) fault partitioning; 2)
heuristic paralldization; 3) search-space partitioning; 4) functional (algorithmic)
partitioning; nnd 5) topological partitioning. Tables 3.1, 3.2, 3.3,3.4 and 3.5 in
the following section are laken from this reference. The following sect.ions will
give all overview for each category, and present its advantages and disadvantages,
1I1e type of parallel machine it has been implemented on, and a brief summo.ry of
the reporled results.
3.1 Fault Partitioning
Fault parlit!("lT'llng is the simplest way to parallelize the ATPG problem. It first
divides the fnult set'" into several subsets I~,; = 1, ",11/" F = U:~IFi, and
/.; n I') = ~ where i:f j. Every processor Pi is assigned to generate test patterns
to the fnult set V;. This scheme is called "ltllil: frtllll pfll'1ilitJII;llg.
Stntic fnult pnrtitioning results in each processor having a completely separa.te
19
Table 3.1: Summary of Fault Partitioning
Resea.cheu
Srinivas
Patil, Prith
Banerjee
Hideo Fuji_
wara, To-
moo Inolle
Susheel
J.Ch"nd,a,
Janak
II.Patel
Parallel
Machine
Iulel
iPSCj2
Network
orSunJ!50
workst1l.tions
Nelwork
of Sun Jj50
workstations
Scalability
Nea"y liuear speedup
{oru[lto8p'l,)(essors;
speedup f1l.115 off after
Il,at,
NelUly linear spc<:dup for
up 1o 5 processors.
Uniforlll partitioning:
nearlylillear speedup fQr
5pro"essors.
DelnOl'dmted th"t ATI'G wilh
f..ult sillllllalioll i. 1I10rt' ellid".,t
than A'1'I'CIlI011e,ev,·" illl''Ulll-
lelenvirOllmCIlI,llnnltl,.utiti,,",
ingsllOw'Sootlsl,<'C,llll'r<JI"I,t"
8p.or"ssors.
Verif!n.IIlIlIlJysis"r"I'tillnlll\mili
size for f,,,,Jt-p,ulitiolliug .y.t"'ll
with ,·xl}clilt1e"t.~1 rr'\1Ilt5.
Irltrodllccd cOlltepl of 1I"lIris-
tie l)amlldi~atiOJl ami deveh'l,cd
two mdhods llllifon" and
coucunel,l he,"islks. Dem"n-
strntedlinirorllll'arliliollinsl'w,
dllcesbeHer.)'eCllll[l.
L-_-l-_-L -----L _
20
tnllk in that it performs the entire test generation procedure on its own. Uthe fault
lIet is divided carefully, each processor wiU have roughly the same amount of work
lllid will fiuidl in aooutthe same time. If this is the case, the communication cost
i~ low. In prnctit:e, it i~ very difficult to get such a par~ition prior to executing the
APTG algorit.hms, so d]nnmic scheduling is used. In dynamic scheduling, each
procc~sor requests a flew fallil from a master scheduler when it is idle. Dynamic
IIclledulilig increases the communiclI.tions ovp.rhclI.d because of requests from idle
proccssors for new faults.
In Nt'PO, one lest pattern t:an test several faults at the same tir.te. This
irnpliell t.hat the time to generate ted patterns for those other faults call be saved
if a lest paHern is found for one fault and at the same time the test pattern can
deLect tho~e othcr faults. In static fault partitioning, if one test paHern for fault
Ii E ,.; is fOllnd and after fault simulation, Ii" .. ,lit are found to be detectable
hy the SlUlie test pattern, Ii" .. ,Ii. should be removed from ,.'. If Ii, E Fi , it
is easy to remove it since the proces~or itself can do without any communication
with other processors. If ii, E "/ (i #- I), communication between Pi and p,
is ncccs~llry to remove!.', from I-i. This communication increases the parallel
8y~lellL 's communication overhead and reduces the possible speedup.
Table :.1.1 sUlnnHLries the current research work using fault partitioning. The
be~t scalability to date is 8 processors.
3.2 Heuristic Parallelization
As we know heuristics can be used to guide the algorithm to generate te5t patterns.
Ih'~carch has in<.licllted that many heuristics will produce a test for a given fault
21
within some compulation time limit wIlen olher heuristie~ haw failed to d.~ s,'lHI.
We can use complementary heuristics to speed up the A'l'PG in 111ultilltllCl'SSOf
systems.
Suppose there are!' different heurislics. k processors an;,' \1sed to gellt'rnl~' h'sl
patterns, and each processor llses a dilferent heuristic. All the processors COHlIlIll,'
a lest for the same fault. Once a processor sncceeds in generating a t.est for lIle
fault, it sends a message to other processors to notify litem to stop workill~. Then
aU processors begin to work for a new faull
Heuristic parallelization has the potentia.! Lo i,chieve J:" .Ialer slwc,lllps than
the uniform"partitioning method because of possible anol1lnlies in tilt' ordering of
the heuristics for different faults. For example, suppose the time limit for "ai'll of
five heuristic in the uniform"partitioning mcthod is 10 sceollds and only t11~~ last
heuristic on the list can generate a lest. for a specific faull wiLhin the tillw lilllit,
say in 5 seconds. Then the processing limc for the uniforl11"llartitioniuj1; method
will he 45 seconds. However, the concurrent hcuristic mclhod will linrl IL I-Cl't (or
the same fault in only 5 seconds.
In the heuristic parallelization method, there is no way to ClINllfC tlmt tile
search space of each processor is disjoint. That is, even though the hCllristicl' 1lsed
by the processors differ, they might a.ll lead thc ATPG i~lgorillllll down similar
paths to a non-solulion and a test may not he found ill the allotted tillie, even
though one exists. This means that the heuristic techniques cannot he gm~mnt('~;IJ
to make all processors work efficiently togeth~: to find a lest for a singh: hard-to-
detect fault which takes a large amount of computation tilllc.
Table 3.2 gives a summary of heuristic parallc1ir.atioll. It 5h,)ws that Uw
22
Table 3.2: Summary of Heuristic Parallelization
S,,"Io~1
J.Ch~l,tlra.
Jalll.k
ll.l'nld
Parallel
M.~chine
Network
orSu03/50
workstations
Scalability
Concurrent heuristics: less
Ihan linear speedup for only
5procusol5
Major
Resulh
Introdueed concept of hellril-"
ticparaUelilalionanddeveloped
Iwo methods: uniform and
concurrenl heuristics. Demon-
straledunirormparlitioningpro-
dllcesbetler.peedup.
speedup is less than linenr speedup, for up to at least 5 processors.
3.3 Search-space Partitioning
Search-splice partitioning is a technique to make all processors work efficiently
together to find a test pattern for a single fault.
The search space is divided into sub·search spaces. Given a circuit with iV,.;
primary inpuh, there are 2""p· possible input patterns. The search space is the
set which contains all these patterns. A sub-search space is a subset of the search
space. A processor searches one of the sub-search spaces. The sub-search spaces
for the Ilroccssors are disjoint and arc spread as far as possible across the solution
"paCe to maximize the area of the current search. This organization increases the
cllancl's of liuding II. valid solution quickly.
The following is an example which shows one way to partition the whole search
spare. Stlppose there arc 2k processors, the number of primary inputs for a given
l'irclli~ is X,., (NI" ::: q. Then the whole search space is 2,vp,. We can divide it
into 2" sub-senrch spnces if every processor has an identifier i (0 :5 i < 2" - 1).
23
From N)'; inputs, k inputs are ~elected. Th~'~e k inpll~s can work liS pr("l",'SStlr
identifiers, or identifiers of sub-search spac~s, since 2k dill"crenl VIl!lIl'S ,'an Ill' us,'!!
to represent 2l ' processors. The whole search space is divirbl by ll,cs,' 2" vallll·s.
Witholll losing generality, the selected J.. bits arc lh(~ first k hits ill .VJ" hils
"U"'''1'_1~
"',.-1-
where 1/; E {D, I},; = 0.· . ,J.·-land J' is allllsJlccificd vallI(' forming a S1lIl-SI'ardl
space. This space is assigned to processor I'., whcrc .• = Lf;,: 1/,2'. Fur ('acll f~"
Otherwise, we have
This contradicts that ,., #- ."1' Therefore the sub·searell spaces nrc disjoint.
It is impossible to search the wholc space within limiled tillie, for lar~(~ Ilroh-
lems, because the search space increases exponenlially, so .l b'lcklrack lillli~ slilt
must be specified. When the number of backtracks exc~ds the limit, llw all;'"
ritllms will give up the sl1arch and consider this fault as a hard-lo-ddcd fault.
[131 makes the following observalions: Firsl, increasing the lmcktTilck limit 011
the uniprocessor implementation does not yield beUcr results on }jard-lll·lIded
faults, and lhe parallel algorithm yields bellcr results for an el[!lal 1I11111],er of
backtracks. The results are better because t.he pnmllcllllgorilllll1 scarcll(~li n InrJ!;er
portion of the solution space. Second, the parallel algorithm runs mlu:1J fa.skr limn
the uniprocessor implemenliltion and exhihits early linear speedup ill II1IJ~t cnses
for up to 16 processors.
Table 3.3 shows lhe current research for search space pllrtiliollillf.;. Olle resuh
Table 3.3: Summary of Search Space Partitioning
It~scarcbe's
S,illiy;~~
I'lLtil,l'rill>
ll;,."crjce
Akinl- Moto-
Itara, Kerlji
Nishimura,
lIidco Fuji-
wara, Issao
Shirnkawa
Parallel Scalability
Machine
J"ld Nearly linear speedup for up
iPSC!2 1016 processors. Superlinear
specdupillllOmeC<l3e'.
Li'lks·1 Aver1lged linear speedup for
Z8000-based up to 50 proceMors during
systeln seatehsp;lcepha.se.
Major
Resulh
Inlroducedefficientsenrchspnce
parlitioning using rodem
algorithm.
Demonstrated good speedup is
possible for large numhe,s of
processors using search space
p;,.rlitioning.
shows ihat linear speedup can be had for up to 16 processors. Anoiher shows that
liuear spt'Cdup can be had for up to 50 processors. These results are much better
lIHLn the resll-lis in previous subsections.
3.4 Functional (algorithmic) Partitioning
An algorithm cnn be divided into independent subtasks that can then be exe-
cutcd 01L scparatc proccssors in parallel. This method is referred to as fundional
partitioning.
Motohalajll] uses a type of functional partitioning to remove the easy-to-
detect fanlts from tILe fnull list. This procedure is done before the parallel method
for hard· to-detect faults presented in the previous section is run. The method
hegins hy dividing the fault list into groups of related fnults. Typical related fa.ults
include those along the samc path belween a fault site and a primary output. After
llll" fault list is divided into groups, each group is sent to a cluster of processors that
25
Table 3.4: Summary of AlgoriUuniC" Pl\tl;tionin~
Re-archns Par:illd
Ma.chine
Srini.:as Intel
Patil. Prilh iPSC/2
Banerjee
Akira Moto- Links-)
harll, fienji Z8000·buen
Nishimura, system
Hideo Fuji·
walli. hSMl
Shiraltawa
Sc~abilit,.
Nearly linear speednp for up
t08prOfcssors;speedupf"lla
01 lifter thllt
Line"r speedup for np to
10 processcrs during ,~lgo­
rith1l1lcpbase.
IHl1\on.ltatnl th"t ATI'G "itk
f"1l1l.iIllUL"\lioll ilI1l10,t dliri"nt
than A'T'I'G"lont, ....tn in l)Ilral-
It! en..iro"'ll~nt. \o'a"l1 I'",tili",•.
inl!i.ho,ugootlsl'.....lnl'r"rnl't"
8IltOn'.....".
Inlro<!lIcrdcOllihinaliollufal"".
rillunic ami "earth "I'M" I,arti·
tioning.ystem•.
includes a lest ceneral.or And a fault simulal.or, Thc tt:llt gcncrator tall'; till: lifl<l
fault and ll.enerates II. test for it using a Podem algorillull with IL lilllilcri nllmhcr
of backtracks. If a test for II. fault is not generated within the hacktrl\.Ck limit, it
is considered A hard· to-detect fault and is processed latcr. If a test is. found, it
is sent to II. fault simulator node. This node runs a ventio" or a (."OIlC'Hrcllt rault
simulator[33jto determine which other faults tI.c test pllllcrn dcll.-ds. 'l'IIt.'lII:
faults are then removed from the fault lisl.
So far most serial ATPG lLlgorithms developcll nrc dillicult to pllrl..lldi:w fnuc-
tionaUy. In order to efficiently usc fundional partitioning, 1\ new 1\11;Orilli/il for
ATPG must be deaigned.
Table 3.4 shows that the current algorithmic partitioning lIyslcrlll; 1:lLn rcndt
linear speedup for up to 10 processors.
2.
Table 3.5: Summary of Topological Partitioning
Parallel
Machine
Scalabilily Major
Results
Il'llllillMU
llirosc,
Koichiro
"f'.,kay'u,"l.,
Nul",,,ki
Kamalo
SI·ccialpur-
l'osesimuln-
lion
No results available speedup Demonstrated topological parti·
falls off afler that tioning for simulation portiollof
ATPG process.
Glenn A. COllllecliol\
Kramer Mnchine
Linear spudup for circuils
with up to 15 - 18 inputs.
Speedul' fnJls off rapidly af-
lcrthat.
Elllploys lopological pll.rtilioninll
by mapping one cireuil element
to each Connection Machine pro_
ces.o•. Only current algorithm
dcmonstratcdon massivc1yparal-
lei machine.
3.5 Topological Partitioning
All parallel algorithms discussed so far require each processor to access to the
entire circuit datl~bnse. This may be a problem for large circuits because each
I'rocc5lior may not have enough memory to hold the entire circuit database. Tope-
logical partitioning tries to divide a circuit into separate partitions and instantiate
each partition on a different processor. Each processor only processes a partition
of circuit therefore less memory is needed. Since it is a difficult task to patti-
tion circuits so iI.S to parnllelize the ATPG algorithm, no ideal method has bcen
reported so fat. Fnrther work is needed.
'L~~ulc 3.5 shows the summary of current research systems using topological
partitioning. It is clear that the results are not satisfactory.
27
So far, most of the tedllliqucs for pIH1,lld']Jro~cssilll:i A1'PG USI' nil,' "r \,h,'
cOllllnercially available networks to provide cOlllllluuication bdwI'('ll procpss"rs.
Tables 3.1 to 3.5 sUlllmarize the previous research work ill Imrnllcl [lro",'ssin~
ATPG, For example, [14J and 113J used the Intel iPSO/2; a lldwtlrk or SUIl:l!f,U
workstations were used by [101 and (121. As the nnmber or pro<'cssors iunl·as,·s,
it is unavoidable that network cOlllmunication 1011(1 hecollLes heavit'r nlHI hl'ltV,
ier. As the limited capacity of the cOlllluunicatioll network is providt'd, tralli,'
jams appear, computation time decreases and the ndwork Salllrltl,'s, 'l'II<~t<,r"tt"
in order to permit llle computing time to dccrease 1illenrly. A'I'PG rt'lluin's :,
communication network which can lI'foid COIllUlllllication nlllllids.
Chapter 4
Hard-to-detect Faults
:-iilll:C lIlassively parallel machines wilh hundreds or thousands of processors will
he availahle in the (uture, can these machines be used to efficiently solve the
ilutomatk lest rH~llern generation problem? What is the problem which should
bl: \'onn·n1.rakd on? Some experiments may give us some hints.
4.1 Data from Experiments
).('1. us first do several experiments.
If we llse tile PODEM algorithm {implemented by ourselves} with heuristics,
wlJi"h will he tliSCllsscd later, we can try to rliscover what the relationship is
h\.'lw....·ll hacktrack lilllits amI what percentage of faults are ,o;()f,~rll jlllllt.,. Here, II-
.~,,!I', ,I /,,1111 is a fault for which a lest pattern is found or its redundancy is proved.
It is rt~asOllnblc to lake the number of primary inputs as a unit of backtrack
limil., For example, C<\32[23] has 36 primary inputs, the backtrack limit is assigned
;IS ;{fi, 2 " ;i6, dc, The following explains why it is reasonable.
1. Dilfcr~-llt circuits have different sizes. A test set for a circuit with only 10
logic gall-s rau be gcneral,ed within a constant backtrack limil ~', For a
29
Table 4. I: Statistical Data for Cirru;ts
--;.(~:;rl
~~
l0IT
SM
7:2:1-
5;12
7.28 .
7.48 .
5.51l--
SAn"
\Ul:I~'
:l:i:i~--
Circuit PI Gates Faults
~~~~l~1'
Stepl Step2 Skp(
C422 36 160 521 471 471 +45 8n.88 II
C499 41 202 758 251 25·1 + 496 '-;f~ !I
C880 60 383 942 872 872 +44 !I
C13SS 41 546 1574 350 350 + 1152 . 22.2;1 !I
C190S 33 880 1879 1369 1369 ... 459 72.85 -i}
C2670 233 1I93 2747 2678 2678 +0 !17A8 !I
C3540 50 1669 3428 3180 3180 + flO \l2.711 II
C5315 \78 2307 5350 5259 [,259 + 8 1l8.20-·!l
C62lS8 32 2416 7744 6274 6274 + ]·157 8Ull !I
C7552 207 3512 7550 7073 707:1 + 12 !la.OR -!"J
circuit with 1000 logic gates, within lhe same backtrack limil, l-, Ill) ll'st
pattern is likely to be found even for olle fiLull. lh:lIcc, it is 110t iL ~,,,,,l
idea to hke a constant value as a backlrack lill1it for all circuils, withollt
considering the difference between their siv.es.
2. In combinational circuits, the size of II circuit is strongly n:l<lled 1" til<'
number of primary inputs. In other words, to sOllie exlcllt, the 11ll11I1,,:r "r
primary inpuls represents the size or the circuit.
We assign the backtrack limit as the number or primlLry inpuls (olle Iluit) ;11111
douhle the number of primary inputs (two units), respectively, mill ol,serve tI",
number of solved faul15 and percentage of faults. For 10 typical drcuih[2;l[, WI'
obtain the following 10 groups or data, which arc represcutcd by 'l'al!l!: .... 1.
In those table~, every unit is the number or primary inputs of tIll: c:ircuit nndl:r
tc~t. l;or cxample, circuit C499 has 41 primary inputs. Its unit is 41. Here, 499
represents lIle number of connecting lines in the circuit. The number of faults is a
reducctl c'IuivlI.lcnt fault set hl'lSed on equivalen~ fault coUapsing[17], The number
of faulls to be tested can be reduced by combining, for example, indistinguishable
fnults into a single set!JO], "Indistinguishable faults", are faults S'lch that there
i~ 110 test to distinguish between them. Therefore, when generating a test for an
/I-input AND (OR) gate only (11 +2) rather than (211 +2) faults of the gate need
to bc tcsted. A systcmatic approach that reduces the number of faults that have
to be tested is based on the idea of fault equivalence classes, i.e. such faults that
arc covered by I\. single test set[JO].
4.2 Inference from Experiments
III the prcvious st:dioll, some experimental results about automatic test paHem
gcneration were discllssed. Now, we will see what kind of conclusion we can
delillce.
Qur data tells us that over 93% faults are solved faults if the backtrack limit
is double the number of primnry inputs. For the circuit C6288, over 99% of the
fnulLs arc solvcd funlts, This fact implies that in a system which contains massive
proce~sors, fault partitioning can work very efficiently to generate test patterns
for most faults. (Why? because most faults are easy to detect), The next stage is
thl' tillie to concentrate on bow to coordinate all processors to solve the remaining
faults, whicb I\re cl\lIed IlIIn/·ln-dr/I'/·1 fl!lIl1.~,
If there is a system which consists of massive processors, the fault partitioning
method cnn bl' Ilsed to elliciently solve the test pattern generation problem for
31
most of the faults (93% or over), namely, the e;\"y-I,o-lk"kd fallll". Tbere i" 11"
COllllllunication problem amons processors after ('neh pron'ssor is assigll"ll ~I suh"d
of faults. The backtrack limit is double the number of primary inputs, wliio:h is ~I
small integer. This implies that there is 110 big difference bd\\'('t'U tl1I' amount of
work d....ne by each processor. The tcst set may, however, contnin funny redundant
test patterns.
After this first step, the remaining faults are lmrd-lo-dett:ct fl~ult". I\UY Oil"
of them may require several hours, days or even weeks if one cunvclltiollal A'I'I'G
algorithm and onc processor arc used. It was also sbown tlmt ':lItrl"lIt parallel
systems are still unsatisfactory for thc solution of the automatic ksl pllt.t..rll
generation problem.
Therefore, it is lime to think about designing aspccial s!.rllr.luH' to inlllreolln,'cI
many processors in order to make n group of jJfOCeSSars salve till: nutolllllli,: t"sl
pattern generation problelll more effectively, for Lhose Illlrd-to-r1clIJcl faults
32
PartII
Simulation Environment
33
Chapter 5
Model for Measurement
Some kind of a measurement model is important to lIlCl\Surc the clllalily of 11 sys-
tern, or to allow meaningful comparisons of different sYBh:ms. The lllcaSlITcm(:1l1
model must encapsula.te the essential functiona.lity of the systelll in a. lIlHl111ilillhll:
5.1 Model for Measurement
If there is II. system which consish or lab of pnx:esSOrl and has a slH.'Cific lol'oI,'t>Y
to connect these processors. can the quality of this IYltCIII be mClllilllL'tn litre
quality means the quality of the syslem for parallel processing of automatic led
pattern generation.
One criterion for evaluating the quality of l\ jlarallel IIOlution to 11. prol,lclll
il how weJl it scales. There are two aspects wllich rIllY illlportant roles ill tile
quality of a par1l11elsolulion. One is the algorithm, the other is the topology of :~
multi·proccssor systcm.
So far, most research hu concentrated on the t1csign of Jlurnllcl aJl;oriUtlllli,
which can be executed by a specific multi-processor system. The measure of t11l~
------- -1--
Figure 5.1: Virtual 2 Phase Clock
f1IlIIJity of a parallel solution is determined by how well the algorithm scales. An
algorillull scnles well if Lhc computa.tion time decreases linearly, or nearly so, with
all il1crclL~c in the number of processors in the system. The speedup of a given
]lamllel algorithm is defined as the ratio of the time taken by the fastest sequential
algorithm running in an equivalent uniprocessor to the time taken by the parallel
algorithm on the parallel machine. The goal is to have the algorithm's speedup
scale linearly with the number of processors.
Since our goal is to design a multi-processor system which has a high per-
{ormance when generating test patterns, the quality of a parallel solution is how
well the .~!I""'III scales. Analogous to the case for a parallel algorithm, a parallel
$ystcm scnles well if the computation time decreases tinra-rly, or nearly so, with
an increasc in the number of processors in the system. The speedup of a system
is defined ns the ra-lio of the time taken by the algorithm running in one processor
to the timc taken by the algorithm on the multi-processor machine. The goal is
to have the system's speedup scale linearly with the number of processors.
1\ Illulti-processor system is a tuple (P,C,/li), where P is a set of processors,
and every processor is identical. C is a. set which contains interconnection informa-
lion for the processors. All processors execute their own algorithms synchronously
under the control of a virtual two phase clock (2PC). Figure 5.1 shows the dia·
35
gram of 2PC. It is also required that these checking algorithms lmvc the ~1l.lIl\' linl<'
complexily. This is very important since all processou work in synchronillillioll
The period of 2PC depends on the lowest spec,] processor. Silll"e all prOn'SS('rS
in P are the same, the greater the time cOlllplexiiy, the lower lhe sJlt·cd. A I"IV
speed will cause other processou to wall.
Iii E P, Pi is the master processor of the sysiem. The mllSter proce~~or is In
charge of coordinating the system. For example, it receives t;l.Sks from out.side,
and is the first processor to begin to generate test patlcflls. It (lccides wht:tlicr
it is ti"le to stop working because one test patlcrn has hcen found to c1cll'cL u
given fault, or time has run out, or a redundant fault is founl!. It also shoulll
report the result to the outside. C determines the topology of II 1l11lIti-pruces~{lr
system since a topology depends on the connections am""g processors. If there is
a connection between two processors, il means that there is a wire hetween Utern
on the physical level. Limitations are needed for (.' hecause it is impractical tn
have many wires to input or output data for cach processor. Laler, we will slmw
that all our multi-processor systems have a II cOllnected strnctllTe, two for iuputs
and two for outputs. This results in a simple and natural layout.
In order to measure the quality of a system, the number of two phlUie clock
(2PC) steps is used. The number of2PC steps (N2PC) is coullicu ~o record h"w
many N2PC are used to find a test paHern, or a rcdununnt fi~lIlt, or il hard-to·
detect fault. If a multi-processor system can ~ellerate ,,11 possible 21: vaIUt:~ for
any given integer k, a test pattern can be founu eventually - as long ilS it exists,
or a conclusion of reuunclancy can be reached. Hard·to-uetect faulls arc thos(:
whose test pattern has not been found within the specified time limit.
36
With an increasing number of processors, that is 1p 1-. OC, every N2PC is
rl,lcorded. These data are analyzed to determine whether a multi-processor system
scales well, which represents the quality of the system.
5.2 Parallel Speedup in Test Pattern Genera-
tion
In 1861, the philosopher Charles Babbage said:
It is impossible to construct machinery occupying unlimited space;
but it is possible to construct finite machinery, and to usc it through
unlimited lime. It is this substitution of the infinity of time for the
infinity of space which I have made use of to limit the size of the engine
and yet lo retain its unlimited power.
We may call this Babbage's thesis. This thesis states that time and space com-
plcxity arc related and cnn be traded for one another. As hardware technology
dl!vclops, WI! can employ the converse of Babbage thesis: use a very large number
or processors to solve the test pattern generation problem. That is to say, we can
lise space to gain invaluable time.
The central issue in parallel and concurrent processing using a large number of
processors is the design of multi-processor systems and parallel algorithms whose
performance can be 50mehow related to the time oomplerity of the single-proce5sor
sequclltinl nlgoriUlln, T,. Ideally, we require that a parallel algorithm which takes
a problem and uses N processors in time '1:v is related to '1'1 by the relation
Tv = TI/N. In other words, we hope that a multi-processor system with N
37
independent proceuon; should be able to camputl' till' solutioll 'Jf II pruhll'm S
times faster than a single processor. This is called idml "111",.,1-,/". \Iow,'v"r, in
practice, this speed-up rlltio '1'1/'1\' often turus out to be far 1"5S than N for till'
following reasons;
L Processors competing for 1I1e same comluunication paths witll uther pron's,
sors or to 1\ shared memory can slow down Lecuusc of 1I1e non-nvailal)ilit}·
of patlis.
2. Since simultaneous reading and writing from a lile can calise cOlillictli, tilt'
processors are forced to wllit for mutual exdusioll.
3. Processors need to be conditionn.lly synchronized whell dil[crclit tasks ar.· til
be coordinated.
4. The sequential component in an algorithm limits the sp•.,'etl of the total
process; in 0' her wortls, if '/~ and ',;, arc respectively the time slu'IIL .111
serial and paraUel components of an algorithm ill n sillgle prOCCliSOr, lilt,,,
the maximum speed.up SN that clln be acitieve{1 using N processors ill
paraUel for the parallel component is given by:
.... < "~ + ·f.;, __1_
"to'_ 'f~+~ - J+!lj/l
where J = 'r:~"I;, and 0 ~ J ~ 1.
We can find that J is the fraction of computations performed SCllllClltially. F'or
example, if J = t, where J.: > 1, then .'i,v ~ k, CVCJI if N is very h~rgcj ohvirmsly,
for J = 0, S,v;: N. This is called t1I1H/Il/,{'.~ film.
38
15 Steps
Figure 5.2: One Processor Searches 16 Elements
lIowcver, the tes~ pattern generation problem is anomalous. To find a test
pattern for 1\ given faull C1\11 be considered equivalent to searching a space. N
("oopcrating processors may reach the goal much faster than one processor, even
l110tethan tV times faster.
Suppose tberc is a space which has 16 dements. One processor Po searches
the space according to some heuristics. The goal dement can be reached after 15
steps, as shown ill Figure 5.2. If four processors, I'll, PI, P~ and I~l, takc part in
the search according to the same heuristics, each will search ll,. sub-spacc, as shown
ill Fignre 5.:~. The goal clement is found by P;l nfter one step. 'I'd'!:1 is 15, which
has greater than 4. This mcnus that there is greater than linear speedup, called
Thi~ anomaly can also be seell from another point of view. In general, a
probk'm contains II sub·problems. In order to solve this problem, the processing
39
1 Stq.
p,
Figurc 5.3: <1 Prote~sors Search 16 1~lclllcllts
elements in a parallel ~ystelll have lo cooJlcrn.lc with each otlwr Lo S..IVI~ all "f
these II sub-problems. But for the tcst paUern generation, if 11 lcd paU"rll is
found, all of the remaining computalioll can h~ olilitled. 'I'herdon:, IIfJl all nf lIu.:
sub-problems must be done.
If a circuit has N,,; primary inputs, the test patlt:ru gcncrn.tiflll prol,lt:1JI for
this circuit tnn be divided into 2Np, suh-prohlems l~ccording to the rejJreM~lIliv,:al
its primary inputs. Every sub-task is to solve olle sub-problem, whiclJ is lo t:h,,,:k
whether the given pattern in primary inputs cnn dctcct the given fault. If Oll'~ ur
t.he sub-tasks is done and a test pattern is found, all of the rt>ltmiuilll; lIufiliislic,l
sub-tasks can be ignored. Again, consider lile examples in fo'igure ii.2 anel l·'il;uw
5.3. In Figure 5.2, when a test pnUern is found, 15 sllh-tasks Imve heclI d()IIl~,
which occupies 93% of aU 5ub-to.sks. But in Figurc 5.:1, only 2r..% 'lr ,1I1 sub·lll.~h
10
J~igure 5.4: A Circuit with One Fault
Step I, I, I" Status
1 X X X potential
2 0 X X cannot
3 1 X X polential
, 1 0 X cannot
5 1 1 X potential
6 1 1 0 cannot
7 1 1 1 cannot
Tahle 5.1: A Process of the Proof of a Redundant Fault
om: rlolle when a test pattern is found. Although only part of the sub-tnsks arc
dune, tllc tesl plltlern gencration problem is solved. This is one of the anomalous
cl1amclcrislics of the problem.
Ilow ahout tile redundant fault? Should all sub-tasks be done in order to prove
ils redundancy'! The answer is "No", again.
'I'able 5.2: A Process of the Proof of a Redundant Fault
41
For exam pic, considcr th... ,-ircuil in Pigun' r,,'1. '1';01.1,' ii,1 ami Tnb!.· ~•. :! nn'
two examples which show the prOCC'55 1.0 ]lro\'~' til<' n'dllllc1:ul<"y "r Uw fnull. 'I'lwr
clearly show that: it is JlossiLle for th,· proof of a Tl'lhlliliaut faull I." tI,. ,,"ly parl
of the sub-I,asks, if an unknown valuC' .\ is I1s,'iI IlS Oil,' llf tilt' I'riltl:lr)' inlll11.'s
vallie. These two tables also show lhat UlC dilli.'Tl'ul ord"T "f assi)!;lIill~ v;I1I1"s III
primary inputs may cause a different numbt'r of sull·lash I.•• h,· 11"1"'. 'I'a!>\" r'.1
arran~es I~, II, I" as the order of assiglllllent. St'v,'1t sl111-lasks an' dtlllt, 1.,. pn'Vl'
the rednndalH;Y. Bllt Table "'.2 selects I" as tIl(' lirsl, prilllary input t" I';lv.· il.s
value assigned. Il requires only 3 suh-tru;ks to prove t.ll\' n·(luudallt·y. III ~"r\l'ral,
this order of primary inputs is dellCntlcnt all wInd lu:utisli,'s an' atlopl"ll. II"IH"',
it is nol necessary for the proof of l~ redu11l11.nl fllull lo dt> all of til" 1<lIh·ll...~ks.
Moreover, the heuristics play an il11pOrlILl11. role ill dt:ddiug how mauy sllh·tasks
should be clone.
So far, there is no standard method to measure lire spe.'dull of ""rall..l sysl.'·III.
for test pattern generation. 'l'1j'/\, is the rndhod widdy IISI',r !1:Ij[Hj.
For the automatic test paltern generalioll prohlelll, W(~ limy t'xl"'d a lUulti·
processor system to gencraie a lesi paU,~rn for II givcn fault v"ry 'pli"kly if 1111'
iest pattern exists and therc arc cl10ugh procclisors wI lid, ;,r,' iulnl"!'IIlIl"·l.·oI. Fllr
example, there is a circuii with N primary in/lll!.lI. To ,It·\.<:d a l'"ssil,l" sl.lwk·al
hult, there are 2'" different patterns which CIII1 he fell t" tll<~ circlIi!.. 'l'J.,.•,~ t!'"
different patterns form Il. pallern set. Thill paLicru lIeL ill l'nllt~tl lh,~ ." 'Ir,.J'·'~lm,·,
because ATPG algorithms always tty \.0 search tlli. sd 1I1) ;...~ if. lilld ;, Jlallo:ru
which can detect the given fault. If ihere is a IIIlJlli·pr",:clIlior l\Yl\t'~rn wllidl
contains 2,'11+1 _ 1 processors, which ,ue COlIllllClcfl i', f,mll a o'IIJ/,ld'~ Ili/lary
lr"c, lhen, witllin N steps, a test pattern can be found if it exists, or n redundant
fault call he proved if there is no test pattern for the given fault.
III Ilfaclicc, it is impossible to have such a system since 2'\' is a huge number
wl]('11 tV is a little hit lilrge, say IV > 20, Therefore, the problem is how to
1I~1' limited rc~sollrces, or processors, to find a test pattern in the search-space as
fluir.kly as possihle. More exactly, the problem of exploring linear speedup is to
clcsigfl II topology to conned given N processors and a protocol to make these X
Ilr,,{'cssors comlTlunicate with cach other so as to find a lest pattern or prove its
rt·dllllflallcy tV times fastcr than when one processor is used.
43
Chapter 6
Algorithms
This chapter introduces all the algorithms used in our simulation software. 'l'llcit
time complexities arc also discussed.
6.1 Parser Construction
To generate test patterns, first of all, circuits have to be analYl,cd. 'I'he descriptioll
of circuits is written in a nellist format(23]. This scclion disCllliSCS llll: gralllmar
rules of tile netlist format; the format is described in detail ill Byroll [2:IJ. Based
on the grammar rules, a parser can be developed directly.
6.1.1 The Grammar Rules
The description of Lhe nellist format frolll Bryan(23) is a list of dCliuiptiolJl; of l"l;il:
gates. The description of a logic gate is called a /1m" since cadi gate, O[ primary
input, or fanoul branch is considered as a /lmlt. Using tile form of YJ\(X.;Pfij, w(~
can use the foUowing grammar rules to rcprcsenl these:
circuit : node_list
node_list: node
I node_list node
A "in'II;! is a list of nodes, denoted by 1I0I1r._li,;I, A ,w/lf)i,;! is described in a
rt;cllrsive way, A "m/" forlns a 1/(/(hji,~I, A /I",lf:_/i,~1 followed by a 11(i/!r. also forms
Frolll the llcllist format, /lm!,s can be classified into three types: primary
inputs, fanollt hranches, and logic gates. They have different formats,
Prilllllry inputs have the format:
address name INPT fanout ZERO faults
Fanout branches have the format:
address name FROM name faults
Logic gates have the formal:
address name type fanout fanin faults fan in_line
Here, I!I/If" represents ~he type of the gate, for example, AND, NAND, OR, etc.
Using grammar rules, a limit' can be written as:
node : address naml': INPT fanout ZERO faults
I address name FROM name faults
I address name type fanout fanin taults tanin_line
Since we adopted the reduced equivalent fault set, which is based 011 equiva.-
lence fault collapsing[23], there may be some nodes labeled n0 fault, some labeled
stllck_aLO, sOllie stuck_al.t, some labeled both. Tho::refore, the grammar rules of
{UUI/8 (all be described as
45
faults
I S_A_O
I S_A_1
I S_A_OS_A_1
I S_A_1 S_A_O
The complete grammar rules can be liskd tul rollow:
circuit : node_list
node : address name INPT fanout ZERO faults
I address name FROM name faults
I address name type fanout fanin faults fan in_l ilLa
address : integer
: STREAM
1 N_ZERO
I ZERO
type : AND
I NAND
46
I OR
I NOR
I XOR
I NXOR
I BUFF
I NOT
fanout : integer
faults
I S_A_O
I S_A_l
I S_A_OS_A_l
I S_A_1S_A_O
address_list; address
I address_list address
integer : ZERO
IN_ZERO
47
Here, the uddl'r.~.~ should be an illl!'.II'I'. 'I'll(' illl,!/,.,. is a 1.l'W (If a 1l01L1.1'r,l
value. The /I/II/Ir can be a zero, or nOli_zero "alue, or OJ. string uf dmmdets. '1'1",
jUllill and lhe jlll/l'II/ arc inLegen. The jl/llill call1lol ht~ l.t'fO Sill\'(' \'IH·h log;l· ~al,'
must have at least one input. A logic gille has lit'veral inpnts, will'st' :"hlrt'sst'S Ml'
put in thejlllli/dim field.
With each grammar rule, actions may be associah:d LO lie 1',~rrUrllll'll l'iH~h tim,·
the grammar rule is recognizcd in the input I'rOCelili[:m]. '1'111:11 tlll' rwllisl formal
call be analyzed, and the needed data structure cnn he constrll<:kll.
6.2 Compiler Driven Sitllulation
In a mult; ·processor system, there arc it lot of processing delllCnls, We call llwlI1
checking processing elements (CPE), which simultaneollsly do th,~ sallie task,
checking whether the given trial test paUern call, or cannot, or is pussil,I,: 1..
detect a given fault. In order to do this job, every ePE (locs itl! work ill twu skps:
1. simulate the logic circuit
2, check the result
In simulation, there arc two basic classes of simulators, compilcr (Iriv{:l1 allll
table-driven event·directed. The earliest simulators werc of tilt: f"rmcr typc, hilt
most modern ones are of the lalter type sincc tllCy allow for more: vcrsl~tilily ill
handling delays as well as a reduction in simulation timc. In our cn.~c, till! <;r,mpile:r
driven method is adopted. In our syslems, all IHOCCl;~ors work sylldtrlJflollsly
under the control of the virlual 2 phase clock (2PC). I'or thr: worst ClUil:, lIle
table-dri' ~n event-directed method has 10 ~iJllula1c nil logic gal($ ill lIll: r:irt:uit
~ince (~very gilt!:: is active, This situation has to be considered when we calculate
the lillie periol! of virtual 2 phase dock, In a synchronous system, the period
of 2PC is the time for the worst case. If table-driven event-directed method is
ll~ed, processors in the best ca.~es have to wait for processors in the worst cases,
In olber words, the saved time is wasted because the fasler processors are idle
ill orller to wail. If the compiler driven method is adopted, it is much easier to
l.'Stimale lhe time period of 2PC since every processor runs the same executable
code. Therefore, the compiler driven method is more suitable for our casco
Compiler driven simulation first translntes the description of circuit into a list
or logic gates, which is called !hl' III/"'{,illc /',rn'ul"Mf .'{f/Ir- lisl, which is arranged
according to Ih,' 1I1f1/'hilH' ,'rr/'IIIIIM/' 1/1"11/'1'. This machine executable ordt'.r guar-
Mllces that the circuit simulntion can be dOile by simula.ting ench logic gate in the
li~t olle hy one according to their order in the list. This subsection first introduces
the circllit levclizing algorithm which translates the description of circuit into a.
machine executable ordered list. Then lhe simulation algorithm will be discussed.
FUl'Illing machine executable g<lte list
In allY logic circuit, each logic gale can be assigned a level value. The level value
dedc1e8 wllell lhe logic gate can be simulated. For example, there are two logic
gateli (,', nul! r:J , which hnve level values ~'r., and k(:" respectively. The order of
~illll11alion for (,'; and (,'j snlisfies lhe following reslrictions:
J. If k/:. < ~'.", r,', IIIlIsl be .simulntcd before r,'j.
2, If ~'/;. = k.", (:, and r.'J can be simulated in any order.
49
3. If k(;. > kn" (,'; must be simulated after (;r
It is clear that the level value of eacll logic gate imposes a parlial nrll,-r Oil 1.11<'
simulation.
The following circuit I('velizing algorithm I\ssigns el\ch logic gate II ll'vel Vllhw;
1. Assign all primary input lines ,r and feedllack lines .II the lev,'1 vah\<' II.
2. For any element not yet assigned a level value, ,lSsign this t'!l'm,'ut illll! ih
output lines a level value as defined by
1.••• = 1 +lJw.r(k".l·,,_ ... k;,l
where k; is the level vnlue of element i, and clement " has inllllLs rrUll1
To implement this levelizins algorithm, the following algonUlI1l was dcsil;lIcd. I''nr
convenience, each primary input is considered as 11 special logic gate.
1. Find ail primary inputs, assign a level value 0, and pnt them illto 1111 ;~~sil;rU'd
queue;
2. While (the assigned queue is not empty)
(a) Get one logic gate (,' from the assigned lfUCUC;
(b) For {each logic gale (,'u driven by (,') do
i. If ({;" already has beep assigned Olle level vlllue) Skip;
ii. If (at least one of the logic galL'S which ,!rivc (,'" hll..~ lI'Jl ht:t:1I
assigned a level value) Skip;
iii. Otherwise, all logic gates which drive G" have been assigned a level
va.lue
A. Assign k( , "" 1 + JI",.r.(~·i" ki,,' ., k,,);
B. Put (:" into tIle assigned queue;
Suppose the Ililmber of logic gates in a circuit is 111, the max.imllm number of
flluill is Ii, and the maximum number of fanout is I". Now the complexity of the
algorithm can be analyzed.
stcp I: In thc worst casc, aftcr 111 steps, all the primary inputs can be found.
Thcrdorc, the time complexity is O{m).
step 2: Each logic gate will be entered, only once, into the assigned queue, so
this while statement will be executed III times.
St.ep (n): Constant time is required for it.
st.ep (b): for statement will run I" times.
st.ep i: It !leeds Constant time.
stell ii: Ii time is required to do this judgment.
stel) iii: H is clear that it is constant time.
st.ep A: ()(fi) time is needed.
step B: It is constant time.
Thl'lolaltime needed is:
0(/11) + 1Il{(' + lAC' +J;+ (' +Ji +e») = 0(11I) + 0(1;],,11I) "" O(J;I.m)
51
For aU circuits, each logic gate has a limited number of fallill IUlti rllllout. Il com
be assumed that they nrc less thnn n constant C. lIence, tile tilll.· cOllll'll·!C.ily i~
0(1/1).
After levelization, any sorting program can be used to Pllt all t.he logic gah'~
in 11. special order. That is, if hI, < /.',;, , (,'i is before (,'./"
In VLSI circuits, there are Illany logic gntes. It i~ worth usiu,; au l'COIlOlllil'
sorting method to rearra.nge these gates. (J1I;rh,.,./ is OIlC or the widdy lIst~d
methods since it has a best-case till1e O(I/Iog 1/) [:151. Il also has II w()rsL"'a~('
time 0(1/") [351. IIrflJ!.~(J1'/ can sort these data within time O(I/logl/) I:l.'il, 111It it
needs a litlle bit more space. Either of them can serve the purpos.~.
After sorting, an ordered gate list!· lormed. This list is called a 11/"1'11;/11'
C.£(;(:1I111Mc .qllic Ii.~l.
Simul<ltion Algorithm
Based on the machine executable gate list, the circuit simulatioll b(·.:ome.~ easy
and direct.
From head to tail, for each logic gate in machine cxectltahle gat!: list, d"
1. Get all its input values;
2. Simulate the logic gate ba..~ed on '.he five value logic In, I, X, IJ, 7i}.
Suppose the maximum fanill among all logic gates is 1.:. '1'0 simulate cad. logic
gate, k steps may be needed to felch input signals. Since tile llliu:llinc cxccut"lJlc
gate list contains /II logic gates, the time complexity of thc ~imulatiOlI al.t;oritlllli
is 0(1)111). In practice, the fal1in or a logic ga.te is a limited vuluc, liay 4 or a lilt],]
52
Figure 6.1: Ineffective Logic Gates
The nHucimUIll fanin can be considered as a constant value. Therefore, the
time complexity o[ simulation algorithm is O(m).
This machine executable gate list can be made smaller if a fault, which should
he tlclectcd, is given. In n circuit, soille logic gaks do not affect the h:st pa.ttern
generation [or a given fault.
Var example, the logic gate II' in Figure 6.1 does not affect. the test pattern
generation for the fault shown in the figure. Therefore, gate /\' can be deleled
(rom the machine eJCecuta,ble gate list, making the list smaller. Hence, simulation
l,ime can. be saved.
To reduce this list, another algorithm is nceded. Here, we describe it in natural
li\ugunge, il is trivial to cadI' it in n programming language. First, two concepts
arl'defined.
A jut/lllmIl4,,,,,·ill.fJ .'1111,. is a logic gate such that, if at lenst one of its inputs
carries a flLHlty !lignal, then its output is also a faulty signal.
53
A useful gate is a logic gnle, which lhiw~ aL lcast um' u~cflll ~igll;ll ;\11\1 il~
inputs are all useful signals.
At the faulty point, the faull)' signal mud ue Ilropagnkd forw.ml alltl ust'flll
signals propagated backward.
Based on these definitions, the algorithm CRn be de~crilll'd. IJcgillllill~ frOln
the faulty point, a fault transferring signal can be propaWllcl1 fllrwilrll. All fiwlt
transferring gates can be labeled. Aha beginning froUl the faulty poinl, a usdnl
signal can be propagated backward, Al1 useful gales C,ltJ he flJuu<1. 'I'ho~c !-\i\ks,
which are neither fault transferring gales nor useful giltcs, can he lidded from til('
machine eXCl:utable gale list since thcy do not affect lhc lest JllLllcTII /;cllcmlillll
for the given fault,
The penalLy is that the changing of a fault enuses the e1iallgillg of LIII' "X('·
cutable gate list, As we kllOW, our goal is to solve thc prohlcm for Illud·to·det.... t
faults, This may neccssitate that the simulation be execulcd IImll)' limcs, 'I'hcr<"
fore, it is still worth doing so because simulation becomes ljuicker.
6.3 Checking 'frial Test Patterns
After the simulation, we use the following checking algorithm lo c1leck wlll~tlu~r
the trial test pattern Ca.1I detect the fault.
L If all primary outputs arc 0 or ), it cannot detect 1Il(~ given fault.
2, rf at least one primary output is /) or Ti, II test pattern for tllC given fault
is found.
54
;'1. r'~lsc, lherc is no /) either 7]. And there is at lead one X in the primary
outputs. Do
(a) Set an empty set which is used to contain any logic gate which is found
cnn possihly propagate thl' fnult to primary outputs;
(h) If the logic gale whicll has a stuck-al fault, has the value I), 7J or X,
put il into the sd;
(c) While the set is not empty, Do
i. Get one logic gate from the set
ii. If the logic gate is a primary output gate with value X, rehlTn a
POTENTIALTEST_PATTERN flag, which means that the input
1J1Itlern has the possibilHy to detect the fault. As discussed before,
the input pattern is a potential test pattern.
iii. If the logic gate is not a primary output gate, put into the set aU
those logic gates which arc driven by the logic gate, have value D,
7J or X, and have not been in the set before.
(tl) J( the sd is empty, return a NOT.TEST_PATTERN flag, which means
that this input pattern is impossible to detect the fault.
Suppose the maximullI fanout in the circuit is ~', and a circuit contains 1/1 logic
~all's. Wl' call analyze the time complexity of the checking algorithm .
.st.ep I: III lhe worst case, all gates arc primary outputs, /I, gates arc needed to
be c1ll'cked. Therefore, the time complexity is 0(111).
"
step 2: To check the result, '" gate, l1l11y have lu \... "";11111<'11. 'I'll\' lilll<" ""11I'
plexity is stiU (J(w).
step 3: After step I And step 2, the result i~ ob\·i'lllli. Il nil \", th"u~ht ;ll'
constant time.
step (a): Constant time is required for it.
step (b): It is still conslnnt time.
step (c): Since every gnle lIIay be put in the lc~t "cL, the \Vhill' "tatl'llL<·nt. IIlily
be executed 11/ times.
stel) i: It needs constant time.
step ii: Clearly, it is constant time.
step iii: t· steps Are needed because the Io&ic gate lIlay (Irivc t· galt,,;.
step (d): Constant time.
According to the struclure of the algorithm, the total limc iIi:
O(w)+O(m)+ (:+ (:+ (.'+ ",«(.'+ (.'+l:) -I- (:.:: (J(hu)
In practice, the fanc.ut of a logic gate is a limited valuc. Tll<m:f'm:, WI: ':1Ul
take t· as a. constant. Hence, the algorithm has O(m) tillle cOIll]llc:xily.
6.4 Heuristics
Heuristics are very useful to speed up the test generation "illce till: t.~~t I'II,UI:tll
generation problem is NP-complt:te in general. lIere, we IlliC lIll: tl:stahilily /Il.:I.-
"
nJl11IHlIlCUf.
Figure 6.2: Diagram for a Component
~urc of :ilcphcnsoll and Gra.~oJl/17J as the heuristics to generate a test pattern for
i\ rt·gister tmnsfer level circuit can be assumed to be a network of components
(e.g., ndders, registers, multiplexors, controllers) interconnected by unidirectional
links. III general, a link may be many conductors carrying more than one bit of
illforllH~tionj however, to simplify our discussion we assume here that every link
lm~ a sint)lc conductor. A link is a signal line carrying logic values 0 and L
i\ c()nl,rollnhility value ('1""(.•) and observability value On..)ranging from 0
tu I nrt· assigned to each signal line .'.
C~)llsid{'r the cOlllponent illustrated in Figure 5.2. The expression used to
,'nl"11lnt." ('\' for output;;; is
wlLl'f<~ CTF is the controllability transfer factnr of the component and 11 is the
57
number of inputs of the component.
The concept of the CTF is used to account for the potcntial di1l\inishing "f ,',111-
trol information as it is propagated through the cirruit, Thc e'I'!" or a ~'\'mplllll'nt
must represent the ability to control output of the ~~OlllpOm~llt by illlpl)'ing input
values, It is defined by the following equation, dcpending only on tILl' inpllt-output
relation of the comJlonent:
where N:(O) and N:(l) are the numbers of input valucll rur whidl Ol1tlll1t. : Ims
output value 0 and 1, respectively. The C'fF or 1\ COI1lI>olw"t rangcs hetwc\'l1 (j
and 1. It takes the mnximum value 1 when the cOlllponcnt hall a Ul1irOrlll illl'llL-
output relation, and decreases to 0 as the degree or lluironllily t1ecr"I"'U~S. l'~'r
example, the CTFs for a NOT gate and an XOR gale are I, since N(O) ilnd N( I)
nre equal. On the other Iland, the CTF of an II-inpllt NAND gate is -:;;1:,.
Consider ngnin the component diagramilled in Figure r,,2, The l~x[lr,~ssion IIs"ll
to calculate O\"s for each input J', is
OV(.r,) = (J'f/o'xO\""(.:)
where OTF is the observability transrer factor of the cOIllJlunent. Nutl: tlmt '~ill:ll
input observability is a.o;signed the same value.
The OTF of a componcnt must rcpresent the ea~e or JlmllngiLtill~ il rlLlllt valn,-
through the component, It is expressed as
58
where N,';, is the Ilumherofinput values for which output resulting from changing
the input VI.lue of :1:, are dilTerenl. NS; also means the rlumber of input values that
CiUl sensitize a. path from :l:j to the output of the component. The OTF measures
Llle prohability that a faulty value at any input of the component wiU propagate
to its /Jlltpul. The values of OTFs also vary between 0 and 1.
As discllssed hefore, controllahility transfer factor (CTF) is used to account
for the potential diminishinl; of control information as it is propagated through
lI,e circuit. The CTF of a component represents the abitity to control the output
"f tile component by applying input values. The abservability trn.nsfcr factor
of lI,c component represents the ease of propagating a fault value through the
':OlllpOlient, It IIlso measures the probability that a faulty value at any input of
I,he cOlliponcnt will propagate to its output.
So to forward a faulty value to Olltput, we can use the observability measure
to gel I,he "mosl potentialn logic gale which makes the faull be propagated to
prilllary 0lltpu4 as early as possible. In the backward propagation of supporting
values, t1H~ controll'Lbility measure is used to gain the "most potential" primary
inplll, which makes the test pattern generation terminate as early as possible.
From the di~cu~sion ofil'stability, we can find that it is suitable to select a logic
~att· \Vbidl has t.lte greaie~t ohservability in the /) frontier as the element with the
"most potclItiiLl", and to select a primary input located on 1/1( //IQsl t'/II/Iml/IIM,
JlIIIII from the selected logic gate. From a given logic gate, we always select a logic
gatt.' which hal\ the greatest controllability among its input gates. All these gates
form a path. This path will terminate at a primary input. This path is called IIIl"
59
From another point of view, the selection of primary in)lllh mil also c1t'jlt'11l1
on u'r If'ltT'S! r"rIIIII,./f"h1. IHlfll. Similar to the mmt fontrollabl(' Imtll, til<' wtlr~l
controUable path begins (rom a logic gate and ends I'lt a prinmry input ~:'1..·h ~;\h'
on the path is chosen because it has the least ctllltroliability alUllng :tlll"l1.i,· l1.;\k~
which drive a gate, which is alrcady in the palh.
The worst controllability path is the complement of lin' IIln,;t ...m~rnll1thility
path. Sincc there is nO way to guarantee the most controllahility path i~ "'1\11)'
the best or the quickest path ~o generatc Il tesl or prove a r(~dllndatlt fnillt, tit<,
complementary method mn.y be better for some silnaliolls. They Hlay rll'" ,,"l
some fruitless search space quickly. This qllickne~s cnll also acc:dcmh' LIlt: I,l'~t
paUern generation. The selection of this lead contmllnllility ht.'t:um"i t:UlIllIl,·.
mentary heuristics. Later, these complementary hClLtistiu will he 1IS1!!:l in "nr
autonomous architecture to generate test paUerns.
6.5 Expansion of Trial Test Patterns
To generate II test pattern for a given fault, fit5t aU Ilrimary inlluls arc a.~si~II,.. 1
unknown value Xs, which means that it lIIay hc II. vl\luc II or I. Tbis pall"m
works as the first I,.j'l/!•.~f IH/II'I'II. A rccursivc IIIcthOllci\11 he ulil~d til (ldi.1<' It
trial test pattern:
Suppose i\ circuit h",s "'", primary inpllh "/~ .. I,,,
1. XI X~ .. XI,i is 1\ lrial test pltltcrn, wllt~rc X, JJI(;IIlIS liial 1I11'
primllry input I; has the unknown vnluc X.
60
2. SUJI[lOSe fll"~" '''J-IXJ''J+I .•. (//,; is a trial test pattern, where IIi
I/Ieans that 1I1e primary input I, has a value (/ and II E {D,l, X}.
Aner simulation and checking algorithms, it is found that this
pattern is a potential test pattern. Then, we can say that
and
··I/)_ll"J+l···",.;
are two trial tcs~ patterns.
A potclitiallcst pattern (11"1" '(1,.; may contain several X values. The task of
ollr exrmnsio1\ algorilhm is to select one and assign it 0 Rnd 1, therefore two lrial
It.'st ]1,II,tcrl1s arc generated.
I. Set a temporary set empty;
2. If the logic gale which 111\5 a stuck·at fault, has the value IJ, or 7J put it
into the sdj otherwise, it is X according to the conclusion of the checking
algorithm, We take this gate as the most potential forward node 11/,'N.
:l, Wliile 1I1e temporary sd is not empty, Do
(a) Gl't oILe logic gale from the temporary set
(b) For nil of the logic gates driven by it,
i. If the logic gate 01ltputs IJ or TI, put it into the set if it has not
been there before,
51
ii. If the logic gat(' outputs 0 or 1, ~kip.
iii. If the logic gate ontputs .\", com\Hl.rt' it with L1ll' galt- Inl."I",1 lIy
",,"s.
A. Ir it is more suiti,ble ,lCCO!I!illg to lh.. ohserv,lhility ViII II", dlanl.:.'-
the "''",\1 nag to this gate.
B. Otherwise, keep the "FS nag 1IIH:llallgc,1.
4. Beginning from the gate labeled /' V1\', DO
(iI) If the gate is a primary input, this input j" ;LssiliU(,,1 tJ aJld I S!-Imral,-Iy.
Therefore, two trinl test patteflls nrc formed.
(b) if it is not a primary input, select the most Sllit,lbll~ gatl~ in all tIlt' I.:.lLt.,s
which drive it according to tlLeir controllability vahl<'. Got.. st.·p 4.(a}.
It is not difficult to show t1111t the time cOlliplexity of this ,llgorithlll is U(III) if
the maximum Canin and the 1lI1,ximlllll flLllout are considered lL$ '·"lIst,ant". Sl"I'
3 may be executed 1/1 times. Step :J.(b) 1lI,'y he run M f ,,,,,,, tillll'S. St,,!, 4.(a) ;<l,,1
4.(h) may be executed :\ff'"'''''' times ns well. Tl,ereforc, the lime: ':'l1Il1,ll,xity "f
this algorithm is 0(11I).
6.6 Detection of Redundant Faults
!fa fault is redllndant, it can he detected after wllole S(,arell spar.,: is sl:arr:hml. III
a multiprocessor system, a fault is proven rcdUlidiLut if all I,Tfjr.essIJrs arc, illl,: ill
the same phase clock. If in any clock phase, some proc(:ssor.~ arc: idl,~ and SHUW
are busy, then some subsearch space is heing selLrched. 'J'hcrd'm:, it is pr,ssilll.:
62
lhat there are some test patterns. Hall proccuors are idle during the same phase
dock, each processor has no potential test pattern in its local memory, nor in its
injJ1lt ports; therc arc no potential test patterns to be generated. Therefore, all
processors arc idle during next phase do;.:k. This implies that fuU space has been
Tu dckd wheUter all processors nrc idle during one dock phase, a special
l'wce,;,;or is designated. Wilen this processor is idle, it sends 11/1 itllr ,.' 'rdi"l1
.~i!llll" to ib children. The idle detection signal contains the time when it is
sl:nt aIH! jl"!1 "'/1'1'" ror processor, Suppose there arc" processors, the flag space
nJlttains 1/ nags. Each processor corresponds to one flag in the nag space,
When a processor receives an idle detection signal, it checks whether it hn.s
"cell iull: since the time specified in the idle detedion signal. If it is idle, it sets
the llag in the fillg space. If it is not idle, it resets the flag. Then it sends this idle
,Idectioll signal to its children.
Arh~r a perioll of time, lids idle detection signal is received by the designated
prOt:essor. Hall I,he flags in the signal are set, it knows that all processors have
11("1'11 idle since tl"l.t time, an'! consequently the fallit is fOllnd redllndant. This is
I,lll' protorol we u~('s to prove the redundancy of a fault,
Thj~ c1taph'r discllssed all of the algorithms relating to the simulation in detail.
'l'It"sl' nlguriUllll~ have bccn used in the code for the simulations described in the
lakr ,'hapkrs. They lay the roundation for the methods described in the following
t"h;\pl.('u.
63
PartIII
ATPG and 4 Connected
Architecture
Chapter 7
Four Connected Topology
'I'lii:< dmpLer defines the II cOllnected structure, and discusses some of its charac-
7.1 4 Connected Structure and Examples
A 1\ t:olllLcctcd structure is a. graph. If a node is denoted by", its four ports are
rcprci<cnlcd by"'" 1'1, "1, and I~j. /'u and I'r are two input ports And "1 and 0:1
ate two output poth. Each input port is fed from one output pori of a node, and
('adl untput ]lorl is connected to only one input port of a node. A 4 connected
:<Lrudurc can be defined formally. In following definition, we usc \I to represent
Hw naill' :<ct, \"/ the port sel, and ,~' the connection set. If" E \I, "i denotes one
purL of tile tlode I'. Sometimes, there afC several nodes, for example, .r.. II,':. In
"nIL,! 10 distinguish their ports, we me ,fi" !Ii.• > and =i, to denote one of ;r's port,
Oil<' of H'~, and one or :'s.
Dl,finit.ioll Givl'lI 11. finite node set \., It 4 conneded structure (4CS) is 11. directed
gfilph derived rrom '., (; = (1", h') where II' and I~ are derived from the
65
finite node set \'.
\" = il·n.I·I'I·~"·;' II' t.= \'}
E satisfies the following conditiolls:
1. Ifv E I', then lhereexist four nodcs.r.,II . .:c.u· E I', (S"IIW<lftll'~lI1l1la.v
be the sanle node). The ports of I' hav{' following rdatit1l1s: (, ~,)'" ) ,
I~, (":I . .If;.) E I~" (::",1',,) E I~', allli (11', ... I·d E fo:. 1I<~r<' i".i!1 C {O,11,
and i"i,,, E j2.3\.
3. If (::;" I'd E h' and (w;~. "d E I':, then ':" = 11".'
Condition 1 says that each I' E \! has 2 inputs, I~I. "1, and 2 outputs, I"~, ,.". I~,u'h
input port, "1I or Ph is fed hy one alld only olle ontput port, and ,,;,,:h "111[,111.
port, I~l or 1''1. feeds one input porl. Condition 2 gllaranlees tlml each "t1l[lIlt is
connected to only one input. Condition 3 ensures that each in pill i~ f(~d hY'1I11y
one output. Followings are two examples of fOllr cOllllccled slrlldurc~.
Example 1 A directed graph (:1 = (\I'./~·) is givclI by 111(, sds
v= 101
It is clear that ('\ is a 4 connected structure (4CS). Figure 7.1 is 1L flil'l;falll, wliidl
represents the graph (:1'
66
Figure 7.1: The Diagralll [or Graph in Example 1
Fil;lITC 7.2: The Dillgrnm for Graph in Example 2
67
Exmllple 2 A gr;>,ph (;~ "'" (\ 'I. E) i~ given uy i,ll<' ""I"
\. = {ll,1.2.:Q
It is not difficult to verify th;>,t (t'! is fl. 'iGS, LlIkr, 11'1' will lind that {..~ {"rIlls a
square array architedure. The diagram of (;'! can 1)(' n~l'n~sl'l1tcd lly 1,'i"lIn' T.:!.
7.2 Characteristics of 4 Connected Topology
7.2.1 4CB naturally snpports parallel ATPG
As discussed beforc, lo generale a tcst pattern for it giVt~ll faull. WI' lirsl, c1w..k
whether a trial test palLern can deted Lhe faulL. If it call dd"d till' {ault, a I..-sl
pattern is found. If it is shown tllal Lhe test JlI~tlcrll ,'allllol dd,'do til<" faull,
it will lose its status as a polentiallcst for the fault 1I1Il1 l.. ~ ahitllil"Ill"r l,y "lIr
ATPG parallel algorithms, If wc cannot dccille wlldher lIlis trial t"sl pl,UNIl
can or cannot detcd the given fault, it becomes a )lob~1II.i1l1 \,psl. Ilallt'rn. W"
will expand this potential test pattern to two new trial t.:st [lallerlls by I;lIl'ssilll;
the best potential primary input and sellin~ the illPlit IJ and I, r,:s]lI'd.ivdy, as
discussed in the previolls chapter. TIllIS, during UIW virtual :.! ph(~~,~ d'I':k jwri.ul,
each processing element mny accept olle trial t,~st pallerll am: ~t~ll"ra1." tWll tria!
test patterns, Two output ports of n processing d'~lllefit (lrllvirl<, till' thmll~hway.~
for these patterns, In this model, lhese twu polclltilll lest p..tl<:rll~ ow rt.,IV 'lIll
of the notlc without Itny tlelay ur hallie jaw. Sille'~ a hallk jam i~ 1w"i<l",I, II<.
situation arises where some processors caunot ~elld llut,mtial t,:st p:lll"rll~ 'JlIL Lll
"tllcr idle proccssors because lhe hus which clelivers these pa.lterns is busy. A
lIdwork witll this properly is said to be .<ll/llm/ifm-!I'(;/'.
Ill:re, Wl: note another characteristic. Each processc.r has two input ports,
11IIt ill (:,u.:h cycle, a nocle call only process one input. Therefore, tile other input
IlH.s to lJe storcd in somcwhere. Every node has its own memory to slore these
llllproccssetl ill puts. This memory provides a .•(I!.bllfllIIfT characteristic. In our
algoriLhlll, n hinltest pnttern may be allorted because it is impossible to generate a
Lest pattern afLer applying the checking algorithm. Therefore, there is no potential
Lest patlerll sent out. Tllis may make some of the IJ I~'s in some subtree idle. The
topology itself has several ways to make them busy ngain; it can
l. fddl tri:11 Lest patterns from the memory of the (J g
:t. Tl'cdve trinl test patlerns frolll IJ/::s connecteclto its input ports.
'l'lwsc characteristics are very useful. All our designed architecturcs are"
l'l}lIlll'dctl structures. Therefore, they hnve the properties of saturation·free and
s,·lf·lllllancc. Thesc properties will be shown in later chapters.
7.2.2 Isomorphic 4CS systems
In drawing a lliagrnrn for" cOllllected llIulti-processor systems, we have complete
fn'<'dol11 to chaw them in arbitrary positions or shapes. There are no reshictions
"1\ tIll' ~i:w of HI' vertices or on the length or even the dmpe of the edges. These
llrawiugs, although conslrained by the connectivity of the nodes, are very much
fret··form. We nre also free to choose an entirely different representation for the
graph. Bl1t this rr<X'dom presents us with SOllie other difficulties. If a graph
69
Figure 7.:1; Two I.somorpllic Craph"
is presented in different ways, how can we determinc ir 111(' prt)"'~lIl11ti"",, rl'idly
represent the same graph, or the same topology in our '! (;(JJlII,~d,'(lllllllti·IJf"""."'~"f
system?
Mathematicians use the term i.~""/fll'llbi.~1II to IIwtln thc "flllldawt·lIt.nl t"lllal-
ity" of two objects or systems. That is, the object" feally have til\' I',UIl" lImllt-
ematical Slructufe, and only nonessential fcalllfCI' like olljed llauwl' ltti~ltl I>t-
different For graphs, "fundamentally equal" means the Sr1lplts lmv,: """':lItially
the same adjacencies and nonadjacellcies. To rortllali1.') tlti" OJiw':l'l fUTtlt<:r, w.·
use the following definition to define when lwo graphs (,'1 ami (;,/ rut: iMIII""·'II,i,':
phic when there is a bijection" ; II, -, \I~, sucll thal (,,(.r). "(!!J) r ,':/ if
and only if (or. II) E /~·t. The hijection f> is said t" be 1m iSlllJl"rl'llislII.
For example, consider the lWO graphs show II in Figure 7.:1. An is''lfl1Jr]JltisIIJ
7U
Figure 7.01: A NOll-planar 4-conneckd Graph
l)('lw""11 (;1 anti (,'2 is determined by the function (0 : \'((;1) ---+ \ '{(;~) where
,,(,,)~.,..
,,(d)=._.
,,(1,)= ,..
,,(,I = ,.
(I(d =.!I.
H is cI"ar that" is II olle-lo-one :l.nd onlo funclion. All isomorphism from (:~ to
(:1 is l'.iVl'1l by ,,_1, the inverse or ,I.
11Ilrortunak[y, t.he graph isol1lorpllisl1l problem, that is: .lfil'f /I IWII !}!"II/I/,S
1:1 -; (I·,.I·:,) lIud ( ..,~ = (\:!.J~·~l. fill' (;1 fll/Il (,'~ i.~I'HlOI·/lftif·. i.f .. i.~ 1/"1"/ II
"",-j"-",,, 111"/ "ul"j'lIIdilOli/: I; ---> I:: .~'f("h fI",1 j".,,} E I~'I iJuflfl (July
if {f(f1)./(1·1I ~ ,.:~!' is all Opt'll prohlel1l[27). It means that, so far, there is
II.' dli"i~'III, al~orith11l 10 solve the graph isomorphism problem. This problem reo
lllains lll'~'11 ,'\,,'11 if (:1 and (:~ nre rcstr:cted to regular graphs, bipartite gmphs,
[in" !-\raphs, romparability graphs, chordal graphs, or undirected path graphs(27].
11<'\\'<'\',·r. it i~ soh'able in polynomial time for planar graphsl29j. Here it is worth
71
k--;/----~-_=_, I
IJ
/F-------""I
Figure 7..'); A SlIhdivision of /'"
mentioning that NOT all 4 connected ,;ystcllls haY"~ plallar slrud,lIrt,. I,'"t "x,
ample, Figure 7.4 shows an array structured 1\ COIllH'(:1cl! lliulti·pron'ssIlr sysl"lIl.
The subgri\.ph of this structure can be drawn as a sllbdivisiOIl "f /1"" as "I,owli
in Figure 7.5. According to Kuratowski's Tlll'OTClIl[28], 1/ .tInt/,/, I: i.,' /'/11/11/1" if
structured 4 conllectedllluhi-proccssor is not plnnar. J(ufalowski':-; lIW"T"JIl alsIp
implies that a planar graph alld a IlnphUmT grapl. arl' Jlut ts"m"rpl,i",
Therefore, so fnr, we have no clllciclit algorithm to ,ld,:t:L whdlll'T tw," J\ n'lI-
neded Inlllti-proccssor systems afC isomorphic, OT !lOt. WI":J1 w,: d,~siV;1I a l'arall,·1
system, we should pay attention to this prohlelll so as t" llvr,id desij.(lIillg is''lu',r-
phic configurations.
The concept of <l conllccled structure alld s'mll~ rlmraderisti<:s Ij;,V'~ b""11 ,lis.
cussed_ Their properties will he helpful ill lhe Ilcsij.(11 ',f sl',,,-ial IIIlllti·pr,,,:,,ss',r
~y~klll~ lo ~f)lvc th!: l'Iutomatic test paHern generation problem. From the next
(:hapt,)r, we will hegill lo discuss several special structures designed for automatic
lest pallcffl gelleration.
73
Chapter 8
ATPG Using MCBTA
(n this chapter, we first introduce the MOllilil)t1 COlllplete Binary Tn"< Ardlill'c!,lIf"
and its parallel algorithm. The data from cxpcril1lclils wit.h this !If,·I.ilt,,·t.un',
showing their speedup, are presented. After lilat, lUi llUlollOlllOllS M(:II'l'A, an,l
the results of sOllie experiment.s with it arc also disclIs~c(l.
8.1 MeRTA Architecture and Parallel Algo-
rithm
8.1.1 Architecture and Algorithrn
CBTA alld Parallcl Algorithm
In a complete binary tree with height k, every illtcTllnJ lIo,Il' [HlS f~xadly tWfJ
children, a len subtree and a right. subtree. The distance frolll the ront 11l)(l<~ I."
any ]t:af node is L Figure 8.1 shows a 3 '"(}II/pit /, 1';IIJU"!Iln" oclI"t,:,] :j GilT.
We construct a processing arcllitccturc, cillk>J the c.... rlll'ldc l,ill"ry tn:" an:lJi-
leclufc(CBTA). Supposc every nodc in I: CUT iSllpruceHHilig clcmcnl (1'1-:) whidl
is also often called /JlVWI .~.'II/', every cdj;c is l\ cOIIJlecliuli IJetwe.m tw" Ilr"r.':H~"r~,
one oalput linc to the input or anoUler. Every I'H IIlIS tw" .mlJlul lint$ au.1
74
Figure 8.1: Complete Binary Tree Architecture
onc input linc, which arc denoted as (JII, 0 1 and I, respectively, AU PBs work
~YllcllrOllollsly. Data (rmn cach I'/~' arc sent to both its left and right subtrees.
Parallel algori~lltll J can generate aU possible ~. bit values in ~, CBTA within k
Parallel Algol'it,hul ):
set rool IJI~' store vector XIS~ "XL';
set 0.11 olher \lon_root I'f~s to contain no vectors;
IWERY I'e DOES SIMULTANEOUSLY
EV ERY STEP DO
input vector = vector from I;
if ( input vector is nOli_data) do nothing in this step;
r otherwise input vector has the (arm III .. '/I;X;+I' . Xk • /
if( ;== k) '/1"'''; is one possible value;
else
75
scnd UI···"jO.\,~.~ .. .\1: to 1'1': """lw<'l,'d I" fl,,;
send <II" ·<lilS,+~. ,Sk 10 1'1': Cllllll<','h',llll ('I;
It is obvious that this parallel algorithm which rUllS un l' CB'I'A I"lIn J.;t'u<'rat"
all possible ~, bit values with l: steps since eacb step ddcrlllillt'S tJlLt' hit
Modified CBT (MCBTA) and P~ll'allcl AlgOI·it.htll
It is impractical for a circuit with mRny inputs, say !II iUllllh, to IlS'~ III CIlTA L"
genernle all possible vl\lues since 2"'+1 - 1 I' f~'s luwe to he I1S('I], Il is n",:"ssary t..
use a fixed height l' CBTA to generate all po~siblc values for HI bill; (III "> q. W,·
modify oBTh lo get a modified CBTh (MeUTh) with 1hes(' pWIlI'rLi,'s. I,'irst,
every I'e is expanded to have a local memory nncllwo iupllLs, /" /Lilli II. S'~':"II,I,
lwo outpullines of every leaf processor arc cOl1l1cch:,lto nile int"rlllli pr"...,ss"r iU1l1
itself, we designate such output lines as fcedlmck lines or fecdlJack cnllll(~diI)IIS.
For example, Iruf /lor/I' II in Figure 8.2 is connected Lo ;111'"1"/1,'/ I/mlJ :! alit! itsdf.
In a later section, an algorithm is discussed to generale such r:olllll:detl MeHTA
structures. Figure 8.2 shows the MCBTA fOf lhc CD'l'A shown in 1~·iJ;lln'li,l. Tllis
nelwork is 4-connected, and is obviously planar, making il an excdll:llt c(ulllidilt,,,
for VLSI layout. Figufr 8,3 is its Inyout usillg lhc II-tree illgoriUIIIlI(jI.
To generate test paHerns using lhis archilecture, parallel algrlritlllJl J 1II1lsl 1.1'
modified slightly; we designate this as pllrallel algorithm 2.
PlIfallel Algol'ithm 2:
set root IJg store veclor XIX~ ··X,,,;
set other non_root I' 8s empty;
EVERY P/" DOES SIMULTANEOUSLY
I"il;urc 8.2: Mouified COlrlplcle Binary Tree Arch.ilecture
Figure 8.3: J'llyoul of MeDTA using H-lree
77
EVERY STEP DO
receive vectors from I" and 'I;
put them in~o memory according to ~OI1lt" slral_q;yl;
input ved.or = get one vcclor from il.,; 11I~'lI1ory;
if ( input vector is Ilon~data ) do 1I{"1' :,ing ill I.hi~ st,·p;
r otherwise input vector Ila..; the formlti </1 "<I,S'II' ..\,,, 'j
if ( i == III ) "I'" 'I; is one possible valut"
else
send "I ..•,;OXi ... ·! .. \,,, to thc procCl<sor ,'OI1I1('('L"d to II..
send "I .. II, lS, ...~ ... X," to the proccssor COllllcd,'c1 Lo (II
One of the drawbacks of Ilsing tllC MeIlTA cOlllignra~iull is tllllt UW"lIiplit
can bc a bottleneck for the whole system. ThaL is, wllcn scwral I' I':~ ~"ll('rak
their possible values at thc salnc stcp, how call tlll'SC datr, he ouLplIt't III ~'·ll<'ral.
aile PI;; is specified to take dlnrge of COtlllllllnicatillg with thc ontsi<l,'. '['IIl·rd"n.,
all results have to be sent to this I' f~·. Then, this fJ I'; will sCIIl1 tlll'lll "nLsitl,· sb'll
by step. This IJJ~' forms a boHle-neck of the systcl1I. Fortllnatdy, fur l,uL""la1.i ..
test pa~tern generation, lids is lIut the CilliC siJlcc (JIlt: snita!>l.: Imll"TlI is suflicit,"~
for ATPG.
IFor every 1'1,', some sLrategy hn~ to be 1,doplcd to slnrc .1"1,,. Silln. til"'" ILr,' tw" illl"I'
lines, /11 and/1, lwo inpllts may possibly be red at Illesa,,,,, tillle. Sillr.c"L "'u:h ,1"1', "VPry
1'/~talL only proecss o"eil1put vcclor, one vector lUll; to I,., st<>ll11 ill II1e1ll0ry II> boo l""n.,s'·,j
later. Theslrntegy used 10 store the.e vcdor.; dl·dd,'!l the sio""fl"""IIII<:III"'y. 1f"'I""""
slnlCluroe is used, MCBTA i. si,,,ilnr 10 using a brc.~dth-r.r.t ,,1&OIitlllll t" ll"l1''fI,t" nil I",ssibl,'
VIl!Ud. Consequently, 11 Inrge (pQSl;ibly huge) 1lUIOlIItL of melll,ny is t1<:tdrd f"r <:V"'y /".; '"
store JWtcntilll valucs (I.st vcetors with undd.rlllillcd V,IlIlCS, X). If a slark shur.L,,,,: i., 11",,1,
MeHTA is similnr to using the depth-first .~lllcrithllL, lot'll n"'ltloty til" I", K",,,tly ",m·d. Ji"'1tt
thiscDhsideralioll, it is most prefcrable londopt th" ~I.l[k .budll"'.
7R
All AlgIJrithm to GellCrllte MeHTA Structure
To design all ;Jgorithm to generate the topology of a MCBTA, each node needs
all identifier, called fllb"" Each node is labeled based on following rules:
I. gach noele is located on a special layer I which is defined as the distance
from the root node to the node.
2. Root node is laheled as O.
;j The lefl-most node on layer I is labeled as i + 1 if the right-most node on
layer 1- I has h~bel i.
<1. gach node except the left-most node on layer I is labeled as i +1 if its left
Ilode has label i.
According to these rules and cbaracteristics of modified complete binary tree,
IL uode i loca.ted at jih position of llayer has the relationship,
i= 2'-1+i (j =0,1,,, ,2'-1)
In order to connect these nodes, following rules can be Ilsed:
I. 1£ node i "" 2/ - 1 +j is 1I0t a leaf node, its two ;)ulputs are connected to
lIode ~. nnd 1I0dc k + 1, where k '"' 2
'
+1 - 1 + 2j.
2. The left-most leaf Hode and the right.most leaf node have labels 21.- 1 _1 and
21'-2, where /. is the Ilumber of layers in the MCBTA. Both of them connect
onc of their output ports to one or their own input ports, nod connect their
anot!ler output port to the root node.
79
3. For any leaf node which is neither the len-most nor 1.1t<-, righl-Illosl 11,:,r
node, one of its output ports is colllledl'd to 01\(' of ih OWll input IlUr~s
Its another output porl is connected to one nOlle n":l'onling l,u UII' tnk: I;'"r
f = 1,2, , " L -1 and j = 0,2.4, ' .. 21 - 2, node 21 - I +.i is bl hy l,'af
noJe2'--1 +21.-1-1 +(j-l)21.-I,and uode 'i -I+.i + I;s retl hy I,'afn,,",'
2/.-1 + 21.-1- 1+(j - 1)21.-1+ L
ATPG Algorithm using MCBTA
In this section, MCBTA is used to generate a test pallern. We mollify all)orillllll
2 to parallel algorithm 3, Tlle time complexity of each stcp (kpcncls 011 1111'
algorithm ror circuit simulation, the algorithm for checking whether llll' tri'~lllcst
pattern can detect a given faull, and the algorithlll for cxpalldillg It IlOtclitial lest
pattern. As discussed in chapter 6, all of these algoril.hul<; have lillle l:omph~xity
of O(N), where N is the number of gates ill /I. circuit, lIelll:c, the l.imc ..:orllpl"xity
of this parallel algorithm in each step is O( tV).
Parallel Algorithm 3;
set root /'B store vector XI X~·· X"' with 1lIldccidcd.lIaJ;;
set other nOli_root P /:,'s emptYi
EVERY PH DOES SIMULTANEOUStV
EVERY STEP DO
receive vectors from /0 and II;
if ( any of them has 11. detected-fing )
II assume this vector has the form tll·'·U,X;.I.l··· S",
set detected-nag; II the test pattern is already fuund.
80
scml ill' 'II,S,+I'" X,.. 10 processors connected by Ou and 0 1
pul them into nle!lI1ory according 10 some strategy2;
iUIllll vector =; gel one vector from its memory;
if ( input vector is non_dala ) do nothing in this step; /" idle! status *j
r otherwise input vector has the form (I) •• fl'.\'+l· . X,,, *j
simulate the faulty circuit with the input vector·1j
check whether the fault can be detected by (I). ··(/i.\'+1··· S,.. 01;
swilch\ resull o(ehe<:king)
case D1~TECTED;
sd detected-nagj
scnd "I' '/IiSi+)' . X,,, 10 processors connected by Ou and 0 1
<:nsc UNDECIDABLE:
if ( i == 11/ ) send notlling 10 (Ju and 0 1 1,;
dse
usc heuristi<:s 10 select the primary input
with the most potential ability H;
/* suppose .\'+1 is this primary input *j
send III .. 'lIiOXi",~ .• X"' to the processor connected by Vni
send <II ... uil.\'i+2 •• X"' to the processor connected by 0 1;
21lt'fcrlothccX]I]aull.Lionintlieparll.Ue1algorithm2.
:I1l.·f.-rIOlhc5i",ul"lionalgorithminchapterG
'Ut'frrlothc5Cdiollllbolltcheci.inglriallcst paitc.nsillchaptcr6
"I" g"uclal, there should nol be the UNDECIDABLE case since all primaly inputs have a
~lwdfi"d valur Oor 1. For the program romplctenen, it is still considerod a pOtiibilil)'.
';llt·fcrlotl,,'cxpalidingnlgorilllluillehaplerG.
81
break;
CillW CAN-NOLDETECTED:
send nothing to ()o and 0 1;
break;
In order to output the result of A1'PC, Olle ont.llut port shouhl IH' ,1,'lillt'll for
the whole architecture, which can report whethcr the MCBTA has ~"lh'raL",1 a
test pattern, whether it is generating a tcst paUern, or whether il has foulIll that
the fault is undelectable. We accomplish this hy expanding tile fl1l1di,," of lilt'
root node, denoted by I/O 1'/::. I/O pe has the followillg fundiollsj it:
1. receives commands from ouhicle the network.
2. sends commands to internal l'/~s.
3. outputs computing results to outside.
These duties make it a special pe,
MCBTA is a 4 connected structure. Following from the <liscussion in dlll!,I.,·r
7, this structure has the self-balance property anti is saturation·frt,,~.
We can also claim that MCHTA has O(log1 u ) output. time dehty.
Suppose the number a! processors is II, then the height of the MCU'I'A is 1"~If.
If one /' h: finds that the fault can be ddected, it will send tb(~ rt!\;lIlt In LIll~ ro.,t
node, wllich can then output the resull. hi tbe worst case, (J(tog1 ,,) slt:ps art:
needed to propagate this test pattern. This cnll be shown as folluws:
Before the proof, we introduce the COllcept of " "./. Ev(~ry pr',(:(,:;s"r ,:an Ill'
labeled with a value which is called ih h·m-/. This vulilc is defined hy tflt, f"lI"wjll~
rules:
82
I. Tbe rool processur bas fr",,/ fl.
:!. A prOC($sor has 1,,11 { 1+ 1, if ils parent processor in the complete binary
lrt.-e has 1,,,,/ I.
The IOllgest pllth from a processor to the root processor determines the output
lillie delay. MenTA itself has one important characteristic: every path frolll an
11111<,r proc('ssor to the root processor contains one of two feedback connecting
lilies:
I. the left-most leaf processor _ the root processor
2. lhe righl-mod leaf prm:essor -> the root processor
The IOllgest path in llle architecture begins at a leaf processor. Eacll time to reach
a lower It~vcl inner processor, I, steps have to be taken, where It is the height of
this sulltree rooted from this lower level inner processor. For example, in Figure
8.:!, Olle of llw longest paths is
!.:.!i2.~2..=~ ..!..!~..=!_!)~1 ---+ I'E~ ---+ Il/~j ---+ PEIj
The lellgth of this [mth is
6=1-/·2+3
from /' I':" to I' '-'".1. 1 ski); from I' I~·.l to JJel , 2 steps; from I' E 1 to PEu, 3 steps.
III g('lwral, the length of the longest path in 1\ MCBTA from one processor to
the rootllroccssor has
83
edge~. Ellch edge causcs onc timc delay. Here, il. is wort.h n1t'l\!i"nill~ t hat. Ill<'
pllratlel algorithm 3 first checks whether pMh-rns in two inpll! lIorts n>1llaill a It'sl
pattern. 1£ there i& a test pattern, it. will 11(' s('ut ,Iire,-ll)' to 1,\\'" olltl'n~ I'''rh I)"
ami 0 1 , This guarantees that each edgl' ('lUIS('S 01l1y 011<' t.im<' llt-Ia)' ,11111 HI<' kst
pattern can reach at the 1/0 I'/~' a~ (tuickly as ]l()~~ihl,'. TII,'n,f"n', MeliTA lias
()(Iog~ /I) oulput time delay.
8.1.2 Empirical Results and Aualysis
To evaluate lhe performance of MeBTA, (:\lcr~' ]lrot:('~s"r lIS(''; a paraU,'1 a'~"ril1l11l
wilh tIle salile heuristics as discussed Lefore. At firsL, MGHTA I"outaills "Illy "m'
processor. The spe~d of thi& MCBTA forms 1. 11I\sis for sIW...1 COl1l1mris"lI. As
more and more processors are put into MCB'rA, tile ratio of Ulis basis sp'·l·d 'Iud
the current MeBTA's speed determines the spcedup of the curr<:II1. MGIITA.
MCBTA is a complele binary lrt:e. Th,: 111l11lhcr of pro"essors ill MCIITA is
210+1 ~ 1, where t· is lhe height of tree. ror I: == 0, 1,2, a, ami ~, we..,1lI t't)]lsLflll'l
5 MCBTA multi.processor systems.
Five \lery hard·lo·deled faulls in the circ:llit C~:~2 w<:rt, liulllllilL"d lo ('l,,:h
system. Empirical results were gained l.nd arc shown in lht: 1"i~l(T" XA aud 1"i~l(w
8.5.
Four of these \lery hard·to·detect raulls arc provc,1 tf) Ill: reo1l1l1rlalll !ly lI",s<'
multi.processor systems'. Their spec/llli' CllT\I'~S art: sh"wn ill 1"il4llTt: XA
One of the 5 very hard·lo·detect fllUlts was (r'1I1111 to 1)(, dd,:dal.l,:, alld its
test pattern was generated. Figure 8..'; shull'S lin: spcedup cur\l" ft'T til" fault II
'the lndhod LoproYe the redundancy w;u;disr;u"""d i'l Cha!,t,,, ';,st",.li"" f;.r;
20
Speedup
I'
10 15 20 2,.
Nn1l\b."o(l'rocc""o,5
Figure 8.'1: Speedup Cor 4 Rcdunclilllt 1~,~lllls ill MeHTA
15 20
Number ori'roccs.~o,"
Figure 8.5: Speedup for an IrredundilTll Hard·lo·dclccl '·'/lull ill MGIJ'J'A
,,,
j:.; very illter('~tillp; to noll: that there arc super-lillear speedups, that is, speedup
i~ p;r,:all'r than lhe "lIIuher of processors, when the number of processors i~ 3, 15,
and ;11, rc~p.:etivcly,
8.2 Autonomous MCBTA Architecture
1\11 J.'r"cessor~ in MOD'!'A use the same heuristics to guess the ~best" undecided
input so as to find a test as early as possible, We call such a MCB1'A a/lillY
MCHTA. Testability is olle of the most widely used heuristics since it i,; considered
to I,C' 1m inherent property of a circuit, and is determined entirely by its structure
1171. 'l'Iiis allows cstillli\tion of circuit testability before test generation. Becau:,,,
"f tile nllrHoxiulatc Ilature of the analysis, most testability analyses results have
p"or accuracy.
H SOllle cOll1plcmentary heuristics afC also used by somc pfocessors, a mixed
hcnri~Lic MOOTA can be formed. It. is quite possible thal a test pattern can be
rOHlll1 ltluch rasl,er titan allure MCDTA.
8.2.1 An:hiteet.ure
H s('vernl processors form a pure MODT:\, one f'l> one f l , one Ou, nnd one 0 1 are
opcnc,1 to onhide. We call5uch a MeDTA a pure MCBTA module, or auto!:omous
Men'I'A HlOllule. Figure 8.6 shows onc pure MeHTA module which consists of 3
Il rOt'('ssors
It is ('alled an autonomous module because it has two characteristics. Io'irst, one
lLllldlll(' has only onc polit), to select the lIIost potential undecided input. Second,
t!wrC:He cydcs within thc module. This property has the potential ability to find
86
Fignre 3.6: A Pur<' t.ICB'I',\
n test pnttern carly, or lo get l\ stop cOllclusioll quickly, in l:asC 1111' ~,·ll·dill~ p"li,'y
is the most suitable OtiC for l\ givcn fault.
Figure 8.7 is lUI autonolllOUs MeHTA (AMCIlTA). From till' nwrvi{'w, illl1ls
the same topology as a pure MCHTA. The dill"erellce is th;\l i~ cOlisists "f ilU-
tonomOlls modules instcad of processors and c;lch lllodule [;St:s nil(: of LIlt' t.w"
heuristics as discllssed ill section 6.5.
8.2.2 A Parallel Algorit.hm
The parallel algorithm for AMCBTA is exactly the same as the pamllt:l al~"ritl\lll
for MCBTA, a.s discussed ill last section, IHlml/"/ tlf.ll"rilhlll .'1 ill M:di"ll 13.1,1.
The output time delay of AMCBTA is ()(Iog~ Ii). SIlPIIOSC" is tIll' Iluruh"r "f
prOt;essors in the system. Sint;c every modulI: t:clIl"illS thrCI: flrfl.."ss(jr.~, we ,:all
use a t· AMeBTA (I" = logft) to generate test pl.tter/ls. If 'Jill: 1'1'; finds lllal
tht fault call be detected, it will send the result to tilt: host 1;IOdulc, wlliclt ';il.U
87
Figure 8.7: Autonomous MeRTA
inform tile outside. In the worst case, O(,{·:l) steps are needed to propagate this
It.'st paltcrn. The explanation is similar to that for MCBTA discussed previously.
8.2.3 Empirical Results and Analysis
To evaluate the performAnce of an autonomous MCRTA, we usc the same simu-
lating mdhod as for MCUTA. At the first, AMCBTA contAins only one module.
'fll(' sp(:cd of MCDTf. with only one processor is still a bll$is speed. As more
lUlll more modules nrc pul inlo AMCBTA, the ratio of basis speed to thr current
A~"CB'I'A's ~peed clcl.ermincs the speedup of the curreut AMeBTA.
/\MCUTA ;s a complete binl\rY tree. The number of processors in MeBTA is
:1, 2k 11 - I, where ~. is the height of tree. For k =0, 1, 2, nnd 3, we can construct
·1 t\J\'lCBTA llIulti.proccsstJr systems, which conbin 3, 9, 21, and 45 processors,
n'spel;t;vl'ly.
Fivl' vcry hard·lo·delect faults in the circuit C432 were submitted to each
88
----~.-
10
'""""~~~~~~-~,----­
l~ W D 30 J5 41)
NlIlIlb~r of Proe~.so..
Figure 8.8: Speedup for" Redundant ~'lullts in AMCIITA
system. Empirical results were gailled and ar~ shown ill the FigllT<' lUI, lUI, alld
8.10.
Again, four of these very hard-to-deted faults arc prolled Lo Ill' T<~lllIlIll;lIlt 1>y
these multi-processor systems>!. Their spc.."lllp \~lIfV(~S are :;!I0Wll in Fil-\Ilrt~ lUI.
The anomalous phenomenon~1 of the automatic tesL pattern gl~lIl'rnti,," 1,,,,1>1"11I
appears on only one fault.
One of the 5 very hard-lo-dded fnults is fOlllld to I)(~ Ildl~dal)h~, alld its klll
pattern is generated. Figure 8.9 and 8.10 nre the speedup curll(~S f"r tll(~ fanlL. It is
amazing thal there are super-linear speedups loa, whell t1I(~ fllJltIlJ(~r of Jlroel~ssnr"
is 9, 21, and 45, respectively. And they art: ellen better lhan the results ill I,'il-jure
8.5. Since complementary heuristics llJ are used, tlley e1irtlirHlll~ tile rruitlt:.ss s"arch
space quickly. The redundancy is proved much more quiekly thlLlI ill Ml:ll'l'A.
>lrefer 10 Chapter 6,s<dion 6.6
I'refer 10 Chapter 5.
'''Iefor to Ch"l'leI6, Sttlioll 6.'1
89
::~:~[-'--~:J
"""'"'' :::::-
15UnU
IUUUO
r,OOll
o _~... -----I....--L
° G 10 15 W 25 30 35 40 U W
Number ofl>roce••ors
fi~lIrc 8.9: Speedup for an Irrcdundant Hard·to·deted Fault in AMCBTA
15 W 25 30 35 ~ 45 W
Number ofPtOeel$Or$
l~igllre 8.10: Scaled Diagram for Figure 8.9
90
iRl~~
Figure 8.11: Experimenl Witl, I f!.·lo,11l1l'
BBI---~~~~---,
~BWW IlWL TT_
Figure 8.12: Experiment With 7 Mouub
Figure 8.13: Experiment With IS Modules
91
Fil;llre 8.1 1,8.12 awl 8.1:\ s1IOW the mapping between modules and heuristics
1I~"tl in nur (~Xl'(:rimetlh. There fOUf nolations are defined:
HB . - Best controllability aud Dest observahility
BW --. Bc~t eontrollnhilily and Worst obscrvabilily
WB - Worst controlln.bilily and Best observability
WW -- Wor~l controllabilily and Worst observability
I~adl lIlollule is assigl\l~d one hellristie at random.
Gonqmriug experimental results between MCBTA and AMCBTA, we can con-
dude lImt AMenTA has !JeUer linear speedup than MCBTA, and it I\lso has
jl;ft'akr :;lIp\~r·lillcar speedup than MeBTA has. Therefore, at least for this c~r·
cu;l, AMCB'rA ir. much morc attractive tha.n MeBTA.
92
Chapter 9
ATPG Using Square Array
This chapter introduces a.nother system: SItUaTe army aT('hilcdur<', 1~)q,erillll'lll.;11
results will also be presented for this ardliteclufc.
9.1 Square Array and Its Parallel Algorithm
9.1.1 Square Array Architectnre
A square array system consists of 1/1 processors, called II~ SQAIUI.AY. In lUI II'~
SQARRAY, each tOW or column contnins 1/ processors. I~ach prlll:l~ssor hilS I,w"
input ports ( III and II ), and two outpnt ports ( Ou lind Or ).
Figure 9.1 shows the symbol for one processor. gvcry output ()" is ':""lwdc:d
to ib right neighbor's input port lu. The rigbt-mod processor will mUlled its ()"
to the luofthe left-most processor on the SRme row. Every output ()I is cOllm:dc:d
to the input port /1 of its neighbor below. The lowest procc.~s()r ill a CUIIlIlIIl will
connect its output 0 1 to the input II of the processor at 11,,:) lOfl .,f lilt! r.UJIlIlI/l.
Figure 9.2 is a square array system, which consists of ;j~ processors.
Here, we may ask whether SQARIlAY and MCI3TA arc isomurphic. w(~ SllY,
they are not. As we discussed in chapler 7 section 7.2.2, ;~ SQARRAY is a 11/111-
I"
',_y-o
(J,
I~igurc 9.1: The Symbol for iL Processor in SQARRAY
Ifigure 9.2: R~ARRAY with 9 Processors
94
plall1l.f 4-conneded I;raph. It is clt'ilf tlml MeHTA is planul sill"'- it "<In I", ptlsil,\'
Ooorplaned on a plrme without allY inL...r~c.,tion, A m'n-l'lanar ~tal'h lUllll' planar
graph are nol isomorphicl .
9.1.2 Completeness of Square AlTay
'1'heol'clll Givcn llny integer", II'! square ilrray system can gellt'T"t,(' ;111 :!k l',,~sihl.,
values, where l' is an integer.
Pm,,! From the topology of a square army ~ysh'llI, if tllcn' i~ [( llinary
value 1'1'-1, ... ,I'k_i, .\"1·_i_l, "".\0 throngh 11 hori~.olltal C(lllll,-r.l.in~ lilli' (r"w
I~/,'), If the connecting line is a verticallillc, /'1'-1,' ',1'1_" I. X I._, '!, .. , X"
can be generated at the l'/~j,
For convenience, in the same row, we call Lhe [crt-1I1ost proce~l\or n.s th" IH'igh-
bor of the right-most processor. In the sallle colulIlTl, the top-most processur is
considered as the neighbor of the lowest processor, gllch prul;,'Sl\Or is t!wrd<>t<,
connected directly to all of it nearest neighbors.
For any 11,0 ::; " < 21', " can be rellrt,':icntcd hy a l' hit Ilillary 1I1111;!",r "1 It
(lk_2, ..• , "1). Then, we can find a palll from I' Hu to 1'1';, ( () ~, < ,,~ J. We,sl'Y
''ll, where Ph'''k_' is the 1'1,' to the right of 1'1";' i( "k_1 == O. Otlll~rwise~, 1'1':"., is
the fJ/~' below 1)/~'u if "k_l == 1. For "/~;,_" (j == 2, ... It if "I... , -'-" fl, I'/~',,_, is
the {>I;; to the right of I'/:;;k_'+'; if "k-J = I, I'l~,,_, is the 1'/': 11I.:low 1'/;"'_01,'
'rcfertoehapter 7,5edion 7.2.2
1"':,(, which gCllcmtes 0110.
TlJi,. Ulcorclrl guarantees that SQARRAY will find a test pattern for a given
fault ifilexish.
!J.1.3 A Parnllel Algorithm
In this scdioll, SQA RIlAY Ilses the Ilame parallel algorithm as MeHTA and AM-
CUTA, ILo discussed in chapter 8. The time complexity of this parallel algorithm
ill ulle stcp is ()(N), sillcc ~hc checking algorithm, the simulation algorithlll, and
Llle cxpILmling tdgoritlllll llll have O(N) time complexity.
In SQAIl.RAY, the output time delay is O(JiI). Suppose /I processors are
us(~d, we construct 1\ (JiI)~ SQARRAY. If one P 8 finds that the fault call be
deledc!1, it will send the result to the 110st processor, which can tell the outside.
If t\uy f'/~' fim!s II test pattcrn, this pattern can be propagated to the len-most
processor within vii steps since there is a path to connect aU processors on the
,.ame row. Similarly, this pattern can also reach the top-most processor witllin
0' steps. Therefore, within .,(Ii + .fiI time dep, the test pattern can arrive at
the corner of left and lop, which is the processor in charge of communicating
wilh outsidc. HCllce, we say that O{ VIi) steps are needed to propagate this test
9.1.4 Empirical Results and Analysis
'1'0 evaluate the performance of a. square' artilY system (SQARRAY), First, we let
~hc sy~t(,'llI contain only one processor. The speed of this SQARRAY forms a
96
50 ~-,-·-'-·--r·--~ ,-
45
40 I.ill~'lr _~
~~'
o '----~'---.I~.-- ,-
o 5 10 15 20 U 30
N"mlwrnrl'rnr"ssors
Figure 9.3: Speedup for il Redundant l~alllb ill ~qAllIlAY
basis spced. As mure and morc processors arc put into SQAIl.BAY, the ratio of
basis speed and currcnt SQARHAY's speed deh:rmille~ llll.~ slwcdllll "f till' ,'urn'1I1
SQARRAY.
25(5~), 36(6~), and 49(7~), we gol thc speedup curves SIIOWIi in 1"igUfc !J.:l.
Again, five very hard-to·delecl. fnults in the circuil Oil:l:.! w"rt: Sllhllliu,·,j I."
each system. Empirical results were gained aul arc show I! ill tlll~ following ':Ur'."".
Again, [our o[ these very hard-to-deled faults wert: pr()v(~11 lo Ill' r'~fIIllIlIlUll
by t11ese multi· processor syslems. Thl:;r speedup CIUVI~S nrc shown in Fil\lIr<~ !J.;I.
They are quite close to linear slleer!up,
aile of the 5 very hard-to-detect fnults is fUllUd til I)(~ ch:lcetable, alill ils ksl
pattern was genera.ted. Figure 9.4 is the spccdujJ curve for tIll: flll1lt. 'I'11(~r,~
arc super-linear speedups when lhc nUiubcr of processors is ii, Hi, :11), IIl1d 4!J,
respectively. Compared with Figure 8.5 and I<;gllrc 8.!J, Figllrl: ~J.7 i~ beLLl:r UlIlJl
20 25 30 35 10
N"l)lb~1 of Processors
Sp,·,',I"I'
::::[.--~
ROil
1100-
100-
~~- l~~ ..
, _"~~~~.lL==""'-==.::.:..cJ
, 5
Figure 9.'1: Speedup for an Irrcdundant Fault in SQARRAY
,0 -
::f"
l~kI~~ .
o 5 10
L'"", [ .
211 25 30 35 40 4S 50
Nllmber of Processors
Figllrl' lUi: Scaled Speedup for the lrrcdundani Fault in SQARRAY
98
100 ,-~~-~~-~-,----,--. •. _..,._.-
90
80
Speedup 50
"30
20 ~__• :::::=--
\.
o""''''''=-~~-~~-~-
o 10 15 20 ~ 30
Number or rr()("~ ••ors
I<'igurc 9.6: Speedup in Complementary SQAIUlAY (11 TI:dllllllalil fa\lll~l
the curve in Fil;ure 8.5 hut is inferior to the curve in l~igllw B.n.
Ifwecomparc the resuils wilh JI,{CI3TA, WCCitll filllillmt SqA IlltAY lias hdlo-r
speedup than MCBTA, and it also ha.s superior sllpcr-lillcM spl'l:dup rdati'll' t"
MeBTA, at It'nst.
Por this example, if we use complementary heuristics for cadi pro.:t:s,,<>r, call
the performance be improved?
Curves in Figure 9.6 are ohtained fWIlI the silllllialion.
These curves tell us that cOl.lplemcnlary Ilcurisljr" can speed lip the prnccs.ill~
for some faults, and ihey can also prouuce extreme supcr-lill{:lH sp<.:cdlll' rdativ,:
to lhe use ora single heuristic.
99
2000
Sp,·,·,llll'
500
15 W ~ 30 35 40 45 50
Nllmb~r ofProcessol1;
FiSurc 9.7: Spccdup in Complementary SQARRAY (an irredundant fault)
100
PartlV
Conclusion and Discussion
101
A lll<tj<lr harrier to lhe fllll exploitation of the capabililies offered by VLSI is
the prohlem of the increased cost of testing the complex devices immediately after
fahrkntion. Tlie need to test devices results from imperfections in the fabrication
process IJroducing a wide range of defects in the devices, for example, pin-holes
ill tllC gate oxide, shortcd or open interconnect lines (polysilicon, diffusion and
lrId;ll), colltad hole defects, crystalline defects on the wder, etc. There may .. 'so
1)(' :-;omc llesign faults, such as a gate output having insufficient drive capability
for its output capacit1mce, which may not be identified by the simulator, unless a
post layout simulation is performed; although simulation may have been used ex-
tCflsivcly many design faulls may go undetected since simulation is an incomplete
proce~s based on 1111 abslract model.
Attempts to rcducc tile costs of testing have becn made by developing more
sOJlhisli<:ated gate level test gcucration algorithms, by pedorming t.est generation
at higher levds of abstraction, by exploring paranel processing techniques, etc.
'I'h" r<:sults of these work still show that developments in the solution of the
ll'stilll-: problem do not keep up with the pace of the development ofVLSI devices.
'I'hi:-; implies t.lml much lIIore work ~ho,tld be done in this field.
Olle method is 10 study the lypes of architecture which can powedully support
ATrG algorit.hms, or st.udy what kind of connection nmong processors can most
I'fiicicnlly ~pec(1 np automatic test pattern generation.
I II t,his report, we propo~ed lhree interconnection mcthods lo speed up ted
I'r~lt.ern gencration, The experiment results show that for a r~dundant fault, the
squan' array structure has more linear speedup than MeBTA if the same heuristics
arc itSI'll. Fc>r an irredundant fault, SQARRAY more likely reaches super-linear
102
speedup than l\'YCBTA does. If autonOll\OUS MCI3TA is IIst'd, t'\'l~1\ f0r r~o,lund:lIlt
faults, super-linear speedup can occur. For all irrCtltllltlJtut. fault, til<' sl"'l'du]'
reaches incredible vnlues. For example', when th" 111ll11h('r "f l'r~ll"l'ss"rs is 21. lIl.,
speedup is greater than 3500, a factor of auolll 170. '['IWM' results an' ('\'('U I",ll.lor
than SQARRAY, showin~ that autonomolls mdllO'[s arc moTt' ••tLradiVt' Unlll
pure methods.
Of course, we have no way to guarant.ee super-linear ~IWt·,hlll fur l'v,'ry fault.
Complementary heuristics can oflen enhance super-linear sp,'cdllp. [n atl art"1li-
lecture, what kind of combination of heuristic inrormation 1II0st likdy r"adws
super-linear speedup? This is olle of the illtercstin~ problems wllidl will be illVt's·
ti~ated in the future. 15 it possible to implement these lll\mllcll'r(Jl"essill~sysLellls
on n single VLSI chip? If so, how? What nrc the honnds for area, til11<', alld arc'a
time squared? Thcse arc exciting problems to he rcscJtTclll~tl. The key to Lh.,s,·
problems is how to design a very elegant checkin~ algoritll11l So as t" illlpl"IIICllL
it on a sma.1I area, ami how to solve the storage problem for circllit dest:ripli"l1
since a VLSI circuit contains so many logic gates. Once tlJesc IlrnltlclIIs ilrc s"lv\"I,
automatic test pattern generation will be much less eltpcnsive thall uIlwmlolYs. As
well, similar algorithms can be developed ror many othcr NP-colllplclt, l'rolllt'lIls.
103
Bibliography
PI Zhilllin Shi, and Paul Gillard, II Jlllmlld 1''''''T.~.,il!.fJArrlJikdrllY f:,' ,1I.'}I'I";/II111
flO,' ..1'1'1'(,', Canadian 7th Annual High Performance Computing Conference,
.Iullc.19!13
12] Zhimin Shi, and Paul Gillard, q,wlI/i/fllilir ,1/1/1I"f>/".h /fll" flrdrllldrwl"lj Irlrll/;-
}i,"tll;"" 11/ ..1'1'1'(,', The Third International Conference for Young Computer
SciClllish, .lilly, 1993, Beijing, P. R. Chin&.
1:11 Zhimin Shi, and Paul Gillard, /f.,illy 8ql/(liT A,TtlY Sln/dIIlT.' ill Pund't"
:l TI
'
(,', 199:{ Canadian Conference on Electrical and Computer Engineering,
September, 1993, Vancouver, Canada
1,1] .r.P.Roth, /)i1'flIlO.,i" II! .·\II{fllrllll,1 Hlil",.,..,: tI ('II/nt/II.' (/Ild /l. Mcllwd, IBM
.1.lksc~rch and Development, VoLlO, July 196';;
[r.1 I~ichclbl'rgcr, E.B. lind Williams, T.M., ,I 1,0.l/il" IJc.~i!11/ Slrlld,llY: JOI' {,HI
'fiAI/I,iliIN, Itilil Design Automation Conference Proceedings, June, 1977
Ill] .Icfrrey D. Ullman, ('ml'/lIIlafi",1II1 .·I.~IJn·I.~ oj "'"SI, Computer Science Press,
ISBN 0-9148fltl-9S.I, Hl83
104
{7] Melvin A. Breu,-r, Arthur D. Friedmllll, /Jill!!,,,, ... i.,· r /,'di"M, n....i!1I' "j /lit/'
ill/I SII_"hm, Computer Science Pr('~s, Inc., ISBN ll_f11'IS!1·1_f"_1l, nl,!;
[8] Robert H. Klenke, Ronald D. Williams, 'lUd .James II. Ayl"r, I'omlld-
1'''(lI"r ...... iUlj '/il'hniljll's fill .. llIlmulr/i,· .,.,.,/ l'rlll,"/"" (o', "'/'llli"", II~EI': (:0111'
puter, January 1992
[9] S..J. Chandra and J.H. Patel, I:'f/II ,.i"'t'1I1r11/':I,,,I,,,,/i,,u "j '/i-sl"I,i/i/il .\1,",,,"/,'."
jo,. 1'1',,/ (,', /If'l"fI/i/l/l, IEEE Trans. Computcr-Ailbl DClii~ll. Nul. X. No.1, Jan.
1989
lIO) S..J.Chandra and J.n.Patel, '/\',,1 (:"1/1,,11/;11/1 ;11 rI I'I/I"rlllt! ,'I'III·, ..... ill!! /':111';'
!"(lIi/lnll/, Prot, IEEE lnt'l COlLf. Computer Desigll: VL51 ill Computer... allli
Processors, CS Press, Los Alamitos, Calif., Dreier N087:!, 1!l8S, 1111.11.14
[II] Akira Motohara., J(enji Nishimura, lIideo Fnjiwam, anll Isar Shrakawa, I
/'III"fIJlrf .')1'/'1'/111 /0'111" '/~'''I/'(JIII'I'/l (("'"'I'rl/illll, ICGAD-Sfi, 191:1(;
[12] H.Fujiwara and T.Inoue, O,II;lI/til (,"1I1wl"l'il!lllj '/isl r:lllfl"ll/i,," ill 1/ IJ;."-
I,.ilnl/I'd Sy.../r·1 tEBE Trans. Computer-Aidcd Design, YoU), Nu.lI, AIII;.,
1990
[13] S. Patil and P. Banerjee, ,I I'"mll'} 11I"IIIII'/;-I/I/I/-I111l1lld 11/'11''''1111/11 j'lI" ." ... /
(,'1'/1(;1"1I/;011, Proe. 26lil ACMjIEEE Design Automation CouL, .June Hlli!J
[14] S. PaW and P. Banerjee, N/ldl l'III'/ilifJI,i"!1 I,,,.• ,,,.~;/I "" 11l1'.'I1"f11"J 1"11""11,,
'I't.~1 (,'I"'/·'1,li",,/NlIlll Siml/llllit", /';II'li'"fUllllt'llI, Proc. 1!}!l!J InCI 'lcsl C'JIIr.,
CS Press, Los Alamitos, Calif., Ordcr No. 1952, 19B!!, PII. 711i-nfi
lOS
Pr,1 Tracy J,lLrrll"(~, '/i,,'I'II/hl'lI (:"IImliotl /f"ill!l!J'/(J/'III1 S"/j,,,ji"/li/il!J, IEEE
Trrllls. 011 COMPUTER·AIDED DESIGN, Vol. 11, No. I, January 1992
[Uil l{a1.lIo [wamp., (:"1/11'/"/11"/1/111'.1) 11/IIIIYIf!"!w" '(J ('Nt-' U'WIIlIll eq,/{/li(J".~, Proc.
of the .lapan-US .Joint Seminar on Di~crete Algorithm and Complexity, June,
I(ll'lD, I{yoto, .Jr~pIlTl
[171 Hilleo 1"lljiw'lfa, /'''f/It· T""lill!l '111// 1Jf'.~i!111 fill' ''''."llIbilily, The MIT Press,
I!J8r,
l[il) Miroll Almllllovici,cl al., IJ!JIlfIl"i,' Nllltllllltll,I'yld"/lliji<'/lli/l/l i/l.-ll/lillfwlif
'11.,,/ (,',lIIll1li,,,,, IEEE Trans. on COMPUTER-AlDED DESIGN, Vol. 11,
No. a, March IB92
[I!JI S.A.Cook, Til, l'IIIIi"/"rilf) "ft/,fOIV'1II JI/'II"jllf/IIIYII'('IIIIIl''', Proc. Third Annual
ACM Syllip. Theory (If Computing, 1971
1201 II. Fujiwara and T. Shimono, 0" 1111' 11I'1'I''''l1IliOIl of /r',,,/ !I1"1'fm/io/l II/YO-
I'il/wl.", [E~~E TraIlS. Compllt., VoL C.31, 1983
121) P. encl, .-111 i/ll/,Ii"il "IWlflfllllilJll III!lo";l/lfll 10 ff'/lr/YIh 1,."/,, for /'OlI/flil/fI-
Ihl/w//,,!!i,' ,·ItTllil.", IEEE Trans. Comput., Vol. C-31, 1981
[221 lI,l.lI. Schlllz, et ai, 8,,,.rul,.,,,· ..I ili!lhI.11 fllidrlll "lIlollwtir Ir.~1 }llItlCI'" f/"'lfT·
"Ii,," .,!/."I, 1ft, mEtE Tram. Computer-Aided Design, Vol. 7, Jan., 1988
12:1) David Bryan, n, •. /.')('..1.<;'8;; bffldflll"I'k 1"i,1'lIits /l1If1 '/I'llilil fOI'll/ll/, email:
hryan@mcllc.org, 9-30·88
106
[241 Vishwll.ni D. Agrawal ami Silarmi C. Sdh. {1I1"l'i(//: 1', ..;( (;, 11"~lli,," lilt"
\"{..';/ Chip..;, 'rhe Compuler Socidy, ISBN O-SIi\I'·$7S11·X
[25] Gordon Russell and Lilli L, Saycr~, ..1.lI'ulI... d Sill/llfuli'l/! 1I/1fl '1i ..;1 ,II"/'",{,,I,,-
fli.".; 1m' l'f..'lllk.;if/lI, Van Nostrand tkiullold (Inkrllalillllal), ISBN n·7,l7Ii.
0001-5
[26} Kwang-Ting Cheng, Vishwani D. Agrawnl, t:/liJi"I,I/d/",,/,.;!ul' 1"1..'>/ ,"'>illll"'/-
lioll flilil ·/i,.;/ (;r'1l1mliIlH, Khm'er AcmlClllic PlIlllishers, ISBN n_7!1'l:I_~Hl'lr,_;1
[27) Michael R. Garey, David S. Johnson, ('OM/'IITI,:".,',; :1:\'11 INrU:1( T·llllI.·
I"IY, ,.\ Glli,fr' III lin 'I'IH"f,ry oj NI'-('IIIII/I/rI,'IIl' .;.,·. W,II.I,'nl~I~M!lN /lND
COMPANY, ISBN 0-7167·10<\4-7, Hl79
[28] Kuratowski, G., ,"'"" If l'm~ln/ll d, ..; ('"urlH''; (;/lIII"J,,-..; ," .,iJI"",,!/i.. FllIlll
Math., 15(19:10), 271-283
129] Hopcroft, J,E., and J.ICWong, /.il/n,,. lilli" '''.'I"rillllll jor i,';fJlI",,·pl,i.';/IJ u!
}1""IIJ/' !l1"II/'''''; (/'n lil/lhllll·.') n""I'I) Proc. Btll All" ACM Sytllil. "II 'l'lwmy "r
Computing, New York, 112·18-1
po] Parchomenko, P.P. (cd.) '/""·"lIi...I1 nillyIlIJ8/i,' /-'1111.1,1111I III,,},.;, 1~II"rJ.\jya,
Moskva,
131] Williams, T.W. and Parker, K.P. /),.~i.'/I/ /"1' 1"."lIllilil!! /1 ."!I""'!!, I'rl)(:.
IEEE, 71(1), 98·113, 1983
[32] Roth, J.P. IJill.'JllfJ.;i..; oj '1IIIoHHI/,,, jllif,,,.'.,: 1/ "'llrfilfl.~ 11/1" 1/ m,.,flfld, IBM ./,
Res. and Dev. 10, 278-81, 1966
107
1:1:IJ Hlridl,KG. alld Bakcr, E.T, (.'IJIU"IITtlll.,illlfdlllilill "j II(lU"!! id,"lim/lli!/iltl!
,,,,,,,,,,.1..,, COlliputer, April, :J9-i\4, 1974
1:141 Burgc~~, N., l)amj'Cr, R.I., Sha.w, S.J. and Wilkins, D,R.J, !-ill/I/._ ,/lid .r'/lill
'H,d., ill NA/OS (.'in'lIil.' i/ll/l/II'I Oil OFT, Proc. lEE, PtG, 132(3), 82·lJ,
I:!!il .Idfrcy I).Smith, /)....,;.'/11 tilltl,llltd.'l.,i.,· nj:lI.(,'Of(f'I'IIMS, PWS-KENT Puh-
li~hillg Company, 1989, ISBN 0·534-91572-8
1:161 Sun lI1icroSylilclllli, I'n'i1l1l/1l/lIill!l Iflilili,., [:i I.ibm";,-,<, Part Number: 800-
:U11J7-1O, llcviliion A of 27 March, 1990
lOB




