Bit-driven logic: a style of digital logic for VLSI design by Gerpheide, George E.
BIT-DRIVEN LOGIC:
A STYLE OF DIGITAL LOGIC FOR VLSI DESIGN
George E. Gerpheide 
VLSI Research Group Memo No. 1 
22 A pr i l  1980
Department o f  Computer Sc ience  
U n iv ers i ty  o f  Utah 
S a l t  Lake C ity ,  Utah 84112
This research  was supported in part by the National Science  Foundation 
Grant MCS78-04853-
i1 In trod u ct ion  o f  B it -D riven  Logic Concept 1
2 G-Nets -  A Graph Model f o r  B it -D riven  Logic 4
2 .1  D e f i n i t i o n  o f  G-Nets 4
2 . 2  Using G-Nets f o r  BDL 6
3 N-Bit  R ipple -Carry  Adder Demonstrating P ip e l in e  E f f e c t  9
4 Performance o f  RTL Versus BDL Operators 11
5 Throughput o f  A c y c l i c  G-Nets 12
5 .1  Basic  Concepts 13
5 . 2  G-Net Module C haracte r iza t ion  15
5 .3  P a r a l l e l  combination o f  G-Net Modules 18
5 . 4  S e r ia l  compos i t ion  o f  G-Nets 21
6 VLSI Implementation o f  G-Nets 25
6 .1  Mapping G-Nets in to  Standard SLA Programs 25
6 .2  The G-SLA 27
6 .3  Mapping G-Nets in to  G-SLA Programs 29
6 .4  G-SLA Program fo r  an N-Bit R ipple -Carry  Adder 30
7 Array-Organized M u l t ip l i e r  Module 32
7.1  Basic Operation 32
7 . 2  G-Net D escr ip t ion  o f  M u l t ip l i e r  33




This memo d e s c r i b e s  a new s t y l e  o f  l o w - l e v e l  d i g i t a l  l o g i c  des ign  c a l l e d  
B i t -D r iven  Logic (BDL) which may prove a t t r a c t i v e  f o r  the des ign  o f  VLSI 
c h i p s .  BDL i s  an a p p l i c a t i o n  o f  speed- independent ,  d a t a - f l o w  ideas  to  a very  
low l e v e l .  I t  has the advantages o f  good l o c a l i t y ,  c l o c k l e s s  o p e r a t i o n ,  and 
inherent  p ip e l in in g  lead ing  to  high throughput.  The G-Net,  a graph model 
s im i la r  to  the Petr i  Net, i s  presented to  represent  BDL c i r c u i t s  and the 
throughput o f  a c y c l i c  G-Nets i s  in v e s t i g a t e d .  The concepts  o f  skew and 
t h i c k n e s s ,  and the use o f  shims as wel l  as drawings in Time Normal Form, are 
presented to  enable a c y c l i c  G-Nets t o  be designed which support maximum 
throughput.  The s u i t a b i l i t y  o f  uni form-array  concepts  f o r  VLSI
implementation,  p a r t i c u l a r l y  the SLA, i s  shown by means o f  examples,  the most 
involved  o f  which i s  an i t e r a t i v e  array m u l t i p l i e r .
1 In t r o d u c t i o n  o f  B i t -D r iven  Logic  Concept
This paper presents  a new and ra ther  unusual s t y l e  o f  l o w - l e v e l  d i g i t a l  
l o g i c  des ign c a l l e d  Bit -Driven  Logic (BDL) which seems to  o f f e r  some 
s i g n i f i c a n t  advantages over the c u r r e n t ly  predominate s t y l e ,  R e g is te r -T r an s fe r  
Logic (RTL). In BDL, every  o p e r a t o r ,  even one as simple as a s i n g l e  
conven t iona l  g a te ,  i s  d a t a - d r iv e n .  That i s ,  an operator  i s  qu iescent  u n t i l  
requ ired  data becomes va l id  at i t s  in puts ;  then i t  removes the data from i t s  
inputs and c r e a t e s  r e s u l t  data at i t s  ou tp u ts .  In c o n t r a s t ,  the com binator ia l  
l o g i c  o p e r a to r s  o f  RTL co n t in u o u s ly  produce r e s u l t s  based on the input values  
with no regard to data " v a l i d i t y " .  Propagation time assumptions are used to  
des ign  c l o c k  timing which attempts to  ensure that on ly  " v a l i d "  data i s  c lo cked  
in to  r e g i s t e r s .
Most convent iona l  th inking  in l o g i c  des ign  segregates  s torage  from l o g i c ,  
as exem pl i f i ed  by the synchronous s t a te  machine and the register-ALU 
o r g a n iz a t io n  o f  most computers.  With BDL, in c o n t r a s t ,  s tora ge  and l o g i c  are 
h ig h ly  in te g ra te d ;  s t o ra g e  i s  a s s o c ia te d  with each g a t e - l e v e l  o p e r a t o r .  The 
r e s u l t  i s  inherent p i p e l i n i n g  at an atomic l e v e l  which g iv e s  BDL the p o t e n t ia l
2f o r  very -h igh-throughput  c i r c u i t s .
BDL r ep r esen ts  an extreme p o s i t i o n  in the d a ta -d r iven  (or  d a ta - f l o w )  
p h i lo sop h y  which many re s e a rc h e rs  are c u r r e n t ly  applying to  i s su es  o f  computer 
a r c h i t e c t u r e  and programming [Davis  78,  Dennis 74,  K e l le r  79.  Misunas 
77.  Misunas 7 8 ] .  Most o f  t h i s  d a ta -dr iven  research  has d e a l t  with modules o f  
f a i r l y  high  com plex i ty ,  such as adders ,  m u l t i p l i e r s  or l i s t - p r o c e s s o r s , and 
u s u a l ly  the in te rn a l  o r g a n iz a t io n  o f  these  modules has employed the 
conven t iona l  RTL s t y l e .  BDL s imply c a r r i e s  the d a ta -d r iven  ph i losophy  to  a 
lower l e v e l .
I t  has u s u a l ly  been f e l t  that  apply ing d a ta -d r iv e n  ideas  to  the low l e v e l  
tha t  BDL does i s  h o p e l e s s l y  c o s t l y  in components and t h e r e f o r e  o f  on ly  
t h e o r e t i c a l  i n t e r e s t .  The ready a v a i l a b i l i t y  o f  numerous MSI in tegra ted  
c i r c u i t s  o r ien ted  toward RTL, coupled with the la ck  o f  even the s implest  
in tegra ted  c i r c u i t s  e x p r e s s l y - s u i t e d  f o r  BDL has s t r o n g ly  encouraged such 
th in k in g .  However, now that  we are in the p o s i t i o n  o f  th inking  about the 
des ign  o f  an e n t i r e  system on a VLSI c h i p ,  the component c o s t  d i f f e r e n c e  i s  no 
longer  c l e a r .  To be sure ,  the BDL implementation o f  a simple operator  w i l l  
almost always req u ire  more s i l i c o n  area than a s im i la r  RTL implementation,  but 
i t  w i l l  be shown la t e r  that (g iven  the r i g h t  types  o f  computation problems) 
BDL u t i l i z e s  inherent p ip e l i n i n g  with very  f i n e  g r a n u la r i t y  to  achieve  much 
higher throughput than i s  p o s s i b l e  with RTL o f  the same switching speed and 
fan i n / o u t .  Thus i t  might turn out that a BDL system w i l l  a c t u a l l y  req u ire  
l e s s  s i l i c o n  area than an RTL system o f  equal performance ( i . e . ,  one that  can 
perform the same computation in the same amount o f  t i m e ) .
Perhaps the most important fea tu re  o f  the BDL s t y l e  o f  des ign i s  the high 
degree o f  l o c a l i t y  which i s  rather  n a t u r a l ly  achieved ;  a l o c a l i t y  both o f  
des ign  c o n c e p t u a l i z a t i o n  and o f  space in the phys ica l  r e a l i z a t i o n .  Conceptual 
l o c a l i t y  i s  important to  s im p l i f y  the des ign  p r o c e s s ,  and to  make v e r i f i c a t i o n  
o f  c o r r e c t n e s s  e a s i e r .  S pat ia l  l o c a l i t y  becomes very  important f o r  VLSI 
des ign  as smaller geometr ies  and high  speeds are attempted [ S e i t z  79 ] .  Good
3l o c a l i t y  i s  c l o s e l y  t i e d  to  modularity  and r e s u l t s  from most d a t a - f l o w  
approaches .  By carry ing  the d a t a - f l o w  ph i losophy  to  a lower l e v e l ,  and by 
e l im in a t in g  c l o c k s  and t h e i r  attendant t iming problems com ple te ly ,  BDL i s  
expected  to achieve  very  good l o c a l i t y .
In a d d i t i o n ,  the BDL s t y l e  o f  des ign  appears to  be very  compatib le  with the 
concept  o f  uniform arrays  such as the SLA [ P a t i l  7 9 ] ,  which should make VLSI 
des ign  e a s ie r  in two important ways. The f i r s t  i s  shared by any des ign  s t y l e  
using uniform a rrays .  The time-consuming and c o s t l y  p rocess  o f  IC la you t  i s  
rep laced  by an array-programming process  a l low ing  the des ign  o f  VLSI ch ip s  
independent o f  t r a n s i s t o r - l e v e l  d ev ice  c h a r a c t e r i s t i c s  or f a b r i c a t i o n  methods.  
Much l i k e  a h i g h e r - l e v e l  so ftware  language,  the SLA may lend p o r t a b i l i t y  to  a 
ch ip  d e s ig n .
The second advantage o f  the BDL-SLA com bination ,  which would not be shared 
by an RTL-SLA com binat ion ,  i s  t r a n s fe r in g  c i r c u i t  t iming co n s id e r a t i o n s  from 
the l o g i c  d e s i g n e r ' s  j o b  to  the i n i t i a l  (one - t im e)  des ign o f  the SLA in te rn a l  
c e l l  s t r u c t u r e .  A c locked  RTL system re q u ire s  c o n s id e r a t i o n  o f  operator  de lay  
times in the design o f  the c l o c k  t im in g .  Without such knowledge,  i . e . ,  
without knowledge o f  the delays  o f  the component SLA c e l l s ,  an RTL system can 
not be guaranteed to  be f u n c t i o n a l l y  determinate .  A BDL system, on the other  
hand, can be so guaranteed from s t r u c t u r a l  c o n s id e r a t i o n s  a lon e .  Thus, a BDL 
system i s  t r u ly  p o r ta b le  from one SLA f a b r i c a t i o n  tech n o logy  to  another while 
an RTL system, with c l o c k  timing based on the c e l l  de lay  c h a r a c t e r i s t i c s ,  i s  
not.^
As an added b e n e f i t ,  the l o c a l i t y  o f  BDL may f a c i l i t a t e  a simpler layout  
l e v e l  design f o r  the SLA c e l l  than i f  the SLA were to  be used f o r  RTL, s ince
V o r  c e r t a in  types  o f  systems, in which a " r e a l - t i m e "  i n t e r f a c e  with the 
externa l  world p la c e s  t ime c o n s t r a i n t s  on the computat ions  to  be performed, a 
BDL system w i l l  not be e n t i r e l y  p o r t a b l e .  I t  i s  hard to  imagine any design 
s t y l e  which would g iv e  an e n t i r e l y  p o r ta b le  system in such c a s e s .
g re a te r  l o c a l i t y  on a ch ip  a l lows s tronger  in tern a l  t iming c o n s t r a in t s  to  be 
assumed .
2 G-Nets -  A Graph Model f o r  B i t -D r iven  Logic
2.1 D e f i n i t i o n  o f  G-Nets
B r i e f l y ,  a G-Net (F igure  1 f o r  example) resembles a Petr i  Net with co lo red  
tokens o f  two c o l o r s ,  c a l l e d  "0 "  and " 1 " .
A , B, and C a re  in p u t  p l a c e s .
D is an output place.
4
Figure 1: Example G-Net 
S pec ia l  l a b e l s  on the input a rcs  c o n t r o l  the c o l o r s  o f  input tokens required 
to  enable a t r a n s i t i o n ,  and l a b e l s  on the output a r cs  determine the c o l o r s  o f  
tokens g e n e r a te d . An i n i t i a l  marking has at most one token per p la ce  and a 
s t r i c t  f i r i n g  r u l e  prevents  more than one from ever occupying a s in g l e  p la c e .  
As with Petr i  nets  in f u l l  g e n e r a l i t y ,  c o n f l i c t s  are a l low ed .
In somewhat more d e t a i l ,  a G-Net i s  a d i r e c t e d  graph with an i n i t i a l  
marking. There are two types o f  v e r t i c e s ,  c a l l e d  p la ce s  and t r a n s i t i o n s , and 
two types  o f  a r c s .  Input a r cs  o r i g i n a t e  at p la c e s  and terminate  at 
t r a n s i t i o n s ,  and output  arcs  are in the o p p o s i t e  d i r e c t i o n .  No other  a r c s  are 
perm it ted ,  nor are m u l t ip le  a r c s  ( i . e . ,  those  with a common o r i g i n  and 
d e s t i n a t i o n ) .  A marking o f  a G-Net a s s o c i a t e s  e i t h e r  no,  a s i n g l e  " 0 " ,  or a 
s i n g l e  " 1 ” token with each p lace  o f  the n e t .  At some marking,  a p lace  with no 
token i s  said to be empty and one with a token i s  f u l l . G ra p h ica l ly ,  a G-Net 
uses c i r c l e s  to represent  p la ce s  and bars f o r  t r a n s i t i o n s .  The tokens  o f  a 
marking are represented by w ri t ing  noth in g ,  "0 "  or "1 "  in s id e  each p la c e .
With r e s p e c t  to some t r a n s i t i o n ,  a p lace  connected to  that t r a n s i t i o n  by an 
input arc i s  c a l l e d  an input p la ce  and one connected by an output arc i s
5c a l l e d  an output p la c e ^ . With re s p e c t  to some G-Net,  an input p lace  i s  one 
which i s  an input p lace  to  some t r a n s i t i o n  in the n e t ,  but an output p lace  f o r  
none.  S im i la r ly ,  an output p la ce  i s  an output f o r  at l e a s t  one t r a n s i t i o n  but 
an input f o r  none.  A l l  o ther  p la ces  in the net are c a l l e d  in te rn a l  p l a c e s .
A s imulat ion  o f  a G-Net i s  a sequence o f  f i r i n g s  o f  t r a n s i t i o n s  o f  the n et ;  
each f i r i n g  i s  an in te g r a l  event which o c c u rs  some time a f t e r  a t r a n s i t i o n  
becomes e n a b le d . The s imulat ion  assumes an input process  which p laces  tokens 
in input p la ces  o f  the net as they  become empty, and an output p rocess  which 
removes tokens from f u l l  output p l a c e s ;  these  proc esses  represent  the 
i n t e r f a c e  between the net and the " o u t s id e  w o r ld " .
When f i r i n g ,  a t r a n s i t i o n  assumes a temporary c o l o r  based on i t s  in p u ts ,  
removes a l l  tokens from i t s  input p la ce s  and c r e a t e s  tokens at each o f  i t s  
output p la ces  with c o l o r s  based on i t s  temporary c o l o r  and markings on the 
output  a r c s .  I f  two or more t r a n s i t i o n s  with a common input or output p lace  
become s imultaneously  enabled ,  a c o n f l i c t  i s  said to  o c c u r .  In case  o f  a 
c o n f l i c t ,  on ly  one t r a n s i t i o n  a c t u a l l y  f i r e s .
A t r a n s i t i o n  i s  enabled when a l l  the input p la ces  (A2, A, B, C, and D o f  
Figure 2) conta in  s p e c i f i e d  tokens  and a l l  the output  p la ces  (E, F, G and H)
Figure 2:  Example t r a n s i t i o n  
Which c o l o r s  o f  tokens are required  at the input p la ce s  i s  s p e c i f i e d  by
A pi ;ic:o which i s  both an input and output f o r  a s i n g l e  t r a n s i t i o n  i s  
a l low ed ,  but due to  the s t r i c t  f i r i n g  r u l e ,  i t  w i l l  prevent  the t r a n s i t i o n  
from ever f i r i n g .
markings on the input a r c s .  A "0 "  next  to  the arc means that a 0 token i s  
r e q u i re d ;  s i m i l a r l y  a "1 "  means a 1 i s  r e q u i r e d .  An unmarked input may be 
e i t h e r  a 0 or a 1, but a l l  unmarked inputs o f  one t r a n s i t i o n  must be the same 
c o l o r .  An input arc with a small  in v e r t in g  c i r c l e  r e q u i r e s  a token which i s  
the complement o f  that  on any unmarked inputs ;  i f  there  are no unmarked input 
a r c s ,  the in ver t in g  input may be e i t h e r  0 or 1 . In any c a se ,  a l l  in ver t in g  
inputs  o f  one t r a n s i t i o n  must be the same. Thus, f o r  the t r a n s i t i o n  o f  Figure
2 to  be enabled ,  C must be 0 ,  D must be 1, and e i t h e r  A2 and A are both 0 
whi le  B i s  1 or A2 and A are 1 whi le  B i s  0.
The temporary c o l o r  assumed by a t r a n s i t i o n  during f i r i n g  i s  determined as 
f o l l o w s :  I f  there are any unmarked inputs  the t r a n s i t i o n  takes the c o l o r  o f  
that  input ,  o therwise  i f  there  are in v e r t in g  inputs  the complement o f  these 
inputs  i s  used,  and i f  n e i ther  then the t r a n s i t i o n  i s  given the c o l o r  1. Each 
output p lace  is  g iven  a token c o lo r ed  as f o l l o w s :  A " 0 "  or a "1 "  means that  a 
0 or 1 token i s  to  be ou tpu t ,  an unmarked arc means that  a token with the 
t r a n s i t i o n ' s  c o l o r  i s  to*be  output and an arc marked with an in v e r t in g  c i r c l e  
means that the complement o f  the t r a n s i t i o n ’ s c o l o r  i s  t o  be output .  For 
Figure 2, the c o l o r  assumed by the t r a n s i t i o n  w i l l  be the same as the token on 
the A or A in p u t .  E w i l l  r e c e i v e  a token with the t r a n s i t i o n ’ s c o l o r ,  F the 
complement o f  that c o l o r ,  G a 0 and H a  1.
2 .2  Using G-Nets f o r  BDL
At t h i s  p o i n t ,  the r e l a t i o n  between G-Nets and BDL should be c o n s id e r e d .  
T r a n s i t io n s  represent  da ta -d r iv en  op e ra t io n s  with com plex i ty  s im i la r  t o  s in g l e  
gates  o f  con ven t ion a l  l o g i c .  P laces  represent  data l i n k s  from one operator  to  
the nex t ,  and each prov ides  s to ra ge  f o r  a s i n g l e  b i t .  T ra n s i t io n s  are 
c o n j u n c t i v e , as each re q u i re s  a f u l l  tu p le  ( o r  one o f  two tu p le s  i f  unmarked 
or in v er t in g  inputs are present)  with a s p e c i f i c  value o f  c o l o r ed  tokens 
present  at the inputs  to  be enabled , and produces tokens at a l l  the outputs  
when f i r i n g .  Places  are d i s j u n c t i v e  s in ce  they can r e c e i v e / d i s t r i b u t e  tokens 
fr om /to  any one o f  severa l  t r a n s i t i o n s .
6
Boolean l o g i c  o p e ra to rs  can be implemented in a manner s im i la r  to  
con ven t ion a l  "AND-OR" 2 - l e v e l  l o g i c  by using one t r a n s i t i o n  f o r  each "AND" 
term and a p lace  to  "OR" the t r a n s i t i o n s  t o g e t h e r .  For example,  Figure 3(a )  
i l l u s t r a t e s  one p o s s i b l e  h a l f -a d d e r  c i r c u i t  in G-Net n o t a t i o n .  Note that no 
a c t i o n  w i l l  occur  u n t i l  tokens are present  at both input p l a c e s .  Then one o f  
the four t r a n s i t i o n s ,  depending on the input v a lu e s ,  w i l l  f i r e  and d e l i v e r  the 
r e s u l t  to the output p l a c e s .  A lso ,  Figure 3 (b )  d e p i c t s  a 3 - b i t  gate u n i t ;  
tokens  at the input p la c e s  cannot pass to  the output p laces  u n t i l  a token i s  
d e l i v e r e d  to the c o n t r o l  p la c e .
7
(a) (b)
Figure 3: Example G-Nets:  h a l f - a d d e r  (a)  
and 3 - b i t  gate  ( b)
I t  i s  o f t e n  convenient  to  use some G-Net as a b u i ld in g  b lo c k  in the 
c o n s t r u c t i o n  o f  a more complex G-Net. For t h i s  purpose,  the G-Net module 
n o ta t io n  e x i s t s .  A module i s  s im i la r  to  a simple t r a n s i t i o n :  i t  has input and 
output a rcs  and p l a c e s ,  and i t s  a c t i o n s  are enabled by tokens  which are 
accepted from (a subset  o f )  i t s  input p la c e s  and i t  c r e a t e s  tokens at (a 
subset o f )  i t s  output p l a c e s .  The module i s  drawn as a box with double l i n e s  
on the s id e s  and s i n g l e  l i n e s  on top  and bottom. Ins ide  the box i s  a name 
which i d e n t i f i e s  that  G-Net which the module,  to g e th e r  with i t s  input and 
output p l a c e s ,  r e p l a c e s .  The input and output  p la ces  are drawn externa l  to  
the module with s i n g l e  a r cs  to  or from the module.  Thus, severa l  modules or 
t r a n s i t i o n s  can share input or output p l a c e s .  The number and f u n c t i o n a l i t y  o f  
input and output p la ce s  o f  the module i s  the same as the G-Net f o r  which i t
s u b s t i t u t e s .  Each input and output arc o f  the module should be l a b e l l e d  to  
i d e n t i f y  i t  with a s p e c i f i c  arc o f  the r e f e r e n c e  G-Net;  but i f  many modules 
are used in a s i n g l e  n e t ,  i t  i s  u s u a l ly  assumed that a l l  have the same 
arrangement o f  a rcs  and on ly  one module needs to  be so l a b e l l e d .  Figure 4 i s  
an example o f  a G-Net and the corresponding  module r e p r e s e n t a t i o n .
8
Figure 4: Example module and corresponding  G-Net
A s p e c i a l  type o f  module which s a t i s f i e s  c e r t a i n  r e s t r i c t i o n s  i s  c a l l e d  a 
macro t r a n s i t i o n . I t  i s  drawn with on ly  s i n g l e  l i n e s  on the s id e s  but 
o therwise  i d e n t i c a l l y  to  the module n o t a t i o n .  The f i r i n g  p r o p e r t i e s  o f  a 
macro t r a n s i t i o n  mimic those o f  a simple t r a n s i t i o n :  When a macro t r a n s i t i o n  
f i r e s ,  a token i s  removed from every  input p la ce  and one i s  created at every  
output p l a c e .  There are no in tern a l  p la ce s  and each t r a n s i t i o n  o f  the 
r e f e r e n c e  G-Net has a rcs  to  each o f  the input and output p l a c e s .  Figure 5 
g iv e s  an example o f  a macro t r a n s i t i o n  and the net i t  r e p r e s e n t s ;  note  that  
the module o f  Figure 4, in c o n t r a s t ,  cannot be a macro t r a n s i t i o n .
Figure 5: Example macro transition
93 N-Bit  R ipp le -C arry  Adder Demonstrating P i p e l in e  E f f e c t
In t h i s  s e c t i o n ,  a co n c re te  example i s  presented which performs use fu l  
computation and demonstrates the b i t - l e v e l  p ip e l in in g  which o c c u rs  
a u t o m a t i c a l l y ,  l ead in g  to  very high throughput.  Figure 6 i l l u s t r a t e s  a G-Net 
which adds two N-b i t  operands,  A and B, to  produce an N+1 b i t  r e s u l t ,
Note that r i p p l e - c a r r y  o r g a n iz a t io n  i s  employed;  t h e r e f o r e  the fa n - in  and 
fa n -ou t  required  o f  the elements i s  low, which may be an important p r a c t i c a l  
c o n s id e r a t i o n  .
To analyze  the a d d e r ' s  o p e r a t i o n ,  l e t  us f i r s t  c on s id er  that  a l l  p la ces  in 
the net are i n i t i a l l y  empty. Assume that an i n f i n i t e  stream o f  operand pa irs  
are to be added and that id e a l  operand sources  are connected to  the A and B
n_  ^ Figure 6 :  G-Net f o r  N-b i t  addern
10
input p l a c e s ;  as soon as any input p lace  becomes empty the appropr ia te  b i t  
from the next operand to  be used i s  put in that  p l a c e .  S im i la r ly ,  an idea l  
operand sink i s  connected to  the Z output  p la c e s  which consumes tokens o f  the 
r e s u l t  as soon as they are produced at Z.
No op era t ion  o c c u rs  u n t i l  the Aq and Bq p la ce s  r e c e i v e  tokens o f  the f i r s t  
pa ir  o f  operand words. Then the 1/2-ADD t r a n s i t i o n  TQ f i r e s ,  consuming the 
tokens at AQ and Bq and producing r e s u l t  tokens  at Zq and the c a r r y ,  Cq . 
S ince Aq and Bq are now empty, the l e a s t  s i g n i f i c a n t  b i t  tokens  o f  the second 
pa ir  o f  operands are placed in Aq and Bq and the pipe begins  to  f i l l .  
Concurrent ly ,  s in ce  a token has arr ived at CQ and the id ea l  operand sources  
w i l l  have f i l l e d  the A^and B^  input p l a c e s ,  the ADD t r a n s i t i o n  T^  f i r e s  
consuming A^  , B^  and Cq and producing C^  and Z^. A lso ,  the operand sink 
consumes ZQ which was produced e a r l i e r .
Now Cq and Zq are empty and Tq can again f i r e ,  consuming the Aq and Bq 
tokens o f  the second operand pair  and producing ZQ and a c a rr y  Cq. Two other 
o p e r a t io n s  proceed at t h i s  same t ime:  A^  and B^  r e c e i v e  tokens from the 
second operand pair  and T  ^ f i r e s  to  consume A^, Bp and C^  producing Z  ^ and 
propagating  the c a r r y ,  C  ^ o f  the f i r s t  operand p a i r .  With Aq and Bq again 
empty, the AQ and Bq tokens  o f  the th ird  operand pair  enter  the adder p ipe .  
So f a r ,  Zp, Z1 and Zq outputs have been produced fo r  the f i r s t  operand pa ir  
and Zq f o r  the second p a i r .  As o p era t ion  c o n t in u e s ,  c a r r i e s  propagate to  the 
l e f t ,  new p a i r s  o f  operands enter  the adder p i p e l i n e  l e a s t - s i g n i f i c a n t - b i t s -  
f i r s t  and r e s u l t s  for  s u c c e s s iv e  p a i r s  leave  the pipe ( a l s o  l e a s t - s i g n i f i c a n t -  
b i t s - f i r s t ) .
When the c a r r y  from the f i r s t  operand pair  reaches  T^_^, the pipe  w i l l  have 
been f i l l e d .  N/2 t r a n s i t i o n s  at a t ime w i l l  be f i r i n g ,  odd-numbered ones 
a l t e r n a t in g  with even.  With each v o l l e y ,  N/2 new output b i t s  w i l l  be 
produced;  t h e r e f o r e  the throughput w i l l  be 1/2-r adds per second where * i s  the 
f i r i n g  time fo r  a t r a n s i t i o n .  Note e s p e c i a l l y  that  t h i s  r e s u l t  i s  independent 
o f  word l en gth ,  N!
11
Given some computation which an opera tor  i s  t o  perform, two important 
performance measures can be d e f i n e d .  The f i r s t  i s  de lay  t im e , the time from 
a r r i v a l  o f  ( the  f i r s t  part o f )  the input to  output o f  ( th e  l a s t  part o f )  the 
r e s u l t .  The second i s  throughput , the ra te  at which the computation can be
O
repeated given a stream o f  input v a l u e s . ”1 Tt should be remembered that  BDL 
e x h i b i t s  p ip e l in in g  to  a ra ther  extreme degree ;  i t s  performance i s  r e la ted  
d i r e c t l y  to  t h i s  p i p e l i n i n g .  Delay time f o r  a BDL op e ra to r  i s  s im i la r  to  that 
f o r  an RTL o p e r a t o r ,  assuming eq u iva lent  com plex i ty  and switch ing times f o r  
the e lements .  BDL ga in s  i t s  advantage in throughput,  as d iscussed  below.
Throughput fo r  an RTL c o m b i n a t o r i a l - l o g i c  op era tor  i s  j u s t  the r e c i p r o c a l  
o f  the maximum de la y  time,  s ince  the r e s u l t  from one input operand se t  must be 
computed and c locked in to  an output r e g i s t e r  b e f o r e  i t  i s  sa fe  to  change the 
input to the next operand s e t . ^  Since a l l  b i t s  o f  an operand word are c locked  
s im ultaneous ly ,  the maximum de lay  i s  the l o n g e s t  path from any input b i t  to  
any output b i t ,  which may be much longer  than any path from input t o  output 
f o r  a s i n g l e  b i t  p o s i t i o n .  Considering an N -b i t  r i p p l e - c a r r y  adder,  the 
l o n g e s t  path f o l l o w s  the c a rr y  from the LSB ( l e a s t  s i g n i f i c a n t  b i t )  input to  
the MSB (most s i g n i f i c a n t  b i t )  ou tput .  I f  the de lay  o f  each f u l l - a d d e r  
element i s  t , then the o v e r a l l  op era tor  de lay  i s  Nt and the throughput i s  
1/N-r. Since N w i l l  o f t e n  be f a i r l y  l a r g e ,  the throughput o f  the BDL N-b it  
r i p p l e - c a r r y  adder ( 1 / 2 t as derived in s e c t i o n  3) can be much b e t t e r  than that 
o f  the e q u iv a len t  RTL o p e r a to r  (by  a f a c t o r  o f  32 f o r  N = 6 4 - b i t  words) .
In current  RTL p r a c t i c e ,  ca rry - lookahead  rather  than r i p p l e - c a r r y  schemes
4 Performance of RTL Versus BDL Operators
“These measures are a lso  known as l a te n c y  and bandwidth, r e s p e c t i v e l y .
^Sometimes a minimum " s t o r a g e  time" as wel l  as maximum de lay  time f o r  an
op era tor  i s  known, in which case  the throughput can be increased  somewhat. It
i s  d i f f i c u l t  in p r a c t i c e  to improve the throughput by more than a f a c t o r  o f  
about two in t h i s  way. .
12
can be used which u t i l i z e  higher fan i n / o u t  elements to  decrease  the number o f  
s tages  through which the c a r r y  must propagate and thereby s i g n i f i c a n t l y  
decrease  the opera tor  d e la y .  U nfor tunate ly ,  elements become co r resp on d in g ly  
l e s s  l o c a l  which might be undes irab le  from the s tandpoint  o f  VLSI 
implementat ion .
Examining the cause o f  t h i s  low throughput o f  the r i p p l e - c a r r y  adder i s  
h e l p f u l .  Define meaningful  sw itch ing  as that  o p era t ion  between the a r r i v a l  o f  
" v a l i d "  data at a l l  inputs o f  an element ( g a t e ,  f u l l - a d d e r ,  e t c . )  and output 
o f  a v a l id  r e s u l t .  For the RTL r i p p l e - c a r r y  adder ,  the LSB f u l l - a d d e r  element 
w i l l  f i r s t  do some meaningful  s w i t ch in g ,  then the next  and so on,  r ip p l i n g  
along with the c a r r y  to  the MSB. Thus, at any i n s t a n t ,  o n ly  1 o f  N elements 
w i l l  be doing meaningful  switching  while the o th e r s  e i t h e r  have in va l id  inputs  
or  are merely maintaining a r e s u l t .  I t  would seem more e f f i c i e n t  i f  each 
element were spending n e a r ly  a l l  i t s  t ime doing meaningful  sw i t ch in g .  The key 
to  achieving  t h i s  end seems to  be a l lowing d i f f e r e n t  b i t s  o f  the data words to  
a r r i v e  at d i f f e r e n t  t im es .  This idea o f  p ip e l i n i n g  each b i t  independently  has 
been explored  in a synchronous clocked  a r r a y - m u l t i p l i e r  c o n t e x t  [Agrawal 751.  
BDL a ch iev es  t h i s  n a t u r a l ly  s in ce  each b i t  c a r r i e s  i t s  own timing in form at ion .
5 Throughput o f  A c y c l i c  G-Nets
For the N-b i t  adder o f  the preced ing s e c t i o n ,  i t  i s  f a i r l y  c l e a r  t h a t ,  
under the r i g h t  c o n d i t i o n s ,  a high throughput can be o b ta in e d .  But i t  i s  not 
at a l l  obvious  how ( o r  i f )  s i m i l a r l y  high throughput can a lso  be obtained at a 
system l e v e l .  In t h i s  s e c t i o n ,  concepts  p er t in en t  to  op t im iz a t io n  o f  
throughput in G-Nets w i l l  be deve loped .  P a r t i c u la r  emphasis w i l l  be given to  
i n t e r f a c i n g  between G-Net modules,  and techniques  w i l l  be developed to  design 
e n t i r e  systems with optimum throughput .  Although admitted ly  requ ir in g  rather  
l a r g e  G-Nets,  such systems can be designed which c a r r y  out complex 
c a l c u l a t i o n s  with the same high  throughput ( independent  o f  word length) as the 
N -b i t  adder.
13
F i r s t ,  a c l a r i f i c a t i o n  o f  e x a c t l y  what i s  meant by throughput o f  a G-Net i s  
in o rd e r .  Although in general  one might t a lk  about the throughput o f  e n t i r e  
N -b i t  operands at a p a r t i c u l a r  p lace  in complex s t r u c t u r e s ,  o n ly  the ra te  at 
which s i n g l e - b i t  tokens pass through very  simple n o n - c y c l i c  G-Nets w i l l  be 
cons idered  here .  Throughout the s e c t i o n ,  throughput w i l l  be considered  in the 
steady s t a te  assuming that appropr ia te  input and output proc esses  are 
o perat ing  to  maintain that  steady  s t a t e .  A d d i t i o n a l l y ,  the time f o r  any 
t r a n s i t i o n  to f i r e  a f t e r  being enabled i s  assumed to  be constant  and uni form, 
designated by t .
Since tokens are conveyed through a net by the f i r i n g  o f  t r a n s i t i o n s ,  the 
g rea ter  the t r a n s i t i o n  f i r i n g  r a t e ,  the higher the throughput.  And the o v e r a l l  
f i r i n g  ra te  proves to  be e a s i e r  to  study than the d e t a i l e d  f l ow  o f  tokens  
through the n e t .  Using the f i r i n g  r a t e ,  an ex press ion  i s  obtained r e l a t i n g  an 
upper bound on the maximum throughput which a simple l in e a r  chain ( s e e  Figure
7 f o r  example) o f  N p la c e s  (and N-1 t r a n s i t i o n s )  can susta in  to  the nunber o f  
tokens in the cha in .  Note that  each t r a n s i t i o n  has e x a c t l y  one input and one 
output ,  t h e r e f o r e  throughput i s  the same at a l l  p o in t s  in the net and the 
number o f  tokens  in the net remains co n s ta n t .
Figure  7 :  Simple l in e a r  chain o f  7 p l a c e s .
A c t ive  f r a c t i o n  at t h i s  marking i s  2/7
Let p be the a c t i v e  f r a c t i o n ; that  expected f r a c t i o n  o f  ( th e  t o t a l  N) 
p la ces  from which tokens are being removed (by t r a n s i t i o n  f i r i n g )  at any 
i n s t a n t .  The corresponding  ra te  o f  token f l ow  advances pN tokens  one p lace  
per -r in the N-place cha in .  To maintain the steady  s t a t e ,  p tokens per t must 
be enter ing  (and leav in g )  the cha in .  The throughput i s  thus p / t . In the 
remainder o f  t h i s  memo, p alone w i l l  be used fo r  the normalized throughput 
(expressed in un i t s  o f  tokens  per f i r i n g  t im e ) .
In the case o f  the above simple l in e a r  c h a in ,  an upper bound, p can
5.1 Basic Concepts
14
easily be placed on the active fraction and hence on the throughput by 
considering the fraction of places which contain tokens. Consider that to 
enable a transition, its input place must contain a token while its output 
place is empty. Thus, if N/2 of the N places contain tokens, at most N/2 
transitions can fire and Pu = 1/2. On the other hand, if j < N/2 tokens are 
present, then at most j transitions can fire and Pu = j/N; or if j >N/2 are 
present (N-j empty places) then at most N-j can fire and Pu = (N-j)/N. More 
concisely, for the simple linear chain of N places of which j are kept filled,
Pu = min[j/N,(N-j)/N] 
which has a maximum value of 1/2 when j = N/2.
(1 )
Note that Pu is only an upper bound and might not be achievable. For 
example, suppose that the first N/2 places are filled and the rest empty; only 
a single transition can fire yet Pu = 1/2. On the other hand, if every other 
place is filled then it is fairly easy to see that the bound of 1/2 will be 
achieved.
Now consider a slightly more complex G-Net: two parallel linear chains, one 
of M places and the other of N >_ M places, synchronized at the start and 






Figure 8: Two parallel linear chains 
A.ssmie thnt. the net is initially empty. Note that whenever a token is 
introduced into one path by T^ it is also introduced into the other path. 
Similarly, T1 removes tokens equally from the two paths. Thus, the number of 
tokens present in one path is always the same as the number present in the
other. Further, notice that this same synchronizing action requires that the 
throughput of the whole net be the same as that of each of the two paths. Now 
we can simultaneously plot the upper bound which each of the two paths places 
on its throughput (and hence the overall throughput) for a given nunber of 
tokens, j, in each path. See Figure 9.
15
M MN N M N
2 M+N 2
Figure 9: Plot of upper bounds for parallel paths 
The point at which the two bounds intersect gives the maximun bound for the 
overall structure:
pu = M/(M+N) (2)
which occurs when j = MN/(M+N) tokens in each chain. If the two paths are 
balanced, ie., M = N, then Pu = 1/2 and it is easy to see that this bound can 
in fact be attained. Unbalanced paths have smaller values of p , decreasing 
asymptotically to a value of M/N for very unbalanced paths. Clearly, balanced 
paths are desired for optimun throughput.
5.2 G-Net Module Characterization
To obtain optimal system throughput, it is useful to characterize modules 
using two properties: skew and thickness. Also, by drawing G-Nets in Time 
Normal Form (TNF) it is easier to appreciate these properties. These concepts 
are made manageable and useful by the assumption throughout the following that 
all transition firings take nearly the same amount of time. Furthermore, time 
will be expressed in units of transition firing-times.
Input skew is a measure of the time difference between the arrivals of two 
tokens of one operation set at a single (if the two tokens are carried on the
16
same serial data path) or two different input places such that minimum waiting 
occurs. If no confusion will result, the word "input" is omitted. For 
example, in the N-bit adder of Figure 6 , SKEWCa^.a^) = -SKEW(a^.a^) = +1 
meaning that a token should arrive at place a^  one firing-time after one 
arrives at aQ to give minimum (in this case no) waiting. In many cases, 
including the N-bit adder of Figure 6 , skews between any two successive bits 
of an operand are the same. Then, for brevity, we refer to the skew of the 
entire operand word; eg., SKEW(A) = 1.
Output skew is a measure of the time difference between production of two 
tokens of one operation set at a single or two different output places 
assuming that input tokens are received with minimum waiting. As with input 
skews, the word output is often omitted and when appropriate the term is 
applied to an entire result word. A very similar measure, internal skew, 
applies when one or both of the places under consideration is an internal 
place. .
Thickness measures the time delay from arrival of an input token to 
production of an output token, again assuming that input tokens are received 
with minimum waiting. In simple cases, the thickness from input place a^ to 
output place is just the number of transitions in the longest path from a^ 
to z^. Thickness, like skew, is often applied to entire words when 
appropriate.
Using the above properties, the N-bit adder of Figure 6 can be 
characterized as follows:
1. SKEW(A) = SKEW(B) = SKEW( Z) r 1
2. SKEW(A,B) = 0
3. THICKNESS(A,Z) = THICKNESS(B,Z) = 1
Notice that 2. is actually implied by 3. and therefore redundant. Skew and 
thickness are sometimes not defined, or are data dependent as in Figure 10. 
But in most cases, they are quite useful measures.
Figure 10: G-Nets with ill-defined skews pnd thicknesses
Time Normal Form (TNF) is a style of drawing acyclic G-Nets which 
illuminates skew and thickness. As an example, the G-Net of Figure 6 is drawn 
in TNF. To draw a TNF G-Net, the drawing surface is imagined to be divided 
into levels, numbered starting with 0 at the top and increasing toward the 
bottom. The net is then drawn according to the following rules:
1. Arcs of the net always point downwards, from a smaller to a greater 
level .
2. Places are always drawn directly on the levels, transitions between 
levels.
3. Arcs will traverse as few levels as possible subject to 1. and 2.
4. G-Nets for operators dealing with N-bit parallel data words should 
be drawn with the least significant bit position at the right and 
increasing significance toward the left.
Inspection of a TNF G-Net illuminates the throughput which that net can 
sustain. If rule 3- is obeyed, any arc traversing more than one level must 
be part of the longer of two synchronized parallel paths. The throughput of 
this parallel structure is less than optimun, as expressed by equation (2). 
Such a bottleneck can be corrected by inserting extra places and identity 
transitions into the shorter path until the two paths are balanced. TNF 
G-Nets thus point out possible design deficiencies.
18
In addition, thickness and skew of a TNF G-Net are readily apparent: the 
skew between tokens arriving at two places is just the difference in levels of 
the places. Skew of an entire operand is just the slope of the line passing 
through all the places which hold that operand, where slope from a to b is 
defined as
[Level(b)-Level(a)]/[BitPositionC b)-BitPosition(a)].
Thickness is simply the level of the output place minus the level of the 
corresponding input place.
Block diagrams can also utilize the advantages of TNF. A block diagram is 
essentially a G-Net which contains modules, and for convenience, lines between 
modules are assumed to implicitely contain places. The shape of the module's 
box conveys information: its height corresponds to the height of the operator 
(some scale being assumed) and the slopes of the top and bottom edges (inputs 
and outputs respectively) give the skews. Figure 11 gives such a TNF block 
diagram which adds three numbers using the N-bit adder of Figure 6 as a 
building block and Figure 12 gives the TNF black-box version of this 3-operand 
adder.
5.3 Parallel combination of G-Net Modules
Using the above concepts, the efficient combination of G-Net modules can be 
studied. Consider first the effect of two operators with different thickness 
in parallel as in Figure 13(A). If the thickness of A is M and of B is N, 
then expression (2) gives Pu = M/(M+N) as an upper bound on the throughput 
compared to P = 1 /2 which could be obtained if the operators were equal in 
thickness.
The rather obvious solution is to insert a flat shim of thickness N-M in 
series with operator A as shown in Figure 13(B) . The flat shim is simply an 
identity operator of the specified thickness, implemented with parallel chains 
of equal length. The chains are uncoupled, so a flat shim may be used to 
increase the thickness of an operator with a skewed output (input) as well as 
one with no skew. Thus, operators of any thickness may be efficiently
















Figure 13: Parallel operators: unbalanced (A) 
and balanced with shim (B)
paralleled .
5.4 Serial composition of G-Nets
The harder case is the series composition of operators with different 
skews. Consider the output of an N-bit operator A with output skew of 1 and 
thickness of 1 connected to an operator B with i^put skew of 2 and thickness 
of 1 as shown in time normal form in Figure 14. From place aQ to place 
(other than zQ), there are several parallel paths. The shortest is almost 
entirely within operator A, and contains i+1 places (not counting aQ or z^ ) . 
The longest is contained almost entirely within B and has 2i+1 places. (The 
shortest and longest paths for i = 3 are enhanced in Figure 14.) The greatest 
unbalance occurs when i = N-1 , ie., the most significant bit, in which case 
the short path is N and the long path 2N-1 places. By equation (2), the 
throughput is less than optimal: Pu = N/(3N-1). In the limit as the word 
length becomes very long, = 1/3.
Using the same approach, upper bounds on the throughput for the composition 
of two arbitrarily skewed operators can be obtained. Again assume that the
/Figure 14: Cascading operators with different skews
output of A has skew a and is connected to the input of B which has skew b, 
and each has a thickness of 1. There are three basic cases:
1 . If a and b have the same sign, and are both non-zero, then 
Pu = [(N-1)max(|a|,|b|)+1]/[|a+b|(N-1)+1 ] which in the limit for 
long words (large N) is Pu = max ( | a | , | b | ) / | a+b | .
2. If a or b but not both is zero, then Pu = 1/N|a+b|.
3. If a and b have opposite signs, both non-zero, then 
Pu = 1/[(|a|+ |b|)(N-1)+2], which in the limit is pu = 1/(|a|+ |b|)N.
The results for case 1. are not too bad; with a and b both small integers, 
Pu is typically 1/3 - 1/5 which might be tolerable in some cases. But since N 
is typically fairly large (16 - 64), cases 2. and 3 . rarely give acceptable 
throughput. Fortunately, in all of these cases full throughput of 1/2 can be 
realized by inserting a skewed shim with skew a-b between the two operators. 
A skewed shim, like a flat shim, is an identity operator but its thickness 
varies linearly with bit position rather than being constant. A skewed shim 
with a skew of s, s > 0 , has thickness of s*i at bit position i, and if s <0 
the thickness is s(N-1-i) at bit i. In both cases, the minimum thickness will 
be 0 and the maximum s(N-1). Like a flat shim, the bit positions are 
uncoupled. Thus a skewed shim with a skew of s can be used to convert an 
operand with any skew a into one with skew a+s. Figure 15 illustrates the use 
of a skewed shim.
Note that several cases have been presented in which throughput can be 
improved by the introduction of shims, which delay the flow of data. This 
apparently counter-intuitive result becomes clearer if one realizes that shims 
have storage as well as delay, and it is the increased storage for 
intermediate results which allows the throughput to rise.
23
5It is interesting though not precisely correct to consider RTL in this 
light. A ripple carry adder, for example, has a non-zero "skew" while the 
registers that feed and receive the data have a "skew" of 0.
Figure 15: Use of skewed shim to give optimim throughput
In this section, concepts were presented which allow non-cyclic systems 
composed of serial and parallel combinations of G-Net operator modules to be 
designed which achieve optimim throughput. By drawing the G-Net operators in 
time-normal form and inserting extra places into unbalanced parallel paths, 
optimum throughput of the operator can be ensured and the operator can be 
characterized using the properties of skew and thickness. At the system level, 
one can deal with modules so characterized and, by inserting flat or skewed 
shims as required, ensure that the resulting system has optimim throughput.
25
From the material so far one might view BDL as a theoretically interesting 
but hard-to-implement concept. Fortunately, however, it is rather well suited 
for implementation using uniform-array VLSI circuits such as the SLA. The SLA 
contains storage elements (columns) and combinational operators (rows) mixed 
with a very fine level of granularity throughout the array. Such a mixture is 
at the very heart of the BDL concept, as exemplified by the places and 
transitions of the G-Net representation. In this context, BDL provides an 
elegant low-level basis for efficient usage and design of VLSI chips.
This section presents a general mapping from G-Nets into standard SLA 
programs and also a more efficient mapping into programs for a 
specially-suited variation of the basic SLA, the G-SLA. Also a G-SLA program 
for the N-bit adder is presented as an example. To understand this section, 
the reader should be familiar with the SLA operation and programming language, 
as presented in [Patil 791.
6.1 Mapping G-Nets into Standard SLA Programs
Basically, places of the G-Net map into one or two colunns of the SLA and 
transitions into one or two rows. Each colunn has two stable states, 0 and 1. 
But a place can be in one of three stable states, E (empty), 0 or 1. 
Generally therefore, two colunns, pO and pi, are required for each place P: 
both being 0 represents P being empty, pO being 1 represents P containing a 0, 
p1 being 1 represents P containing a 1 and both being 1 should never occur. 
If it is known that a place will only contain a single type of token (perhaps 
because it performs a control function) then only one colunn is needed. In 
general , each transition can take on either a value of 0 or of 1 when it 
fires; accordingly one SLA row is used for each event. Again, if it is known 
that the transition can only take one of these values, then only one row is 
required.
The procedure for translating from a G-Net to a standard SLA program is as 
follows :
6 VLSI Implementation of G-Nets
26
- For each place P in the net, one or both of colunns pO and p1 are 
created in the SLA as above.
- For each transition T which can assume a 1 value when firing, a row 
t1 is created which becomes activated for such a firing. Row t1 
should contain an entry at columns p1  ^ and pO^ corresponding to each 
input place P of T as follows:
* If P^  is unmarked or marked with a 1, enter a 1R at column p1j.
* If P^  is marked with a complementing circle or a 0, enter a 1R 
at column pCh .
- Also, at each colunn p1Q and pOQ corresponding to an output place PQ 
of T, make an entry in row t1 as follows:
* If PQ is unmarked or marked with a 1, enter a OS at colunn p1Q 
and a 0 at pOQ .
* If PQ is marked with a complementing circle or a 0, enter a 0 
at colunn p1Q and a OS at p0Q .
- For each transition T which can assume a 0 value when firing, a row 
tO is created which becomes activated for such a firing and contains 
entries at colunns p1. and p0  ^ corresponding to each input place P^  
of T as follows:
* If P. is unmarked or marked with a 0, enter a 1R at colunn pO..i  * K l
* If P^  is marked with a complementing circle or a 1, enter a 1R
at colunn p1. .i
- Also, at each colunn p1Q and p0o corresponding to an output place PQ 
of T, make an entry in row t1 as follows:
* If PQ is unmarked or marked with a 0, enter a 0 at colunn p1Q 
and a OS at p0Q .
* If PQ is marked with a complementing circle or a 1, enter a OS
at colunn p1 and a 0 at colunn pO .K o K o
Figure 16 gives some simple examples of the translation process. In
addition to this straightforward procedure, consideration must be given in
complex nets to physical arrangement of places and colunns in the SLA if 
efficient use of split rows and colunns is to be made.
27
1R OS 0 {copies a 1} 
1R 0 OS {copies a 0}
Standard SLA program 
ah al ch cl xh xl yh yl
ah al bh bl
1R 1R OS 0
1R 1R 0 OS la -> ^
Standard SLA program G-Net
(b) Switch
Figure 16: Simple G-Nets translated into 
standard SLA programs
6.2 The G-SLA
Standard SLAs, while functionally adequate, are somewhat cumbersome for 
implementing BDL circuits. The G-SLA presented here is a simple variation of 
the standard SLA which is tailored for the implementation of G-Nets.
Like the standard SLA, the G-SLA is a rectangular array of cells. Each 
column of cells stores one "state" variable while each row performs tests 
and/or actions on the columns which it intersects. The basic difference 
between the G-SLA and the standard SLA is that the G-SLA uses tristable, 
rather than bistable, flip flbps for the columns. The three stable colunn 
states are called E (empty) , 0 and 1 , and testing and setting of states is 
implemented in the same manner as in the standard SLA.
Each cell, which is at the intersection of some particular row and column 
of the G-SLA, may contain either a test-action character pair, an input 





the row does not interact directly with the column. Otherwise, the characters 
in the cell specify a test which the colunn must meet for the row to be 
enabled, and an action to be performed on that column should the row be 
enabled. The tests in a row are conjunctive, i.e., tests of all cells in the 
row must be met for the row to be enabled.
The first character of a test-action character pair specifies the test and 
is one of "E", "0", "1" or "E" , "0" or "1" mean that the column must be
in the "E", "0" or "1" state for the row to be enabled; means that the
state of the column irrelevant. The second character is the action character 
and is one of "x", "r", "s" or "x", "r" or "s" mean that the colunn is to
be put into the "E", "0" or "1" state if the row is enabled; means no
action is to be performed.
The input and output characters are shorthand versions of test-action pairs 
which are especiallly suited for G-Net implementation. The input characters 
"0" and "1" are exactly equivalent to the test-action pairs "Ox" and "1x" 
respectively, while the output characters "r" and "s" are equivalent to "Er" 
and "Es". Note that test-action characters always occur in pairs while input 
or output characters occur singly within cells.
It is assumed that rows and columns of the G-SLA may be split at any point. 
One tristable flip-flop is required for each section of colunn; this flip-flop 
is assumed to require an area one colunn wide and two rows tall, and may be 
located anywhere along the column section. The flip-flop area can not be 
crossed by any row and is indicated in the G-SLA program by a shaded region. 
The flip-flop will normally be initialized to the "E" state on power-up; 
writing a "0" or a "1" in a circle within the shaded flip-flop region will 
cause it to be initialized to the "0" or "1" state.
29
6.3 Mapping G-Nets into G-SLA Programs
The procedure for translating a G-Net into a G-SLA program is very similar 
to that for the standard SLA, but only a single colunn is needed for each 
place. The resulting, slightly simpler, procedure follows:
- For each place P in the net, a colunn p of the SLA is created.
- For each transition T which can assume a 1 value when firing, a row 
t1 is created which becomes activated for such a firing. Row t1 
should contain an entry at each column p^ corresponding to an input 
place P^  of T as follows:
* If is unmarked or marked with a 1, enter a 1 at colunn p^.
* If P^  is marked with a complementing circle or a 0, enter a 0 
at column p^.
- Also at each column pQ corresponding to an output place PQ of T, 
make an entry in row t1 as follows:
* If PQ is unmarked or marked with a 1, enter an s at column pQ .
* If PQ is marked with a complementing circle or a 0, enter an r
at column p .Ko
- For each transition T which can assume a 0 value when firing, a row 
tO is created which becomes activated for such a firing and contains 
an entry at each column p^ corresponding to an input place P^ of T 
as follows:
* If P^  is unmarked or marked with a 0, enter a 0 at colunn p^.
* If P^ is marked with a complementing circle or a 1, enter a 1 
at column p^.
- Also, at each column pQ corresponding to an output place PQ of T, 
make an entry in row t1 as follows:
* If PQ is unmarked or marked with a 0, enter an r at column pQ .
* If PQ is marked with a complementing circle or a 1, enter an s at colunn p .
30
6.1* G-SLA Program for an N-Bit Ripple-Carry Adder
Using the above translating procedure, G-SLA programs for the N-bit adder 
of Figure 5 are given in Figure 17 and Figure 18. Each program is composed of 
N nearly identical sections, each section handling one bit position, but in 
one program the sections are stacked vertically while in the other they are 
arranged horizontally. Such flexibility of form factor in G-SLA programs 
should prove extremely useful
r  r  r r  r r  n 1 n
a3 h ao Vo
(r r 0 0 0 ) ( r r 0 0' 0 )
( s r 0 0 1 (s r 0 0 1 )
Basic add ( s r 0 1 0 ) ( s r 0 1 0 )
mod ule ( r s 0 1 1 ) ( r s 0 1 1 )
for  even (s r 1 0 0 ) ( s r 1 0 0 )
b i t s ( r s 1 0 1 ) ( r s 1 0 1 )
( r s 1 1 0 ( r s 1 1 0 )
( s s 1 . 1 1 s
I*'1*2
1 1 1 )
V '






( s r 0 0 1 ) ( s r 0
B as ic  add (s r 0 1 0 ) (s r 0
mod ule ( r s 0 1 1 ) (r s 0
for odd (s r 1 0 0 ) (s r 1
b i t s (r s 1 0 1 ) (r s 1
(r s 1 1 0 ) (r s 1





























Figure 18: Horizontally organized adder G-SLA program
Notice the extensive use of split rows and colunns in both programs, which 
highlights the fact that interconnections for this N-bit adder are highly 
local. That is, the maximum length of any row or colunn is some small fixed 
value (in these programs, 28 and 18) of elements independent of the word 
length, N. This locality is a direct result of the ripple-carry structure, 
which is feasible only because of the bit-level pipelining obtained with BDL. 
Such locality is an important beneficial feature for the design of fast, long 
word-length VLSI circuits.
Figure 17:
r ) (0s ) (1
r )
s )
(0 0 0 r r
(0 0 1 r s
(0 1 0 r s
(0 1 1 s r
(1 0 0 r s
(1 0 1 s r
(1 1 0 s r
(1 1 1 s s
f 2 c2 /?2
0 ' /// //, * ! /r (0
s ) (1
r )s )
(0 0 r 0 r
(0 0 r 1 s
(0 1 r 0 s
(0 1 s 1 r
(1 0 r 0 s
(1 0 s 1 r
(1 1 s 0 r



















The previous sections have introduced the BDL concept and the associated 
issues of notation, inherent pipelining, throughput, and implementation using 
the G-SLA. This section ties all of these issues together by presenting a 
start-to-finish design example of a non-trivial high-throughput multiplier 
module. The design will be presented in a structured, top-down manner.
7.1 Basic Operation
The left-shift-and-add algorithm (Figure 19) is used as a basis for this 




r * , b r>
7 Array-Organized Multiplier Module
il__________I l
If
Figure 19: Basic multiplier Algorithm 
At each level the A operand is multiplied by the appropriate bit of the B 
operand and added to a partial result. The new partial result along with a 
shifted copy of the A operand is passed down to the next level. The LSB of 
the partial result at each level passes directly to become part of Z^, the 
lower order 4 bits of the result.
33
Each level of computation can be performed by an "A.b" module which is an 
enhanced adder. Figure 20 gives a block diagram for the multiplier using 
these modules. Note that the first level A.b module does not use its P inputs 
and the last level does not produce any A* outputs, as indicated by the "X" 
mark on those lines.
Each A.b module can be characterized as follows:
- InputSkew(A) = InputSkew(P) = OutputSkew( A*) = OutputSke w( P ') = 1.
- InputSkew( aQ ,bQ) = InputSkew( A, P) = 0.
- OutputSkew(P ,a ') = 1.
- Thickness(P,p') = 1.
The entire multiplier can also be characterized, as follows:
- InputSkew(A) = 1, InputSkew(B) = 2, InputSkewC aQ .bp) = 0.
- OutputSkew(Z^) = 2, OutputSkew(Z^) = OutputSkewCz^,z^) = 1.
- ThicknessCaQ,Zq) = 1.
The A.b module can be decomposed using "a.b" macro transitions as shown in 
Figure 21. Each a.b macro transition accepts p, a, b and c as inputs and
t i t i i iproduces p , a , b and c as outputs. Outputs a and b are simply copies of 
the a and b inputs. Outputs p' and c* are the sum and carry, respectively, 
resulting from the expression p+c+ab. The G-Net decomposition for the a.b 
macro transition is not given since it is fairly tedious and the above 
provides a complete description.
Note that the most significant macro transition has no b* output and the 
least significant has no c input. Also, recall that the first and last level 
A.b modules are special. In the last level, a.b transitions should have no a* 
output. In the first level they should have no p' input, hence no addition is 
performed and no c input or c' output is needed.
7.2 G-Net Description of Multiplier

35
Figure 21: TNF G-net for A.b module 
7.3 G-SLA implementation
The major consideration in the design of a G-SLA program for the above 
multiplier is the choosing of a physical layout which leads to efficient 
spatial arrangement of the modules. The layout presented, which might seem 
obvious in retrospect, required several attempts to evolve. Figure 22 gives a 
block diagram of the program showing how a.b macro transitions are 
concatenated to form A.b modules, A.b modules are concatenated to form the 
entire multiplier, and the interconnecting data paths. Each rectangular block 
represents an a.b macro transition G-SLA program, those along the periphery 
are special types and are labelled "a.bOL", "a.bL" and so on. The program for 




there are ways in which they could be made smaller but the form presented is 
felt to be most easily understood.
Again note the locality and regularity of interconnections in this program. 
It can be easily expanded to handle any given word length and will operate 
with the same throughput while requiring no greater fan in or fan out of the 
constituent elements.
8 Summary
Bit-Driven Logic, a rather unorthodox concept which may have considerable 
advantages for VLSI chip design, has been presented. Strong points of 
circuits using the BDL style of design include:
- Good locality, which aids design and implementation.
- A finely-grained mix of storage and logic, which automatically 
promotes extreme pipelining leading to high throughput given 
suitable computation problems.
- The asynchronous, data-driven philosophy at the heart of BDL 
eliminates clocks and the complex timing-design problems that 
accompany them.
The G-Net, a graph model similar to the Petri Net, was developed for 
drawing and modelling BDL circuits. Using G-Nets, throughput in acyclic BDL 
circuits was investigated and some basic results developed:
- In a simple linear chain (like a FIFO), best throughput can be 
achieved if half the places are full. '
- The throughput of a parallel combination of two linear chains, 
synchronized at begining and end, is best when the chains are of 
equal length.
- Using flat and skewed shims, a designer can interconnect BDL modules 
characterized by the properties of skew and thickness to achieve 
optimum throughput in an acyclic system.
- Time Normal Form is a very useful way of drawing G-Nets to 
illuminate throughput-limiting design deficiencies as well as skew 
and thickness properties of the net.
The SLA concept provides an elegant implementation vehicle for BDL. It is 
well-suited for implementation of BDL circuits for at least two reasons:
- Both concepts feature finely-grained mixtures of storage and logic.
(The rows and columns of the SLA, and the transitions and places of 
the G-Net, respectively.)
- The locality of BDL is beneficial to the SLA concept. Not only 
might the SLA cell’s internal design be simplified, but better 
opportunity to use split rows and columns is created.
To exploit this suitability, algorithms mapping G-Nets into standard SLA and
specially-tailored G-SLA programs were developed.
A complete design for an array-organized multiplier circuit demonstrates 











Agrawal , D. P.
Optimum Array-Like Structures for High-Speed Arithmetic.
In 3rd Symposium on Computer Arithmetic, pages 208-219. IEEE 
Computer Society, November, 1975.
Davis, A. L.
Data Driven Nets: A^ Maximally Concurrent Procedural, Parallel 
Process Representation for Distributed Control Systems.
Technical Report UUCS-78-108, University of Utah, July, 1978.
Dennis, J. B. and Misunas, D. P.
A Computer Architecture for Highly Parallel Signal Processing.
In Proceedings of the ACM 1974 National Conference, pages 
402-409. ACM, November, 1974.
Keller, R. M. , Lindstrom, G. and Patil, S. S.
A Loosely-Coupled Applicative Multi-Processing System.
In AFIPS Proceedings 48, pages 861-870. AFIPS, June, 1979.
Misunas, D. P.
Report on the Workshop on Data-Flow Computer and Program 
Organization.
Technical Memo MIT/LCS/TM-92, MIT Laboratory for Computer 
Science, November, 1977.
Contains a large bibliography of the field.
Misunas, D. P.
A Computer Architecture for Data-Flow Computation.
Technical Memo MIT/LCS/TM-100, MIT Laboratory for Computer 
Science, March, 1978.
Patil, S. S. and Welch, T.
A Programmable Logic Approach to VLSI.




In Seitz, C. L., editor, Proceedings of Caltech Conference on 
Very Large Scale Integration, pages 345-356. Caltech 
Computer Science Department, January, 1979.
REFERENCES
