Static analysis techniques for the synthesis of efficient asynchronous circuits by Gopalakrishnan, Ganesh & Akella, Venkatesh





Department of Computer Science 
University of Utah 
Salt Lake City, UT 84112, USA
Ganesh Gopalakrishan 
Department of Computer Science 
University of Utah 
Salt Lake City, UT 84112, USA
October 7, 1991
A bstract
In the context o f deriving asynchronous circuits from high-level descriptions, determining whether two ac­
tions are potentially concurrent (overlapped execution) or serial (non-overlapped execution) has several 
advantages. This knowledge can be utilized to efficiently implement shared variables, support speculative 
guard evaluation, and optimize resources (circuitry) by sharing. In a distributed environment with several 
concurrent processes, determining whether two actions are potentially concurrent or not, automatically, is 
often difficult to form ulate and com putationally expensive. In this paper, we present techniques to overcome 
these problems. First, we present a tool called p a rC o m p  which infers the composite behavior o f a collection 
o f modules, and then we present an algorithm called c o n C u r to analyze the inferred behavior to detect the 
seriality o f two actions. Sim ple heuristics are presented for the abstraction o f the inferred behavioral descrip­
tions and im proving the efficiency o f  co n C u r. The algorithms p a rC o m p  and c o n C u r are illustrated in 
the hopCP framework and implemented in Standard ML o f New Jersey. Execution times o f the algorithms 
are reported on a variety o f examples. The results are quite encouraging.
'Keywords: Performance-Directed Synthesis, Asynchronous Circuits, Static Analysis, Parallel Composi­
tion, Petri-Nets
2Supported in part by a University o f Utah G raduate Research Fellowship
3Supported in part by NSF Award M IP-8902558
1 Introduction
Considerable work has been done in performance-directed synthesis of synchronous circuits 
from high-level descriptions [CW91]. Flow analysis techniques have been studied in the con­
text of synchronous circuits for area-time optimizations in resource-allocation and schedul­
ing. Recently, there has been a growing interest in the synthesis of asynchronous circuits 
[Mar89, BS89, Chu87, Ebe89]. Many of the popular asynchronous compilation approaches 
are based on a communicating sequential process paradigm. Flow Analysis based optimization 
is conspicuously absent in these approaches. In this paper, we identify one simple criterion, 
namely, t
detection of whether two actions a and b are serial or concurrent in the execution
of a hardware module M  which consists of sub-modules M\ . . .  Mn.
Actions a and b could occur in the sorts of any submodule M i . . .M n. Actions could de­
note communication actions as in CSP like p?x, q\exp, and computational aspects like x + y, 
(e f fectivejaddr opi op2) or low-level register-transfer operations like Idjreg.Y , ALU -l.add  
etc. We investigate the significance of this criterion in the performance optimization of asyn­
chronous circuits. Then we present algorithms to statically detect this criterion in the context 
of a communicating sequential process paradigm used in the specification of asynchronous 
circuits.
We illustrate our ideas within the hopCP design environment which is being currently 
developed [AG91 b]. hopCP is a simple language for the specification, simulation and synthesis 
of synchronous and asynchronous circuits. We restrict ourselves to asynchronous circuits in 
this paper. hopCP can be best viewed as a first-order functional language augmented with 
features to support synchronous and asynchronous styles of value communication explicitly. 
A hopCP behavioral specification describes a concurrent state-transition system called H FG  
or hopCP Flow Graph. The actions in the H FG  denote communication (e.g. p?x,q\e) and 
computation (e.g. ( t  + j/)) aspects of behavior. The control aspects are captured by sequencing 
(analogous to of CSP) and choice (analogous to the alternate command in CSP) operators. 
In addition we have a parallel operator which derives a composite H FG  for a hardware module 
M  from the H FGs for its submodules Mi . . .  Mn.
A distinctive feature of hopCP is the support for asynchronous communication through 
a restricted form of shared variables called asynchronous ports. In general, asynchronous
communication is considered hazardous in a distributed environment due to its propensity to 
cause deadlocks and metastable behavior. The static analysis scheme proposed in this paper 
can issue a priori warnings for the unsafe usage of asynchronous ports and also help implement 
them efficiently if the accesses to the asynchronous ports are indeed serial.
Determining whether two actions are serial or concurrent leads to the following optimiza­
tions in area and time of the resultant circuits: (i) mapping logical channels to physical 
channels, (ii) concurrent and speculative guard evaluation in the alternate command, (iii) 
detecting sharing to optimize resources, and (iv) efficient implementation of shated variables 
used to support asynchronous communication.
In general, it is difficult to determine statically whether two actions are serial or concur­
rent because all possible interactions of the submodules have to be taken into account. We 
achieve this through behavioral inference, and present a tool called parComp which infers the 
composite behavior of a collection of hopCP modules. A formal definition of parComp and 
strategies to contain its complexity are discussed in this paper.
Reachability analysis of the inferred behavior with respect to the pair of actions in question 
reveals whether they are serial or concurrent. Naive reachability analysis can lead to two 
undesirable scenarios: (i) Combinatorial explosion in the size of the reachability graph, and (ii) 
In general one may have to deal with potentially infinite states in the reachability tree if the 
input flow graphs are not bounded (i.e. k-safe for some fixed k).
In this paper we present techniques to address both the above problems. First, we prove 
that the hopCP flow graphs are 1-safe. This results in a reachability graph with finite states, 
(actually finitely represented symbolic states). Then we present an abstraction mechanism to 
eliminate unnecessary states in the inferred behavior. This results in contraction (abstraction 
through the replacement of subgraphs containing more than one transition with a single tran­
sition) of the reachability graph. Thereafter, algorithm conCur performs efficient reachability 
analysis of the contracted inferred behavior and detects if two actions are serial or concurrent.
Organization of the Paper
In the next section we present scenarios in the asynchronous circuit compilation where the 
proposed strategy can be used to optimize area or speed of the resultant circuits. Then, we 
briefly introduce the language hopCP and the H FG  notation. This is followed by a formal
3
description of parComp  and conCur algorithms. The implementation of the algorithms and 
their execution times on a variety of examples is presented next. Finally, we conclude with a 
summary of the significant contributions of the proposed work and directions for future work.
2 Application Scenarios
We identify the following optimizations scenarios that can result in efficient asynchronous 
circuits, and present a unified suite of static analysis algorithms for addressing these opti­
mizations.
Mapping From Logical to Physical Channels
In a CSP based program, a pair of channels (p!,p?) are used for transmitting data between 
processes based on the rendezvous paradigm. A channel is typically implemented by a set 
of control wires to carry the signaling information and a set of data wires to carry the data 
values encodings. In this framework, two pairs of channels (p\,p?) and (9 !, q?) are semantically 
unrelated: a communication on (p!,p?) does not affect one on (? !,??), and vice versa. Most 
existing approaches to hardware compilation compile two distinct channel implementations in 
hardware to support (p!,p?) and (g!,g?). However this may not be always necessary. Suppose, 
in a certain context, it is guaranteed that communications through the p and q channels are 
serial; then, hardware resources to support these channels can be shared. The question now 
is how to say if p and q are used serially.
In hopCP, we address this problem by assuming that all the channel names in the initial 
hopCP specification denote logical channels. A logical channel may or may not have a direct 
manifestation in the final circuit. The channels in the final circuit are called physical channels. 
More than one logical channel can be mapped into a single physical channel if it is guaranteed 
that the corresponding actions are serial. This optimization has two significant consequences: 
firstly, it makes the specifications more abstract and secondly, it facilitates sharing of buses 
as in standard synchronous circuit synthesis (albeit, it needs extra multiplexors and control 
logic).
4
Concurrent, and Speculative Guard Evaluation
Consider an instance of the alternate command in the description of a module M  in a CSP-like 
language,
( . . . p?x -> Q [] q?y -> R )
In the above expression it not apparent whether the context (environment) of M  would 
generate p\ and q\ concurrently or not. Existing asynchronous compilation methodologies (e.g. 
[BS89]) typically synthesize a circuit that checks for the arrival of the guard communications 
in a round-robin fashion and then pick the first one that succeeds. (For fairness, one could 
remember the last guard which succeeded and staxt the round-robin search from there.) An­
other technique typically employed involves arbiters. However, arbiters are an overkill if the 
guards are indeed serial; if they are not serial, of course an arbiter or a round-robin based 
mechanism is essential. Sequential guard evaluation such as above has a potential disadvan­
tage. It could incur a penalty that is linear in the number of arms of the alternate command. 
An alternate approach is to try  all the guards concurrently and proceed with the guaxd which 
succeeds. But, concurrent guard evaluation has a caveat; one should be able to determine 
statically that the actions in question p\ and q\ are serial and one should have a mechanism 
in hardware to expunge the partial evaluation of the unsuccessful guards. One could adopt 
the strategy suggested in [Ebe89] where a C A L  component is used to concurrently wait for 
the communication actions. (A C A L  component is also known as a decision-wait element in 
literature and is discussed by Molnar et. al in [CEM85]) Concurrent guard evaluation could 
avoid the cost of round-robin checking.
Consider a more complex alternate command which has a mixture of boolean expressions 
and communication actions for guards.
( El,p?x -> P [] E2,q?y -> Q )
Let the evaluation of expression El require a more complex computation (takes more time) 
than that for E2. Further, assume that in one scenario q? is going to synchronize. In that 
scenario, the evaluation of El followed by the checking of the arrival of pi (which is not going 
to succeed anyway in this scenario) is wasted work. A better approach would have been to 
concurrently start the evaluation of El and E2, and when the respective evaluations finish, 
to wait for the respective communication actions, and then when they occur (exactly one is 
guaranteed to occur by the seriality check) to fire the CAL component at its appropriate input.
5
This would allow P or Q (which matters in the given scenario) to be started the earliest.
In the example scenario presented above, the evaluation of E l was wasted work because 
only q\ was guaranteed to occur. This did not slow-down the evaluation of E2 because it 
is assumed that dedicated hardware exists for the computation of E l and E2. In functional 
programming literature, this kind of evaluation is referred to as speculative evaluation. Spec­
ulative evaluation is not a novel idea in hardware design either. For example, the familiar 
carry-select adder is based on it. What we propose in this paper is a mechanism to support 
speculative evaluation in a asynchronous compilation framework via static analysis of the 
behavioral descriptions.
Sharing Resources
Sharing resources by detecting seriality constraints is a well-known idea in high-level synthesis 
of synchronous circuits. We propose to integrate it in a asynchronous compilation methodology 
by appealing to the same static analysis tools as for the other optimizations discussed in this 
section. Briefly, sharing of resources by detecting seriality constraints can be explained as 
follows: Consider two invocations of the “+” operation in a flowgraph representation used in 
high-level synthesis. W ith respect to this flowgraph these two invocation can be supported 
by one physical adder if it can be guaranteed that these invocations are serial in all possible 
circumstances (for all input data values, system states and system environments).
Efficient Implementation of Shared Variables
In hopCP we support a restricted form of asynchronous communication by using shared port 
variables called asynchronous ports. In general, CSP-like languages do not allow shared vari­
ables. Occam allows shared variables but does so in a limited way: a variable can be written 
into only in serial threads; then, when the serial thread splits into concurrent threads, the 
variable may only be read in the concurrent threads. The usage of shared variables allowed 
by hopCP is more general, with the proviso that exactly one hopCP module owns an asyn­
chronous port A (i.e. it can write into it), and, all other modules use A in a read-only manner. 
Asynchronous communication via shared variables enhances the expressive power of hopCP. 
It enables us to model common hardware scenarios like busy-waiting (which involves polling 
for a certain condition) and status signals (which are written once and could be read several
6
times by different processes) very elegantly.
Reliable communication through asynchronous ports can be guaranteed only in one of two 
ways:
• guarantee that the writes occur before any reads
• Use an arbiter-based circuit such as an ATS module described in [Kel74]. If the inputs 
to the ATS module are signaled simultaneously, the module acts as if one input, then 
the other, occured. In other words, it serializes the inputs in hardware, at run-time.
Of the two alternatives, the serialization alternative is cheaper as it does not involve the use 
of an arbiter. Hence, the check for seriality is once again central to the efficient implementation 
of asynchronous ports in hopCP.
Section Summary
The common basis for all the four optimizations suggested in this section is the ability to detect 
statically from the behavioral description of the module, whether two given actions are serial 
or concurrent. Naive approaches to the detection of seriality can either lead to combinatorial 
explosion, or can miss many opportunities to detect serial usage. Combinatorial explosion 
can result because many of the techniques to detect seriality are centered around reachability 
analysis paradigm. This is tackled in the hopCP framework by restricting the hopCP flow 
graphs to be one-safe and employing a heuristic-based pruning of the composite hopCP flow 
graphs. The details are presented later in this paper. The second, and a more serious problem 
underlying the feasibility of the above optimizations, is that unless the context (environment) 
of a module is known, it is not possible £o tell if two actions within the module definition are 
serial or not. For this to be done properly, we need a tool to analyze the combined executions 
of a collection of processes that constitutes the system description, and that, perhaps, even 
includes a process to model the abstracted environment. We have developed and implemented 
such a tool called parComp. Briefly, parComp deduces the composite flow graph of a parallel 
composition of several hopCP modules. It is analogous in effect to the expansion rule in CCS 
which statically composes a set of CCS agents. Our algorithm differs from the expansion 
rule in that it handles compound actions, value communication, multiway rendezvous and 
asynchronous communication, which are salient features of hopCP. Details of parComp  are
7
presented in section 4. In the rest of the paper we present the definition, implementation, and 
performance results of the static analysis tools in the hopCP framework.
3 Introducing hopCP
hopCP is a notation to describe a concurrent state-transition system called hopCP Flow 
Graphs, augmented with features to model computation in a purely functional style, and 
mechanisms to support synchronous and asynchronous communication. In hopCP, hardware 
is modeled by a structural entity called a module which contains two parts: (i) a behavioral 
entity which captures the state-transition system describing the hardware in question, and
(ii) a set of communication ports with which the hardware interacts with its environment. 
A hopCP specification has six sections: The MODULE section introduces the name of the 
module being described, the TYPES section introduces the data types of the communica­
tion ports used, and the SYNCPORT section declares all the synchronous communication 
ports used in the specification. A synchronous port allows rendezvous style communication as 
in CSP. The ASYNCPORT section declares all the asynchronous communication ports used. 
An asynchronous port allows value communication communication between two modules with­
out explicit synchronization (note that synchronization with the resources implementing the 
communication may still be necessary). The FUNCTION section contains the user-defined 
functions used in the specification. The functions are written in a first-order functional lan­
guage. The syntax of Standard ML of New Jersey is used. The BEHAVIOR section describes 
the state-transition system which captures the behavior of the hardware system being speci­
fied. The state-transition system being described is called H FG  and is discussed in detail in 
the following section.
Informal Semantics of HFG
T ransition = V +(State) x (CompoundAction © Guard) x V *(State)
State = Proc.Name x LocalStore
A hopCP flow graph is formally defined as a record with two fields:




byte vector 8 of bit
SYNCPORT
a? ,b! : byte
FUNCTION
(* index(a.O) extract the bit at position 0 in the word a, 
update(b,2,0) sticks in a 0 at bit position 2 in the word b
*)




<= a?y -> b!(f x y) -> P [y])
Figure 1: hopCP Specification of a Simple Pipeline Stage
where Guard,  C om pound  Action  are syntactic objects, and V  and  0  denote power set and 
disjoint sum operations on sets , respectively. Guards  in hopCP include input communication 
actions and Boolean expressions, is ta te  denotes the set of initial states of the specification 
and trel  denotes the set of transitions in the H F G  . A transition tr  £  T r a n s i t io n  is a triple 
(pre{ tr ) , a c t ( t r ) , pos t( tr))  where pre( tr)  denotes a set of states called the precondition of the 
transtion, post( tr)  denotes a set of states called the postcondition of the transtion, and act(tr)  
denotes the action of the transition. The execution semantics  of a H F G  are similar to that 
of a Petri-net.  Let tr  G T r a n s i t i o n ; if tr  is enabled (i.e. execution reaches pre(tr ))  then 
the system performs actions act(tr)  and the execution reaches post( tr) .  Note that no notion 
of clocks or tim e is being associated with the performance of the actions act(tr).  Also note 
that if more than one tr  6  T r a n s i t io n  is enabled, they can perform their respective actions 
concurrently.
We illustrate the features of hopCP using the example of a pipeline stage. Figure 1 shows a
9
Figure 2: hopCP Flow Graph (HFG) of hopCP Specification in Figure 1
complete hopCP specification of the pipeline stage. It does not have a ASYNCPORT section. 
It declares an input synchronous channel a and and an output synchronous channel b of type 
byte. Figure 2 denotes the H F G  corresponding to the hopCP specification shown in Figure 1 
and is textually described as follows:
h f gi = { is ta te  =  {(P , [x])}, tre l  = {((P , [x]), a?y, (Q, [x, y])), ((Q,  [x ,y]), bl( f  x y), (P, [j/]))}}
It is more convenient to draw pictures to denote H F G s  where circles denote the control 
state names (P r o c N a m e ) and “bars” denote the actions. The H F G  is interpreted (read) 
as follows: Module ex l is initially in a state (P, [x]) where P  is the control state (known as 
P r o c N a m e )  which is analogous to program counter  in a conventional computer architecture 
terminology while x  is the datapath state (also known as LocalStore)  which is a snapshot  of its 
relevant internal  state. In the state (P, [x]) the system can engage in a communication action 
a?y which will be henceforth referred to as a data query and go to a state denoted by (Q , [x, y]). 
Data query a?y denotes a synchronous  communication action: read from synchronous  input 
channel a. Note that by performing the action a?y the internal state of the system  is modified 
to include the value received on channel a which is reflected by the presence of the variable y in 
the state (Q , [x ,y]). We could have a synchronous communication action without value com­
munication, i.e. merely a? which is referred to as input control action. We have just described 
the execution of the system via the transition ((P, [x]),a?y, (Q , [x,y])). As a consequence of 
this execution we find that the transtion ((Q, [x, y]), bl( f  x  y), (P, [y])) is enabled. In the state 
((5 ,[x ,y ]), the module can perform the communication action b \ ( f  x  y)  and proceed to the 
state denoted by (P, [y]). blexpr,  where expr  6  E X P R , (domain of expressions allowed by
10
hopCP syntax) is said to be a data assert ion  and represents the synchronous  communication 
action of outputting  the value denoted by the expression expr  on the channel b. A data asser­
tion without value communication, for example just b\, is called an output  control action.  In 
the example, expr  is the application of user-defined function /  on arguments x  (original inter­
nal state) and y  (received as a consequence of the action a?y). The function /  could involve 
arbitrary computation and is expressed in a purely first-order functional language. hopCP has 
a wide repertoire of bit-level manipulation routines commonly used in hardware systems like 
I s h i f t , r s h i f t , e x o r , subvec tor , index -vec to r , update -vec tor , p a r i t y  etc. (P ,■[?/]) denotes 
the fact that the system  goes back to the same control state P  (as the initial state) but the 
datapath state is now y  instead of x.  In a programming language sense, this could be viewed 
as invoking a function P  with aa actual  parameter y  for the form al  parameter x.
Salient Features of hopCP
The other significant features of hopCP which could not be illustrated in the above example 
are: .
Compound Actions: A compound action ca in hopCP is a set of primitive actions a1? a2, . .  -, am 
with the restriction that all a ,,a j £ ca should be non-interfering i.e. no two a, and aj  
should use the same channel or try to update the same datapath variable. The underlying 
semantics of a compound action is that of a fork-join construct.
Multiway Rendezvous'. Multiway rendezvous is said to occur when more than two processes 
(modules) wait for each other (synchronize) and communicate. Multiway rendezvous 
is a powerful notion which facilitates the specification of a wide variety of concurrent 
algorithms very naturally [Cha87]. It subsumes broadcast  style of communication (point 
to multipoint communication) which is very natural in hardware.
Asynchronous Communication: This is facilitated by special channels called asynchronous  
ports.  The details of its scope and implementation were discussed in the previous section.
Functional Sublanguage: A significant feature in hopCP is the facility to specify computa­
tional aspects of hardware behavior in a functional  language. This allows us to get 
maximally  parallel implementations for datapaths, and also facilitates formal reasoning 
about hopCP specifications.
11
4 Parallel C om position
The “||” operator specifies concurrent behavior in hopCP. It defines the interaction of inde­
pendently specified hopCP specifications. In this section we will define interaction of hopCP 
specifications by describing a tool called parComp.  hopCP modules interact via commu­
nication actions. The interaction could be synchronous (via handshake or rendezvous) or 
asynchronous (via global store). Synchronous interaction is possible when modules are willing 
to perform complementary actions (query/assertion) on the same channel. When the number 
of modules willing to perform a query is equal to one for a given assertion we have a two-way 
rendezvous (similar to CCS for example), when the number of modules willing to perform a 
query is more than one for a given assertion we have a multiway rendezvous. Multiway ren­
dezvous is a powerful notion to express several hardware-oriented algorithms very naturally. 
In the next section we will provide the formal definition of p a r C o m p  and illustrate it with an 
example.
Formal D efinition  o f parComp
First we define an auxiliary function conjugate  which checks if two primitive action can 
synchronize or not. Two primitive actions synchronize when they are complementary and 
both use the same channel. For example, co n ju g a te (a lx ,a \ (p  +  1)) and conjugate(a?,a\)  
yields true  while c o n j u g a t e s ' l l ,  b\(p-\- 1)) or con juga te (a lx ,  a l y )  or con juga te (a \x , a !(p +  1)) 
yields f a l se .  A formal definition of conjugate  is om itted to conserve space. In hopCP, 
parallel composition is complicated by the presence of compound actions. This is handled 
by the following definitions which determine whether a pair of compound actions a and b are 
synchronous or asynchronous.
synchronous(a,b) =  x £ a =$■ (3y € b .conjugate(x ,y))
A
x € b => (3y € a .con juga te (x ,y))
asynchronous(a,b) =  ->(3x € a A 3y G b.conjugate(x, y))
12
p a r C o m p  is a function which composes two concurrent state-transition systems ( H F G s ) 
It uses auxiliary functions V a lu e c o m m  to perform value communication and R e ta in  A s O u tp u t  
to facilitate multiway rendezvous. The auxiliary functions are defined as follows where 
( i sD query  x)  and ( i sD asse r t  x)  are predicates which check if the action x  is a data query or 
a data assertion respectively.
R e ta in  A s O u tp u t  (a,b)  =  {x | ((x € a A ( i sD query  x) A (3y € b .con jugate (x ,y ) ) )
\ J ( x  £  6 A ( isDquery  x)  A (3y 6 b .con juga te (x ,y ) ) ) )}
V a l u e C o m m  takes a pair of states denoting preconditions of the transitions being com­
posed (sj, 52), a pair of conjugate  actions (a, b) and a pair of states denoting the postconditions 
of transitions being composed and updates the local stores of Sj or s 2 depending on
the actions a and b. It is defined recursively as follows:
V a l u e C o m m  ( s i , s 2 , a, b , s \ , s 2) =
let
a — { a l , • • • , ° n } b =  { 61, . . .  A )
■Sl =  ( P u ^ ) ■S2 =  (^2,^ 2)
1
=  ( A V  i)
/
5 2 =  {P'2 ^ ' 2 )
a, =  C i?X i aj =  d : \e j
b, =  Cilei =  d j l x j
in
if  ((a, £ a) A ( i sD query  a,) A (36< €  b.conjugate(ai , &,)) A ( F D 2, cr2, crg2, e<) = > e t\) 
th e n  V a l u e C o m m  (si ,  s 2, a \  a,, b \  bi, (P l , a ju ./x ,] ) ,  s 2)
else if ((cij €  a) A (i s D a s s e r t  aj)  A (36j €  b .c o n ju g a te (a j ,b j )) A ( F D i ,  01, , ej)  = > e v3) 
then V a l u e C o m m  ( s i , S 2, a \ a j , b \ b j , s l , ( P 2, a 2[vj /xj ] ) )  
else (5n 52) 
end if
hf g i  =  { i s t a t e  — i s \ , t r e l  =  tr-i)
Using th e  above auxiliary functions defined above, ( p a r C o m p  h f g \  h f g 2 ) =  h f g z  where
13
hfgv  =  {i s ta te  = i s 2 , t r e l  = tr?} 
h f g 3 = {i s ta te  = i syU  i s 2 , t r e l  = t r 3)
and t r 3 is inductively defined by the following rules. In defining t r 3, we shall build two 
“temporary” sets of transitions tr[ and tr'2 also. Rules for building tr \  and tr'2 are also given.
All three sets ( ir |, tr'2, and t r 3) are inductively defined by the following rules.
(i) : One rule for building tr[
t £ try
t £ tr \  _
(ii) : Another rule for building tr'2
t  £ t r 2 
t £ tr'2
(iii) : One rule for building t r 3: Case “Total Synchronization”
(s i, a, s ’j) £ tr[ A (52, b, s'2) € t r 2 A synchronous(a ,  b)
(«i U S2, R e ta in A s O u tp u t (a ,b ) ,  V a l u e C o m m  ( s i , s 2 , a , b , s \ , s ' 2)) £  t r 3
This rule is applied when the compound actions a and b synchronize completely. V a l u e C o m m  
performs the value communication across the H F G s  while R e t a in A s O u tp u t  retains the 
output counterparts of the synchronized actions to facilitate multiway rendezvous.
(iv) : The other rule for building tr3: Case “No Synchronization”
(s i ,a ,  Sj) €  tr\  A ($2, 6 ,s'2) £  tr'2 A a synchronous (a , b)
{ (^ i,a ,s 'J ,(5 2, 6, 55)} C t r 3
This rule is applied when none of the constituents of a and b can synchronize. It reflects 
the fact that both the transitions can be done concurrently. Note that we are not 
interleaving the transitions. This rule plays a very key role in keeping the space and 
tim e complexity of parallel composition linear in terms of the number of states. To 
give an example of its significance, if H F G  hi has m  states and H F G  h2 has n states, 
and if all the actions of h Y and h2 are different, then the total number of states in 
p a r C o m p ( h \ , h 2) is 0 ( m  +  n)  as opposed to 0 ( m  x n)  which a interleaved rule would 
give.
14
(v) : The last rule for building tr[ and tr'2: Case “Partial Synchronization”
L et  a =  d! U 02 and b =  bi U &2 
Then,
(($1, a, $1) €  tr \ )  A ((s2> b, Sj) €  ^ 2 )  A syn ch r o n o u s (a i , &!) A a s ynchronous (a2, &2) 
p r e f i n e ( s  1, a, a 1? a2, C A p r e f i n e ( s 2 , b, 61? i 2, -s^ ) C
where
p r e f i n e ( s i , a ,  a l5 a2,5 j) =  { (s i ,  a', {s*a, , s „ J ) ,  ( s ^ , a 1; ),
prefine(s-i,a ,  0, a2, s^) =  { ( $ i ,a ,$ i ) }  .
p re fin e(s\ ,  a, ai, 0,5^) =  { (5 1 ,0 ,3 1 ) }
This rule handles the remaining case which is not handled by (iii) and (iv) i.e. when 
compound actions a and b synchronize partially. The definition is based on the inter­
pretation of compound actions as sets of primitive actions and pattern-matching on the 
structure of the compound actions. We partition a and b to extract the components 
(which are themselves compound actions) which synchronize and which do not synchro­
nize and recursively invoke rules (i) and (ii). The partitioning is done by appealing 
to the refinement of a compound action into its corresponding H F G  (describing its 
fork-join structure). The states (5  ^ , s ' , ) and actions (a \ a c) introduced in the
p r e f i n e  definition have significance in the synthesis of circuits from hopCP specifications 
[AG91 a].
The following example illustrates parComp.  Consider, the specification of two-stage 
pipeline obtained by composing two copies of one-stage pipeline illustrated in figures 1, 2. 
The composite H F G  of the two-stage pipeline is obtained by applying pa r C o m p  on the in­
dividual H F G s  describing the single stages which have two transitions each. The resultant 
H F G  is shown in figure 4. Transitions involving a l y l  and c\(g x2 y 2) are retained as they are 
because they do not interact (synchronize) with any other actions. This is an application of 
rule 1 in the definition of parComp.  The actions bly2  and b\( f  x l  j/l) do interact, so they are 
merged by into a single transition involving synchronization and value communication. This 
illustrates the application of rule 2 in the definition of parComp.
MODULE ex3
SYNCPORTS a ? , b !,b? , c  ! byte
FUNCTION
fun 1 a 
fun g a
b = if (index(a,0)=l) then update(b,2,0) else b; 
b = ii (index(a,0)=0) then update(b,2,0) else a;
BEHAVIOR





<= b?y2 -> c !(g x2 y2) -> Q [y2])
Figure 3: hopCP Specification Illustrating Parallel Behavior
Figure 4: Inferred Behavior of 2-Stage P ipeline U nit
16
5 D etec tio n  o f Seriality
Given two actions a and b and the composite H F G  h obtained through pa r C o m p  (which 
denote the inferred behavior) algorithm conCur  determines if a and b axe serial or concurrent. 
conCur  proceeds in two phases. In the first phase, all the reachable configurations of h are 
deduced by the procedure R C  and in the second phase, we check if there is any configuration 
in the set of reachable configurations of h such that actions a and b are simultaneously enabled. 
This is accomplished by procedure Concurrent .  Before we present the formal definition of 
R C  and C oncurren t  based on H F G s  , we digress a little bit and show that H F G s  generated 
by hopCP syntax are one-safe.
R elationship  w ith  P etri N ets and O ne-Safety
There is a close similarity between Petri-nets and the hopCP Flow graphs. The basic hopCP 
flowgraphs defined without using the || operator correspond to finite-state  machines in the 
Petri-Net terminology [Pet81]. The only difference is the presence of compound actions which 
introduce local fork-join structures and support restricted form of non-interfering parallelism. 
W ith the || operator, one could specify H F G s  which are more general than finite-state ma­
chines because of the ability to represent synchronization, value communication, and concur­
rency.
By one-safeness we imply the standard definition from Petri-net theory applied to H F G s  
interpreted as Petri-nets. This involves interpreting the control states as places and the actions 
as transitions. To show that H F G s  are one-safe, one can appeal to the Petri-net analogy 
hinted above. The basic H F G s  (without || operator) are one-safe because they are finite-state 
machines. The local fork-join structures introduced by compound actions still preserves one- 
safeness. p a r C o m p  basically retains unsynchronized transitions, or merges transitions which 
synchronize, both of which preserve one-safeness. Hence, the H F G s  corresponding to the 
inferred behavior which will be presented to the conCur  tool are guaranteed to be one-safe.
R eachable Configurations
A configuration is defined as a set of control states of a H F G  . To keep the analysis simple, we 
do not consider data-dependent behavior. We assume that all the data related guards could
17
evaluate to true rendering our analysis slightly pessimistic. Let h €  H F G , h . i s ta te  and h.trel  
denote the set of initial states and the set of transitions in h, and RCh  denote the set of all 
reachable configurations from h.istate.  RCh  is defined inductively as follows:
tr ue  ( a ^ a ,^ )  € h.trel,  (By e  R C h.s i C y )
h.i s ta te  € RCh  $1 U (y \  s i)  G RCh
The first rule is the basis case, while the second rule computes the set of reachable configura­
tions recursively by checking for all possible enabled transitions in a given configuration and 
incorporating them in the set of reachable configurations. One-safeness of the H F G s  corre­
sponding to the inferred behavior ensures that the set of reachable configurations is finite.
Using, the definition for RCh,  we define a predicate C oncurren t  which takes two actions 
a and b and a H F G  h and checks if a and b are concurrent in h or not.
C oncurren t  a b h =  {{{si,  a, s[), (s2, b, s'2)} C h. trel)
A  (si n  52 =  0)
A  (3y e  R C h \ s  1 u s 2) ^  y)
The predicate Concurren t  first checks if the two actions a and b belong to the sort of 
the module in question and that the actions are not mutually exclusive, then it scans the 
set of reachable configurations in h to see if there is a configuration in which a and b can be 
simultaneously enabled.
The major bottleneck in the strategy suggested thus far, is the explicit generation of the 
set of reachable configurations. For a realistic circuit with over hundred states in the inferred 
behavior the computation of RCh  could be expensive in tim e and space. To circumvent this 
problem we suggest a heuristic next
H euristics for Pruning the Inferred Behavior HFG
M o tiv a tio n
Consider the H F G  shown in figure 5a. It denotes the inferred behavior of hopCP mod­
ules Mi and M 2 whose behavior is given by A [] <= (a? -> b? -> A [] I (c?  -> 
( d ! , e ? )  ->  A [ ] )  and B [] <= f !  ->  (g? -> B [ ] )  I (d? -> h! ->  B [ ] )  Let 
us assume we are interested in finding out whether e? and f ! are serial or not. Note that
18
the inferred behavior has 10 states (denoted by circles). To determine if e? and f ! are serial 
or not, it is not necessary to consider the possible execution of the system  through sequential 
states such as 53. The only interesting states that can influence the causality of actions axe 
those that involve synchronization or choice. The pruning  heuristic is based on the above 
observation. It basically involves, eliminating (or abstracting) states that are not relevant 
with respect to the actions in question. For example, employing our heuristic (which will be 
described shortly) one can get an contracted H F G  shown in figure 5b. Note that we could 
elim inate 3 states. (Note: we have so far analysed and implemented only a subset of the 
possible heuristics to optim ize the inferred behaviors)
A lg o r ith m  for P ru n in g
The transformation implemented by the pruning heuristic on a inferred behavior h €  H F G  




h.trel  — h.trel
h.t 1 — h. t2, h. t2 — h-i3
h . t i  — *t h . t ^
This rule says that — is reflexive and transitive
(■s^a^-s'i) €  h.trel,  ( s ,b u s 2) €  h.t rel , ( \  s [= l ) , ( s  C s[) , (b i  ^  a ^  b) ,(h . is ta te  n s =  0)
h.trel  — >t (h . t re l  \  {($ 1 , a i, s |)  € h.trel ,  (5 , bi, s 2)}) U{(-Si, <*1 , (si \  s U s 2))}
Figure 5: Illustra ting  P run ing  of th e  hopC P Flow G raphs
(ii)
19
This rule captures the elimination of s in H F G  h. We first check if the transitions 
(s i, a i, Sj) and (s, 61, s 2) are sequential, and then make sure that s does not enable either 
action a or b. If both the conditions are satisfied we replace the sequential transition 
pair by a single transition after suitably modifying its postcondition.
A lgorithm  conCur
In this section we will summarize the overall scheme to detect if two actions a and b are serial 
or not in the executions of a hopCP module M  which contains submodules M i, M 2, . . .  Mn. 
Algorithm conCur  has four major steps:
S T E P  1 Derive the composite H F G  h by applying p a r C o m p  to the collection of H F G s  
/i 1, /i2 • • ■ ^ni I-®' h =  ^1 || ^2 || • • ■
S T E P  2 Apply the pruning heurisitic to h with respect to actions a and b to derive hp,
i.e.h — >1 hp (hp is the normal form for the transformation process via — ►*)
S T E P  3 Apply algorithm RC>, to generate the set of reachable configurations in hp
S T E P  4 Invoke the procedure (Concurren t  a b hp)
6 Im plem en tation , R esu lts and D iscussion
All the algorithms discussed in this paper have been implemented and tested on a wide suite 
of examples, in the hopCP design environment. The implementation is in Standard ML of 
New Jersey (version 0.66). The hopCP descriptions are parsed using SML-Lex/Yacc and 
converted into H F G s  following the operational semantics discussed in [AG91b]. A compiled- 
code concurrent functional simulator called C F S I M ,  which facilitates functional simulation 
of hopCP specifications exists. Once the specifications are simulated satisfactorily, they are 
ready to be compiled into asynchronous circuits using a technique called action-refinement.  
Details of action-refinement are reported in [AG91a]. The techniques discussed in this paper 
are designed to make action-refinement  more efficient.
Figures 6, 7, 8 show the performance of the algorithms developed in this paper on a few 
examples. 2-Stage Pipeline Unit is the specification discussed in section 4, Mutex  is the spec­
ification of a simple mutual-exclusion protocol in hopCP, UsartMain, UsartRcvr, UsartXmit
20
Circuit #  States $  Transitions Time (secs)
1 2-Stage Pipeline Unit 4 3 0.05
2 Mutex 9 6 0.19
3 UsartMain 33 34
4 UsartXmit 43 52
5 UsartRcvr 41 47
6 UsartMain || UsartXmit || UsartRcvr 136 136 79.38
7 CpuMain || ExtRcvr || ExtXm it 15 15 0.33
8 UsartMain || UsartXmit || UsartRcvr || 
CpuMain || ExtRcvr || ExtXm it 166 154 141.42




Af ter  Pruning
Time fo r  
Pruning (secs)
1 2-Stage Pipeline Unit 3 3 0.0001
o Mutex 6 6 0.01
3 UsartMain 34 17 0.13
4 UsartXmit 52 38 0.23
5 UsartRcvr 47 38 0.19
6 UsartMain || UsartXmit || UsartRcvr 136 107 4.19
7 CpuMain || ExtRcvr || ExtXm it 15 4 0.02
8 UsartMain || UsartXmit || UsartRcvr || 
CpuMain || ExtRcvr || ExtXm it 154 133 7.29
Figure 7: Illustration  of M axim al P runing
Circuit Typical Running Tim e (secs) with optimization  
for a pair of randomly  chosen actions





6 UsartMain || UsartXmit || UsartRcvr 12.09
7 CpuMain || ExtRcvr || ExtXm it 0.04
Figure 8: Typical Performance of the Seriality Detection Procedure
are the hopCP specifications of the three main components of the Intel 8251 USART, a com­
mercial VLSI chip. Details are presented in [AG91c], CpuMain  is the CPU interface to the 
Usart, ExtRcvr  and E x tX m i t  are the external serial devices communicating with the Usart. 
Figure 6 shows the performance of algorithm parComp.  Note that the timings for UsartXmit,  
UsartRcvr  and UsartMain are omitted because they do not involve the “||” operator. Also, 
note the absence combinatorial explosion of states/transitions in the composite behaviors 6 
and 8. This is a consequence of the non-interleaving semantics of || operator in hopCP. Fig­
ure 7 shows the effect of the pruning  heuristic. The results reflect maximal  pruning. Maximal 
Pruning gives us an upper-bound on the number of transitions that can be eliminated by our 
heuristic. Note that the pruning algorithm has no effect on circuits 1 and 2; this is because 
there are no nodes which satisfy the conditions outlined for pruning. There is a considerable 
saving in the number of transitions for the other circuits. Figure 8 shows the typical per­
formance of the overall algorithm for a pair of randomly chosen actions. Note that even for 
the USART example (which is fairly large), the tim e consumed is only in tens of seconds. 
Timing measurements were conducted on a implementation in SML (Version 0.66) running 
on a Sparc-IPC.
7 C onclusion  and Future W ork
We observe that statically determining whether two high-level actions are potentially concur­
rent or not, could lead to a variety of area-time optimizations in the high-level synthesis of
2 2
asynchronous circuits. We presented a formal technique to conduct this analysis within the 
hopCP framework. We also discussed the implementation of the algorithms and their perfor­
mance on a few realistic examples. The techniques developed in this paper are fairly general 
and could be applied to the circuits generated by other asynchronous compilation approaches 
such as [BS89]. These techniques were employed successfully to discover many useful facts 
about the US ART specification. These include the ability to map several logical channels to a 
single physical channel, concurrent accesses (unsafe) to asynchronous ports, and determinacy 
of guards. This information was exploited to improve the specification and will be used in the 
action-refinement based synthesis scheme. Currently we are engaged in incorporating these 
algorithms into the action-refinement  compilation strategy centered around hopCP.
R eferences
[AG91a] Venkatesh Akella and Ganesh Gopalakrishnan. Hierarchical Action Refinement: A 
Methodology for Compiling Asynchronous Circuits from a Concurrent HDL. In Pro­
ceedings o f  the Tenth International Sympos ium on Computer  Hardware Description 
Languages and their Applications, Marseille, France, April 1991.
[AG91b] Venkatesh Akella and Ganesh Gopalakrisnan. hopCP : A Concurrent Hardware 
Description Language. Technical report, Department of Computer Science, Univer­
sity of Utah, 1991. Under Revision; Latest Version available upon request from the 
authors.
[AG91c] Venkatesh Akella and Ganesh Gopalakrisnan. Specification and Validation of a 
USART in hopCP. Technical report, Department of Computer Science, University 
of Utah, 1991. In preparation; available upon request from the authors.
[BS89] Erik Brunvand and Robert F. Sproull. Translating Concurrent Communicating 
Programs into Delay-Insensitive Circuits. In International Conference on Computer-  
aided Design, IC C A D  89, April 1989.
[CEM85] F.U. Rosenberger C. E. Molnar, T.P. Fang. Synthesis of Delay-Insensitive Modules. 








[Cha87] Arthur Charlesworth. The Multiway Rendezvous. A C M  Transactions on Program­
ming Languages and Sys tems , 9(3):350-366, July 1987.
j
Tam-Anh Chu. Synthesis  o f  Sel f-timed V L S I  Circuits f rom  Graph Theoretic Speci­
fications.  PhD thesis, Department of EECS, Massachusetts Institute of Technology, 
September 1987.
Raul Camposano and Wayne Wolf. High-Level V L S I  Synthesis.  Kluwer Academic 
Publishers, 1991. ISBN-0=7923-9159-4. •
Jo C. Ebergen. Translating Programs into Delay Insensitive Circuits. Centre for 
M athematics and Computer Science, Amsterdam, 1989. C W I  Tract 56.
Robert M. Keller. Towards a theory of universal speed-independent modules. I E E E  
Transactions on Computers , C -23(l):21-33, January 1974.
Alain J. Martin. Programming in VLSI: From Communicating Processes to Delay- 
Insensitive Circuits. Technical Report Caltech-CS-TR-89-1, Department of Com­
puter Science, California Institute of Technology, 1989.
James L. Peterson. Petri  Net  Theory and The Modeling O f  Systems.  Prentice-Hall, 
1981.
24
9
