Opportunities and Challenges in Process-algebraic Verification of Asynchronous Circuit Designs  by Wang, X. et al.
Opportunities and Challenges in
Process-algebraic Veriﬁcation of
Asynchronous Circuit Designs
X. Wang M. Kwiatkowska G. Theodoropoulos Q. Zhang 1
School of Computer Science, University of Birmingham
Edgbaston, Birmingham B15 2TT, UK
Abstract
This paper reports our experiences of applying process algebras and associated tools (esp.
CSP/FDR2) to verify asynchronous circuit designs developed in the Balsa environment. Balsa
is an asynchronous logic synthesis system which uses syntax-directed compilation to generate gate-
level implementations from high-level descriptions in a parallel programming language (also called
Balsa). Previously, we have proposed a unifying approach to compositionally verifying Balsa de-
signs across several abstraction levels. This paper continues our eﬀort by applying and testing our
approach on several large-scale real-life case studies. We describe the outcome of veriﬁcation for
the case studies, and also analyse the strengths and limitations of our method.
Keywords: Asynchronous hardware, Hierarchical veriﬁcation, CSP, Model checking, Levels of
abstraction.
1 Introduction
Asynchronous and GALS (Globally Asynchronous Locally Synchronous)[17,8]
design techniques are important alternatives to synchronous, global clock
based design techniques, which achieve synchronization by means of localized
handshake synchronization protocols between the communicating subsystems.
The removal of the global clock can result in highly concurrent, nondeterminis-
tic systems, which render simulation alone inadequate as testing methodology.
1 {X.Wang,M.Z.Kwiatkowska,G.K.Theodoropoulos,Q.Zhang}@cs.bham.ac.uk
Electronic Notes in Theoretical Computer Science 146 (2006) 189–206
1571-0661 © 2006 Elsevier B.V. 
www.elsevier.com/locate/entcs
doi:10.1016/j.entcs.2005.05.042
Open access under CC BY-NC-ND license.
In a previous paper [19] we proposed a formal veriﬁcation approach for
asynchronous hardware systems using Balsa as background, the CSP-based [9]
speciﬁcation and synthesis system developed by the AMULET group at the
University of Manchester [17,7]. Balsa employs syntax-directed compilation
to generate gate-level implementations from high-level descriptions in a paral-
lel programming language (also called Balsa), via an intermediate level called
handshake networks [2]. Balsa is endowed with simulation, but not veriﬁ-
cation, tools. Using the unifying formalism of process algebra CSP [15], our
proposed approach can compositionally verify Balsa designs across all abstrac-
tion levels (ﬁgure 1). Based on some small case studies, we demonstrated how
Balsa programs, handshake networks (a.k.a. handshake circuits) and asyn-
chronous gate circuits can be translated into CSP, which in turn enables the
use of FDR2 [15], the mature model checker for CSP, to serve as the back-end
veriﬁcation tool.
This paper continues our eﬀort by applying and testing our approach on
several large-scale real-life case studies. We describe the outcome of veriﬁ-
cation for the case studies, which, if encoded and checked naively, will im-
mediately cause state space explosion. We show how to solve the problems
within process algebraic theories. Especially, we will apply compositional rea-
soning, data indepnedence theory, and compression functions of CSP/FDR2
framework. Overall we summarize our experience and analyse the strengths
and limitations of our process-algebraic approach. The rest of the paper is
structured as follows. After a brief summary of our hierarchical (composi-
tional) veriﬁcation approach, we studies in details two large-scale veriﬁcation
problems at diﬀerent abstraction levels. The higher level one is based on syn-
chronous communication while the lower level one is based on asynchronous
communication. They need two diﬀerent techniques to verify. Both cases en-
counters state space explosion. One is more associated with hitting the space
limits of FDR, while the other is more time consuming. We solve one of the
problem using native compression functions implemented in FDR while the
other using some compositional reasoning and data independence theory.
2 A unifying hierarchical veriﬁcation approach
Within this asynchronous logic synthesis framework like Balsa, a CSP-based
parallel programming language is usually used to give a high-level algorithmic
description of the design. From such a description, syntax-directed compila-
tion creates a network (composition) of handshake components, where each
language construct in the program is mapped to a corresponding handshake
implementation. Handshake components are usually pre-designed and stored
X. Wang et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 189–206190
Syntax-directed
  Translation
Asynchronous Logic
   Synthesis
Balsa program & 
its behavioural spec
Handshake network 
 & its protocol
Gate level circuit
 & its protocol
CSP Program 
   & CSP Spec
 CSP Network 
   & CSP Protocol
CSP Circuit & 
   CSP Protocol
RAI
RAI
Translation
Translation
Translation
Fig. 1. The hierarchy of abstraction
in a library in the form of gate-level circuit fragments.
Based on the silicon compilation framework, we designed a hierarchical
approach to verifying asynchronous hardware designs. The approach centres
around the two hierarchies in silicon compilation: the hierarchy of abstraction
levels and the hierarchy of component composition. For abstraction hierar-
chy, CSP is appropriate for describing all three levels of system description.
The top level describes a synchronous system like in standard CSP; it utilizes
ﬁne-grained parallel (parallel operator within sequential composition) that is
rarely supported by other model checkers. The two lower levels describe asyn-
chronous systems, which poses challenges to the expressive power of standard
CSP. Based on our novel idea of a scheduler, we ﬁnd that asynchronous sys-
tems can not only be modelled in CSP in a direct and intuitive way, but also
be simpliﬁed to use just the traces model. Since we employ CSP at all levels
of abstraction, it is possible for us to establish a formal semantic link between
the levels, based on various techniques such as behaviour Reﬁnement, Action
reﬁnement, abstract Interpretation (RAI), etc.
For the component hierarchy, it is possible, at all three levels, to reduce a
veriﬁcation problem across component composition hierarchy.
In the synchronous case, it is well known that CSP reﬁnement supports
compositional veriﬁcation. That is, separate speciﬁcation of a component from
its implementation; after checking the reﬁnement, the speciﬁcation can be used
instead of implementation when composing components. In the paper, we call
it speciﬁcation-based reﬁnement checking. The ﬁrst case study, veriﬁcation of
a disrtibuted coloring algorithm, will be based on it.
X. Wang et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 189–206 191
In the asynchronous case of lower levels, they are input/output systems
consisting of a collection of asynchronous components connected by channels.
Each component is associated a protocol dictating all the legal sequences of
input and output event on the connected channels. A channel is associated
with delay. Components communicate by sending/receiving signals with non-
blocking semantics. Although CSP usually enforces synchronisation of input
and output on the same channel into a single event, we choose to model the two
operations separately and use a scheduler to explicitly schedule the delay of
input and output. A scheduler will ﬁrst nondeterministically select an enabled
output (those in the initials of a CSP process) of a sending component and
then force it onto the corresponding input of the receiving component. This
nondeterminism of the scheduler can simulate all the possible delay scenarios
of the system. If a system runs correctly with a scheduler, this implies that
the system runs correctly in all delay scenarios. That is, the system is delay-
insensitive.
An asynchronous input/output system can be an open-circuit system or a
closed-circuit system. An open-circuit system models a composite component
and is often associated with a protocol. When verifying a closed-circuit system,
we run the system in parallel with the scheduler. If it deadlocks when the
scheduler forces the output onto the input (a.k.a. choke), we say the system is
incorrect, in the sense that the system does not constitute a good environment
in which all the protocols of the subcomponents are obeyed. Because the
scheduler participates in the occurrence of every event, any deadlock of the
scheduler will be global, and hence easy to check in FDR. Before verifying
an open-circuit system, we need to use its protocol as the environment of the
system to close-circuit it. The deadlock freedom of the new system will implies
the conformance the open-circuit system with its protocol, thus the name
protocol-based conformace checking. The detailed theoretical foundation of
this method and its formal relation with other asynchronous circuit veriﬁcation
theories has been described in a separate paper of ours [20].
In the rest of the paper, we will concentrate on two large-scale studies that
test the scalability of our approach, one at the top level using speciﬁcation-
based reﬁnement checking and the other at the bottom level using protocol-
based conformace checking.
3 A distributed coloring algorithm for control hazards
in asynchronous pipeline
Unlike traditional synchronous pipelined systems operating in lock-steps, con-
trol hazards (e.g. a branch or a jump) in asynchronous pipeline are much
X. Wang et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 189–206192
harder to handle, since the depth of prefetching, namely, the number of in-
structions that have entered the processor and thus must be discarded, is
usually unpredicatble due to the autonomous and decoupled operation of dif-
ferent stages. Therefore, some special algorithm needs to be designed to solve
the problem, e.g. the one for the AMULET1 processor by the AMULET
group at the University of Manchester. Their technique uses a single bit to
“colour” the state of the processor at any particular moment. Each instruc-
tion address issued to memory, carries the current operating colour of the
processor, which will be used to mark the corresponding fetched instruction.
When a control hazard occurs, the colour of the processor changes, causing a
change in the colour of instructions subsequently fetched from the new target
address. The colour bit of an instruction which arrives at the datapath for
execution, is compared with the current colour of the processor. If a match is
found, the instruction belongs to the current valid instruction stream and is
thus executed, otherwise it is discarded.
However, one important limitation of the above mechanism is that control
hazards are not allowed to occur at more than one stage in a distributed,
non-deterministic fashion. This excludes its application to system like MIPS,
where control hazards may potentially occur in more than one stage (e.g.
where conditional branches maybe taken in the EXE stage while unconditional
jumps are executed in ID stage) [13].
To overcome the problem, two of the authors in [18] proposed a generic
“colour” mechanism and its distributed algorithm for handling concurrent
control hazards in asynchronous MIPS-based pipelined processor designs.
The algorithm is based on two fundamental observations:
• The state of the system is distributed.
• Stages that are deeper in the pipeline have higher priority than stages before
them. In other words, a control transfer event that occurs at a pipeline stage
renders other events that may occur in pipeline stages earlier in the pipeline
irrelevant and invalid, event if the latter precede the former in time.
Based on the above two observations, in the proposed scheme the colour
state of the processor at any particular moment is deﬁned as a vector c =
(c1, c2..., cn) in the set C
n , where C is the set of colours C = {0, 1}, n is the
number of stages in the pipeline and ci is the colour of the stage i . Priority
of ci > Priority of cj , i > j . The detailed original algorithm is described in
Balsa [18], i.e. at the top abstraction level. For brevity, below we will only
use its CSP translation (i.e. CSPm, the script language used by FDR2 [15])
for explanation. For an introduction of Balsa language and its translation to
CSP, readers are referred to [19].
X. Wang et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 189–206 193
S1 S2 S3 S4
A
A
U
PC
+
M
e
m
o
r
y
(C1C2C3C4) (C1C2C3C4) (C1C2C3C4) (C1C2C3C4)
InsAdd
Ins0 Ins1 Ins2 Ins3 Ins4
PCAdd
NTAdd1 NTAdd2 NTAdd3 NTAdd4
Fig. 2. The pipeline
stg = 4 -- number of stages
sz = 5 -- memory size
nametype STG = {1..stg}
nametype COL = {0,1}
CV(0) = {<>} -- <> is empty sequence
CV(i) = {<cl>^cv | cl <- COL, cv <- CV(i-1)}
-- operator ^ is sequence catenation.
nametype COLVEC = CV(stg) -- the color vector data type
zeroseq(0) = <>
zeroseq(i) = <0>^zeroseq(i-1)
zeroCV = zeroseq(stg)
-- the color vector with all elememnts being 0.
eq(cv,cv’,i) -- testing equality on i-th bits of vectors
= if i == 1
then head(cv) == head(cv’)
else eq(tail(cv),tail(cv’), i-1)
change(cv,i) -- change value at i-th bit of a vector
= if i == 1
then <1-head(cv)>^tail(cv)
else <head(cv)>^change(tail(cv), i-1)
The algorithm will be described based on the following micro-architecture.
X. Wang et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 189–206194
The instruction set is abstracted into stg + 1 types of instructions. stg
of them will cause hazards at respective stage of the pipeline. The last one
models non-hazard instructions. An instruction will contain two ﬁelds, one
for the type information while the other is an address.
nametype INSCODE = {0..stg}
-- 1..stg for instruction causing hazard at that stage
-- 0 for non-hazard instruction.
nametype ADDR = {0..sz-1}
nametype INS = (INSCODE, ADDR)
haz((0,y)) = false -- if an instruction is hazardous
haz((x,y)) = true
addr(z) = let (x,y) = z within y
-- get the address field of an instruction.
icode(z) = let (x,y) = z within x
-- get the type code field.
channel Ins : {0..stg}.INS.COLVEC
-- channel with 3 data fields; the first acts as index.
channel NTAdd : STG.ADDR.COLVEC
channel InsAdd : ADDR.COLVEC
channel PCAdd : ADDR
Each pipeline stage SGi where control hazards may occur maintains a copy
of the vector of the colour state:
channel Read, Write :COLVEC
ccell(cv) = Read!cv -> ccell(cv)
[] Write?cv’ -> ccell(cv’)
-- operator [] is external choice
-- operator -> is prefix
SG(i) = ( SGCtr(i) [|{|Read, Write|}|] ccell(zeroCV)
) \ {|Read, Write|}
-- operator ’[|{|Read, Write|}|]’ is interface parallel
-- on Read and Write channels.
-- operator ’\ {|Read, Write|}’ is hiding the two channels.
But it is in charge of managing only the element that corresponds to it (ﬁgure
X. Wang et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 189–206 195
2). For each new instruction that arrives for execution at a stage SGi , its
colour state vector is compared against the state vector of the stage. If any
higher priority colour bit (cj where j > i) in the instruction is diﬀerent than
the corresponding colour bit of the stage, that means that the instruction is
the ﬁrst of a transfer address as a result of a control hazard that has taken
place deeper in the pipeline.
-- compare the colour bits of two vector above i
cmpr(cv,cv’,i)
= if i == 0
then if cv == <>
then true
else head(cv) == head(cv’) and cmpr(tail(cv),tail(cv’), i)
else cmpr(tail(cv),tail(cv’), i-1)
Thus, the stage lets the instruction through and now the state vector of the
instruction becomes its own.
If the stage’s own colour bit (ci) is diﬀerent than the corresponding bit in
the instruction vector then, this instruction is one of the instructions following
an instruction that has already caused a control hazard in the stage, and
therefore the instruction is rejected. Otherwise, the instruction is executed
and the state vector of the instruction becomes its own.
SGCtr(i) = Ins.i-1?ins?cv -> -- input instruction
Read?cv_r -> -- read the local colour vector
if (not cmpr(cv_r, cv, i)) or eq(cv_r, cv, i)
then if icode(ins) == i -- hazardous instruction
then let cv’ = change(cv,i) within -- change color
Write!cv’ -> NTAdd.i!addr(ins)!cv’ -- sent to AAU
-> Ins.i!ins!cv’ -> SGCtr(i) -- go through
else Write!cv -> Ins.i!ins!cv -> SGCtr(i) -- go through
else SGCtr(i) -- discarded
Since target addresses (NTAdd) may be generated at any time by any
pipeline stage, this scheme assumes the existence of an arbitration unit, re-
ferred to as the Address Arbitration Unit (AAU) in the ﬁgure. The Address
Arbitration Unit issues all instruction address information to memory, namely,
sequential instruction addresses as they arrive from PC (normal operation) or
from the pipeline stages (in the case when a control hazard occurs). PC will
need to moniter addresses issued by AAU and updates itself to keep in pace
with the newest control hazards.
next(x) = (x + 1) % sz
PC(x) = PCAdd!x -> InsAdd?x’?cv -> Ins.0?y.cv -> PC(next(x’))
[] InsAdd?x’?cv -> Ins.0?y.cv -> PC(next(x’))
In the case of control hazards, the role of the AAU is to let through to memory
X. Wang et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 189–206196
instruction addresses that are the result of high priority control hazards, while
blocking any subsequent lower priority target addresses from reaching memory
and thus interrupting the high priority instruction stream. To achieve this,
the AAU keeps a record of the colour state of the processor, which it updates
based on the colour vectors of the instruction addresses arriving to it.
Hazard = ( NTAdd?i?x?cv -> Read?cv_r -> -- hazard recieved
if cmpr(cv_r, cv, i)
then (InsAdd!x!cv -> Write!cv -> AAUCtr)
-- high priority go through
[] NTAdd?j?x’?cv’ ->
-- concurrent hazard recieved
if j > i -- high priority wins
then InsAdd!x’!cv’ -> Write!cv’ -> AAUCtr
else InsAdd!x!cv -> Write!cv -> AAUCtr
else AAUCtr ) -- low priority discarded
AAUCtr = Hazard
[] PCAdd?x -> Read?cv_r -> -- next instruction
( (InsAdd!x!cv_r -> AAUCtr) [] Hazard )
AAU = ( AAUCtr [|{|Read, Write|}|] ccell(zeroCV)
) \ {|Read, Write|}
This completes the CSP description of the distributed coloring algorithm.
3.1 Veriﬁcation in FDR2
To verify the above algorithm using FDR2, other components of the system,
e.g. memory (storing program), and a reference processor (which is the cor-
rectness criteria) need to modelled in CSP as well.
mcell(x,i) = InsAdd.x?k -> Ins.0!i!k -> mcell(x,i)
-- a memory cell storing one instruction.
serializer = InsAdd?x?cv -> Ins.0?i?cv -> serializer
-- to serialize the accesses to memory.
mem’ = (||| x : ADDR @ |~| i : INS @ mcell(x,i))
[|{|InsAdd, Ins.0|}|] serializer
-- an array of non-deterministically initialized cells.
-- operator ’||| x : ADDR @’ is indexed interleaving over ADDR.
-- operator ’|~| i : INS @’ is indexed internal choice over INS.
seq(x) = InsAdd?y?cv -> Ins.0?ins.cv ->
if y == x
then Ins.stg!ins?cv’
-> if haz(ins) then seq(addr(ins))
X. Wang et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 189–206 197
else seq(next(x))
else seq(x)
sys =
( mem [|{|InsAdd, Ins.0|}|]
( AAU [|{|NTAdd, PCAdd, InsAdd|}|]
( PC(0) [|{|Ins.0|}|]
(|| i:STG @ [{|Ins.i,Ins.i-1,NTAdd.i|}] SG(i))
)
)
) [|{|Ins.stg, Ins.0, InsAdd|}|] seq(0)
-- operator ’|| i:STG @ [{|Ins.i,Ins.i-1,NTAdd.i|}]’
-- is indexed alphabetized parallel over STG.
assert sys :[deadlock free [F] ]
The reference processor moniters the input of the pipeline (i.e. Ins.0 from
memory), executes instruction stream sequentially, and compare its output
with that of the pipeline by synchronizing on a common output channel (i.e.
Ins.stg). In case that a mismatch happens, the whole system will deadlock.
Thus deadlock freedom implies the correct implementaion of the reference pro-
cessor by the pipeline processor, and so the correctness of distributed coloring
algorithm.
After supplying a speciﬁc value for the parameters like the number of stages
(stg) and the size of the memory (sz), we start FDR2 to check the system 2 .
With stg = 4 and (sz <= 4, we successfully veriﬁed the deadlock freedom
using a reasonable amount of time and space. However, when sz reaches 5,
state space explosion immediately happens. After running several days and
the consumption of almost 15G on memory and harddisk, the check ﬁnally
has to be aborted.
Obviously, the memory is the main culprit of the state explosion. To solve
the problem, we need to do some compositional reasoning in CSP. mem will be
replaced by one of its abstraction mem’, i.e. mem’  mem in CSP terminology.
mem’ = InsAdd?x?cv -> |~| i : INS @ Ins.0!i!cv -> mem’
In mem’ and mem, if type INS is decomposed into INSCODE and ADDR, there
will be three types: ADDR, COLVEC and INSCODE. ADDR is parameterized by sz
while the latter two by stg. Interestingly, while COLVEC and INSCODE are used
data-independently (Theorem 15.2.1 in [15]) in mem and mem’, ADDR can be
2 Details and full scripts in CSPm can be found at [22].
X. Wang et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 189–206198
reduced by a simple data independent induction [5] as well. Therefore the
generic problem of mem’  mem for all possible stg and sz, can be reduced to
the problem of mem’  mem with stg and sz of small ﬁxed values, which can
be trivially discharged using FDR2.
Since mem’  mem holds generically, replacing mem by mem’ in sys will only
increase the chance of deadlock due to the monotonicity of CSP operators [15].
Thus, the new system is deadlock free implies sys is deadlock free, and the
converse may not be true.
Since mem’ has just two states, the new system dramatically reduced the
state space. Even more intriguingly, in the new system the type ADDR becomes
weakly data independent [11]. According to theorem 5.1.2 in [11], it can be
reduced to a data type of size 1. For types related to stg, some non-trivial
operation are deﬁned on them. It is beyond the scope of data independence.
Thus, using the model reduced on sz, we has veriﬁed the deadlock freedom
up to 10 stages in FDR2. It should be enough for most applications of the
distributed coloring algorithm.
4 The tree arbiter circuit
In the above case study, one of the crucial component is the Address Arbi-
tration Unit (AAU), which need to do stg+1 way arbitration between the PC
unit and the stg stages of the pipeline. To implement it, Balsa systems utilises
a tree of two-way arbiters compiled into gate-level circuits. Tree arbiters is a
classical examples in asynchronous circuit veriﬁcation and is known to suﬀer
from state explosion problem if advanced state space reduction techniques are
not applied: for example, BDD alone cannot solve it. Many people uses many
diﬀerent techiques to solve the problem, ranging from BDD plus PN (Petri
Net) [14], BDD plus POR (Partial Order Reduction) [1] to PN unfolding [12].
Thus it can make a good scalability test for our veriﬁcation approaches (i.e.
protocol-conformance and scheduler) and the FDR2 tool.
A tree arbiter is a binary tree of arbiter cells with a buﬀer-like element as
its root.
nametype TXS = {up, down} -- up and down transitions on a wire
channel r, a : TXS
Buf = r.up -> a.up -> r.down -> a.down -> Buf
An arbiter cell arbitrates between its two children. Once a child makes a
X. Wang et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 189–206 199
r1 r2a1 a2
r1.up
a1.up
ra
r1.down
a1.down
r.down
a.down
r.up
a.up
r2.up
a2.up
r2.down
a2.down
Fig. 3. An arbiter cell and its protocol in STG
request (e.g. r1.up), the cell will propagate the request to its parent (r .up).
At the same time, the other child can make a request concurently. After the
parent cell give the grant (a.up), the cell will see how many children have
made the requests. If both have made, it will non-deterministically select
one and propagate the grant to it (e.g. a1.up). After the child ﬁnishes the
work, it will return the grant to the cell (r1.down) and the cell will inform
its parent (r .down). After the parent agree to take back the grant (a.down),
the cell can reply so to the child (a1.down). Note, however, that if the other
child is waiting for grant at the same time, the cell can reply to the ﬁrst child
(a1.down) and propagate the request on behalf of the second child (r .up)
concurrently.
STGs (Signal Transition Graphs) [4] can give probably the most concise
description of an arbiter cell’s protocol. It can be translated into CSP (a little
more involved) as follows:
channel r1, a1, r2, a2 : TXS
ArbL = r1.up -> a1.up -> r1.down -> r.down
-> a.down -> a1.down -> ArbL
ArbR = r2.up -> a2.up -> r2.down -> r.down
-> a.down -> a2.down -> ArbR
SEQ = r.up -> a.up -> (a1.down -> SEQ [] a2.down -> SEQ)
X. Wang et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 189–206200
ME1 = r1.up -> r.up -> a.up -> (a1.up -> ME1 [] a2.up -> ME2)
ME2 = r2.up -> r.up -> a.up -> (a1.up -> ME1 [] a2.up -> ME2)
ME = (ME1 ||| ME2) [|{|r,a|}|] SEQ
Arbcell = (ArbL ||| ArbR) [|{|r1.up,r2.up,a1,a2|}|] ME
Using arbiter cells, a (balanced) binary arbiter tree (of k levels) can be
built level by level as follows:
k = 8 -- number of levels
power(0) = 1
power(n) = 2 * power(n-1)
nametype K = {0..k+1}
nametype X = {1..power(k)}
channel ur, ua, sr, sa : K.X.TXS
ArbNode(v,s)
= Arbcell[[r1 <- ur.v.(2*s -1), a1 <- ua.v.(2*s -1),
r2 <- ur.v.(2*s), a2 <- ua.v.(2*s),
r <- sr.v.s, a <- sa.v.s]]
-- In a two dimension space of K x X,
-- each node in the tree has a position (v, s).
-- operator ’[[r1 <- ur.v.(2*s -1), ..]]’
-- is renaming, e.g. r1 renamed to ur.v.(2*s -1).
ArbTree(0) = Buf [[r <- ur.0.1, a <- ua.0.1]]
-- A tree of 0 level consists of only Buf element.
ArbTree(n) = let l = power(n-1) within
(||| x: {1..l} @ ArbNode(n,x)) ||| ArbTree(n-1)
-- A tree of n level consists of a tree of n-1 levels
-- and an 2^(n-1) array of arbiter cells at level n.
The arbiter tree of k level implements a 2k -way arbiter whose protocol
should be:
channel req, ack : X.TXS
X. Wang et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 189–206 201
Arb(z) = req.z.up -> ack.z.up -> req.z.down
-> ack.z.down -> Arb(z)
-- behaviour on one way
Mex = ack?x.up -> req.x.down -> Mex
-- to ensure at one time only one way can get through
ArbSpec = (||| x: X @ Arb(x))
[|{ack.x.up, req.x.down | x <- X}|] Mex
4.1 FDR2 veriﬁcation
We use our scheduler approach to verify asynchronous circuits. Basically, we
have:
Scheduler = sr?x:{1..k+1}?y?z -> ur!x-1!y!z -> Scheduler
[] ua?x:{0..k}?y?z -> sa!x+1!y!z -> Scheduler
-- propagate events on sr at level n to ur at level n-1
-- and propagate events on ua at level n to sa at level n+1
Sys = (ArbSpec[[req <- sr.k+1, ack <- sa.k+1]] ||| ArbTree(k))
[|{|sr, sa, ur, ua|}|] Scheduler
-- ArbSpec is at level k+1
assert Sys :[deadlock free[F]]
But this naive implementation will immediately cause state explosion, as
is well-known in the literatures. To solve the problem, the key is to realize
that Sys is deterministic and that all the events, except those up tranistions
on ua and sa, can be hidden without introducing any non-determinism.
external chase
--a non-semantics-preserving FDR2 compression function
transparent normal
-- a semantics preserving FDR2 compression function
Arbcell = normal((ArbL ||| ArbR) [|{|r1.up,r2.up,a1,a2|}|] ME)
Sys = ( (ArbSpec[[req <- sr.k+1, ack <- sa.k+1]] ||| ArbTree(k))
[|{|sr, sa, ur, ua|}|] Scheduler
) \ {|sr, ur|} \ {ua.x.y.down, sa.x.y.down | x :K, y :X}
X. Wang et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 189–206202
16 32 64 128 256 512 1024
100
101
102
103
104
105
106
107
n−way binary arbiter tree
ve
rif
ica
tio
n 
tim
e
compiling time
checking time
(s)
Fig. 4. The veriﬁcation time
assert chase(Sys) :[deadlock free[F]]
The fact that the new Sys is a deterministic process can be deduce from
the following proposition. For a deterministic process, chase function can
be safely applied to reduce the state space. chase function essentially force
FDR2 to prioritise τ -transitions in depth-ﬁrst style exploration and do not
backtrack on them [15].
Proposition 4.1 In the LTS of Sys, for any two paths (starting at the initial
state), p1 and p2, which end at stable states s1 and s2 respectively, if p1 and
p2 have the same trace, then s1 and s2 have the same set of enabled events.
After loaded it into FDR2 3 , we successfully checked the tree arbiter up to
k =10 levels, i.e. 210 ways. The time used is shown in the ﬁgure below.
It is almost linear, surpassing all the results we know so far from the liter-
atures [1,12,14]. This is surprising considering that FDR2 uses explicit state
checking and does exploit partial order semantics. Even more intriguingly the
memory used is neglibible (below 100M). This, we believe, is largely due to
the fact that, for a large network of small parallel processes, FDR2 will not
construct the overall product state space, rather, it construct the explicit state
space only for individual process and then bind them together using super-
combinators [15]. Asynchronous circuits are just this type of systems with a
3 Details and full scripts in CSPm can be found at [22].
X. Wang et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 189–206 203
great number of small components running in parallel [16].
5 Conclusion and future work
We hope the above cases studies have showed that:
• CSP formalism and veriﬁcation theory is versatile. Both synchronous and
asynchronous systems can be handled quite nicely in it.
• CSP notation is expressive, natural, and especially good at describing large
systems 4 .
• Compositional theories and data independence theory are very important
in verifying large and parameterized systems.
• With the above theories and human guidance, FDR2 scales well for many
asynchronous circuit problems.
Similarly work can also be realized in other process algebras like CCS/CWB
and LOTOS/CADP [3]. Overall, we believe process algebra has provided a
practically viable approach to verifying large scale asynchrous circuits, even
though sometimes it can be challenging and an expert knowledge of theories
and tools is required.
A incomplete survey of related work has been given in [20]. In summary,
process algebra is superior in that,
• Unlike Petri net and graphical notations, process algebra is compositional
and scalable.
• Very few other work has full-ﬂedged tool like those for standard process
algebra theories.
• Process algebra can shed some new light in extending existing veriﬁcation
theories like [6,10].
However, our work is still in a preliminary stage. In the future we hope
some of the expert knowledge involved in veriﬁcation can be automated. For
instance, our recent results have showned that for a system consisting of solely
deterministic asynchronous components 5 , chase can applied directly after
hiding all the events.
4 Certainly, there is room for improvement like the protocol of arbiter cells.
5 Note that the deﬁnitions of determinism in asynchronous systems and in synchronous
systems are diﬀerent.
X. Wang et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 189–206204
Acknowledgement
We would like to thank Doug Edwards, Andrew Bardsley and Luis Plana for
their invaluable advice and help. The research is funded by EPSRC projects
GR/S11091/01 & GR/S11084/01.
References
[1] R. Alur, R. K. Brayton, T. A. Henzinger, S. Qadeer and S. K. Rajamani. Partial-Order
Reduction in Symbolic State Space Exploration. CAV 1997: 340-351.
[2] K. van Berkel. Handshake circuits - an Asynchronous Architecture for VLSI Programming.
Cambridge University Press, 1993.
[3] G. Birtwistle. Control state in asynchronous micropipelines. In AINT 2000, pages 45-55, TU
Delft, 2000.
[4] T. A. Chu. Synthesis of Self-Timed VLSI Circuits from Graph-Theoretic Speciﬁcations. PhD
thesis, MIT, 1987.
[5] S. J. Creese and A. W. Roscoe. Verifying an inﬁnite family of inductions simultaneously using
data independence and FDR. FORTE/PSTV’99, 1999.
[6] D. L. Dill. Trace Theory for Automatic Hierarchical Veriﬁcation of Speed-Independent Circuits.
ACM Distinguished Dissertations. MIT Press, 1993.
[7] D. Edwards and A. Bardsley. Balsa: An Asynchronous Hardware Synthesis Language. The
Computer Journal, vol 45, no 1, pages 12-18, Jan 2002.
[8] S. Hassoun and D. Marculescu. Towards GALS Design Methodologies. In FMGALS’03, pages
1-10, Italy, Sep. 2003.
[9] C. A. R. Hoare. Communicating Sequential Processes. Communications of ACM 21(8): 666-
677 (1978)
[10] M. B. Josephs and J. T. Udding. An algebra for delay-insensitive circuits. In CAV’90, LNCS
531, pages 343-352. Springer-Verlag, 1990.
[11] R. Lazic´. A Semantic Study of Data Independence with Applications to Model Checking. PhD
thesis, Oxford University Computing Laboratory, 1999.
[12] K. L. McMillan. Trace Theoretic Veriﬁcation of Asynchronous Circuits Using Unfoldings. CAV
1995: 180-195.
[13] Patterson, D.A., Hennessy, J.L., Computer Organization & Design, second edition,Morgan
Kaufmam, 1997
[14] O. Roig, J. Cortadella and E. Pastor. Veriﬁcation of Asynchronous Circuits by BDD-based
Model Checking of Petri Nets. Application and Theory of Petri Nets 1995: 374-391.
[15] A. Roscoe. The Theory and Practice of Concurrency. Prentice-Hall, 1998.
[16] A. Roscoe. Compiling shared variable programs into CSP. Proceedings of PROGRESS 2001
Workshop. http://web.comlab.ox.ac.uk/oucl/research/areas/concurrency, 2000.
[17] J. Sparso and S. Furber. Principles of Asynchronous Circuit Design: A Systems Perspective.
Kluwer Academic Publishers, 2001.
[18] G. Theodoropoulos and Q. Zhang. A Distributed Colouring Algorithm for Control Hazards in
Asynchronous Pipelines. Proceedings of I-SPAN ’04, pages 266-272, IEEE Computer Society
Press, 2004.
X. Wang et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 189–206 205
[19] X. Wang, M. Kwiatkowska, G. Theodoropoulos and Q. Zhang. Towards a unifying CSP
approach for hierarchical veriﬁcation of asynchronous hardware. To appear in Electronic Notes
in Theoretical Computer Science, AVOCS 2004, Sept. 2004.
[20] X. Wang and M. Kwiatkowska. On process-algebraic veriﬁcation of asynchronous circuits.
Sumitted to MEMOCODE 2005.
[21] Q. Zhang and G. Theodoropoulos. Towards an Asynchronous MIPS R3000 Processor.
Proceedings of ACSAC’03, LNCS 2823, pages 137-150, Springer, 2003.
[22] Balsa Veriﬁcation Examples Page. http://www.cs.bham.ac.uk/research/parlard/examples.
X. Wang et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 189–206206
