Towards a Unifying CSP approach to Hierarchical Verification of Asynchronous Hardware  by Wang, X. et al.
Towards a Unifying CSP approach to
Hierarchical Veriﬁcation of Asynchronous
Hardware
X. Wang M. Kwiatkowska G. Theodoropoulos Q. Zhang 1
School of Computer Science, University of Birmingham
Edgbaston, Birmingham B15 2TT, UK
Abstract
Formal veriﬁcation is increasingly important in asynchronous circuit design, since the lack of a
global synchronizing clock makes errors due to concurrency (e.g., deadlocks) virtually impossible
to detect by means of conventional methods such as simulation. This paper presents a hierarchical
approach to asynchronous systems veriﬁcation using CSP and its model checker FDR. The approach
reﬂects the hierarchical nature of asynchronous hardware synthesis frameworks, for example the
Balsa system, and enables the veriﬁcation of the system at diﬀerent levels of abstraction against
properties such as deadlock, delay insensitivity, conformance and reﬁnement. We demonstrate the
feasibility of our approach by automatically detecting errors due to delay sensitivity and deadlock
in simple asynchronous hardware components.
Keywords: Asynchronous hardware, Hierarchical veriﬁcation, CSP, Model checking, Levels of
abstraction.
1 Introduction
Asynchronous and GALS (Globally Asynchronous Locally Synchronous) [17,9]
design techniques are important alternatives to synchronous, global clock
based design techniques, which achieve synchronization by means of localized
handshake synchronization protocols between the communicating subsystems.
The removal of the global clock can result in highly concurrent, nondetermin-
1 {X.Wang,M.Z.Kwiatkowska,G.K.Theodoropoulos,Q.Zhang}@cs.bham.ac.uk
Electronic Notes in Theoretical Computer Science 128 (2005) 231–246
1571-0661 © 2005 Elsevier B.V. Open access under CC BY-NC-ND license.
www.elsevier.com/locate/entcs
doi:10.1016/j.entcs.2005.04.014
istic systems 2 , which render simulation alone inadequate as testing method-
ology.
In this paper we propose a formal veriﬁcation approach for asynchronous
hardware systems using Balsa, the CSP-based [10] speciﬁcation and synthe-
sis system developed by the AMULET group at the University of Manch-
ester [17,7]. Balsa is endowed with simulation, but not veriﬁcation, tools. We
demonstrate how Balsa programs, handshake networks (a.k.a. handshake cir-
cuits) and asynchronous gate circuits can be translated into CSP [15], which
in turn enables the use of FDR [15], the mature model checker for CSP, to
serve as the back-end veriﬁcation tool. Data independence can be employed
to tackle the datapath reduction problem. The proposed approach can be
implemented as an add-on to existing Balsa design and synthesis processes.
The paper is structured as follows. We ﬁrst outline the VLSI compilation
framework for asynchronous hardware design. Then we propose a hierarchical
veriﬁcation approach, as an extension of the framework, based on the use of
CSP as the unifying formalism. Next, we illustrate the approach with the help
of a Balsa program fragment, handshake networks, asynchronous logic circuits
and its synthesis process. For each level (three in all), we give a translational
semantics of the Balsa components in CSP and describe the outcome of veriﬁ-
cation experiments. Finally, we conclude the paper by discussing related work
and future plans.
2 High-level asynchronous circuit compilation
Of the various CSP-based approaches that have been used (e.g., [18,14,11]), a
particularly promising one employs silicon compilation to automatically gen-
erate gate-level implementations from high-level speciﬁcations; most notable
examples include Brunvand’s [3] work, Tangram [1] and Balsa [17,7].
Within this asynchronous logic synthesis framework (c.f. Figure 1), a CSP-
based parallel programming language is typically used to give a high-level al-
gorithmic description of the design. From such a description, syntax-directed
compilation creates a network (composition) of handshake components, where
each language construct in the program is mapped to a corresponding hand-
shake implementation. Handshake components are usually pre-designed and
stored in a library in the form of gate-level circuit fragments.
2 Note that highly concurrent, nondeterministic systems can have sequential and determin-
istic blackbox behaviours and that the deﬁnitions of determinism in asynchronous systems
and in synchronous systems are diﬀerent.
X. Wang et al. / Electronic Notes in Theoretical Computer Science 128 (2005) 231–246232
3 A hierarchical approach
Based on the silicon compilation framework, we propose a hierarchical ap-
proach to verifying asynchronous hardware designs, which utilises FDR, the
model checker for CSP. The approach centres around the two hierarchies in
silicon compilation: the hierarchy of abstraction levels (c.f. Figure 1) and the
hierarchy of component composition.
The key observation is that CSP is appropriate for describing all three
levels of system description. The top level describes a synchronous system like
in standard CSP; it utilizes ﬁne-grained parallelism (parallel operator within
sequential composition) that is rarely supported by other model checkers. The
two lower levels describe asynchronous systems, which poses challenges to the
expressive power of standard CSP. Based on our novel idea of a scheduler, we
ﬁnd that asynchronous systems can not only be modelled in CSP in a direct
and intuitive way, but also be simpliﬁed to use just the traces model. In some
sense, our work is similar to Dill’s trace theory of asynchronous circuits [5]. At
all three levels, the systems will be highly concurrent; past experiences have
indicated that FDR works well with a high degree of parallelism [16].
Another important insight is that it is possible, at all three levels, to
reduce a veriﬁcation problem across the component composition hierarchy.
At the synchronous level we employ traditional speciﬁcation-based reﬁnement
checking. At the asynchronous levels we employ a novel protocol-based closed-
circuit testing.
3.1 Hierarchy across abstraction levels
Abstraction hierarchy is an important tool for managing the complexity in
asynchronous circuit design, both for human designers as well as for veriﬁca-
tion tools. At the programming language level, we use abstract constructs such
as synchronous broadcast communication, shared variables, sequential and
parallel composition, to describe a high-level algorithmic view and structural
design of the hardware system. At the handshake level, simple asynchronous
handshake signals and a basic set of handshake components (Fork , Split ,
Variable, Loop, Concur , BinaryFunc, etc.), implement synchronous broad-
cast, interleaved variable accesses, control sequentiality and parallelism. At
the basic gate level, handshake signals are mapped to transitions on the wires.
The function of basic handshake components is synthesized from basic logic
gates.
Given a large asynchronous hardware design, it is often infeasible to verify
the whole design at the lowest level. By utilising the abstraction hierarchy, the
overall veriﬁcation problem can be decomposed into smaller, more tractable
X. Wang et al. / Electronic Notes in Theoretical Computer Science 128 (2005) 231–246 233
problems at diﬀerent levels. Since we employ CSP at all levels of abstraction,
it is possible for us to establish a formal semantic link between the levels,
based on various techniques such as behaviour Reﬁnement, Action reﬁnement,
abstract Interpretation (RAI), etc. (In Section 6.3, an example will be given
on how to handshake expand, i.e. RAI reﬁne, a Balsa speciﬁcation in CSP into
several diﬀerent handshake protocols in CSP.) In a systematic approach, we
expect to utilize this link in the future to validate Balsa’s compilation functions
and synthesis algorithms and prove appropriate correctness-by-construction
results for the system.
3.2 Hierarchy across component grain-sizes
Within an abstraction level, small components are often combined to form
more complex components. Flattening the composition hierarchy to perform
veriﬁcation is undesirable.
In the synchronous case, it is well known that CSP reﬁnement supports
compositional veriﬁcation, that is, separate speciﬁcation of a component from
its implementation. After checking the reﬁnement, the speciﬁcation can be
used instead of the implementation when composing components.
An asynchronous hardware system, at lower levels, is an input/output sys-
tem which consists of a collection of asynchronous components connected by
channels. A channel is associated with delay. Components communicate by
sending/receiving signals with non-blocking semantics. Although CSP usually
enforces synchronisation of input and output on the same channel into a single
event, we choose to model the two operations separately and use a scheduler
to explicitly schedule the delay of input and output. A scheduler will ﬁrst
nondeterministically select an enabled output (those in the initials of a CSP
process) of a sending component and then force it onto the corresponding in-
put of the receiving component. This nondeterminism of the scheduler can
simulate all the possible delay scenarios of the system. If a system runs cor-
rectly with a scheduler, this implies that the system runs correctly in all delay
scenarios. That is, the system is delay-insensitive. If there is an assumption
on channel delay, such as isochronic fork, the scheduler can be modiﬁed to
reﬂect the assumption.
An asynchronous input/output system can be an open-circuit system or
a closed-circuit system. An open-circuit system models a component and is
often associated with a protocol, dictating all the legal sequences of input and
output events at its interface.
When verifying a closed-circuit system, we run the system in parallel with
the scheduler. If it deadlocks when the scheduler forces the output onto the
input (a.k.a. choke), we say the system is incorrect, in the sense that the
X. Wang et al. / Electronic Notes in Theoretical Computer Science 128 (2005) 231–246234
system does not constitute a good environment in which all the protocols
of the subcomponents are obeyed. Because the scheduler participates in the
occurrence of every event, any deadlock of the scheduler will be global, and
hence easy to check in FDR.
Before verifying an open-circuit system, we need to use its protocol as the
environment of the system to close-circuit it. A complete example of open-
circuit veriﬁcation will be shown in Section 7
4 Balsa
Balsa is both an asynchronous hardware synthesis framework and the CSP-like
language for describing such systems. Balsa generates purely asynchronous
macromodular circuits similar to those of Philips’ Tangram [1]. (One ma-
jor diﬀerence is that Balsa extends Tangram with more handshake enclo-
sures [6,7].) Balsa is technology independent (e.g. channel connections can
be implemented using speed-independent or delay-insensitive schemes) and it
targets standard cell and FPGA technologies for producing gate-level netlists.
Three levels of simulation are supported: behavioural at the Balsa level, and
functional and timing (using native simulators of the supported commercial
CAD tools) at the basic gate and layout levels (Figure 2). No veriﬁcation tool
is available.
Fainter lines in Figure 2 denote manual processes. It is obvious from the
ﬁgure that most validation work in Balsa is done manually.
Syntax-directed
  Translation
Asynchronous Logic
   Synthesis
Balsa program & 
its behavioural spec
Handshake network 
 & its protocol
Gate level circuit
 & its protocol
CSP Program 
   & CSP Spec
 CSP Network 
   & CSP Protocol
CSP Circuit & 
   CSP Protocol
RAI
RAI
Translation
Translation
Translation
Figure 1. The hierarchy of abstraction Figure 2. Balsa System
5 Asynchronous hardware programming
Hardware programming enables system designers to approach the design of
complex asynchronous VLSI circuits at a high level of abstraction.
X. Wang et al. / Electronic Notes in Theoretical Computer Science 128 (2005) 231–246 235
5.1 A translational semantics of Balsa in CSP
Balsa does not have a formal semantics, though Tangram has one based on
handshake processes [1]. But handshake processes are asynchronous and the
semantics is essentially at the handshake level. It is much less abstract than
at the top level. In this paper we give a CSP translational semantics to Balsa
programs directly at the top level. The variant of CSP language we use is
similar to the ones in [14] and [8], which have an imperative ﬂavour but can
give concise descriptions of asynchronous circuits 3 :
Syntax of CSP
CMD ::= c?x : b | c!e | Skip | CMD  CMD ′ | CMD  CMD ′
| CMD o9 CMD ′ | CMD |[ chans ]|CMD ′ | CMD \ chans
| l(e1, ... , en) | if b then CMD else CMD ′ | {l(z1, ... , zn) = CMD , ...}
where x , y and z are variables 4 , chans is a set of channel names (e.g.,
{c1, ... , cn}), b is a boolean expression and e is a value expression. c?x : b is a
selective input command; input x on c is accepted iﬀ it satisﬁes b. |[ chans ]|
is an interface parallel operator; it synchronizes its two subprocesses only on
the events in chans . Sometimes, ||| and ‖ operators are also used: ||| is an in-
terleaving operator, while ‖ is a shorthand for the alphabetized parallel when
the alphabets are self-evident from context. Sequential composition binds
stronger than choices, while choices are stronger than parallel composition.
As a running example, we will use the Balsa code fragment for an arbiter:
import[type] -- importing some types and library procedures
procedure Arb -- procedure defines a component
(input NTarget1, NTarget2:InsAdd; -- input of the component
output NTarget : InsAdd) -- output of the component
is local variable C: bit -- internal variable
begin loop arbitrate -- arbitrate is two-way choice
NTarget1 then -- first one waits on NTarget1
if NTarget1.c = C
then NTarget <- NTarget1 end
| NTarget2 then -- second one waits on NTarget2
C := NTarget2.c || NTarget <- NTarget2
end end end -- input from NTarget2 outputted on NTarget
The arbiter Balsa program can be translated to the CSP process arb in a
straightforward way shown below:
3 The conversion to CSPm of FDR is trivial, as can be found in our CSPm scripts [23] of
the examples in the paper.
4 In the rest of paper, all the variables are assumed to have been declared at the beginning
of each speciﬁcation, so that we can omit them for brevity.
X. Wang et al. / Electronic Notes in Theoretical Computer Science 128 (2005) 231–246236
channel ntarget ,ntarget1, ntarget2 : ADDR.COLOR
channel nt1 end , nt2 end
channel read , write : COLOR
arb c = (ntarget2?x?c o9 (write!c ‖ ntarget !x !c) o9 nt2 end o9 arb c) 
(ntarget1?x?c o9 read?c
′o
9 if c = c
′ then ntarget !x !c else Skip o9 nt1 end o9 arb c)
lcv(c) = (read !c o9 lcv(c))  (write?c
′ o
9 lcv(c
′))
arb = (arb c ‖ lcv(0)) \ {read ,write}
In the above, note the use of sequential composition instead of action
preﬁx (c.f. [14,8]) which is in close correspondence with the syntactic form of
asynchronous hardware description languages.
Balsa supports several features not present in declarative CSP [15]. Firstly,
it has variables and assignment; that is, Balsa programs are imperative. To
translate it into declarative CSP, one can translate a variable as a process and
read/write operations as communications on its channels [15], which has been
shown to be an eﬃcient technique for FDR [16].
Another special feature of Balsa is guard enclosures (a form of handshake
enclosures), which is mostly associated with the Select and Arbitrate com-
mands. Semantically, it involves a long lasting event enclosing a collection
of shorter events. In CSP, we model it by a pair of events (we call it a du-
ration pair), one representing the start of the ‘duration’, and the other the
end. For example, the input and output events on channel ntarget1 are guard
enclosures and are modelled by duration pairs (ntarget1?x?c, nt1 end) and
(ntarget1!x !c, nt1 end).
5.2 Veriﬁcation of Balsa programs in CSP
For a Balsa implemented system I , where I needs to satisfy the speciﬁcation
S , we can translate I and S into CSP and obtain SYS and SPEC . If SYS and
SPEC involve large data types, data independence reduction can be applied
yielding the reduced sys and spec. We can then check the reﬁnement of sys
by spec in FDR, which establishes that I indeed satisﬁes S .
As an example, Figure 3 illustrates an asynchronous circuit which has
been taken from SAMIPS [21], an asynchronous implementation of the MIPS
processor. The ﬁgure shows an abstraction of the ﬁrst two pipeline stages of
SAMIPS, namely Instruction Fetch (IF ) and Instruction Decode (ID).
In the IF stage the physical address — either the current program counter
(PC ) incremented by four (ADD4) or a new target (NTarget2) address from
datapath (ID), if a control hazard occurs there — is calculated and then sent
to PC and the main memory, through an arbitration unit (AAU ).
X. Wang et al. / Electronic Notes in Theoretical Computer Science 128 (2005) 231–246 237
PC +4
BaseAddr
NTarget2
PCvalue
A
D
D
4
DeCode
IF ID
A
A
U
NTarget1
NTarget
MEM
Instruction
Figure 3. The instruction fetching circuit
To stop prefetching invalid instructions (via NTarget1, and discard those
that have been prefetched) in SAMIPS, a colouring mechanism has been de-
veloped [19], whereby both the units of the processor and the instructions
are “coloured”. Instructions are executed only if their colour matches that of
the processor, which changes every time a control hazard occurs and is pig-
gybacked on NTarget2 to colour the new instruction stream. To simplify our
example, we use two colours (0 and 1), and one type of control hazard, namely
the execution of a jump instruction in the ID stage.
The CSP equivalent of the arbiter unit (AAU ) has been shown above. The
PC unit is a buﬀer:
pc(x , c) = pcvalue!x !c o9 ntarget?x
′?c′ o9 pc(x
′, c′)
The add4 process below is an abstraction of ADD4 and MEM units. We
use nondeterministic choice ‘ y : INS •’ to abstract the instruction fetching
operation:
datatype INS = jump | non jump
channel baseaddr : ADDR.COLOR.INS
add4 = pcvalue?x?c
o
9(ntarget1!(x + 4)!c o9 nt1 end ‖ y : INS • baseaddr !x !c!y ) o9 add4
The DECODE unit accepts input from ADD4.
decode(c) = baseaddr?x?c′?yo9
if c = c′ ∧ y = jump
then x ′ : ADDR • ntarget2!x ′!(1− c) o9 nt2 end o9 decode(1− c)
else decode(c) - - - discard or execute ‘y ’
If the instruction is a jump and the colour is correct, it changes the current
colour and sends it with the jump destination to AAU . The jump destination
is nondeterministically selected ‘ x ′ : ADDR •’; this is an abstraction. If the
X. Wang et al. / Electronic Notes in Theoretical Computer Science 128 (2005) 231–246238
instruction is not a jump and the colour is correct, the instruction will be sent
to later stages of the pipeline for execution.
This completes the CSP translation of the system. However, the system
does not have a speciﬁcation. It is a closed system, since we have chosen to
abstract away the remaining parts of the pipeline. Therefore, we only need to
check deadlock freedom of the system below:
system = decode(0) ‖ add4 ‖ arb ‖ pc(0, 0)
In system, ADDR can be a very large data type and may blow up the state
space dramatically for FDR. By applying data independence theory, ADDR
is shown to be weakly data independent [13]. According to Theorem 5.1.2
in [13], it can be reduced to a data type of size 1. Using the reduced model
with only three addresses in ADDR (for illustration purposes), we have found
the deadlock trace for the instruction fetching system with a buﬀer-less arbiter.
pcvalue.0.0, baseaddr .0.0.jump,ntarget1.4.0,ntarget .4.0,nt1 end , pcvalue.4.0,
ntarget1.8.0,ntarget .8.0,nt1 end ,ntarget2.8.1
We can correct the system by adding a buﬀer to the arbiter, thus breaking
the loop. Using FDR we have shown that the corrected system is deadlock-
free 5 .
6 Handshake networks
After a system has been programmed in Balsa, the Balsa compiler will auto-
matically translate the program into a network of handshake components and
we enter the world of asynchronous non-blocking communication.
6.1 Handshake components
A handshake component connects with the environment via a number of hand-
shake channels. Each communication consists of a pair of non-blocking events,
req and ack . This is called handshake expansion; it implements the transition
from synchrony to asynchrony. Depending on which side initiates the commu-
nication (i.e., by sending req), the ports on a channel are divided into active
ports and passive ports. Each communication is either a control synchro-
nizition (i.e., representing the control path of the circuit) or a data transfer
(i.e., representing the datapath of the circuit) between two components. To
connect multiple components, some special handshake components are needed
to do the plumbering (e.g., forking and merging) to create multi-way passages.
5 Detailed Balsa and CSPm scripts can be found at [23].
X. Wang et al. / Electronic Notes in Theoretical Computer Science 128 (2005) 231–246 239
For example, the component in Figure 4 is a FalseVariable (FV ) handshake
component and its protocol in STG [4] is given in that ﬁgure as well.
The FV handshake component resembles a normal Variable, with one
passive (denoted by an open circle) write port WD and one passive read
port RD 6 . It diﬀers, however, in the presence of an active (denoted by a
ﬁlled circle) probe port S . The component is named FalseVariable because
it does not store data. The FV component is usually used to implement
arbitrate/select commands in Balsa.
WD RD
S
WDr Sr RDr
WDa Sa RDa
#
activate
DW
FV
-> ||
->
@1
->
|
[32
:32]
FV
=
(> clr
NTarget
NTarget2
NTarget1
[32
:32]
->
s
s
Figure 4. FalseVariable and its protocol Figure 5. Arbiter Handshake network
6.2 Syntax-directed compilation and handshake component network
By compiling the arbiter program written in Balsa 7 , we can obtain the hand-
shake component network in Figure 5. The edges with arrows represent the
datapath while the edges without arrows represent the control path.
The central component named clr is the local colour variable of the arbiter
program. On its left, components ‘DW ’ and ‘(>’ implement the arbitrate
itself. Below it, we see the subnetwork implementing the ﬁrst branch of
arbitrate, and above it, the subnetwork for the second branch. FV com-
ponents are used to accept data input from channels Ntraget1 and Ntarget2.
The ‘|’ component is used to multiplex data ﬂow from the two branches and
output to Ntarget channel.
6 This is a simpliﬁcation; usually we will have multiple read ports.
7 Note that, although the distinction between Select and Arbitrate is not reﬂected in the
CSP semantics of Balsa, it is utilised in the compilation process to optimise the resulting
network.
X. Wang et al. / Electronic Notes in Theoretical Computer Science 128 (2005) 231–246240
6.3 Veriﬁcation of handshake networks in CSP
Let us assume that a Balsa program B , whose speciﬁcation in CSP is P , is
compiled into a handshake network N , and n is a handshake component in N ,
whose protocol in CSP is p. In order to verify that N correctly implements
B , we need to ﬁrst handshake-expand (RAI reﬁne) P in order to get the
handshake level protocol PRT for the network.
Then, for each n in N , its protocol p is used as its behaviour speciﬁcation.
Composing up these ps (as they are connected in N ) gives us a CSP translation
of the network, SYS .
Putting SYS and PRT in parallel with a scheduler, we can check for dead-
lock in FDR to prove that SYS conforms to PRT , that is, N is a correct
implementation of B .
In the instruction fetching example above, the CSP speciﬁcation of the
arbiter unit arb is:
spec(c) = (ntarget2?x?c′ o9 ntarget !x !c
′ o
9 nt2 end o9 spec(c
′))
 (ntarget1?x?c′o9 if c = c
′ then ntarget !x !c else Skip o9 nt1 end o9 spec(c))
Since NTarget1 and NTarget2 are both passive ports, while NTarget is an
active port, the protocol above can be handshake-expanded into:
prot(c) = serialise ‖ (NT2.r?x?c′ o9 NT .r !x !c′ o9 NT .a o9 NT2.a o9 prot(c′) |||
NT1.r?x?c′o9 if c = c
′ then NT .r !x !c′ o9 NT .a else Skip o9 NT1.a o9 prot(c))
serialise = NT .r?x?c′ o9 (NT1.a  NT2.a) o9 serialise
where, in the above, prot is the protocol for the (arbitrate) handshake network
of Figure 5, while prot ′ is for the optimised (select) handshake network when
there is no interference on choice activation.
Due to space limitations, we will not show the veriﬁcation at the handshake
level. Instead, a full example using protocol-based closed-circuit testing will
be shown at the gate level.
7 Basic gate circuits
After the basic set of handshake components (40 plus for Balsa) is identiﬁed
and deﬁned, each component can be synthesized into a gate level circuit,
manually or automatically, based on some encoding scheme. An encoding
scheme decides how to implement abstract req/ack and data signals of the
handshake level using voltage transitions of wires in gate-level circuit.
7.1 Asynchronous logic synthesis
Given a handshake component, the initial input to the synthesis process should
be its handshake protocol. The synthesis process then concretizes (RAI re-
X. Wang et al. / Electronic Notes in Theoretical Computer Science 128 (2005) 231–246 241
ﬁnes) the protocol according to the encoding scheme, yielding a new gate-level
protocol. This is a design process; the new protocol must consider the impli-
cations it has on the speed, cost, safeness, etc, of synthesized circuits.
For example, the FV component has recently been re-designed by the
Manchester AMULET group for the dual-rail level-signalling scheme. The
new reﬁned protocol is shown in Figure 6(a).
(a) (b)
Figure 6. The gate-level protocol of FV and T elements
Because of level-signalling, useful transitions will be usually upward (+).
The downward transitions (−) are needed just to return the voltage to zero
in order to prepare the next round of upward transitions.
C
T
READ PORT
ANDn
WDr0
WDr
Sr
Sa
RDr
RDa
WDa fork’
fork
Ir Or
Oa
Ia
Oand
Iand1
IandN2
Oc
Ic1
Ic2
In1
In2
Out
I
O1
O2
O1’
O2’ I’
Figure 7. FV implementation Figure 8. Abstract FV implementation
Based on the new protocol, their proposed implementation is shown in
Figure 7 (Details can be found at [23].) The behaviour of the T element is
speciﬁed by the protocol of Figure 6(b) 8 .
8 The input/output wiring deﬁnition can be found in Figure 8.
X. Wang et al. / Electronic Notes in Theoretical Computer Science 128 (2005) 231–246242
In Figure 6(a), WDr 0+ denotes the arrival of the ﬁrst bit on the incoming
bus, while WDr [1,n−1]+ denotes the arrival of all bits. The detection of the
arrival of all bits is implemented by the CD element (Completion Detection)
in Figure 7.
7.2 CSP veriﬁcation
Imagine that a handshake component H is implemented in a gate-level circuit
G , whose protocol is captured by PRT in CSP. Let g be a gate-level element
in G , and let its protocol in CSP be p. Then, for each g in G , its protocol
p is used as the behaviour speciﬁcation. Composing up these ps (as they are
connected in G) yields a CSP translation of the network, SYS .
Putting SYS and PRT in parallel with a scheduler, we prove that SYS
conforms to PRT by checking for deadlock freedom in FDR. Sometimes an
element g may itself be implemented by even more basic elements in circuit
t (e.g., the T elements in the FV circuit). Then the protocol of g will be
the protocol of t . By translating t into CSP, we can similarly prove that t
implements g .
For the FV example, after abstracting the data bus and completion de-
tection, the gate-level circuit implementation is shown in Figure 8.
Translating the STG in Figure 6(a), we obtain the CSP protocol for the
circuit as:
protocolFV = (writer ‖ reader) o9 WDa.down o9 protocolFV
writer = (WDr .up ‖ RDr .down) o9 WDa.up o9 WDr .down
reader = (WDr0.up o9 Sr .up o9 RDr .up ‖ WDr .up)
o
9RDa.up o9 Sa.up o9 Sr .down o9 RDr .down o9 RDa.down o9 Sa.down
Similarly, we can get the behaviour speciﬁcation of the T element as:
protocolT = Ir .up o9 Or .up o9 Oa.up o9 (Or .down o9 Oa.down ‖ Ia.up o9 Ir .down)
o
9Ia.down o9 protocolT
There are two forks in the circuit; their behaviour speciﬁcation is:
fork0 = I .up o9 (O1.up ‖ O2.up) o9 fork1
fork1 = I .down o9 (O1.down ‖ O2.down) o9 fork0
The Muller-C element has the behaviour:
protocolC = (Ic1.up ‖ Ic2.up) o9 O .up o9 protocolC ′
protocolC ′ = (Ic1.down ‖ Ic2.down) o9 O .down o9 protocolC
The READPORT element, after abstraction of the data bus, functions like
an AND gate:
readport = (In1.up ‖ In2.up) o9 Out .up o9 readport ′
readport ′ = ( In1.down o9 (Out .down ‖ In2.down)
 In2.down o9 (Out .down ‖ In1.down) ) o9 readport
X. Wang et al. / Electronic Notes in Theoretical Computer Science 128 (2005) 231–246 243
The behaviour of the one-input negated AND gate could be similarly spec-
iﬁed. But because it is used in this particular circuit in a limited way, its pro-
tocol is rather diﬀerent from the above. This is a good example of veriﬁcation
using protocols instead of the full speciﬁcation of elements.
andN = IandN 2.up o9 (Iand1.up ‖ IandN 2.down) o9 Oand .up o9 andN ′
andN ′ = Iand1.down o9 Oand .down o9 andN
One important observation we can make of the above speciﬁcation is that
no element shares any event. It is due to our principle of separating input
from output so that we can use the scheduler to link and synchronize them.
The speciﬁcation of scheduler is as below:
signalling(x , y) = x?z o9 y !z
scheduler =
( signalling(WDr0, Ir)  signalling(Oc,WDa)  signalling(WDr , I )
 signalling(O1, Ic2)  signalling(O2, In2)  signalling(O1′, IandN 2)
 signalling(O2′, In1)  signalling(Oand , Ic1)  signalling(Ia, Iand1)
 signalling(Or ,Sr)  signalling(Sa, Ia)  signalling(RDr , I ′)
 signalling(Out ,RDa) ) o9 scheduler
where signalling(x , y) connects the output channel x of one element to the
input channel y of another element. Whenever an output 9 is made on x , the
scheduler will force it onto y .
Putting all the elements in parallel with the scheduler and the protocol
protocolFV , we ﬁnally obtain our testing system below.
test system = scheduler || protocolFV || protocolC || fork0 || readport
|| fork0′ || andN 0 || protocolT
Checking the test system with FDR, we ﬁnd it deadlocks. One of the
deadlock traces is:
(WDr .up, I .up) (O2.up, In2.up) (WDr0.up, Ir .up) (Or .up,Sr .up)
(RDr .up, I ′.up) (O2′.up, In1.up) (Out .up,RDa.up) (Sa.up,Oa.up) Ia.up
However, by adding an isochronic fork constraint to fork ′, the arrival of
RDr .up on ANDn will overtake Ia.up, and so block the Oand .up in advance.
This is veriﬁed by FDR. Actually, with another minor constraint on timing,
we prove with FDR that the above implementation is correct 10 .
8 Conclusion and future work
We have proposed a hierarchical framework for an integrated approach to al-
low the design, simulation and veriﬁcation of asynchronous hardware in the
9 For this example, it is an up transition or a down transition.
10 Details and full scripts in CSPm can be found at [23].
X. Wang et al. / Electronic Notes in Theoretical Computer Science 128 (2005) 231–246244
Balsa system. The main advantage of our approach is that it naturally exploits
the diﬀerent levels of abstraction used by the circuit designers to manage com-
plexity in order to divide and reduce veriﬁcation problems. Bringing all three
levels of abstraction into a uniﬁed formalism of CSP gives us the opportu-
nity to connect them semantically, and to use the mature CSP model checker
FDR as the back-end tool for veriﬁcation to prove or disprove important asyn-
chronous circuit properties such as deadlock, delay insensitivity, equivalence
and reﬁnement. We have demonstrated the feasibility of our approach by
translating and verifying a component of an asynchronous processor, discov-
ering a genuine unknown bug in the FalseVariable circuit design caused by
delay sensitivity.
Compared to related works, our scheduler approach at lower levels can
check asynchronous circuits not only for safety conditions as in [5] but also for
progress conditions [8]. The merit of introducing a scheduler explicitly is that
it enables us to use standard CSP theory, rather than specialised asynchronous
theories [5,12]. It makes ‘asynchrony’ much easier to understand and verify.
The formal semantic link and rigorous comparison with the asynchronous
theories, however, is a subject currently undergoing investigation.
Certainly, more work needs to be done to fully realize our approach. Cur-
rently, we have developed CSP speciﬁcations (or, more accurately, protocols)
for all major 35 handshake components in the Balsa system [20]. Based on
these speciﬁcations, on the one hand, we can verify their implementation at
the gate level as illustrated in Section 7; on the other hand, we can verify the
compilation function translating Balsa programs into handshake component
networks. Previously, our research partners have experienced some incom-
patibility problems when composing handshake components into handshake
networks. At the same time we are also working on a larger case study such
as the one in [2]. It is based on an asynchronous MIPS processor core design
done in collaboration with the Manchester AMULET group [21]. Other future
work includes automating the translation at diﬀerent levels and completing
our theory of modelling asynchronous hardware in standard CSP.
Acknowledgement
We would like to thank Doug Edwards, Andrew Bardsley and Luis Plana for
their invaluable advice and help. The research is funded by EPSRC projects
GR/S11091/01 & GR/S11084/01 [22].
X. Wang et al. / Electronic Notes in Theoretical Computer Science 128 (2005) 231–246 245
References
[1] K. van Berkel. Handshake circuits - an Asynchronous Architecture for VLSI Programming.
Cambridge University Press, 1993.
[2] G. Birtwistle. Control state in asynchronous micropipelines. In AINT 2000, pages 45-55, TU
Delft, 2000.
[3] E. Brunvand and M. Starkey. An Integrated Environment for the Design and Simulation of
Self Timed Systems. In IFIP VLSI’91, pages 137-146, North-Holland, 1991
[4] T. A. Chu. Synthesis of Self-Timed VLSI Circuits from Graph-Theoretic Speciﬁcations. PhD
thesis, MIT, 1987.
[5] D. L. Dill. Trace Theory for Automatic Hierarchical Veriﬁcation of Speed-Independent Circuits.
ACM Distinguished Dissertations. MIT Press, 1993.
[6] D. Edwards and A. Bardsley. Balsa: An Asynchronous Hardware System. Principles of
Asynchronous circuit Design, Part II, Dec 2001, Eds: J. Sparso, S. Furber.
[7] D. Edwards and A. Bardsley. Balsa: An Asynchronous Hardware Synthesis Language. The
Computer Journal, vol 45, no 1, pages 12-18, Jan 2002.
[8] J. C. Ebergen. Translating Programs into Delay-Insensitive Circuits. Dissertation, Eindhoven
University of Technology, Department of Computing Science. October 1987.
[9] S. Hassoun and D. Marculescu. Towards GALS Design Methodologies. In FMGALS’03, pages
1-10, Italy, Sep. 2003.
[10] C. A. R. Hoare. Communicating Sequential Processes. Communications of ACM 21(8): 666-
677 (1978)
[11] H. Hulgaard and S. M. Burns. Bounded Delay Timing Analysis of a Class of CSP Programs
with Choice. In ASYNC’94, IEEE Computer Society Press, 1994.
[12] M. B. Josephs and J. T. Udding. An algebra for delay-insensitive circuits. In CAV’90, LNCS
531, pages 343-352. Springer-Verlag, 1990.
[13] R. Lazic´. A Semantic Study of Data Independence with Applications to Model Checking. PhD
thesis, Oxford University Computing Laboratory, 1999.
[14] A. J. Martin. Synthesis of Asynchronous VLSI Circuits. J. Staunstrup, editor, Formal Methods
for VLSI Design, North Holland, 1990.
[15] A. Roscoe. The Theory and Practice of Concurrency. Prentice-Hall, 1998.
[16] A. Roscoe. Compiling shared variable programs into CSP. Proceedings of PROGRESS 2001
Workshop. http://web.comlab.ox.ac.uk/oucl/research/areas/concurrency, 2000.
[17] J. Sparso and S. Furber. Principles of Asynchronous Circuit Design: A Systems Perspective.
Kluwer Academic Publishers, 2001.
[18] G. Theodoropoulos and J. V. Woods. Occam: An Asynchronous Hardware Description
Language? In IEEE Euromicro’97, pages 529-534, IEEE Computer Society Press, 1997.
[19] G. Theodoropoulos and Q. Zhang. A Distributed Colouring Algorithm for Control Hazards in
Asynchronous Pipelines. Proceedings of I-SPAN ’04, pages 266-272, IEEE Computer Society
Press, 2004.
[20] X. Wang. A CSP translation of Balsa handshake components. Tech. Report (CSR-04-11),
School of Computer Science, University of Birmingham, 2004.
[21] Q. Zhang and G. Theodoropoulos. Towards an Asynchronous MIPS R3000 Processor.
Proceedings of ACSAC’03, LNCS 2823, pages 137-150, Springer, 2003.
[22] An Integrated Framework for Distributed Simulation and Formal Veriﬁcation of Asynchronous
Hardware. http://www.cs.bham.ac.uk/research/parlard.
[23] Balsa Veriﬁcation Examples Page. http://www.cs.bham.ac.uk/research/parlard/examples.
X. Wang et al. / Electronic Notes in Theoretical Computer Science 128 (2005) 231–246246
