Partitioning program into hardware and software by Qin, Shengchao & He, Jifeng
Partitioning Program into Hardware and Software
Shengchao Qin
Department of Informatics
School of Mathematical Sciences
Peking University, Beijing, 100871
qinshc@pubms.pku.edu.cn
Jifeng Hey
International Institute for Software Technology
The United Nations University
P.O.Box 3058, Macau
jifeng@iist.unu.edu
Abstract
Hardware and software co-design is a design tech-
nique which delivers computer systems comprising hard-
ware and software components. A critical phase of co-
design process is to decompose a program into hardware
and software. This paper proposes an algebraic partition-
ing method whose correctness is verified in the algebra of
programs. We introduce the program analysis phase before
program partitioning and develop a collection of syntax-
based splitting rules, where the former provides the infor-
mation for moving operations from software to hardware
and reducing the interaction between components, and the
latter supports a compositional approach to the program
partitioning.
1. Introduction
The design of a complex software product like a nuclear
reactor control system is ideally decomposed into a pro-
gression of related phases. It starts with an investigation
of the properties and behaviours of the process evolving
within its environment, and an analysis of requirement for
its safety performance. From these is derived a specifica-
tion of the electronic or program-centered components of
the system. The project then may go through a series of de-
sign phases, ending in a program expressed in a high level
language. After translation into a machine code of the cho-
sen computer, it is executed at high speed by electronic cir-
cuitry. In order to achieve the time performance required by
the customer, additional application-specific hardware de-
vices may be needed to embed the computer into the system
which it controls.
With chip size reaching one million transistors, the com-
plexity of VLSI algorithms is approaching that of software
Partially supported by NNSFC No. 69873003
yOn leave from East China Normal University
algorithms. However, the design methods for circuits re-
semble the low level machine language programmingmeth-
ods. Selecting individual gates and registers in a circuit like
selecting individual machine instruction in a program. State
transition diagrams are like flowcharts. These methods may
have been adequate for small circuit design when they were
introduced, but they are not adequate for circuits that per-
form complicated algorithms. Industry interest in the for-
mal verification of embedded systems is gaining ground
since an error in a widely used hardware device can have
significant repercussions on the stock value of the company
concerned. In principle, proof of correctness of a digital de-
vice can always be achieved by making a comparison of the
behavioral description of the circuit with its specification.
But for a large system this would be impossibly laborious.
What we need is a useful collection of proven equations and
other theorems, which can be used to calculate, manipulate
and transform the specification formulae to the product.
Hardware/software co-design is a design technique
which delivers computer systems comprising hardware and
software components. A critical phase of co-design pro-
cess is to partition a program into hardware and software.
This paper proposes a partitioning method whose correct-
ness is verified using the algebraic laws developed for the
high level programming language. To meet performance
goals, and reduce the communication between components,
our approach combines the program analysis techniquewith
the syntax-based splitting rules to move heavy-weight oper-
ations from software to hardware. The allocation of vari-
ables is also based on the data flow analysis of the source
program. One of the advantages of our method is the inte-
gration of the splitting phase with the joining phase of the
partitioning process. It optimizes the underlying target ar-
chitecture, and facilitates the reuse of hardware devices.
The algebraic approach advocated in this paper to verify
the correctness of the partitioning process has been success-
fully employed in the ProCoS project on “Provably Correct
Systems”. The original ProCoS project [6] concentrated
almost exclusively on the verification of standard compiler
Proceedings of the Eighth Asia-Pacific Software Engineering Conference (APSEC01) 
1530-1362/01 $17.00 © 2001 IEEE 
of a high-level programming language based on Occam
down to a microprocessor based on Transputer [5]. Sam-
paio showed how to reduce the compiler design task to one
of program transformation; his formal framework is also a
procedural language and its algebraic laws [14]. Towards
the end of the first phase of the project, Ian Page et al made
rapid advance in the development of hardware compilation
technique using an Occam-like language targeted towards
Field Programmable Gate Arrays [11], and He Jifeng et al
provided a formal verification of the hardware compilation
scheme within the algebra of Occam programs [4].
Recently, some works have suggested the use of formal
methods for the partitioning process [1, 2, 15]. Balboni et
al adopt Occam as an internal model for the system explo-
ration and partitioning strategy. Cheung pursues the struc-
tural transformation and verification within the functional
programming framework. However, neither has provided a
formal proof for the correctness of the partitioning process.
In [15], Silva et al provide a formal strategy for carrying
out the splitting phase automatically, and presents an alge-
braic proof for its correctness. However, the splitting phase
delivers a large number of simple processes, and leaves the
hard task of clustering these processes into hardware and
software components to the clustering phase and the join-
ing phase. Furthermore, additional channels and local vari-
ables introduced in the splitting phase to accommodate huge
number of parallel processes actually increase the data flow
between the hardware and software components.
The remainder of this paper is organized as follows.
Section 2 describes the splitting strategy. Section 3 in-
troduces the programming language we adopt and ex-
plores its algebraic laws. Section 4 poses the static anal-
ysis that we perform on the source program. Section
5 investigates the underlying target architecture of hard-
ware/software components. Section 6 provides the syntax-
based hardware/software splitting rules in both bottom-up
and top-down styles.
2. Splitting Strategy
This section describes our partitioning strategy. A se-
quential source program of a communication language is
generated from the customer’s requirements. A static analy-
sis [10] is performed on the source program in order to pro-
vide to the programmer statistical data, such as structural
complexities of expressions and their occurrence frequen-
cies, distributive information with respect to those variables
occurring in expressions. Based on the result of the analy-
sis, the programmer marks those parts of the program that
are worth to be implemented by hardware and leaves others
to software, and as well divides the interface of the program
to two disjoint parts.
The implementation-oriented program marking and in-
terface (variable) partitioning are conducted by the follow-
ing guidelines:
 For the concern of security or other special reasons,
some specific blocks will be predetermined to be im-
plemented by hardware or software.
 In general, those procedures which are frequently in-
voked and those specific blocks that occurs frequently
should be marked out to be implemented by hardware,
to gain high performances.
 Some procedures/blocks involving very complicated
computation (e.g., containing intricate expressions)
should be marked and implemented by hardware, to
improve timing performance.
 Busy variables should be allocated to hardware, to
make high-speed access available, whereas the remain-
ing variables and large scale data structures, such as
large arrays, should be left to software, to achieve
lower costs.
 The number of interactions between software and
hardware should be minimized since they incur high
costs.
 In addition, the customer’s demands concerned with
the performance and the cost should also be taken into
account.
We take such a marked source program as input of our
hardware/software splitting algorithm that generates as out-
put a program comprising two concurrent processes repre-
senting software and hardware components respectively.
3. Preliminaries
The language we select to perform hardware/software
partitioning is a subset of Occam which was designed for
constructing communicating systems.
1. Sequential Process:
S ::= PC (primitive command)
j S;S (sequential composition)
j if b S else S (conditional)
j S u S (non-deterministic choice)
j b  S(iteration)
j (g S) [] (g S) (guarded choice)
j declaration  S (local declaration)
where PC ::= (x := e) j skip j chaos j c ! e j c ?x
j procn(e; v) (procedure invocation)
j hSi (annotated block)
and g is skip or a communication event c ! e or d ?x.
2. Parallel Program:
P ::= S j P k P j declaration  P
where declaration ::= var dec j chan dec j proc dec
var dec ::= var v : type(v)
Proceedings of the Eighth Asia-Pacific Software Engineering Conference (APSEC01) 
1530-1362/01 $17.00 © 2001 IEEE 
chan dec ::= chan c : type(c)
proc dec ::=
procedure procn(in u : type(u); out v : type(v))
begin S end 
In the later discussion, we adopt Var(P ) and Chan(P )
to denote the set of variables and channels employed by P .
Moreover, we will not mention the type information of a
variable in a declaration if it is obvious.
As a subset of Occam, the language enjoys a rich set
of algebraic laws presented in [13, 3, 7, 9, 8]. Here we only
explore those algebraic laws which will be employed within
the proofs in the following sections.
Successive assignments to the same variable can be com-
bined to one assignment.
L1 x := e;x := f = x := f [e=x]
Sequential composition is associative, and has left zero
chaos and unit skip . It distributes backward over internal
and external choices and conditional.
L2 (P ;Q);R = P ; (Q;R)
L3 chaos ;P = chaos
L4 skip;P = P ; skip = P
L5 (P uQ);R = (P ;R) u (Q;R)
L6 (g P )[](h Q);R = (g (P ;R))[](h (Q;R))
L7 (if b P elseQ);R = if b (P ;R) else (Q;R)
Assignment distributes forward over conditional.
L8 v := e; (if b(v)P elseQ) =
if b(e) (v := e;P ) else (v := e;Q)
The input and output event can be renamed as follows.
L9 c ?x = var lx  (c ? lx;x := lx)
L10 c ! e = varlx  (lx := e; c ! lx)
Iteration is subject to the fixed point theorem.
L11 b  P = if b (P ; b  P ) else skip
Parallel operator is symmetric and associative, and has
chaos as zero.
L12 P k Q = Q k P
L13 P k (Q k R) = (P k Q) k R
L14 chaos k P = chaos
Parallel operator also distributes over conditional. It’s
disjunctive.
L15 (if b P elseQ) k R = if b (P k R) else (Q k R),
provided Var(b) \ Var(R) = ;:
L16 (P uQ) k R = (P k R) u (Q k R)
Local variable declaration enjoys the following laws.
L17 var x(x := e) = skip, provided x does not occur
in e.
L18 var x  (if b P elseQ) =
if b (var x  P ) else (varx Q),
provided x is not free in b.
L19 var x  (P ;Q) = P ;varx Q, provided x is not
free in P .
L20 varx  (P ;Q) = varx  P ; Q, provided x is not
free in Q.
The following law deals with assignment expansion.
L21 (x := e;S) k T = x := e; (S k T )
The following law is one of the general expansion laws
of Occam [13], which deals with the case where two paral-
lel processes are guarded choice constructs.
L22 Let P = []n
i=1
(g
i
P
i
), Q = []
m
j=1
(h
j
Q
j
), where
each g
i
; h
j
has one of the forms c ! e, c ?x or skip, then
P k Q = []
N
r=1
(k
r
R
r
), where the pairs < k
r
; R
r
> are
precisely all possibilities from the following:
(i) R
r
= P
i
k Q and (k
r
= g
i
= skip or k
r
= g
i
= c ! e or
k
r
= g
i
= c ?x), where c =2 Chan(Q);
(ii) R
r
= P k Q
j
and (k
r
= h
j
= skip or k
r
= h
j
= c ! e
or k
r
= h
j
= c ?x), where c =2 Chan(P );
(iii) R
r
= x := e; (P
i
k Q
j
) and k
r
= skip and ((g
i
= c ! e
and h
j
= c ?x) or (g
i
= c ?x and h
j
= c ! e)):
Corollary 3.1
(C1) (c ! e;P ) k (c ?x;Q) = x := e; (P k Q)
(C2) Let P = (c ! e; P
1
), Q = (Q
1
; Q
2
), where c 2
Chan(Q), but no channel in Chan(P ) \ Chan(Q) occurs
in Q
1
, then P k Q = Q
1
; (P k Q
2
).
(C3) Let P = (S
1
; c ?x;S
2
), and Q = (T
1
; c ! e;T
2
),
where neither S
1
nor T
1
mentions channels in Chan(P ) \
Chan(Q), then
P k Q = (S
1
k T
1
); ((c ?x;S
2
) k (c ! e;T
2
)).
The proof is presented in [12].
We exhibit two derived algebraic laws as follows from
those basic ones.
The test of conditional should be evaluated first.
DL1 if bP elseQ = varlb  (lb := b ; if lbP elseQ)
Proof RHS f(L8)g
= varlb  (if b (lb := b;P) else (lb := b;Q)) f(L18)g
= if b (varlb  (lb := b;P)) else (var lb  (lb := b;Q))
f(L20)g
= if b (varlb  (lb := b);P) else (var lb  (lb := b);Q)
f(L17)g
= LHS 
The condition of iteration is evaluated at the beginning
of every loop.
DL2 b  P = varlb  (lb := b ; lb  (P ; lb := b))
The proof is omitted here because of the page limit. It
can be found in [12].
We introduce an ordering relation between two programs
as follows before further discussion.
Definition 3.3 (Refinement)
Given programs P; Q, we say Q is a refinement of P ,
denoted as P v Q, if P uQ = P is algebraically provable.
4. The Static Analysis
This section illustrates the static analysis on the source
program, which provides plenty of information to the pro-
grammer to assist the appropriate implementation-oriented
program marking and interface partitioning of the source
Proceedings of the Eighth Asia-Pacific Software Engineering Conference (APSEC01) 
1530-1362/01 $17.00 © 2001 IEEE 
program, aiming to gain higher performance and as well
achieve lower cost.
The static analysis comprises two parts: the subpro-
gram/expression analysis and the variable analysis. The
output of the subprogram/expression analysis consists of
three kinds of information, which will be presented in three
tables, respectively.
 Structural complexity of non-trivial expressions in the
program and numbers of their occurrences
 Numbers of invocations of procedures, the complexity
of their parameters and their structures
 Complexity of those implementation-undetermined
blocks
The complexity of expressions is specified by the function
complex as follows.
Definition 4.1 Let EXP be the set of expressions occurred
in the source program, complex : EXP ! N is inductively
defined on the structure of expressions:
complex(v) =
df
w(type(v)), for any variable v;
complex(c) =
df
1, for any constant c; and
complex(op(e
1
; : : : ; e
n
)) =
df

n
i=1
complex(e
i
) + complex(op),
where op is any operator used to construct expressions in
the source language, and complex(op) is defined by the
programmer in accordance to the complexity of op, the
function w : TYPE ! N associates a number to each type
of variables and channels in the program to measure their
complexity. 
By scanning the program, we obtain the occurrence fre-
quency of expressions, which can be regarded as another
factor of criteria about busyness of expressions.
By scanning the program, we also gain the number of
invocations of procedures. Through analysing the declara-
tions of those procedures in the program, we get the com-
plexity of their parameters. Suppose v
1
: T
1
; : : : ; v
k
: T
k
is the list of parameters for some procedure, then the com-
plexity of its parameters is k
i=1
w(T
i
).
It is also possible to define the complexity of procedures
or blocks that do not contain iterations. If the number of
loops can be predicted or estimated, the complexity of
those which contain iterations can also be calculated.
Definition 4.2 The complexity of subprograms (proce-
dures/blocks) can be evaluated as follows.
com(v := e) =
df
w(type(v)) + w(:=) + complex(e);
com(c ! e) =
df
w(type(c)) + complex(e);
com(c ?x) =
df
w(type(c)) + complex(x);
com(S
1
;S
2
) =
df
com(S
1
) + com(S
2
);
com(if b S
1
else S
2
) =
df
com(b) +max(com(S
1
); com(S
2
));
com((g
1
;S
1
) [] (g
2
;S
2
)) =
df
max(com(g
1
;S
1
); com(g
2
;S
2
));
com(while b S) =
df

(com(b) + com(S)) maxN; num. of loops is maxN;
1; otherwise:
where w(:=) is defined by the programmer. 
Based on the three tables the analysis generates, the
programmer can appropriately figure out those parts that
should be implemented by hardware, in accordance with
those guidelines listed before.
The second step of the analysis provides the following
information about variables.
 the structural complexity of each variable
 the occurrence frequency of each variable
 the distributive information of each variable
We illustrate an industrial example in the following.
Example 4.3 The source program is concerned with the
design of an ATM Switch. The code is illustrated in the
appendix.
Provided complex(+) = complex( ) = complex ()
= complex(<) = 10, complex(^) = complex(:) = 5,
w(int) = 8; w(Bool) = 1; w(:=) = 2, then the results of
the analysis are listed below.
expression complex num
RP+ 1 19 2
RC+ 1 19 5
AC+ 1 19 4
RM+ 1 19 3
1::ok:LPP 6 1
2:ok:HPP ^ ok:HPM 7 1
3:ok:HPP ^ :ok:HPM ^ ok:LPM 18 1
4:ok:HPP ^ :ok:HPM ^ :ok:LPM 23 1
5::ok:HPP ^ ok:LPP ^ ok:LPM 18 1
6::ok:HPP ^ :ok:LPP 17 1
7::ok:HPP ^ :ok:LPP ^ :ok:LPM 28 1
procedure num complex. of para. complex. of proc.
GCRA 4 57 141
UPT 9 82 94
variable num. complex. distribution
VPI,VCI 2 8 input
GFC,PT,HEC,Pl,QoS 4 8 input/output
aT 5 8 input
X,L,LCT 5 32 GCRA, input
I 4 32 GCRA, input
nX 8 32 GCRA, output
nLCT 7 32 GCRA, output
ok 22 4 GCRA, 1,2,3,4,5,6,7
CLP 6 4 UPT, input
nCLP 12 4 UPT, input/output
RP,RM,RC,AC 10 8 UPT, input
nRP,nRM,nRC,nAC 9 8 UPT
send 10 1 UPT
nVPI,nVCI,pN 6 24 input/output
Proceedings of the Eighth Asia-Pacific Software Engineering Conference (APSEC01) 
1530-1362/01 $17.00 © 2001 IEEE 
The criterion of the interface partitioning is that a vari-
able should be allocated to hardware if its structure is not
complicated and it occurs in those procedures/blocks which
are assigned to hardware more often than those ones that are
left to software.
5. The Hardware/software Target Architecture
This section describes the target architecture of our par-
titioning approach which confines hardware and software
components to specially chosen forms. To synchronize their
activities, we introduce a simple handshaking protocol to
streamline communications between them.
Suppose B = fr
j
; a
j
j j 2 Ig is a set of channels,
we define CP (B) as the set of communicating processes C
with Chan(C)  B and one of the following forms.
(1). a communicating process which does not use any
channel in B.
(2). r
j
! e;C; a
j
?x; where C is a member of CP (B)
not interacting via channels in B.
(3). C
1
;C
2
; or C
1
u C
2
; or if bC
1
elseC
2
, or
(g
1
C
1
)[](g
2
C
2
), where both g
i
and C
i
lie in CP (B), for
i = 1; 2.
(4). b  C, where C is a member of CP (B).
To simplify the interface design, we confine the interac-
tions between the hardware and software components to the
communications along the channels from the set B. Our
partitioning rules will select the software components from
the set CP (B), and organise the hardware component in
the form of
D = X  ([]
j2I
(r
j
?x
j
;M
j
; a
j
!y
j
;X)[]skip)
where none ofM
j
mentions channels in B. The communi-
cating process D represents a digital device which offers a
set of services to its environment, each of which responds
to a request from its environment on an input channel r
j
by
running the corresponding program M
j
and delivering the
result to the output channel a
j
afterwards. The translation
from such a hardware specification to netlists will be tack-
led using the hardware compilation techniques [11].
We denote asH(B) the set of those processes which own
the same form as D.
Theorem 5.1 (C
1
;C
2
) k D = (C
1
k D); (C
2
k D), for any
C
1
, C
2
in CP(B).
Proof By structural induction on C
1
.
(1). No channels in B appear in C
1
.
LHS fCo3.1(C2)g
= C
1
; (C
2
k D) fCo3.1(C2)g
= RHS
(2). C
1
= r
j
! e;C; a
j
?x, for some r
j
; a
j
2 B;C 2 CP(B);
and no channel in B occurs in C.
LHS fCo3.1(C1)g
= x
j
:= e; ((C; a
j
?x;C
2
) k (M
j
; a
j
! y
j
;D)) fCo3.1(C3)g
= x
j
:= e; (C k M
j
); ((a
j
?x;C
2
) k (a
j
! y
j
;D))
fCo3.1(C1)g
= x
j
:= e; (C k M
j
);x := y
j
; (C
2
k D) fCo3.1(C1, C3)g
= RHS
(3).C
1
= C
01
;C
02
; where C
01
; C
02
2 CP(B):
We know C
02
;C
2
2 CP(B), from the definition of
CP(B). Then
LHS fhypothesisg
= (C
01
k D); ((C
02
;C
2
) k D) fhypothesisg
= (C
01
k D); (C
02
k D); (C
2
k D) fhypothesisg
= RHS
(4). C
1
is one of the cases: (i) if bC
01
elseC
02
, (ii) C
01
u
C
02
, (iii) (g
1
C
01
)[](g
2
C
02
), where C
01
;C
02
2 CP(B). We
demonstrate the first case here, others are similar ([12]).
LHS f(L7)g
= (if b (C
01
;C
2
) else (C
02
;C
2
)) k D f(L15)g
= if b ((C
01
;C
2
) k D) else ((C
02
;C
2
) k D) fhypothesisg
= if b ((C
01
kD); (C
2
kD)) else ((C
02
kD); (C
2
kD))f(L7)g
= (if b (C
01
k D) else (C
02
k D)); (C
2
k D) f(L15)g
= RHS
(5). C
1
= b  C
0
, where C
0
2 CP(B).
We define F(X) =
df
if b (C
0
;X) else skip, and
fFn(chaos); n  0g as F0(chaos) =
df
chaos ; and
Fn+1(chaos) =
df
F(Fn(chaos)), for n  0.
Then C
1
= X  F(X) = F
n0
Fn(chaos), and
Fn(chaos) 2 CP(B), for n  0.
LHS
= ((
F
n0
Fn(chaos));C
2
) k D fcontinuity of k; ; g
=
F
n0
((Fn(chaos);C
2
) k D) fhypothesisg
=
F
n0
((Fn(chaos) k D); (C
2
k D)) fcontinuity of k; ; g
= RHS 
Corollary 5.2 IfC 2 CP(B), then (bC) k D = b(C k D).
The proof is presented in [12]. 
6. Syntax-based Splitting Rules
This section discusses program splitting rules. First we
show how the static analysis affects the partition of prim-
itive commands into hardware and software components.
Secondly we demonstrate how to construct hardware and
software parts of a construct from those of its constituents.
We establish the correctness of those rules by using the al-
gebraic laws given in Section 3.
We introduce a predicate Split , which will be of great
help in formalising the decomposition rules.
Definition 6.1 (Split)
Let B = fr
j
; a
j
j j 2 Ig. Given a sequential process
S, its hardware/software partition (C;D) is specified by the
following predicate:
SplitB(S;C;D) =df
S v (C k D) ^ C 2 CP(B) ^ D 2 H(B)
Var(C) \ Var(D) = ; ^ Chan(C) \ Chan(D) = B ^
InputChan(C) \ Inputchan(D) = ; ^
Proceedings of the Eighth Asia-Pacific Software Engineering Conference (APSEC01) 
1530-1362/01 $17.00 © 2001 IEEE 
Outputchan(C) \ Outputchan(D) = ;
where InputChan(C) is the set of channels employed by C
and only used for input tasks, Outputchan(C) is similar. 
6.1. The Bottom-up Splitting Approach
The bottom-up approach builds the hardware component
from a program directly from the static analysis in one step,
i.e., the hardware device is to provide all the services fre-
quently used by the program. However, it constructs the
software component from those of its constituents using the
following rules.
Bottom-up Rule for Sequential Composition
Split
B
(S
i
; C
i
; D); i = 1; 2
Var(S
1
) = Var(S
2
); Chan(C
1
) = Chan(C
2
)
Split
B
(S
1
;S
2
; C
1
;C
2
; D)
Proof S
1
; S
2
f; is monotonicg
v (C
1
k D); (C
2
k D) fTh:5:1g
= (C
1
; C
2
) k D 
Bottom-up Rule for Conditional
Split
B
(S
i
; C
i
; D); i = 1; 2
Var(S
1
) = V ar(S
2
); Chan(C
1
) = Chan(C
2
)
Var(b)  Var(C
1
)
Split
B
(if b S
1
else S
2
; if bC
1
elseC
2
; D)
Proof if b S
1
else S
2
fcond is monotonicg
v if b (C
1
k D) else (C
2
k D) f(L15)g
= (if bC
1
elseC
2
) k D 
Bottom-up Rule for Iteration
Split
B
(S; C; D)
Var(b)  Var(C)
Split
B
(b  S; b  C; D)
When Var(b) \ Var(D) 6= ;, we will introduce a local
variable lb, and rewrite the conditional and iteration into the
forms
var lb  (lb := b; if lb S
1
else S
2
), and
var lb  (lb := b; lb  (S; lb := b)) respectively by law
DL1 and DL2. The partitioning rule for lb := b will be
discussed later.
The non-deterministic choice can be regarded as a spe-
cial case of guarded choice when all the guards are skip . We
present the partitioning rule for guarded choice constructs
as follows and omit the rule for non-deterministic choice.
Bottom-up Rule for Guarded Choice
Split
B
(S
i
; C
i
; D); i = 1; 2
Var(S
1
) = Var(S
2
); Chan(C
1
) = Chan(C
2
)
Var(g
i
)  Var(C
1
); i = 1; 2
Chan(g
i
)  Chan(C
1
); i = 1; 2
Split
B
((g
1
S
1
)[](g
2
S
2
); (g
1
C
1
)[](g
2
C
2
); D)
The proofs for the last two rules are straightforward and
are presented in [12], due to the page limit.
6.2. The Top-down Splitting Approach
In this approach, both the hardware and software com-
ponents of the source program are assembled from those of
its constituents.
Before presenting a set of top-down splitting rules, we
introduce the notion of interface-consistency on hardware
components.
Definition 6.2 For k = 1; 2, let
D
k
=
df
X  ([]
i2I
k
(r
i
?x
i
;M
i
; a
i
!y
i
;X)[]skip),
D
1
and D
2
are said to be interface-consistent, denoted by
Consist(D
1
; D
2
), if Var(D
1
) = Var(D
2
), and
Chan(D
1
)nB
1
= Chan(D
2
)nB
2
,
where B
k
=
df
fr
j
; a
j
j j 2 I
k
g, for k = 1; 2.
In such a case, we define
D = union(D
1
; D
2
) =
df
X  ([]
i2I
1
[I
2
(r
i
?x
i
;M
i
; a
i
!y
i
;X)[]skip) 
We first present a basic rule, from which and the
bottom-up rules we obtain the corresponding top-down rule
straightforwardly in each case.
Rule for Hardware Augmentation
Split
B
1
(S; C; D
1
)
Consist(D
1
; D
2
); Chan(C) \ B
2
 B
1
Split
B
1
[B
2
(S; C; D)
Top-down Rule for Sequential Composition
Split
B
i
(S
i
; C
i
; D
i
); i = 1; 2
Var(S
1
) = Var(S
2
); Chan(S
1
) = Chan(S
2
)
Consist(D
1
; D
2
)
Split
B
1
[B
2
(S
1
;S
2
; C
1
;C
2
; D)
Top-down Rule for Conditional
Split
B
i
(S
i
; C
i
; D
i
); i = 1; 2
Var(S
1
) = Var(S
2
); Chan(S
1
) = Chan(S
2
)
Consist(D
1
; D
2
); Var(b)  Var(C
1
)
Split
B
1
[B
2
(if b S
1
else S
2
; if bC
1
elseC
2
; D)
Top-down Rule for Guarded Choice
Split
B
i
(S
i
; C
i
; D
i
); i = 1; 2
Var(S
1
) = Var(S
2
); Chan(S
1
) = Chan(S
2
)
Consist(D
1
; D
2
)
Var(g
i
)  Var(C
1
); Chan(g
i
)  Chan(C
1
); i = 1; 2
Split
B
1
[B
2
((g
1
S
1
)[](g
2
S
2
); (g
1
C
1
)[](g
2
C
2
); D)
Proceedings of the Eighth Asia-Pacific Software Engineering Conference (APSEC01) 
1530-1362/01 $17.00 © 2001 IEEE 
6.3. Splitting Primitive Commands
This section deals with primitive commands splitting.
We only investigate the following nontrivial cases: the as-
signment, the invocation of a procedure, and the annotated
blocks.
1. An assignment u := e(v)
We focus on the cases where both hardware and software
participate in the evaluation of e(v) and the update of u.
Case 1: e(v) is a “busy” expression, and the variable v has
been allocated to the hardware component.
Split
B
(u := e(v); C; D), where
C =
df
(r
j
! 1; a
j
?u), and
D =
df
X  ((r
j
?x; y := e(v); a
j
! y;X) [] skip)
Case 2: e(v) is a “busy” expression, however, v has been
allocated to the software component.
Split
B
(u := e(v); C; D), where
C =
df
(r
j
! v; a
j
?u), and
D =
df
X  ((r
j
?x; y := e(x); a
j
! y;X) [] skip)
Case 3: e(v) is not a “busy” expression, but v is allocated
to the hardware component.
Split
B
(u := e(v); C; D), where
C =
df
(var lv  (r
j
! 1; a
j
? lv; u := e(lv))), and
D =
df
X  ((r
j
?x; y := v; a
j
! y;X) [] skip) 
More intricate case of assignment u := e(v; w), where v
andw have respectively been allocated to the software com-
ponent and the hardware one, will be converted to several
successive assignments owning the form we have dealt with
above, by the algebraic law with respect to assignments.
2. A procedure invocation
Without lose of generality, we investigate the invocation
proc(e
S
; e
H
; v
S
; v
H
), where e
S
is supplied by software,
e
H
is evaluated by hardware, v
S
and v
H
are allocated to
software and hardware, respectively. We are interested in
the case where the procedure is implemented by hardware.
Split
B
(proc(e
S
; e
H
; v
S
; v
H
); C; D), where
C =
df
(cr
j
! e
S
; ca
j
? v
S
), and
D =
df
X((cr
j
?x; proc(x; e
H
; y; v
H
); ca
j
! y;X) [] skip)
3. An annotated block
We concentrate on the case where the block hB(v
S
; v
H
)i
is predetermined to be implemented by hardware, and the
variables that occur in the block v
S
and v
H
are allocated
to software and hardware, respectively. We need to arrange
the data flow between software and hardware.
Split
B
(hB(v
S
; v
H
)i ; C; D), where
C =
df
(cr
j
! v
S
; ca
j
? v
S
), and
D =
df
X((cr
j
?x; y := x;B(y; v
H
); ca
j
! y;X) [] skip)
7. Conclusion
This paper shows how the hardware/software partition-
ing problem can be tackled in the algebra of programs.
The partitioning task consists of the static program anal-
ysis phase and the splitting phase, where the former pro-
vides the information for moving operations from software
to hardware and reducing the communication between com-
ponents, and the latter supports a compositional approach
to the program partitioning. To synchronize software and
hardware components, and reduce the complexity of their
interface, we introduce a simple handshaking protocol, and
propose a normal form for the hardware components. The
correctness of the splitting process is verified using the al-
gebraic laws of the source language. To deal with co-design
of embedded systems, we shall introduce timing constraints
into our source program, which will result in timed hard-
ware and software components.
References
[1] A. Balboni et al, “Partitioning and Exploration Strategies in
the TOSCA Design Flow”, In Proceedings of Fourth Inter-
national Workshop on Hardware/Software Codesign, 62–69,
IEEE Computer Society Press, (1996).
[2] T. Cheung, “A Multi-level Transformation Approach to Hard-
ware/Software Co-design”, In Proceedings of Fourth Inter-
national Workshop on Hardware/Software Codesign, 10–17,
(1996).
[3] He Jifeng, Provably Correct Systems: Modelling of Com-
munication Languages and Design of Optimised Compilers,
McGraw-Hill Publisher, 1994.
[4] He Jifeng, I. Page and J. Bowen, “A Provable Hardware Im-
plementation of Occam”, Lecture Notes in Computer Science
711, 693–703, (1993).
[5] He Jifeng and J. Bowen, “Specification, Verification and Pro-
totyping of an Optimised Compiler”, Formal Aspect of Com-
puting 6, 643–658, (1994).
[6] He Jifeng et al, “Provably Correct Systems”, Lecture Notes in
Computer Science 863, 288–335, (1994).
[7] C.A.R. Hoare, Communicating Sequential Processes, Pren-
tice Hall, 1985.
[8] C.A.R. Hoare and He Jifeng, Unifying Theories of Program-
ming, Prentice Hall, 1998.
[9] C.A.R. Hoare et al, “Laws of Programming”, Communica-
tions of the ACM, Vol 30(8): 672-686, 1987.
[10] Flemming Nielson, Hanne Riis Nielson, and Chris Hankin,
Principles of Program Analysis, Springer-Verlag, 1999.
[11] Ian Page and Wayne Luk, “Compiling Occam into FPGAs”,
in FPGAs, eds., WillMoore andWayne Luk, 271-283, Abing-
don EE&CS books, 1991.
[12] Qin Shengchao and He Jifeng, “An Algebraic Approach
to Hardware/software Partitioning”, UNU/IIST Report 206,
Macau, June, 2000.
[13] A.W.Roscoe and C.A.R. Hoare, “Laws of Occam Program-
ming”, Theoretical Computer Science, Vol 60: 177-229,
1988.
Proceedings of the Eighth Asia-Pacific Software Engineering Conference (APSEC01) 
1530-1362/01 $17.00 © 2001 IEEE 
[14] Augusto Sampaio, “An Algebraic Approach to Compiler De-
sign”, World Scientific, (1997).
[15] L. Silva, A. Sampaio and E. Barros, “A Normal Form Re-
duction Strategy for Hardware/software Partitioning”, Formal
Methods Europe (FME) 97, Lecture Notes in Computer Sci-
ence, 1313, (1997) 624-643.
8. Appendix
The source code in Example 4.3.
—————————————————————-
– ATM switch
– Variables
– GFC, VPI, VCI, PT, CLP, HEC, Pl - ATM fields
– aT - arrival time of the cell
– nVPI, nVCI - new id of the cell
– QoS - quality of service
– pN - Port number
– HPP - variables of high priority peak policy
– LPP - variables of low priority peak policy
– HPM - variables of high priority mean policy
– LPM - variables of low priority mean policy
– RP/nRP - current/new value of rejected cells due to peak
– RM/nRM - current/new value of rejected cells due to mean
– RC/nRC - current/new value of rejected cells
– AC/nAC - current/new value of accepted cells
– send - boolean variable which decides whether the cell
must be sent or not.
– Communication with the environment (channels)
– chCell - receives the cell
– chOut - sends the cell
– ch1ReadTable - request data from the table
– ch2ReadTable - receive the fields of the table
– chRouteTable - receive the new identifiers and output port
– chWTable - update the table
—————————————————————–
– Generic Cell Rate Algorithm — Leaky Bucket
– X = bucket level
– LCT = Last Conformance Time
– ta = arrival time
– I = increment
– L = cell delay variation tolerance
—————————————————————–
procedure GCRA(in X, LCT, at, I, L: int,
out ok: Bool, nLCT, nX: int)
begin
var Xtmp: int  Xtmp := X - (at - LCT);
if (Xtmp< 0) nX := I; nLCT := at; ok := true;
else if (Xtmp L) nX := Xtmp + I; nLCT := at; ok := true;
else nX := X; nLCT := LCT; ok := false;
end
procedure UPT(in tt: Bool, p,m,r,a,cl: int,
out send: Bool, nRP, nRM, nRC, nAC, nCLP: int)
begin
send := tt; nRP := p; nRM := m;
nRC := r; nAC := a; nCLP := cl;
end
—————————————————————–
var GFC, VPI, VCI, PT, HEC, Pl, aT, QoS: int, X, L,
I, LCT, nX, nLCT: record of HPP, LPP, HPM, LPM: int end,
ok, CLP, nCLP: record of HPP, LPP, HPM, LPM: Bool end,
RP, nRP, RM, nRM, RC, nRC, AC, nAC: int, send: Bool,
nVPI[3], nVCI[3], pN[3]: array of int;
– Read the cell and the table
chCell ? (GFC, VPI, VCI, PT, CLP, HEC, Pl, aT);
ch1ReadTable ! (VPI, VCI);
ch2ReadTable ? (QoS, X, L, I, LCT, RP, RM, RC, AC);
chRouteTable ? (nVPI[0], nVCI[0], pN[0]);
chRouteTable ? (nVPI[1], nVCI[1], pN[1]);
chRouteTable ? (nVPI[2], nVCI[2], pN[2]);
GCRA(X.HPP, LCT.HPP, aT, I.HPP, L.HPP,
ok.HPP, nLCT.HPP, nX.HPP);
GCRA(X.HPM, LCT.HPM, aT, I.HPM, L.HPM,
ok.HPM, nLCT.HPM, nX.HPM);
GCRA(X.LPP, LCT.LPP, aT, I.LPP, L.LPP,
ok.LPP, nLCT.LPP, nX.LPP);
GCRA(X.LPM, LCT.LPM, aT, I.LPM, L.LPM,
ok.LPM, nLCT.LPM, nX.LPM);
if CLP
if : ok.LPP UPT(false,RP+1,RM,RC+1,AC,CLP,send,
nRP,nRM,nRC,nAC,nCLP);
else if ok.LPM UPT(true,RP,RM,RC,AC+1,CLP,
send,nRP,nRM,nRC,nAC,nCLP);
else UPT(false,RP,RM+1,RC+1,AC,CLP,
send, nRP,nRM,nRC,nAC,nCLP);
else if ok.HPP ^ ok.HPM UPT(true,RP,RM,RC,AC+1,CLP,
send, nRP,nRM,nRC,nAC,nCLP);
else if ok.HPP ^ : ok.HPM ^ ok.LPM
UPT(true,RP,RM,RC,AC+1,1,send,
nRP,nRM,nRC,nAC,nCLP);
else if ok.HPP ^ : ok.HPM ^ : ok.LPM
UPT(false,RP,RM+1,RC+1,AC,1,send,
nRP,nRM,nRC,nAC,nCLP);
else if : ok.HPP ^ ok.LPP ^ ok.LPM
UPT(true,RP,RM,RC,AC+1,1,send,
nRP,nRM,nRC,nAC,nCLP);
else if : ok.HPP ^ : ok.LPP
UPT(false,RP+1,RM,RC+1,AC,1,send,
nRP,nRM,nRC,nAC,nCLP);
else if : ok.HPP ^ : ok.LPP ^ : ok.LPM
UPT(false,RP,RM+1,RC+1,AC,1,send,
nRP,nRM,nRC,nAC,nCLP);
else skip;
– Send the cell
if send
chOut ! (pN[0],QoS,GFC,nVPI[0],nVCI[0],PT,nCLP, HEC,Pl);
chOut ! (pN[1],QoS,GFC,nVPI[1],nVCI[1],PT,nCLP, HEC,Pl);
chOut ! (pN[2],QoS,GFC,nVPI[2],nVCI[2],PT,nCLP, HEC,Pl);
else skip;
– Update the table
chWTable ! (nX.HPP,nLCT.HPP,nX.HPM,
nLCT.HPM,nX.LPP, nLCT.LPP);
Proceedings of the Eighth Asia-Pacific Software Engineering Conference (APSEC01) 
1530-1362/01 $17.00 © 2001 IEEE 
