Abstract
Introduction
The design of a complex software product like a nuclear reactor control system is ideally decomposed into a progression of related phases. It starts with an investigation of the properties and behaviours of the process evolving within its environment, and an analysis of requirement for its safety performance. From these is derived a specification of the electronic or program-centered components of the system. The project then may go through a series of design phases, ending in a program expressed in a high level language. After translation into a machine code of the chosen computer, it is executed at high speed by electronic circuitry. In order to achieve the time performance required by the customer, additional application-specific hardware devices may be needed to embed the computer into the system which it controls.
With chip size reaching one million transistors, the complexity of VLSI algorithms is approaching that of software *Partially supprted by NNSFC No. 69873003 t o n leave from East China Normal University -.
The United Nations University P.O.Box 3058, Macau jifeng @ iis t.unu.edu algorithms. However, the design methods for circuits resemble the low level machine language programming methods. Selecting individual gates and registers in a circuit like selecting individual machine instruction in a program. State transition diagrams are like flowcharts. These methods may have been adequate for small circuit design when they were introduced, but they are not adequate for circuits that perform complicated algorithms. Industry interest in the formal verification of embedded systems is gaining ground since an error in a widely used hardware device can have significant repercussions on the stock value of the company concerned. In principle, proof of correctness of a digital device can always be achieved by making a comparison of the behavioral description of the circuit with its specification. But for a large system this would be impossibly laborious. What we need is a useful collection of proven equations and other theorems, which can be used to calculate, manipulate and transform the specification formulae to the product. Hardware/software co-design is a design technique which delivers computer systems comprising hardware and software components. A critical phase of co-design process is to partition a program into hardware and software. This paper proposes a partitioning method whose correctness is verified using the algebraic laws developed for the high level programming language. To meet performance goals, and reduce the communication between componenrs, our approach combines the program analysis technique with the syntax-based splitting rules to move heavy-weight operations from software to hardware. The allocation of variables is also based on the data flow analysis of the source program. One of the advantages of our method is the integration of the splitting phase with the joining phase of the partitioning process. It optimizes the underlying target architecture, and facilitates the reuse of hardware devices.
The algebraic approach advocated in this paper to verify the correctness of the partitioning process has been successfully employed in the ProCoS project on "Provably Correct Systems". The original ProCoS project [6] concentrated almost exclusively on the verification of standard compiler of a high-level programming language based on Occam down to a microprocessor based on Transputer [5] . Sampaio showed how to reduce the compiler design task to one of program transformation; his formal framework is also a procedural language and its algebraic laws [14] . Towards the end of the first phase of the project, Ian Page et a1 made rapid advance in the development of hardware compilation technique using an Occam-like language targeted towards Field Programmable Gate Arrays [ 1 I], and He Jifeng er a1 provided a formal verification of the hardware compilation scheme within the algebra of Occam programs [4] .
Recently, some works have suggested the use of formal methods for the partitioning process [l, 2, 151. Balboni et al adopt Occam as an internal model for the system exploration and partitioning strategy. Cheung pursues the structural transformation and verification within the functional programming framework. However, neither has provided a formal proof for the correctness of the partitioning process. In [15], Silva et a1 provide a formal strategy for carrying out the splitting phase automatically, and presents an algebraic proof for its correctness. However, the splitting phase delivers a large number of simple processes, and leaves the hard task of clustering these processes into hardware and software components to the clustering phase and the joining phase. Furthermore, additional channels and local variables introduced in the splitting phase to accommodate huge number of parallel processes actually increase the data flow between the hardware and software components.
The remainder of this paper is organized a s follows. Section 2 describes the splitting strategy. Section 3 introduces the programming language we adopt and explores its algebraic laws. Section 4 poses the static analysis that we perform on the source program. Section 5 investigates the underlying target architecture of hardwarekoftware components. Section 6 provides the syntaxbased hardware/software splitting rules in both bottom-up and top-down styles.
Splitting Strategy
This section describes our partitioning strategy. A sequential source program of a communication language is generated from the customer's requirements. A static analysis [ 103 is performed on the source program in order to provide to the programmer statistical data, such as structural complexities of expressions and their occurrence frequencies, distributive information with respect to those variables occurring in expressions. Based on the result of the analysis, the programmer marks those parts of the program that are worth to be implemented by hardware and leaves others to software, and as well divides the interface of the program to two disjoint parts.
The implementation-oriented program marking and interface (variable) partitioning are conducted by the following guidelines:
0 For the concern of security or other special reasons, some specific blocks will be predetermined to be implemented by hardware or software.
0 In general, those procedures which are frequently invoked and those specific blocks that occurs frequently should be marked out to be implemented by hardware, to gain high performances.
0 Some procedures/blocks involving very complicated computation (e.g., containing intricate expressions) should be marked and implemented by hardware, to improve timing performance.
0 Busy variables should be allocated to hardware, to make high-speed access available, whereas the remaining variables and large scale data structures, such as large arrays, should be left to software, to achieve lower costs.
0 The number of interactions between software and hardware should be minimized since they incur high costs.
0 In addition, the customer's demands concerned with the performance and the cost should also be taken into account.
We take such a marked source program as input of our hardware/software splitting algorithm that generates as output a program comprising two concurrent processes representing software and hardware components respectively.
Preliminaries
The language we select to perform hardwarehoftware partitioning is a subset of Occam which was designed for constructing communicating systems.
I . Sequential Process: '
S ::= PC (primitive command) In the later discussion, we adopt Var(P) and Chan (P) to denote the set of variables and channels employed by P. Moreover, we will not mention the type information of a variable in a declaration if it is obvious.
As a subset of Occam, the language enjoys a rich set of algebraic laws presented in [ 13, 3, 7, 9 , 81. Here we only explore those algebraic laws which will be employed within the proofs in the following sections.
Successive assignments to the same variable can be combined to one assignment. L1 Sequential composition is associative, and has left zero chaos and unit skip. It distributes backward over internal and external choices and conditional.
L2 ( P ; Q ) ; R = P ; ( Q ; R )
L3 chaos;P = chaos L4 L5 L6 L7 L8 z := e ; z := f = z := f [ e / z ] skip;P = P;skap = P ( P n Q ) ; R = ( P ; R ) n ( Q ; R) (9 P)O(h Q); R = (9 (P;
R))U(h (Q; RI) ( i f b P e Z s e Q ) ; R = if b ( P ; R ) e Z s e ( Q ; R )
Assignment distributes forward over conditional.
The input and output event can be renamed as follows. (c?1z; z:=Zz) L10 c ! e = var la: (lz := e ; c ! Zz)
L16 ( P n Q I I l R = ( P I I R ) n ( Q I I R )
Iteration is subject to the fixed point theorem.
Parallel operator is symmetric and associative, and has Parallel operator also distributes over conditional. It's
Local variable declaration enjoys the following laws. L17 in e.
L18 v a r z o ( i f b P e Z s e Q ) =
provided is not free in b. L19 free in P. L20 free in Q .
The following law deals with assignment expansion.
L21
The following law is one of the general expansion laws of Occam [13] . which deals with the case where two parallel processes are guarded choice constructs. The proof is presented in [12] . We exhibit two derived algebraic laws as follows from
The test of conditional should be evaluated first.
those basic ones. The proof is omitted here because of the page limit. It can be found in [ 121.
We introduce an ordering relation between two programs as follows before further discussion. Definition 3.3 (Refinement)
Given programs P, Q , we say Q is a refinement of P , 1 8 ) ) 
The Static Analysis
This section illustrates the static analysis on the source program, which provides plenty of information to the programmer to assist the appropriate implementation-oriented program marking and interface partitioning of the source program, aiming to gain higher performance and as well achieve lower cost.
The static analysis comprises two parts: the subprogram/expression analysis and the variable analysis. The output of the subprogram/expression analysis consists of three kinds of information, which will be presented in three tables, respectively. 
3-8
The complexity of expressions is specified by the function complex as follows. 
Cy==, complez(ei) + compZex(op),
where op is any operator used to construct expressions in the source language, and complex(op) is defined by the programmer in accordance to the complexity of op, the function w : TYPE + N associates a number to each type of variables and channels in the program to measure their
By scanning the program, we obtain the occurrence frequency of expressions, which can be regarded as another factor of criteria about busyness of expressions.
By scanning the program, we also gain the number of invocations of procedures. Through analysing the declarations of those procedures in the program, we get the complexity of their parameters. Suppose v1 : TI,. . . , vk : T k is the list of parameters for some procedure, then the complexity of its parameters is Ct=:=lw(Ti).
It is also possible to define the complexity of procedures or blocks that do not contain iterations. If the number of loops can be predicted or estimated, the complexity of those which contain iterations can also be calculated. Based on the three tables the analysis generates, the programmer can appropriately figure out those parts that should be implemented by hardware, in accordance with those guidelines listed before.
The second step of the analysis provides the following information about variables. The criterion of the interface partitioning is that a variable should be allocated to hardware if its structure is not complicated and it occurs in those procedureshlocks which are assigned to hardware more often than those ones that are left to software.
3-

The Hardwardsoftware Target Architecture
This section describes the target architecture of our partitioning approach which confines hardware and software components to specially chosen forms. To synchronize their activities, we introduce a simple handshaking protocol to streamline communications between them.
Suppose B = { r j , a j I j E I } is a set of channels, we define CP(B) as the set of communicating processes C with Chan(C) 2 B and one of the following forms.
(1). a communicating process which does not use any channel in B.
( 2 ) . rj ! e; C; aj ? z, where C is a member of C P ( B ) not interacting via channels in B. 
(4). b * C, where C is a member of CP(B).
To simplicy the interface design, we confine the interactions between the hardware and software components to the communications along the channels from the set B. Our partitioning rules will select the software components from the set CP(B), and organise the hardware component in the form of where none of Mj mentions channels in B. The communicating process D represents a digital device which offers a set of services to its environment, each of which responds to a request from its environment on an input channel rj by running the corresponding program Mj and delivering the resultxo the output channel aj afterwards. The translation from such a hardware specification to netlists will be tackled using the hardware compilation techniques [l 11.
We denote as H(B) the set of those processes which own the same form as D. 
Syntax-based Splitting Rules
This section discusses program splitting rules. First we show how the static analysis affects the partition of primitive commands into hardware and software components.
Secondly we demonstrate how to construct hardware and software parts of a construct from those of its constituents.
We establish the correctness of those rules by using the algebraic laws given in Section 3.
We introduce a predicate Split, which will be of great help in formalising the decomposition rules.
Definition 6.1 (Split)
Let B = { r j , a j I j E I}. Given a sequential process S, its hardwarekoftware partition (C, 0) is specified by the following predicate:
where InpufChan(C) is the set of channels employed by C and only used for input tasks, Outputchan(C) is similar. 0
The Bottom-up Splitting Approach
The bottom-up approach builds the hardware component from a program directly from the static analysis in one step, i.e., the hardware device is to provide all the services frequently used by the program. However, it constructs the software component from those of its constituents using the following rules.
Bottom-up Rule for Sequential Composition
SplitB (Sl; S2, Cl; c27 
Bottom-up Rule for Iteration
Split,(S, c, 0) The non-deterministic choice can be regarded as a special case of guarded choice when all the guards are skip. We present the partitioning rule for guarded choice constructs as follows and omit the rule for non-deterministic choice.
Bottom-up Rule for Guarded Choice
SplitB(Si, ci, D), i = 172 
Splitting Primitive Commands
This section deals with primitive commands splitting.
We only investigate the following nontrivial cases: the assignment, the invocation of a procedure, and the annotated blocks.
1. An assignment U := e ( v )
We focus on the cases where both hardware and software participate in the evaluation of e ( v ) and w have respectively been allocated to the software component and the hardware one, will be converted to several successive assignments owning the form we have dealt with above, by the algebraic law with respect to assignments.
A procedure invocation
Without lose of generality, we investigate the invocation proc (es,eH,vs,wH) , where e s is supplied by software, eH is evaluated by hardware, IJS and V H are allocated to software and hardware, respectively. We are interested in the case where the procedure is implemented by hardware. (proc(eS, eH, is predetermined to be implemented by hardware, and the variables that occur in the block vs and OH are allocated to software and hardware, respectively. We need to arrange the data flow between software and hardware. v s ,~~) 
SplitB
S p l i t~( ( B (
Conclusion
This paper shows how the hardwarelsoftware partitioning problem can be tackled in the algebra of programs.
The partitioning task consists of the static program analysis phase and the splitting phase, where the former provides the information for moving operations from software to hardware and reducing the communication between components, and the latter supports a compositional approach to the program partitioning. To synchronize software and hardware components, and reduce the complexity of their interface, we introduce a simple handshaking protocol, and propose a normal form for the hardware components. The correctness of the splitting process is verified using the algebraic laws of the source language. To deal with co-design of embedded systems, we shall introduce timing constraints into our source program, which will result in timed hardware and software components.
