Abstract. This paper introduces a general scheme for formally embedding high level synthesis by formulating its basic steps as transformations within higher order logic. A functional representation of a data ow graph is successively re ned by means of generic logical transformations. Algorithms that are based on logical transformations guarantee \correctness by design". They not only construct an implementation but also derive the proof for its formal correctness, on the y. An extra postsynthesis-veri cation step becomes obsolete. The logical transformations presented in this paper form a framework for formally embedding existing high-level-synthesis procedures.
Introduction
Guaranteeing functional correctness in hardware synthesis is an essential but demanding task. This is due to the complexity of synthesis tools and the underlying synthesis algorithms. Hence various forms of formal veri cation techniques are employed to prove the correctness of the implementations, resulting from the synthesis process Melh93, ScKK93, Gupt92] . However the applicability of formal veri cation tools within the synthesis context is limited, since the proof of the goal \implementation ) speci cation" is very complex.
Post-synthesis veri cation is an exacting goal. Full automation can only be achieved for small sized circuits on lower levels of abstraction. For large sized circuits, veri cation algorithms either run into space/time hurdles or the user has to interact and perform some proofs by hand.
Conventional synthesis algorithms just determine the implementation | the information on how the speci cation was re ned into an implementation gets lost. The loss of this information is a major bottleneck for veri cation. The verication process gets just two logical formulae corresponding to the speci cation and the implementation. On the other hand, synthesis is split-up into a set of well-de ned steps, namely scheduling, allocation and binding, and furthermore there exists a vast body of knowledge for solving these steps in an e ective manner CaWo91, Paul91, RoKr91] . We therefore propose a technique for \for-mal synthesis" which closely adheres to the steps of conventional synthesis and additionally exploits the knowledge available.
The idea of formal synthesis is in itself not new. One of the early attempts dealt with the conversion of regular expressions into hardware circuits John84]. Later, a number of techniques were proposed for interactively re ning the speci cations into implementations Lars94, HaLD89, JoWB89, AHL92, MaFo91, FoMa90]. All these above-mentioned techniques have one common drawback, namely they do not exploit the knowledge of the algorithms which abound in synthesis. The novelty of our current approach is that no new synthesis algorithms (either formal or informal) are proposed, but a general scheme for logically embedding various existing synthesis algorithms within a formal set-up is presented.
The outline of this paper is as follows: we rst brie y examine the synthesis problem and de ne the notations and scope of our work. Then we describe the formal techniques for scheduling, allocation and binding, respectively.
Basics of the \Formal Synthesis" Scenario
Starting from an algorithmic description, which does not incorporate timing explicitly, the overall aim of high level synthesis is to extract the data path and the controller. The major steps in synthesis are:
1. scheduling under restrained/unrestrained resource constraints 2. allocation 3. binding 4. determination of the RT-level implementation
Our Starting Point
The approach given in this paper deals with synthesis based on data-ow graph representations only. We represent the given data ow graph by a typed function g, and proceed with the various steps of synthesis. The sequential circuit corresponding to g will then repeatedly determine g(x), for the various values of x. Since we use typed functions, a single input x is su cient to represent any number of inputs corresponding to any type, since they can all be bundled together into a single x. The type de nition uniquely determines the set of inputs and outputs. We will clarify this notion shortly.
An Example for g
Throughout this paper we shall illustrate the various steps of synthesis via an example named myg. myg maps a triple (a; b; c) onto the pair (x; y) as de ned by the pseudo-procedural description in gure 1. Assuming that all the variables used are of the type natural numbers num, the overall type of the function is num num num ! num num As basic operations there are the binary operations +, ? and and the unary operation inc. The operator inc maps some x to x + 1. Since the intermediate results are used within the succeeding expressions, they will be named explicitly.
In our example they are named p, q, r, s and t. The data ow diagram which corresponds to the procedural code is given in gure 2. 
On comparing the pseudo-procedural code in gure 1 with the de nition in equation (1), a direct one-to-one correspondence can be noticed. A little bit of formal syntactic sugaring yields the de nition. This is true, if the pseudo procedural code consists of purely basic blocks.
The Formal Synthesis Scheme
Having de ned the basics we will now proceed to give a gist of the overall formal synthesis scheme:
1. Convert the initial data ow graph into a functional representation. 2. Use an algorithm for scheduling, allocation or binding which performs the respective task on the data ow graph and gives us a schedule, allocation or binding, respectively. 3. Apply the pre-proven generic transformations for each task on the data ow graph along with the results of the algorithm. 4. Obtain a transformed function which is equivalent (in the logical sense) to the original description. 5. Derive the RT-level implementation from the transformed function.
Step 3 -the heart of the overall strategy, has been made possible by meticulous proofs of the generic transformations. They take in a function and the results of a speci c synthesis step, and produce a new function which represents the end product of that speci c synthesis step. Thus we are able to exploit all the optimizations that are o ered by a particular algorithm and additionally, these transformations are automated and also not time-consuming. In the following sections, we shall show the transformed functions corresponding to the steps: scheduling, allocation and binding. The entire synthesis scheme will be implemented using the HOL theorem prover GoMe93].
Scheduling
Scheduling determines the number of control steps (c-steps) for each calculation period 1 and assigns each operation to one particular c-step 0 : : : n. Given a speci c n, the basic idea of the \schedule transformation" is to break up the original function g into a sequence of functions g 0 ; g 1 ; : : : g n , so that the composition of these functions yields the original function, i.e. g = g n g n?1
: : : g 0 .
In our example, there are 7 operations, whose outputs are the auxiliary variables p, q, r, s, t, x and y. We will also use the names of the auxiliary variables to denote the operation that produces this variable. Operation s, for example, is the rst + operation and the auxiliary variable s is its output.
Let us assume, that it is intended, that only two circuits are used: one multiplier and one multi-purpose unit for adding +, subtracting ? and incrementing inc. Under this hardware constraint several schedules are possible. Any arbitrary algorithm may determine the schedule.
In our example we will use the schedule sketched in gure 3: in c-step 0, s is processed, in c-step 1, p and q are processed, in c-step 2, r and t are processed and in c-step 3, x and y are processed. In this schedule n becomes 3. There are also schedules for myg with n 6 = 3, but under the given restrictions, n = 3 is the minimum. The scheduled function myg will be described by means of a composition of four functions myg = g 3 g 2 g 1 g 0 , where the functions g 0 , g 1 , g 2 and g 3 perform the computations of c-step 0, 1, 2 and 3 respectively. The formal representation of the transformed function (theorem (2)) is derived by means of applications of the operator de nition, expansion of let-expressions and by -reductions. It is easy to visualize that this transformation does not depend upon the scheduling algorithm itself, nor do we place any undue demands on the algorithm, except that it returns a schedule that obeys the data dependencies.
`myg = , so that the overall number is m. These variables can carry any arbitrary values since they are never used. 2 In our example four variables have to be bu ered after c-step 0, three after c-step 1 and two after c-step 2. Therefore, we add one auxiliary variable z 1 after c-step 1 and two auxiliary variables z 2 and z 3 after c-step 2 (see gure 4). The formal representation is given in theorem (3).
Binding of Registers
During register binding, variables are tied onto speci c registers. The register binding is represented as the ordering of the variables within the tuples | i.e. register binding will be formally expressed by giving the variables a speci c order.
In our example four registers are needed. They are named r 1 , r 2 , r 3 and r 4 . After c-step 0 they are used for storing a, b, s and c, after c-step 1 they are used for storing p, s, q and z 1 and after c-step 2 they are used for storing r, t, z 2 and z 3 . The mapping between the variables and the registers has to be optimized in order to avoid unnecessary variable transfers between registers. Such optimizations can be done by conventional synthesis algorithms outside the logic, and then be integrated within our formal synthesis environment.
The determination of register binding is performed again outside the logic. In this step of the algorithm, we construct a compound functional unit FU providing the operators for implementing the operations of each c-step (allocation), and we use the compound functional unit FU to implement the operations of the data ow graph (binding). As already mentioned earlier, only two operation units are needed in our example: one multiplier (named multiplier) and one multi-purpose unit (named multipurpose) for adding, subtracting and incrementing. Their formal speci cations are given as below. Such descriptions are assumed to be given in a library which de nes the abstract RT-level components. The correctness of such components is beyond the scope of this paper and can be performed using conventional veri cation techniques. In general there may be several operations of each type and optimizations in the binding between operations and functional units may reduce communication costs. In our small example the binding is unambiguous, since in each c-step there is always no more than one operation of each type.
Remark: In c-step 0, the multiplier unit remains unused. Arbitrary values z 5 and z 6 are its input and the output (named z 7 ) is not connected to the output of g 0 . Since there is only one operand needed during the inc-operation in c-step 1, one of the data inputs becomes redundant. An arbitrary value z 8 is assigned to this input.
Theorem (5) is derived by applying the de nitions of FU and the speci cations of the multiplier and the multipurpose component.
Derivation of the RT-Level Implementation
This section describes, how the preprocessed algorithmic description is converted into a RT-level description. Before this step can be performed, we must describe the temporal relationships between the algorithmic and RT-levels, i.e. we must describe, how the circuit evaluating g interfaces with its environment. We will call these relations as communication schemes.
Communication Schemes
The datapath oriented synthesis algorithm described until now is an adequate basis for deriving implementations for di erent kinds of simple communication schemes. All hardware descriptions, that may be implemented by our approach must have a xed number of c-steps 0 : : : n for each evaluation cycle. In c-step 0, the circuit reads x from the input i, in the succeeding c-steps it calculates g(x) and at c-step n it assigns g(x) to the output o. Remark: g s the function to be evaluated, n is the number of c-steps, i is the input and o is the output. Given such a formalization, the overall speci cation can be written as:
9n: specA(g; n; i; o) Speci cation B: event-driven evaluation At time 0, the circuit starts in a nonbusy state. The circuit begins a computation cycle whenever it is not busy and gets a speci c stimulus from an input signal start. Requirements: { the circuit is not busy at time 0 { if at time t the circuit is not busy and there is no start signal at time t, then the circuit will not be busy at time t + 1.
{ if at time t the circuit is not busy and there is a start signal at time t, then the circuit will be busy during t + 1; t + n], produce the required output at time t + n and be ready for new input at time t + n + 1.
Formalization:
specB(g; n; i; o) = 9n: ? :busy(0) ? 8t: :busy(t)^:start(t) ) :busy(t + 1) ? 8t: :busy(t)^start(t) ) 8m : t + 1 m t + n: busy(m)ô (t + n) = g(i(t)): busy(t + n + 1)
The overall speci cation for such circuits is: 9n: specB(g; n; i; o)
Implementation Templates
For a given function g and a communication scheme such as specA(g; n; i; o) or specB(g; n; i; o) an RT-level implementation is to be derived. It is assumed that g has already been preprocessed according to the synthesis steps described in sections 3 through 6 so that g has the form given in (5).
It is not our intention to try and nd an implementation and prove its correctness whenever we have successfully processed synthesis steps 3 through 6. Instead we use generic implementation descriptions for given communication schemes and prove a theorem stating that the generic implementation descriptions ful lls the communication scheme. During synthesis, this theorem is just instantiated and thereby the correct implementation is derived from the specication.
For lack of space, we cannot give a complete description on how the implementation is described within higher order logic and how the correctness proof is performed. Figure 6 gives a sketch of how a general implementation of the speci cation specA(g; n; i; o) looks like. It is assumed that g has the shape as in (5). The controller is a simple modulo-n-counter with n being the number of c-steps. In the middle there is the functional unit as described in section 6. The MUX-circuits on the left and right of the functional unit determine the data ow between input, output, registers and functional unit according to the c-step the circuit is in. Since there may also be multi-purpose units within the functional unit, the operations to be performed may also depend on the c-step and is therefore steered by the controller.
Conclusion
We have described, how high level synthesis can be performed by a sequence of logical transformations. Starting from the functional descriptions of data ow graphs, we have been able to successfully re ne them into an RT-level hardware description. The novelty of our approach lies in the exploitation of the existing knowledge in synthesis in a logically correct manner. This style of formal synthesis will be acceptable to most users since they can proceed with their designs in a customary manner and yet have correctness without getting into the hardship of logic. In the post-synthesis veri cation approach, however, the proof has to be \guessed" rather than constructed by derivation.
We have shown, that formal synthesis is an appropriate approach in high level synthesis and that it is possible to formally embed existing synthesis algorithms. We believe, that also in other areas of hardware design, formal synthesis can be a good alternative to the classical synthesis/post-synthesis-veri cation approach. Still, our approach of formally embedding the synthesis process has only been applied to particular synthesis algorithms in order to prove its applicability. It is our intention to provide a formal synthesis toolbox containing formally based synthesis procedures that cover the entire synthesis from the algorithmic level down to the logical level. We still have a long way to go, but we believe, that we have an interesting starting point.
