Abstract| Formal synthesis has become an interesting alternative towards post-synthesis veri cation. Formal synthesis means integrating formal validation within the synthesis process by performing synthesis via rule applications. The practical applicability of formal synthesis very much depends on the e ciency of the underlying rules. This paper gives a case study about the complexity of formal synthesis programs. Experiments with two realistic-sized benchmark circuits were performed using the formal synthesis system HASH (Higher order logic Applied to Synthesis of Hardware). HASH provides means for representing and transforming circuits in a secure and logically sound manner. Furthermore, arbitrary synthesis procedures can be invoked to achieve high quality of designs. In this paper, the implementation of a formal scheduling step is used to illustrate e ciency considerations related to formal synthesis.
I. Introduction W ITH the enormous size of today's digital circuits and with the increasing complexity of the synthesis process applied for deriving them, guaranteeing the correctness of hardware synthesis is becoming an important matter. By correctness we mean that the synthesis result (implementation) satis es the synthesis input (speci cation), in a formal mathematical sense. Due to the fact that most synthesis steps are nowadays performed automatically, the correctness of hardware implementations strongly depends on the correctness of the synthesis tools being involved. In general, bugs in the programs of those synthesis tools lead to faulty implementations. The \correctness by construction" paradigm, that has been propagated in the synthesis domain, is questionable, since an automated synthesis process is not correct by de nition. Due to their complexity, the programs can only be tested partially and it is almost impossible to formally verify them. Actually, the designers do not trust in the synthesis tools and validate the synthesis results by simulation. However, with the increasing complexity of circuits, exhaustive simulation of the implementation is not possible. The solution is that formal methods have to be applied to ensure that the implementation satis es the speci cation.
A. De nition and Delimitation of Formal Synthesis
The correctness of the synthesis process can be guaranteed at three di erent phases: before, during or after the synthesis process. These approaches towards formal Institute for Circuit Design and Fault Tolerance, University of Karlsruhe, Germany. E-mail: fblumen,eisen,schmidg@ira.uka.de c IEEE This work has been partly nanced by the Deutsche Forschungsgemeinschaft, Project SCHM 623/6-1.
correctness of synthesis are called pre-synthesis veri cation, formal synthesis and post-synthesis veri cation, respectively 1]. All approaches use some kind of logic for proving the correctness of the implementation with respect to the speci cation. However, in pre-synthesis veri cation, the proof is derived even before synthesis happens, in formal synthesis the proof is formally derived during synthesis, and in post-synthesis veri cation, the proof is derived after each synthesis run.
Pre-synthesis veri cation would be favorable, since it means that the correctness is proven once and for all by means of software veri cation. Since this is extremely tedious especially for large sized programs such as synthesis tools, only few activities have been started in this area 2], 3].
At the moment, post-synthesis veri cation approaches 4], 5] like model checking and theorem proving are the most frequently used formal methods. First synthesis is performed in a conventional, non-formal manner. Afterwards, the correctness of the synthesis output is proven with respect to the synthesis input. During veri cation, the information on how the synthesis process was performed is no longer available. This is a signi cant drawback of this method. Since implementations of real-world specications often comprise a large state space, veri cation for large sized circuits is tedious or even impossible. When performing model checking 6], increase in design size results in a combinatorial explosion in the number of global states. Although research has gone very far in the model checking area and there exist large circuits that can be veri ed efciently, there are several important classes of circuits for which no e cient model checking techniques exist (counters, data paths with a large bitwidth, complex units like multipliers). Theorem proving on the other hand is not limited by the design size, but requires a lot of interaction. This is unacceptable for circuit designers, since performing complex correctness proofs by hand requires strong logical skills and a profound knowledge of the theorem prover.
In formal synthesis the synthesis process and the logical validation are strongly interwoven. Other than in conventional synthesis, where hardware is represented by arbitrary data structures, formal synthesis requires a mathematical hardware representation. In contrast to conventional synthesis, where the speci cation is re ned or optimized by arbitrary procedures, formal synthesis is restricted to the application of a small core of basic mathematical rules within a theorem prover. The correctness of a formal synthesis process only depends on the correct implementation of these basic transformation rules. Therefore, formal synthesis programs are an extremely safe alternative to conventional synthesis programs.
B. State of the Art
In the last years, formal synthesis has become a new research topic and several systems have been introduced such as T- Ruby 7] , Lambda/Dialog 8], Veritas 9], DDD 10] or HASH 11] . 1] gives a survey on di erent approaches in the formal synthesis domain.
There is also a broad range of scienti c work for deriving implementations in a transformational design style. Synthesis is performed by applying a xed set of basic circuit transformations, that are described in a more or less mathematical manner. Paper & pencil proofs are performed to prove the correctness of the circuit transformations 12]. However, the correctness of the implementation of the circuit transformations is not considered. In the CAMAD system 13] the algorithmic description is given in a Pascal-like notation. For transforming the program, it is rst translated to a timed Petri-net representation. Both the transformations from Pascal to Petri-nets and the transformations within the Petri-nets are pieces of software that are both complex and crucial to soundness and correctness. However, there is no explicit proof for the correctness of the implementation of these critical parts. The TRADES system 14], 15] uses an approach, where the elementary transformations are performed by graph-rewriting. This representation style is based on a formal semantics, but there is no explicit proof for the correctness of their implementation.
In formal synthesis, the implementation of the theorem provers core is not formally veri ed either. However, this implementation only consists of few hundred lines of code. This core, which is the only crucial part for correctness, is xed. No matter, how big and complex the formal synthesis program may become, the size of the core remains xed.
C. Objective of the Paper
In the following section we brie y introduce our formal synthesis system HASH. In the rest of the paper, we illustrate e ciency considerations at some speci c synthesis step during high-level synthesis: scheduling.
The e ciency of formal synthesis approaches depends on how e cient hardware can be represented and on how fast circuit transformations can be realized based on the given set of logical transformations. In general, there are various ways to realize such transformations. However, the complexity very much depends on the basic set of logical transformations being used and on the order in which the logical transformations are applied. This paper investigates e ciency aspects of implementations of formal synthesis transformations. Although there is a broad range of formal synthesis implementations, there are some very general e ciency considerations. The paper discusses the implementation of the scheduling step in the HOL theorem proving environment 16] . This leads to a general discussion about the e ciency of the implementation of formal synthesis transformations (section III).
Besides optimizing the circuit representation style and the order of the rule applications, it is also possible to modify the implementation of the underlying calculus (section IV). We will introduce a modi cation of the HOL theorem prover and discuss the impact towards soundness and efciency. In section V, we will discuss the extra costs for formal synthesis as compared to conventional synthesis.
II. The Formal Synthesis Approach HASH
In order to get along with the complexity of large formal synthesis programs, we believe that it is absolutely necessary to make a split between basic circuit transformation steps and the design space exploration steps. Due to this split, the heuristics for exploring the design space can be performed outside the logic and the results imported into the design transformation. Therefore standard synthesis algorithms that abound in literature 17], 18] can be exploited. It shows, that this split can be successfully applied to many synthesis steps and that for each synthesis step, e.g. scheduling, state minimization, retiming, etc., there must exist a unique transformation which performs this synthesis step. It is to be noted here, that these transformations are speci c to a synthesis step but independent of the design space exploration technique. The logical transformations are implemented as a sequence of basic logical rule applications. Given an input circuit description (speci cation), some synthesis heuristic is started calculating the control information. Then the output circuit description (implementation) is generated according to the control information and a theorem is derived that the implementation implies or is equivalent to the speci cation. This basic concept is shown in gure 1.
// Insert gure 1 here // Two important points are met independently with this strategy: quality and correctness of the implementation. The quality only depends on the algorithm that calculates the control information, whereas the correctness aspect is guaranteed due to the transformation being based on the HOL system.
The HOL system 16] is a higher order logic theorem prover environment. The core of HOL consists of ve axioms and eight primitive inference rules. The only way to derive new theorems from existing ones is via these rules. False theorems cannot be proven. Since the entire formal synthesis process is nothing but a conversion in HOL, correctness is guaranteed implicitly. Faulty implementations cannot be achieved no matter how the design space exploration step was implemented. If the control information produced by the heuristic is awed, HOL will at some point produce an exception, but it will never derive a false result. Our formal synthesis program either leads to correct implementations or to no implementation but an exception. In case of an exception, an information is produced telling the user in which synthesis step the error occurred. In conventional synthesis programs, however, such bugs might lead to faulty implementations.
In our approach, circuit transformations guarantee that the functional behavior is preserved. Besides functional correctness, there may be further requirements for the implementation such as timing and area constraints. However, checking, whether such constraints are ful lled, can easily be done by non-formal methods. Therefore, it is not necessary to formally represent and verify such constraints in logic.
III. Scheduling Transformation
As mentioned before, our formal synthesis methodology can be applied to many synthesis steps. In this paper we want to restrict ourselves to the scheduling task within high-level synthesis to demonstrate the applicability of our approach.
High-level synthesis converts an algorithmic circuit description into a structure at the Register-Transfer (RT) level. The major steps in high-level synthesis are scheduling, allocation of storage, functional and interconnection units, binding the allocated hardware onto some library components and interface synthesis.
For a better understanding, the starting point for scheduling in this paper is a basic block of some algorithmic description. It represents a pure data ow graph. However, this is no restriction of our approach. The handling of mixed control/data ow graphs is described in 19].
The scheduling task assigns a control step (c-step) to each operation in the algorithmic speci cation. There are various heuristic scheduling algorithms trying to minimize the number of control steps and the hardware requirements 18], 20]. Figure 2 gives an example of some scheduling step. The input data ow graph represented by the function g is split into a sequence of functions g 4 g 3 g 2 g 1 , where g i corresponds to c-step i. During scheduling, the number of c-steps has to be determined and each basic operation within g has to be assigned to one of the c-steps. // Insert gure 2 here // Our approach allows pipelining and to some extend chaining. Chaining is not possible, if it leads to combinatorial cycles, which cannot be expressed by our representations style (see section III-A). Furthermore, our approach currently does not support multi-cycle operations. However, if those operations can be performed by pipelining, i.e. they can be expressed by a sequence of partial operations, it is possible to integrate them. On the other hand we have currently no solution for multi-cycle operations that are not pipelined and are performed by units with internal controller and registers.
A. Formalizing Data Flow Graphs
In general, there are various ways to formalize hardware descriptions in logic. The representation style may have a signi cant impact on the e ciency of the hardware transformations. However, there are common problems. When describing circuit structures, one needs (local) variables to indicate interconnections. This paper will discuss e ciency aspects when transforming such structures in logic.
In our approach, data ow graphs are represented by means of functions that are nothing but simple compositions of basic operations. They are formalized usingexpressions 21]. The following Backus-Naur form shows the syntactical structure of data ow graphs: vblock := variable j "(" vblock "," vblock ")" expr := variable j "(" expr "," expr ")" j operator "(" expr ")" DFG-term := " " vblock "."
"let" vblock "=" expr "in" expr The two terms illustrated in gure 3 show, how the original and the scheduled data ow graph in gure 2 are represented in HOL.
// Insert gure 3 here //
The expressions in gure 3 describe input/output functions in terms of their basic operations. The functions map some input tuple to an output tuple. Each let-term describes the connectivity of one operation, i.e. a node of the data ow graph. Since these terms represent pure data ow graphs, i.e. there are no cycles, a partial ordering on the set of nodes is induced. This partial order corresponds to the fact, that some operation A must be executed before B if the output of A happens to be an input to B.
This partially ordered data ow graph is represented as an arbitrarily ordered list, whereby the data dependencies between the nodes are respected. This section describes, how the scheduling process described in gure 2 is implemented as a conversion in HOL. Our conversion is steered by an external control information, the schedule table. The schedule table indicates, which operation has to be performed in which control step.
The
The approach is based on a conversion for normalizing functions. We will rst describe this conversion and then describe, how scheduling can be realized based on this conversion.
C. Function Normalization
Both the input and the output of the scheduling process are nothing but simple compositions of the same basic operations (see gure 3). Normalizing such representations is pretty simple. The general algorithm looks as follows: We will now introduce a simple conversion which is based on this normalization scheme. Given some data ow graph representation g and some schedule table, the following steps have to be performed:
1. produce g 0 = g k : : : g 2 g 1 according to the schedule table 2. derive`g =ĝ and`g 0 =ĝ via normalization 3. The equations`g =ĝ and`g 0 =ĝ are combined tò g = g 0
The major drawback of this universal conversion is the complexity of step 2 when dealing with data ow graphs with a big depth, i.e. maximum number of operations on a path from some input to some output. Data ow graphs whose intermediate nodes have larger fanouts, i.e. the output of a node is used by many successor nodes as inputs, lead to a number of duplications during -reduction. Due to the fact that during -reduction one has to traverse the entire term and since such -redices can be nested, the term size and time consumption in step 2 may grow exponentially with the depth.
E. An Advanced Conversion
The universal conversion does not exploit any knowledge about how the synthesis step was performed. The advanced conversion exploits this knowledge. In the advanced scheduling conversion the data ow graph is split step by step rather than all at once. Let k be the number of control steps. The scheduling transformation is performed by a sequence of k ? 1 transformations, each of them splitting apart one function g i . Each step transformation is similar to the universal conversion except that step 2 is modi ed. -reduction is only applied to those variables, whose corresponding nodes have been assigned to the considered control step. Although this modi ed normalization step does not eliminate all -redices, the terms achieved after each step will be equal. Hence, for data ow graphs with larger fanouts the exponential complexity associated with step 2 is avoided and the overall cost is reduced. Since the advanced conversion performs step 2 several times, it is to be expected that for small data ow graphs it is slower, but faster for larger ones.
In the following section we present some experimental results that demonstrate the di erences between the simple and the advanced conversion.
F. Experimental Results
We consider two scalable data ow graphs. As a rst example, we use a data ow graph, that realizes the discrete cosine transform (DCT), which is frequently used for image compression. The DCT of an image with pixels x(n; m The data ow graph consists of p+q subtracters, p(q+1) multipliers and q(p ? 1) adders. The critical path has a length of 3q + 2 nodes. The structures of the two data ow graphs di er both in the depth and the number of reused intermediate results.
In contrast to the DCT, the PD data ow graphs comprise nodes with larger fanouts, which are nested, additionally. In gure 4, an example is shown, where these nodes are marked.
// Insert gure 4 here // Figure 5 shows the runtime for transforming the PD data ow graph with the simple and the advanced conversion.
We set p to 25 and increased q thus varying the size of the benchmark. For this data ow graph, the use of the advanced conversion leads to far better results. Due to the exponential memory consumption of the simple conversion, the computer's capacity of 1.2 GB was exceeded at about 600 nodes.
// Insert gure 5 here // When applying the same conversions to the DCT example, however, it shows, that the simple conversion seems to be superior (see gure 6). This is due to the fact, that both the length of the critical path and the fanouts are smaller than with the PD example. However, for even larger data ow graphs, it is to be expected that the simple conversion grows faster and nally consumes more time than the advanced conversion. In this section we present an optimized HOL theorem proving system named HOL'. HOL' di ers from HOL in two points: the term representation is changed and two basic rules are added.
A. Changing the Term Representation
In the HOL theorem prover, terms are implemented in a so-called deBrujin-style 21], which means that free and bound variables are represented di erently. A variable x is said to be a bound variable, i it is within the scope of aabstraction whose identi er is x; otherwise x is called free.
In the deBrujin representation, free variables are stored with their name and type, whereas bound variables are only represented by a number, linking to the -abstraction they refer to. The number describes the distance between the bound variable and the -abstraction in terms ofabstractions. Example: Consider the term x:( y: x + y + z). In the subterm x+y+z, the two bound variables x and y are internally represented by 1 and 0, respectively. One advantage of the deBrujin representation is, that checking the -equivalence of two terms (only the names of bound variable are changed) can be performed in linear time.
However
Therefore, in HOL' a so-called name-carrying term representation was implemented. Here, also the bound variables are stored with their name and type. It has the advantage that many terms, such as circuit descriptions, can be handled in a far more e cient manner. However, there are also some basic logical operations such as -equivalence check that became less e cient. Furthermore, this approach also increases memory consumption due to the fact that bound variables are not any more represented by a single number but are represented by their name and type.
B. Introduction of More E cient Functions
The original HOL system only allows one singlereduction at a time. One can increase the e ciency of the scheduling transformation by performing severalreductions in a single term traversal step. By adding this conversion to the core, we improved the e ciency of the theorem prover. This advanced -conversion is equipped with a lter, i.e. a characteristic function, for selecting the bound variables that are to be expanded. This is useful for our advanced conversion expanding only those variables, that occur within the considered control step (see section III-E).
In the original HOL system, a paired -reduction with an n-tuple needs n term traversal steps. However, based on the advanced -conversion, paired -reduction can be performed within two traversals. For the implementation of the advanced paired -conversion we added a second conversion, that rst switches some paired -redex into an equivalent unpaired representation. Example:
( (x; y):t) (a; b) is turned to ( x:( y:t)) a b. In a second traversal, the advanced -conversion is applied at the locations indicated by the lter function. These two extra conversions are not only tailored to our application, but are of general interest for other users of the HOL system. However, it has to be noted, that enlarging the core is crucial to soundness and correctness, since faults within the implementations of the modi ed rules may violate the consistency of the calculus. Therefore, one has to be extra careful with the implementation and one should be very restrictive with the size of the core.
Figures 7 and 8 compare runtimes for scheduling conversions within HOL and HOL'. It shows, that both the simple and the advanced conversion runs extremely faster with HOL'. Again, due to nested -redices in the PD data ow graphs, the simple conversion ran into memory problems for larger data ow graphs.
// Insert gure 7 here // // Insert gure 8 here // V. The Formal Synthesis Scenario Applied to Scheduling Figure 9 demonstrates, how the scheduling step is performed by means of the small example in gure 2. Given a data ow graph, some scheduling heuristic (here: forcedirected 23]) is started. The heuristic returns a schedule table assigning each operation in the data ow graph to a control step. This heuristic step has nothing to do with logic. The scheduling table is then handled to the logical transformation, that maps the input data ow graph to the output data ow graph according to the schedule table. Additionally, a correctness theorem is derived.
// Insert gure 9 here // Figure 10 shows a HOL session performing this scheduling step. The HOL conversion SCHEDULING_CONV accomplishes the scheduling transformation according to the schedule which is determined by the scheduling heuristic. SCHEDULING_CONV gets the scheduling heuristic as a parameter. In this example, we applied the force-directed scheduling heuristic. Any other scheduling heuristic can be embedded as well.
// Insert gure 10 here // Our formal synthesis system HASH can be combined with every synthesis tool, provided that the synthesis tool can deliver the control information such as the schedule table to the HASH transformation. The circuit designer then can select the synthesis heuristic and HASH performs the transformation according to the result of the heuristic. This can even be done in the background, while the designer is busy with the next synthesis step.
The overall cost for performing a single formal synthesis step is the sum of performing the synthesis step conventionally by some heuristic and the cost for the logical transformation. Therefore, the ratio between these parts is quite interesting. Figures 11 and 12 show the runtimes for both design space exploration and transformational part of the formal synthesis step for the DCT data ow graph and the PD data ow graph, respectively. For determining the schedule table, we applied both the force-directed (FD) and the ALAP program. The circuit transformation was performed in HOL' (the modi ed HOL system) by applying the simple conversion for the DCT and the advanced conversion for the PD. // Insert gure 11 here // As can be seen, the runtime for sophisticated design space exploration techniques such as force-directed scheduling exceeds the runtime for the transformational part by far. Even for simple design space exploration techniques, the ratio between design space exploration and circuit // Insert gure 12 here // transformation seems reasonable. Additionally, it shows, that even large-sized data ow graphs can be synthesized with our method and that the runtime for the transformations depends on the structure of the data ow graph but is independent of the design space exploration technique involved.
Of course, there may be circuits and synthesis steps, where the extra-cost for formal synthesis is much higher than the cost for conventional synthesis. However, one has to be aware of the fact that the alternative to formal synthesis is a conventional synthesis with subsequent exhaustive simulation or post-synthesis veri cation, which is in general extremely costly.
In this paper, we have restricted ourselves to the scheduling problem. Nevertheless, also formal synthesis steps for binding, allocation and interface synthesis have been realized, but are not discussed here. Thus we are able to perform a complete formal high-level synthesis for data ow graphs.
VI. Conclusions
In this paper we have presented a technique to derive correct implementations in a secure manner. This technique is not restricted to a certain abstraction level (see 11] for performing formal synthesis at the RT-level).
For the scheduling task, we have illustrated the e ciency problem when performing the synthesis step by means of a logical transformation. It shows, that it is very important to be aware of the complexity of the basic logical transformation when constructing formal synthesis programs. On the other hand it may be worthwhile to modify the basic logical transformations themselves.
Experimental results have demonstrated that formalizing the synthesis process as a sequence of logical transformations may not be a signi cant performance factor. Our experiences with formal synthesis programs at the algorithmic level and the RT-level con rm us to claim that e cient formal synthesis implementations are applicable even for large sized circuits and that the extra-costs as compared to conventional synthesis are reasonable or at least less than the extra-costs for post-synthesis veri cation.
Christian Blumenr ohr studied electrical engineering at the Universit at Karlsruhe and received his diploma in 1995. Since 1995 he is a research assistant at the Institute for Computer Design and Fault Tolerance at the same university. There he rst was a member of the machine learning group. Since 1996 he is with the formal synthesis group and now works on a research project called formal 
