Implementation issues about the embedding of existing high level synthesis algorithms in HOL by Eisenbiegler, Dirk et al.
Implementation Issues about the Embedding of
Existing High Level Synthesis Algorithms in
HOL  
Dirk Eisenbiegler  Christian Blumenrohr  and Ramayya Kumar
  Institute for Circuit Design and Fault Tolerance Prof DrIng D Schmid
University of Karlsruhe Germany
 Forschungszentrum Informatik Prof DrIng D Schmid Karlsruhe Germany
email eisen	iraukade blumen	iraukade kumar	fzide
Abstract This article describes the embedding of high level synthesis
algorithms in HOL For given standard synthesis steps we describe how
its data can be mapped to terms in HOL and the synthesis process be ex
pressed by means of a logical derivation In contrast to postsynthesis ver
i
cation techniques our approach is constructive in a sense that the proof
is derived during synthesis rather than guessed afterwards Therefore
one does not get into the hardship of NPcompleteness or undecidability
Our approach ensures correctness based on the HOL system and is also
performed fully automatically
  Introduction
During the hardware design process of digital circuits more and more complex
tools are involved Due to their complexity guaranteeing the correctness of syn
thesis software is crucial Bugs in the software may lead to incorrect hardware
implementations
One approach towards proving the correctness of implementations is by post
synthesis verication An excellent overview of verication techniques is given in
Gupt	 Melh
 However full automation is only achievable for comparatively
small sized circuits at lower levels of abstraction For large sized circuits veri
cation algorithms either run into spacetime hurdles or the user has to interact
and perform some proofs by hand
Formal synthesis is another approach towards hardware correctness We con
sider formal synthesis as a derivation of the implementation from the specica
tion by logical renements
We are developing a formal synthesis toolbox called HASH Higher order
logic Applied to Synthesis of Hardware which exploits standard synthesis algo
rithms and is applicable to dierent abstraction levels It is based on the HOL
system ie hardware is represented by means of HOL terms and only rule appli
cations are used to transform hardware descriptions As opposed to conventional
  This work has been partly 
nanced by the Deutsche Forschungsgemeinschaft Project
SCHM 
synthesis tools where there is no restriction on how to compute the implemen
tation our approach can only produce correct hardware implementations The
reliability of our synthesis conversions only depends on the correctness of the
implementation of the HOL core and is independent from the complexity of the
conversions In this article we will present the high level synthesis component
of HASH
Other approaches in the area of formal synthesis are Lars AHL	 HaLD
John JoSh All these abovementioned techniques have one common draw
back namely they do not exploit the knowledge of the algorithms which abound
in synthesis Additionally the interactions to be performed during synthesis are
at the schematic level or from a logicians point of view The novelty of our cur
rent approach is that no new synthesis algorithms either formal or informal are
proposed but a general scheme for logically embedding various existing synthesis
algorithms within a formal setup is presented
The outline of this paper is as follows We will rst describe the high level
synthesis procedure in an informal manner section 	 Then the logical repre
sentations and the logical transformations corresponding to the synthesis process
are introduced in sections 
  and  Afterwards we will present some experi
mental results section  and nally discuss the embedding of existing high level
synthesis techniques section 
 Our High Level Synthesis Process
The starting point of our approach is a so called basic block Basic blocks are
data ow graphs describing the inputoutput relation by a composition of atomic
operations The timing of the atomic operations is static in a sense that they
can be executed in xed time see gure  The functional relation represents
a pure algorithmic description without any timing information
The result of high level synthesis is a structure at the RTlevel Our syn
thesis process consists of the following steps scheduling register allocation and
binding allocation and binding of functional units Interface synthesis will not
be considered in this paper
Our implementation  does not yet allow pipelining instead all hardware
resources functional units as well as registers will be reused during dierent
clock ticks of one evaluation period
Also the synthesis approach currently does not support any control ow For
more details on high level synthesis see GDWL CaWo
Scheduling
Scheduling determines the number of control steps k needed for the evaluation of
the algorithm and assigns each operation to one particular control step       k

















































































Fig  High Level Synthesis Process
see gure  There are mainly two costs that have to be considered the number
of control steps k and the hardware resources required for implementing the
operations During scheduling a trade o between the number of control steps
k speed of the implementation and the hardware requirements size of the
implementation has to be found
Mainly there are two kinds of scheduling algorithms ones with pregiven
hardware constraints for the operation units and others with pregiven timing
constraints However the implementation at the RTlevel not only consists of
operation units but also of communication units The cost for these units can
only be roughly estimated during the scheduling process There are also advanced
synthesis algorithms with their cost functions covering timing aspects as well
as dierent hardware constraints Such algorithms can be used to also handle
sophisticated synthesis tasks A schedule algorithm that is suitable for control
ow paths is eg pathbased scheduling Camp whereas PaKn introduces
a possible schedule technique named forcedirected only applicable to data ow
graphs
Register Allocation and Binding
The register allocation determines the number of registers needed for storing
intermediate results between two control steps The register binding determines
a mapping between registers and auxiliary variables intermediate results for
every control step
In case there is only one single data type for all auxiliary variables register
allocation becomes trivial The number of registers needed equals the maximum
number of auxiliary variables between two control steps In general there may be
auxiliary variables with dierent types Dierent sizes of registers will be needed
to store them This makes register allocation more complex
Register allocation and binding have an impact on the size needed for the
communication parts between function units and registers Good register bind
ings and allocations avoid additional hardware
Function Unit Allocation and Binding
In this step we construct a compound functional unit FU providing the operators
for implementing the operations of each control step allocation and we use
the compound functional unit FU to implement the operations of the data ow
graph binding The function units are assumed to be given in a library The
library describes the mapping between its components and the operations they
can perform There may be function units that are implementations of single
operations as well as multipurpose units with control input signals for selecting
dierent operations In our example the function units consists of a multiplier
implementing the  operation and a multipurpose unit implementing the  and
 operation where the operation is selected by a control signal having one of
the values  and  respectively Besides the functional aspects the library also
contains cost information such as area and power consumption
 Formal Representation of Data Flow Graphs
The eciency of software strongly depends on the underlying data structures
In synthesis tools suitable hardware representations have to be found This also
holds for our formal synthesis approach where hardware is represented by means
of HOL terms In our approach data ow graphs are represented as follows
x      xm
let houtvars i  op hinvars i in
let houtvarsi  ophinvarsi in

let houtvarsli  oplhinvarsli in
y      yn
The above structure describes its inputoutput function in terms of its basic
operations x  x     xm are the inputs y  y     yn the outputs and op 
op   opl the operations of the data ow graph letterms are only used for a
better readability of redices Each letterm describes the connectivity of one
operation For all i hinvarsii and houtvarsii denote the inputs and outputs of
operation opi respectively The inputs and outputs of operations are tuples
with each operation having a specic arity of its input and output tuple
Since these terms represent pure data ow graphs ie no cycles are present
a partial ordering on the set of nodes is induced This partial order corresponds
to the fact that some operation A must be executed before B if the output
of A happens to be an input to B This partially ordered data ow graph is
represented as an arbitrarily ordered list whereby the data dependency between
the nodes is respected
The following term gives an example for a data ow graph representation in
HOL The synthesis state in gure I is formally represented as follows
a b c
let p  a   b in
let s  b c in
let q  s c in
let r  p   q in
let t  p s in
let x  r  t in
let y  r   t in
x y
A constructor function named mkdfg and a destructor function destdfg have






operatorterm invarsterm list outvarsterm list list

mkdfg maps ML terms of type dfg to the corresponding HOL term destdfg
is the inverse function
During scheduling the function g is split into a concatenation of functions
g  g     gk with g  gk      g  g  and each function again represents a data
ow graph The synthesis states described in gures II and III are formally
represented as follows
hdfgki      hdfgi  hdfg i
During the allocation and binding of the function units a compound function
unit FU is introduced as an abbreviation This abbreviation is described by
means of a redex The synthesis state described in gure IV is represented
as follows
let FU  hdfgi in
hdfgki      hdfgi  hdfg i
end
In this representation each data ow graph hdfgii consists of a single FU oper
ator
 Transforming the Data Flow Graphs within HOL
This section describes how the synthesis process described in gure  is imple
mented as a conversion in HOL Our high level synthesis conversion is steered
by external control information the schedule the registerallocation table etc
In this section we will only describe the logical aspects of formally deriving the
synthesis result from the input data ow graph The computation of the control
information and invocation of the external heuristics will be discussed in section

The approach is based on a conversion for normalizing functions We will rst
describe this conversion and then describe how the synthesis steps are realized
using this conversion
Function Normalization
All HOL representations corresponding to gure  are nothing but simple compo
sitions of the same basic functions In principle normalizing such representations
is pretty simple The general algorithm looks as follows
 the original term g is converted to x  x    xmgx  x    xm by ap
plying a paired reduction in the inverse direction
	 the  operations are expanded by rewriting with the denition of  if there
are any and the function unit abbreviation is expanded provided there is
one

 reductions and paired reductions are performed wherever possible
In all cases the result looks as follows
x  x    xmvx  x     xm
In vx  x    xm there are no redices left and there is nothing but pure
function applications
A Universal Conversion
We will now introduce a simple conversion which is applicable to all synthesis
steps gure 	













Fig  Universal Conversion for all Synthesis Steps
 The HOL term representation t is switched to its ML representation z This
is performed by applying some destfunction which is based on destdfg
see section 

	 For the next step some external control information s schedule register allo
cation table etc is required which is produced by some arbitrary heuristic
According to s z is then mapped to some new ML data structure z  corre
sponding to the result of the synthesis step under consideration Step 	 is
performed completely outside the logic

 The data structure z  is translated back to its HOL representation t  This
is performed by applying some mkfunction which is based on mkdfg see
section 

 Both t and t  are normalized by means of applying a normalization conver
sion The results should be the same  t  t and  t   t
 The equations  t  t and  t   t are combined to  t  t  symmetry and
transitivity of equivalence
The major drawback of this universal conversion is the complexity of step 
when dealing with dfgs with a big depth ie maximum number of operations on
a path from some input to some output Data ow graphs whose intermediate
nodes have larger fanouts ie the output of a node is used by many successor
nodes as inputs lead to a number of duplications during reduction Since such
redices can be nested the term size and time consumption in step  may grow
exponentially with the depth
The universal conversion not only works for single synthesis step but it
is also possible to combine several of our synthesis steps within step 	 of the
conversion Applying the universal conversion mechanism to the entire synthesis
process reduces the time consumption since step  has to be performed only once
rather than thrice scheduling register allocation  binding and FU allocation
 binding
 An Advanced Conversion
The universal conversion is comparable to postsynthesis verication and does
not exploit any knowledge about how the synthesis step was performed In this
section we will describe an advanced conversion where synthesis is performed
by a sequence of conversions which are optimized for a specic synthesis step
Thereby one can exploit the knowledge corresponding to this specic synthesis
step In principle each of these conversions is similar to the universal conversion
except that steps 	 and  are tuned towards a specic synthesis transforma
tion Although the advanced conversion is performed in several small parts and
therefore the technique described in section  has to be applied more often the
overall cost is reduced due to the remarkably lower cost for step  within each
part
The Scheduling Conversion
The idea of our scheduling conversion is to split the data ow graph step by
step rather than doing it all at once as in the universal synthesis conversion
reduction is only applied to those variables whose corresponding nodes have
been assigned to the current control step Although some redices will remain
the terms achieved after normalization will be equal
Other than in the universal synthesis conversion k   conversions k 
number of control steps have to be applied successively rather than applying
one single conversion Hence the exponential complexity associated with step 
is avoided
Figure 
 shows a HOL session performing the scheduling step applied to
the example of gure  The HOL conversion SCHEDULINGCONV accomplishes
the scheduling transformation according to the schedule which is determined
by the scheduling heuristic SCHEDULINGCONV gets the scheduling heuristic as a
parameter In this example we applied the forcedirected scheduling heuristic
Any other scheduling heuristic can be embedded as well see section  For sake
of readability we used letexpressions rather than redices EXPANDLETSCONV
and ABBREVIATELETSCONV have been applied to convert letexpressions to 







let p 	 a
b in
let s 	 bc in
let q 	 sc in
let r 	 p
q in
let t 	 p s in
let x 	 rt in




	 	 	 	 	 	 	 	 	 	 	 	 	 	 val it 	
  abc
let p 	 a 
 b
in
let s 	 b  c
in
let q 	 s  c
in
let r 	 p 
 q
in
let t 	 p   s in let x 	 r  t in let y 	 r 
 t in xy 	
rt let x 	 r  t in let y 	 r 
 t in xy o
pqs let r 	 p 
 q in let t 	 p   s in rt o
abcs let p 	 a 
 b in let q 	 s  c in pqs o
abc let s 	 b  c in abcs  thm
 
Fig  HOL session performing a scheduling step
The Register Allocation and Binding Conversion
Register allocation and binding have one thing in common they only have an
eect on the interfaces between the slices In our register allocation and binding
conversion the interfaces are changed step by step rather than all at once The
interfaces between hdfgii and hdfgi i are changed by applying the universal
synthesis conversion to hdfgii  hdfgi i Therefore in each step our universal
synthesis conversion only has to be applied to a small subterm  the rest of the
term remains unchanged Again k  applications are needed to do the job but
it pays out since the data ow graph considered is signicantly smaller
To be able to apply the interface changing conversion to all subterms hdfgii
hdfgi i the associative law of the operation has to applied The number of
the associative law rule applications needed in our implementation is 	k  	
The Function Allocation and Binding Conversion
Function allocation and binding only convert slices to equivalent ones and the FU
abbreviation is performed Therefore besides expanding the FU abbreviation
one can apply our general synthesis conversion scheme to each slice separately
k steps are needed rather than one but again the data ow graphs considered
have a smaller depth
 Experimental Results
We used a scalable data ow graph as a benchmark It realizes the division of




















The coecients i and 	i should be computed To facilitate the calculation
we assume that the divisor is normalized with respect to p After a few algebraic
transformations we get the following two formulas for the demanded coecients
i  ip 
minfipqgX
ki 
ipk  k i      q
	j  j 
minfjqgX
k
jk  k j      p 
Using these formulas the data ow graph can be realized very quickly To
illustrate the underlying structure a data ow graph with p  
 and q   is
shown in gure 
The data ow graph consists of p  q subtractors pq   multipliers and
qp   adders so there is a total of 	pq  	p nodes The critical path has a
length of 
q	 nodes In simplied terms q controls the depth of the data ow
graph whereas p determines the width
We applied both the simple conversion presented in section  as well as the
advanced conversion described in section  The runtimes for the conversions
 All experiments have been run on a SUN ULTRA SPARC with MB
β0 α 7
*










































































Fig  A data ow graph with p and q
time s







Fig 	 Comparison simpleadvanced conversion p   q      
are displayed in gure  It shows that it pays out to interleave synthesis and
logical derivation thereby exploiting the knowledge on how the implementation
was derived ie which synthesis steps have been applied and how they have been
performed The idea behind the technique of the simple conversion is pretty close
to what one could do when performing postsynthesis verication As can be seen
in gure  some intermediate results     q are used more often which leads
to an exponentially growth of redices in the universal conversion as shown in
time s








q           	
p   
q           	 
p  
q     
     
Fig 
 Advanced conversion applied to DFGs with dierent p and q
section  During the conversion this results in an exponential consumption of
both time and data storage Therefore the simple conversion is applicable only
to very small sized circuits In our example the execution failed for bigger data
ow graphs due to a lack of memory The advanced conversion however did not
run into space hurdles and could therefore also be applied to considerably bigger
data ow graphs see gure 
 Embedding Existing High Level Synthesis Algorithms
The conversions described in the sections  and  are our basis for implement
ing synthesis tools in HOL They are controlled by parameters telling them
how to perform the synthesis step the schedule the mapping between registers
and variables etc Arbitrary heuristics can be invoked to compute this control
information
The heuristics invoked in section  have all been very primitive For schedul
ing a simple ASAP algorithm was used Since the operands and results in all
operations are of the same logical type register allocation became trivial The
register binding was generated randomly  optimization aspects were not con
sidered
However we also invoked more sophisticated synthesis heuristics Table 
shows dierent schedules achieved by dierent scheduling techniques The sched
ules describe how the nodes as numbered in the DFG in gure  are mapped
to control steps There are mainly two optimization goals for these algorithms
the number of control steps required and the number of operation units needed
for the implementation
In general implementations with a big number of control steps can be re
alized with a small number of operation units whereas being restricted to a
small number of control steps leads to a big number of operation units There
are mainly two kinds of scheduling algorithms ones with hardware constraints
and others with timing constraints For a given restriction on the number of
operation units scheduling algorithms with hardware constraints try to nd a
schedule with a minimal number of control steps Scheduling algorithms with
timing constraints are the other way around for a given limitation on the num
ber of control steps the algorithm tries to nd a schedule with a minimal number
of hardware requirements
The ASAPALAP algorithm as soonlate as possible assigns the nodes to
the earliestlatest control step according to the restrictions given by the data
dependencies The force directed heuristic PaKn tries to minimize the hard
ware by distributing it uniformly over the control steps The heuristic is modeled
after the calculation of the equilibrium for a set of springs and weights which
obey the Hookes law The ASAP the ALAP and the force directed scheduling
algorithm do not place any restriction on the hardware and produce a sched
ule with a minimal number of controlsteps The static list scheduling heuristic
JMSW has a given restriction on the hardware consumption and tries to min
imize the number of control steps needed according to a precalculated priority
list
In our example the ASAP produced a schedule with a total of  operation
units 
 multipliers 	 adders and 	 subtractors the result of the ALAP required
 operation units 
 multipliers 	 adders and 
 subtractors and the force
directed algorithm required  operation units 	 multipliers 	 adders and 

subtractors For the list scheduling algorithm the number of multipliers was
limited to  the number of subtractors was limited to 	 and the number of adders
was also limited to 	 However it required two extra control steps compared to
the other techniques According to our experiments the time for the logical
transformation is independent from the synthesis algorithm invoked s for
the ASAP 	s for the ALAP s for the forcedirected and s for the list
scheduling algorithm
In our approach a synthesis step can be divided into two parts computa
tion of the control information and execution of the transformation within the
logic gure  Two important points are met independently with this strategy
quality and correctness of the implementation The quality only depends on the
algorithm that calculates the control information whereas the correctness aspect
is guaranteed due to the transformation being based on the HOL system
Since the entire synthesis process is nothing but a HOL conversion correct
ness is guaranteed implicitly Faulty implementations cannot be achieved even
if the control information produced by the external program is awed such as a
schedule where the data dependencies are disregarded In such cases the trans
formation cannot be performed within the logic and an exception is raised In
conventional synthesis programs such bugs could lead to faulty implementa
tions Our formal synthesis program either leads to correct implementations or
to no implementation but an exception In case of an exception an information
is produced telling the user in which synthesis step the error occurred
The optimization tasks corresponding to high level synthesis steps are very
complex and mutually depend on one another Thus heuristics have to be in
volved The major advantage of our approach is that we can exploit the existing
Heuristics
CStep ASAP ALAP ForceDirected List Scheduling
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    







Fig  The concept of our high level synthesis process
techniques Our synthesis conversions oer the interface for embedding arbitrary
conventional high level synthesis algorithms dedicated to the corresponding syn
thesis task This has the eect that  in contrast to most formal synthesis
approaches  we do not have to invent new synthesis algorithms
Although the conversions described in section  have to be performed in the
given order there is no restriction on how to compute the corresponding control
information It is possible to determine it step by step as sketched in the left
side of gure  and one can as well determine it all at once as in the right side
of gure  What really matters is that the control information is delivered to
the conversions in the given order  the order in which they are computed
is ambiguous Therefore it is possible to embed arbitrary external synthesis
algorithms This aspect is of big importance since there is no limit as to the



























Fig  Possibles schemes for the using of our synthesis conversions
 Conclusion
We have described how high level synthesis can be performed by a sequence of
logical transformations The novelty of our approach lies in the exploitation of the
existing knowledge in synthesis in a logically correct manner As in conventional
synthesis programs nding suitable hardware representations and corresponding
algorithms is essential for the eciency We have shown that it is possible to
map algorithms and data of standard synthesis tools to logical conversions and
representations in HOL
Due to the expressiveness of HOL general verication is an exacting goal
In our approach however the proof is constructed rather than !guessed" as in
postsynthesis verication Since our approach does not lead to NPcomplete or
undecidable problems we believe that formal synthesis is a well suited applica
tion for the HOL system
In our recent work it turns out that also in other abstraction levels of
hardware design formal synthesis can be a good alternative to the classical
synthesispostsynthesis verication approach EiKu It is our intention to
provide a formal synthesis toolbox called HASH containing formally based syn
thesis steps that cover the entire synthesis from the algorithmic level down to
the logical level
For the hardware designer there is no dierence between using synthesis
tools based on HASH and conventional synthesis tools However formal syn
thesis guarantees correctness implicitly This style of formal synthesis will be
acceptable to most users since they can proceed with their designs in a cus
tomary manner and yet have correctness without getting into the hardship of
logic
References
AHL AHL Lambda Reference Manual 
Camp R Camposano Pathbased scheduling for synthesis IEEE Transactions
on Computer Aided Design  January 
CaWo R Camposano and W Wolf HighLevel VLSI Synthesis Kluwer Boston

EiKu D Eisenbiegler and R Kumar An automata theory dedicated towards for
mal circuit synthesis In Higher Order Logic Theorem Proving and Its Ap
plications Aspen Grove Utah USA September  Springer
GDWL D Gajski N Dutt A Wu and S Lin HighLevel Synthesis Introduction
to Chip and System Design Kluwer Academic Publishers 
Gupt A Gupta Formal hardware veri
cation Formal Methods in System De
sign  
HaLD FK Hanna M Longley and N Daeche Formal synthesis of digital sys
tems In IMECIFIP Workshop on Applied Formal Methods for Correct
VLSI Design pages  LeuvenBelgium  Elsevier Science Pub
lishers BV
JMSW R Jain A Mujumdar A Sharma and H Wang Empirical evaluation of
some highlevel synthesis scheduling heuristics In DAC  pages 

John S Johnson Synthesis of Digital Designs from Recursion Equations MIT
Press 
JoSh G Jones and M Sheeran Circuit design in Ruby In J Staunstrup editor
Formal Methods for VLSI Design pages  NorthHolland 
Lars M Larsson An engineering approach to formal system design In
Thomas F Melham and Juanito Camilleri editors Higher Order Logic The
orem Proving and Its Applications pages  Valetta Malta Septem
ber  Springer
Melh T Melham Higher Order Logic and Hardware Verication Cambridge
University Press 
PaKn Pierre G Paulin and John P Knight Forcedirected scheduling for the be
havioral synthesis of asics IEEE Transactions on Computer Aided Design
 June 
This article was processed using the LATEX macro package with LLNCS style
