Applicability of formal synthesis illustrated via scheduling by Blumenroehr, Christian et al.
Applicability of Formal Synthesis Illustrated via
Scheduling
 
Christian Blumenrohr Dirk Eisenbiegler
Institute for Circuit Design and Fault Tolerance
Prof DrIng D Schmid
University of Karlsruhe Germany
fblumeneisengiraukade
Ramayya Kumar
Forschungszentrum Informatik
Prof DrIng D Schmid
Karlsruhe Germany
kumarfzide
http		goetheiraukade	fsynth	
Abstract
This paper describes a novel technique for formal synthesis and exempli
es the main
ideas using the high level synthesis task  scheduling The novelty of the approach
is based on the fact that arbitrary scheduling algorithms can be embedded within a
formal framework to automatically achieve guaranteed correct implementations Two
realistic examples are used to emphasize its applicability and it can be seen that the
additional costs for formal synthesis are almost negligible in practice We achieve the
same quality for the implementations as conventional synthesis plus the proof of their
correctness
  Introduction
Although high level synthesis is based on a sequence of algorithms which conform to the
correctness by construction paradigm its implementation may be errorprone This is due
to the complexity of the programs
 
which implement these algorithms
One approach towards proving the correctness of implementations is by postsynthesis
verication An excellent overview of verication techniques is given in Gupt	
 Melh	
One of the important correctness criterions is to show that the implementation implies the
specication Two of the most important reasons for the complexity of these proofs are
 the existence of the major gap between the abstraction levels of the specication and
the implementation and

 the obliviousness of the information used in rening a specication into an implemen
tation
Therefore full automation can only be achieved for comparatively small sized circuits at lower
levels of abstraction For large sized circuits hardware verication specialists are mandatory
They have to either provide appropriate structuring and abstraction of the proofs while using
automatable logics or perform logical interactions with the underlying theorem prover while
using complex logics
Formal synthesis is a complementary approach to hardware verication since formal aver
ment is an integral part of the synthesis process However it is a specialized technique which
is only tailored towards the proof of synthesized implementations Verication is neverthe
less needed for validating specications which can be achieved by checking properties such as
safety and liveness
We are developing a formal synthesis toolbox called HASH Higher order logic Applied
to Synthesis of Hardware which is applicable to dierent abstraction levels It contains
one universal transformation per synthesis step eg scheduling allocation retiming state
minimization etc Each transformation is guided by the results of corresponding standard
synthesis algorithms that abound in literature TLWN	 GDWL	 Hence no new synthesis
 
This work has been partly nanced by the Deutsche Forschungsgemeinschaft Project SCHM 	

 
The programs implementing the synthesis algorithms are mostly imperative in nature and the correctness
of large imperative programs is nearly impossible to prove

algorithms either formal or informal are proposed rather a general scheme for logically
embedding various existing synthesis algorithms within a formal setup is presented EiKu	b
In contrast to conventional synthesis approaches only correct hardware implementations can
be produced or no implementation is derived when the results of the synthesis algorithms are
faulty The quality with respect to costs of the fully automatically generated implementations
is dictated only by the conventional synthesis algorithms The implementations therefore have
a higher quality than those of conventional synthesis from an overall perspective  since they
are proven to be correct This concept will be elaborated with respect to the scheduling task
in the sections to follow
There are also other approaches in the formal synthesis domain An overview is given in
KBES	 But all other techniques do not exploit the results of the sophisticated algorithms
which abound in synthesis FoMa	 HaLD	 JoBB ShRa	 Therefore the quality
of their implementations is normally worse than that of conventional synthesis algorithms
In contrast to HASH which supports fully automated synthesis all other approaches need
interaction either at the schematic level or from a logicians point of view
The major contributions of this paper are twofold
 formal synthesis within HASH is applicable to realistic circuits and

 the additional costs for formal synthesis are reasonable
The abovementioned contributions are exemplied via the scheduling task in highlevel syn
thesis
The outline of this paper is as follows in the next section we briey introduce our
approach and dene the notations and scope of our work In section  we will show the
results with two realistic examples and section  concludes the paper
 Our Formal Synthesis Approach towards Scheduling
In this paper we concentrate on the transformation in HASH for performing the scheduling
task within highlevel synthesis Highlevel synthesis converts an algorithmic description of
the circuit into a structure at the RegisterTransfer RT level The major steps in highlevel
synthesis are scheduling allocation of storage functional and interconnection units binding
the allocated hardware onto some library components and interface synthesis
The scheduling task assigns a control step cstep to each operation in the algorith
mic specication There exist various heuristic algorithms for solving this task CaWo	
GDWL	 A large number of them start from data ow graphs that correspond to the ba
sic blocks in the algorithmic description Although certain scheduling algorithms start from
controldata ow graphs we shall restrict ourselves to pure data ow graphs in this paper
The underlying idea behind the scheduling transformation in HASH is illustrated in gure
 Given a data ow graph some scheduling heuristic is started This heuristic step has
nothing to do with logic The heuristic returns a scheduling table which maps each opera
tion in the data ow graph onto a cstep This scheduling table is now used by the formal
logical transformation in HASH to produce a scheduled data ow graph The split between
design space exploration ie dierent schedule tables for dierent heuristics and the logical
transformation is the core idea in HASH This core idea is applicable to most of the synthesis
steps eg allocationno of resources available retimingsplit in the combinational logic etc
schedule
table
scheduling heuristic
(ASAP, ALAP,
force-directed, ...)
scheduled data flow graph
and theorem
transformation
scheduling
data flow graph
Figure  The concept of HASH as applied to scheduling
All the logical transformations in HASH have been implemented within the HOL theorem
prover GoMe	 Each transformation takes the current design state and the result of some
synthesis heuristic and returns the new design state along with the correctness theorem
stating that the old design state is equivalent to or implied by the new design state
Returning to the scheduling task the formalization of the current design state ie the data
ow graph is achieved by using  expressions Davi	 The data ow graphs are represented
as follows
 x
 
     x
m

let houtvars
 
i  op
 
hinvars
 
i in
let houtvars

i  op

hinvars

i in



let houtvars
l
i  op
l
hinvars
l
i in
y
 
     y
n

The above structure describes the inputoutput function in terms of the basic operations
in the data ow graph x
 
 x

     x
m
are the inputs y
 
 y

     y
n
the outputs and op
 

op

   op
l
the operations of the data ow graph Each letterm describes the connectivity of
one operation For all i hinvars
i
i and houtvars
i
i denote the inputs and outputs of operation
op
i
 respectively The inputs and outputs of operations are tuples with each operation having
the specic arity of its input and output tuple This formal representation is however not
unique since the ordering of the operations is ambiguous Nevertheless the data dependencies
between the operations must be respected
The scheduling transformation in HASH takes the formalized data ow graph g and the
schedule table and produces g
 
which is a composition of functions g
 
 g

     g
k
such that
g
 
 g
k
     g

 g
 
and k is the number of csteps Each g
i
i       k represents a
slice in the original data ow graph g and corresponds to those operations that are executed
in the i
th
cstep Additionally the transformation produces the correctness proof stating
the equivalence between g and g
 
 If the heuristic produces a false result eg a schedule
table where the data dependencies are violated or some operations are unscheduled then
the transformation fails and returns some constructive feedback to the user which reects the
cause of the failure
In gure 
 a simple example is shown which illustrates the invocation of the schedul
ing transformation in HASH In this example a wellknown heuristic called forcedirected
scheduling has been applied PaKn	 For better readability the data ow graphs are shown
in a schematic manner and not by their formal representation If in this example the heuristic
schedules operation  before operation 
 an exception will be raised during the transforma
tion giving the constructive feedback that g
 
gure 
 cannot be built with this schedule
table
It is also possible to combine several synthesis steps into one complex step Then the cor
responding logical transformations have to be performed one after another The cost for this
complex logical transformation is just the sum of the costs of the individual transformations
see EiBK	 for more details about the transformations
 Experimental Results
In this section we demonstrate that our formal synthesis scenario works with realistic ex
amples We therefore consider two scalable data ow graphs and compare the runtimes for
calculating the schedule using various algorithms with the runtimes for the transformations
which produce a correct implementation We cannot compare our work with any other veri
cation results since to our knowledge no one has formally veried the scheduling task
The scheduling algorithms we applied are ASAP As Soon As Possible ALAP As Late
As Possible listscheduling and two versions of forcedirected scheduling withoutwith look
ahead
ASAP ALAP and the two versions of forcedirected scheduling do not enforce any con
straints on the number of resources used However they always produce the shortest possible
schedule Listscheduling on the other hand works with a constrained number of resources
but produces a schedule which is usually slower than those of the former approaches The
main idea behind the forcedirected heuristic is to use the slack between the ASAP and ALAP
schedules so as to distribute the operations in a better manner so that the resource utilization
is also minimized in addition to the number of csteps PaKn	
g’ = g
*
+
-
+
*
+
*
g2 g3 g4
c-step operations
schedule table
1
2
3
4
1,4
3,5
6,7
2
*
+
-
+
*
+
*
g1
scheduling transformation
in HASH
and the theorem
a
b
c
1
2
3
4
5 6
7
x
y
g’
a
b
c
1
2
3
4
5 6
7
x
y
g
heuristic
force-directed
Figure 
 A simple example for the scheduling transformation in HASH
  Division of two Polynomials
As a rst example we used a scalable data ow graph which realizes the division of two
polynomials with the given coecients 
i
and 
i

pq
P
i

i
x
i
p
P
i

i
x
i

q
X
i

i
x
i

p 
P
i

i
x
i
p
P
i

i
x
i
The coecients 
i
and 
i
should be computed To facilitate the calculation we assume
that the divisor is normalized with respect to 
p
 After a few algebraic transformations we
get the following two formulas for the demanded coecients

i
 
ip

minfipqg
X
ki 

ipk
 
k
i      q

j
 
j

minfjqg
X
k

jk
 
k
j      p 
Using these formulas the data ow graph can be realized very quickly To illustrate the
underlying structure a data ow graph with p   and q   is shown in gure 
The data ow graph consists of p q subtractors pq multipliers and qp  adders
so there is a total of 
pq  
p nodes The critical path has a length of q  
 nodes
The runtimes

for the heuristics are shown in gure  The parameter p was always
set to 
 and q was set to  	  
  FD and FD
 correspond the two versions of the
forcedirected algorithm and LS stands for list scheduling
Irrespective of the variations in q ASAP always needed 
 adders 
 multipliers and 

subtractors ALAP always required 
 adders and 
 subtractors but the number of multipli
ers varied between 
 and  The two versions of the forcedirected algorithm delivered either

 adders 
 multipliers and 
 subtractors or 
 adders 
 multipliers and 
 subtractors

All experiments have been run a SUN ULTRA CREATOR with 	MB

β0 α 7
*
α 6 α 5 α 4 α 3 α 2
γ3 γ2 γ1 γ0 δ2
-
-
*
*
*
*
*
*
-
-
-
*
*
*
*
*
*
* +
+
+
+
+
+
-
α 1
δ1
-
α 0
δ0
+
+
ββ1 2
*
γ4
Figure  A data ow graph with p and q
Heuristics
Nodes ASAP ALAP FD	 FD LS
	 
	 
	 
 
 
    	
 
 
 
 
 	
    
 
 
 	
 	
 
    
	 
 
 
 
 
    
	 
 
 		
 
 
  	  	
Figure  Time for the heuristics
Although forcedirected scheduling is a complicated algorithm which usually requires a lesser
number of resources than ASAP or ALAP it does not perform better in this example This is
because there is no better schedule if the number of csteps are minimized On closer exami
nation one can detect that one always needs p  adders and either p  multipliers and p
subtractors or viceversa cf from gure  The listscheduling algorithm was restricted to
 adders  multipliers and  subtractors The number of resulting csteps is shown as sum
of the csteps for unconstrained scheduling and the additional csteps for listscheduling
In gure  the runtimes for the transformations after the heuristics can be seen The
most interesting fact is that the runtime for the forcedirected heuristic grows exponentially
whereas the runtime for its transformation does not instead it grows in a polynomial fash
ion Furthermore the transformation is even faster than the heuristic for higher number of
nodes and the intersection lies at about  nodes So it can be seen that the additional
costs for formal synthesis can be negligible for large data ow graphs when compared with
sophisticated heuristics Additionally it turns out that the runtime for the transformation is
almost independent of the heuristic used The only thing that matters is how the heuristic
distributed the nodes in the csteps not how long it took for that
Heuristics
Nodes ASAP ALAP FD	 FD LS
	 	
	 
 		
 		
 	

 
 
 
 
 

 
 
 	
 	
 		

	 	
 	
 
 	
 

	 
 
 
 
 

Figure  Time for the transformations
  Discrete Cosine Transform DCT
Another scalable data ow graph is realized in our second example It calculates the discrete
cosine transform which is popularly used for image compression The DCT of an image with
pixels xnm is dened by
Xu v 


p
N M
 cu  cv 
N 
X
n
M 
X
m
xnm  cos
  u

N
 
n   cos
  v

M
 
m 
with
cu cv 

 
p

 u v  
  otherwise
In most cases N  M   is used The data ow graphs are built as follows The N M
pixels of the image are used as inputs Furthermore in order to ease the data ow graph
the cosine  terms are considered as additional inputs due to the complexity of the cosine 
operation In order to minimize the number of these additional inputs one can exploit the
periodicity of the cosine function So the arguments can be restricted to the interval   A
restriction to the interval 


 would also be possible but then additional inverters will be
necessary If N  M  the following formula for the additional inputs due to cosine functions
can be given as
fN 







  N  
  N  
N  f
N

  N  
    

N    N      
If N  M  a formula cannot be given in a general manner An additional reduction could
be achieved if cos


 would be omitted but then the data ow graph could not be built in
a regular manner anymore
Due to the denition of the DCT there are still two factors to consider

p
N M
and
 
p


The latter can be regarded as cos


 So if N is even this coecient is already introduced
as input All in all one has N

   N mod 
  fN inputs for the data ow graph if
N  M  The number of outputs is N

N M  if N  M 
To achieve a compact representation of the data ow graph as many intermediate results
as possible were reused This leads to a total number of 
N

N

N N

MNM



M
M  additions and 
N

N 
 N

MNM

M
 multiplications
So there is a total of N

N

 
N 
 
N

M N
M

 M  M 
 nodes
The length of the critical path is 
N   N M  
To give a better idea of the structure the data ow graph for N  M  
 is shown in
gure 
+ *
*
+
+
* +
+
*
*
+ +* +*
* **
cos 1/4 π
* *
+* *
*
cos 3/4π
*
x(1,1)x(0,1)x(1,0)
+
x(0,0)
X(1,1) X(0,1) X(1,0)
1
X(0,0)
Figure  A data ow graph with N
 M

In gure  the runtimes and required resources for the dierent heuristics are displayed
It should be noted that in this example the number of resources required for forcedirected
scheduling is always better than that of ASAP or ALAP For the listscheduling algorithm we
restricted the number of resources used to  adders and  multipliers The number of resulting
csteps is shown as sum of the csteps for unconstrained scheduling and the additional csteps
for listscheduling
Heuristics
Nodes ASAP ALAP FD  FD LS
Time    Time    Time    Time    Time csteps
	 
    
  	  
   
   
  
 
     
    	 
    
	    
   
 
   
     
    
    
  
	 
	     
  	 
   	
   
    
  
     
	    
  	 
  	 
    
    	
  	   
     
 		 	 	
 		 	   
    
   
   
     	   
	      	
    
     
Figure  Time and resources for the heuristics
We investigated  data ow graphs by setting N  M and varying their numbers from 

to  One can see that the forcedirected heuristic does not have an exponential behaviour
as in the previous example This can be explained by a closer look at the data ow graphs If
we compare eg the DCT with N  M   and the polynomial division with p  
 q  	
which have both nearly  nodes one can see that  of the nodes in the DCT are placed
immediately since there is no dierence between ASAP and ALAP cf brief description of
forcedirected scheduling in the introduction to section  In the polynomial division only
	 are placed Furthermore the average movability of the remaining nodes is 

 for the
DCT and  for the polynomial division The maximal movability for the DCT is  and for
the polynomial division it is  So it can be concluded that the operations in the division
have more choices and the scheduling algorithm takes much longer
In gure  the runtimes of the scheduling transformation for the dierent heuristics are
shown The conversions for ALAP FD and FD
 are of the same magnitude A special case
is the transformation for the ASAP algorithm Due to the nature of the data ow graph
many operations can be scheduled in the rst csteps by the ASAP which can also be seen
from the extremely high number of required resources in gure  This special constellation is
very disadvantageous for the transformation algorithm The transformation of the data ow
graph with 	 nodes was not possible due to space problems But in most cases especially
when ingenious algorithms are used the operations are better distributed in the schedule
Generally one can see again that the runtime for the transformation is fairly independent
from the heuristic if the number of csteps is equal For listscheduling the transformation
takes longer due to the larger number of csteps required
 Conclusions and Future Work
We have shown that formal synthesis is not simply an academic dream but can also be applied
to realistic circuits Additionally the costs for formal synthesis are acceptable and are almost
independent from the heuristics involved In certain cases the design space exploration part
can take much longer than performing the actual logical transformation which in turn not
only yields an implementation but also the proof of its correctness
The novelty of HASH rests on the fact that in contrast to postsynthesis verication or
other approaches for formal synthesis we exploit the abundance of knowledge within the
synthesis domain The quality of the synthesis results produced in terms of area timing and
power are the same as that of conventional approaches However the correctness proof is an
added quality Yet another plus point in HASH is that although a theoremprover is used in
the background the entire procedure is automatic and no formal background is required on
the part of the designer
The major consequences that can be drawn from this work are that immense amounts of
simulationverication time can be saved and hence verication can be restricted to property
checking The time required for formal synthesis can be reduced even further if the trans
formations are run either in the background or as a batchprocess while the circuit designer
concentrates on his job ! the task of design exploration
We have just discovered the tip of the iceberg and we still have a long way to go In
the future we shall concentrate on nding transformations for controlow based scheduling
algorithms chaining of operations pipelining memory mapping etc We shall also provide
Heuristics
Nodes ASAP ALAP FD	 FD LS
 
 
 
 
 

 
 
 
 
 

 	
 	
 	
 	
 

 	
 
 
 
 	

	 
 
 
 
 

			 	
 
 
 
 

	  		
 
 	
	 	

Figure  Time for the transformations
links between the dierent levels of abstractions for the design of hardware see EiKu	 for
application of HASH at RTlevel
References
CaWo	 R
 Camposano and W
 Wolf
 HighLevel VLSI Synthesis
 Kluwer Boston 		

Davi R
 E
 Davis
 Truth Deduction and Computation Logic and Semantics for Computer Science

Computer Science Press New York 	 edition 	

EiBK D
 Eisenbiegler C
 Blumenrohr and R
 Kumar
 Implementation issues about the embedding
of existing high level synthesis algorithms in HOL
 In Joakim von Wright Jim Grundy and
John Harrison editors Theorem Proving in Higher Order Logicsth International Conference
TPHOLs number 		 in Lecture Notes in Computer Science pages 		 TurkuFinland
August 	
 SpringerVerlag

EiKu D
 Eisenbiegler and R
 Kumar
 An automata theory dedicated towards formal circuit synthesis

In E
T
 Schubert P
J
 Windley and J
 AlvesFoss editors th International Workshop on Higher
Order Logic Theorem Proving and its Applications number 	 in Lecture Notes in Computer
Science pages 		 Aspen Grove Utah USA September 	
 SpringerVerlag

EiKub D
 Eisenbiegler and R
 Kumar
 Formally embedding existing high level synthesis algorithms
 In
Paolo E
 Camurati and Hans Eveking editors Correct Hardware Design and Veri	cation Meth
ods number  in Lecture Notes in Computer Science pages 	 FrankfurtMain Germany
October 	
 IFIP WG	
 Advanced Research Working Conference SpringerVerlag

FoMa M
P
 Fourman and E
M
 Mayger
 Formally Based System Design  Interactive hardware scheduling

In G
 Musgrave and U
 Lauther editors Very Large Scale Integration pages 				 Munich
Federal Republic of Germany August 	
 IFIP TC 	WG	
 International Conference North
Holland

GDWL D
 Gajski N
 Dutt A
 Wu and S
 Lin
 HighLevel Synthesis Introduction to Chip and System
Design
 Kluwer Academic Publishers 	

GoMe M
J
C
 Gordon and T
F
 Melham
 Introduction to HOL A Theorem Proving Environment for
Higher Order Logic
 Cambridge University Press 	

Gupt A
 Gupta
 Formal hardware verication methods A survey
 Formal Methods in System Design
			 	

HaLD F
K
 Hanna M
 Longley and N
 Daeche
 Formal synthesis of digital systems
 In Luc J
 M
 Claesen
editor Applied Formal Methods For Correct VLSI Design volume  pages 
 IMECIFIP
Elsevier Science Publishers 	

JoBB S
D
 Johnson B
 Bose and C
D
 Boyer
 A tactical framework for digital design
 In G
 Birtwistle
and P
 Subrahmanyam editors VLSI Speci	cation Veri	cation and Synthesis pages 
Boston 	
 Kluwer Academic Publishers

KBES R
 Kumar C
 Blumenrohr D
 Eisenbiegler and D
 Schmid
 Formal synthesis in circuit design  A
classication and survey
 In Formal Methods in ComputerAided Design FMCAD Palo Alto
USA 	

Melh T
 Melham
 Higher Order Logic and Hardware Veri	cation
 Cambridge University Press 	

PaKn P
 G
 Paulin and J
 P
 Knight
 Forcedirected scheduling for the behavioral synthesis of ASICs

IEEE Transactions on Computer Aided Design 	 June 	

ShRa R
 Sharp and O
 Rasmussen
 The TRuby design system
 In CHDL
 pages  	

TLWN D
E
 Thomas E
D
 Langnese R
A
 Walker J
A
 Nestor J
V
 Rajan and R
L
 Blackburn
 Al
gorithmic and RegisterTransfer Level Synthesis The System Architects Workbench
 Kluwer
Academic Publishers 	

