Multiple objective optimisation of data and control paths in a behavioural silicon compiler by Baker, Keith Richard
University of Southampton Research Repository
ePrints Soton
Copyright © and Moral Rights for this thesis are retained by the author and/or other 
copyright owners. A copy can be downloaded for personal non-commercial 
research or study, without prior permission or charge. This thesis cannot be 
reproduced or quoted extensively from without first obtaining permission in writing 
from the copyright holder/s. The content must not be changed in any way or sold 
commercially in any format or medium without the formal permission of the 
copyright holders.
  
 When referring to this work, full bibliographic details including the author, title, 
awarding institution and date of the thesis must be given e.g.
AUTHOR (year of submission) "Full thesis title", University of Southampton, name 
of the University School or Department, PhD Thesis, pagination
http://eprints.soton.ac.ukFigure 6.12 Illustration of how design points migrate from the initial point to the 
optimised points in the simulated annealing algorithm 141 
Figure 6.13 AT graphs showing the effect of scaling the change in energy AE by 
its corresponding priority in the cost function 142 
Figure 6.14 Automatic exploration of a three dimensional design space 
consisting of area, delay and power for the FRISC2 design 144 
Figure 6.15 Automatic exploration of a three dimensional design space 
consisting of area, delay and number of nets for the FRISC2 design. . . 145 
Figure 6.16 Design spaces illustrating the effects of module expansion 145 
Table 6.1 Annealing schedules for the benchmarks 124 
Table 6.2 Initial implementation data for the benchmarks 125 
Table 6.3 Synthesis results of benchmark designs using the comprehensive cell 
library 126 
Table 6.4 Continuation of Table 6.3. Synthesis results of benchmark designs 
using the comprehensive cell library 127 
Table 6.5 Comparison of benchmarks synthesized by Scholyzer and MOODS. . . 132 
Table 6.6 Comparison of systems for the PARKER benchmark 137 
Table 6.7 Comparison of systems for the ELLIP benchmark 138 
Table 6.8 Comparison of systems for the TSENG benchmark 139 K R Baker. 1992 1. Introduction 11 
range from the highest, most abstract level, the functional (behavioural) specification, to 
the lowest, most specific level, the layout. Between the functional and layout levels are, 
in decreasing abstractness, the architectural, register-transfer, logic, and circuit levels. In 
general high-level refers to the functional through to register-transfer levels and low-
level refers to the logic through to layout levels. The earlier definition of a silicon 
compiler encompasses a wide variety of tools, such as, layout, logic synthesis and high-
level synthesis tools. 
High-level silicon compilation comprises the following issues; 
a. definition of an input and output specification, 
b. definition of an architectural model, 
c. definition of an internal representation, 
d. high-level synthesis, 
e. hardware synthesis and layout, and 
f. design space exploration. 
The remainder of this chapter introduces and discusses the issues of high-level silicon 
compilation and high-level synthesis in particular behavioural synthesis; the main topic 
of this research. The last section describes the project objectives. Further information on 
the general issues of silicon compilation can be found in references [1,6,7,8,9,10], 
however, the publication date should be borne in mind as some theories and views may 
be outmoded. 
The rest of this thesis is organised as follows: Chapter 2 is a literature survey of 
previous high-level synthesis systems and shows the current research status in the area 
of silicon compilation. It describes and compares the systems and illustrates their 
drawbacks and how they have been overcome in this synthesis system; the MOODS 
Silicon Compiler. Chapter 3 describes the input, output and architectural models chosen 
and the optimisation techniques used in this system. Chapters 4 and 5 detail the 
transformations and optimisation algorithms and Chapter 6 describes the results and 
compares MOODS with existing systems. Chapter 7 sums up what has been achieved 
and gives suggestions for further work. K R Baker: 1992 1. Introduction 17 
scheduling whose priority function uses urgency measures based on freedoms and the 
possibilities of sharing operators. Other techniques include iterative scheduling and 
control graph partitioning using algorithmic methods such as clique partitioning as in 
Facet [26]. 
Translation changes part of a design to another more useful or efficient one at the same 
level of abstraction. For example, if a multiplier has been specified but it does not exist 
in the cell library then it may be translated into an implementable form, such as 
cascaded adders. A translation may result in an improved design by allowing for better 
scheduling or the possibility to share data path units. 
The allocation, scheduling and translation sub-problems are all interdependent. For 
example, two operations which use a similar operator could share it given that they do 
not occur concurrentiy, whereas if they do occur concurrently then the operator must be 
duplicated. The sub-problems may be done in any order or simultaneously, the method 
chosen will depend on the optimisation strategy adopted by the system. 
Binding fixes the result of a synthesis process. In allocation a data path unit is bound to 
a physical unit, while in scheduling the operations are bound to specific times. Binding 
can occur during or after a process. For example, if allocation is performed one unit at a 
time and once only then the binding may occur with the allocation, however if a method 
such as linear integer programming is used the binding cannot occur until after. Where 
binding is done in the language (as in the case of many structural specifications) the 
appropriate synthesis process is not performed; this makes for a simple compiler but 
restricts the compiler's optimisation opportunities. Language bindings give the user the 
ability to perform area-speed trade-offs. An example of this is the explicit specification 
of parallelism in MacPitts [11]. 
To generate architecture we require to bind elements to structural components and 
operations to control states. The bindings performed are: 
a. Operations to control states, 
b. Operators to functional units, 
c. Variables to storage elements, and 
d. Nets to interconnects. K R Baker: 1992 2. Literature Survey of High-Level Synthesis Systems 30 
The EMUCS tool minimises one aspect of a design, its cost, by performing one 
synthesis task; allocation. The following systems also minimise one aspect of a design, 
either area or delay but in addition use limited heuristics to improve a second aspect of 
the design. 
Facet [26] (1983) is another allocator tool, again part of the CMU-DA suite of tools. 
Facet applies clique partitioning to the synthesis tasks to minimise either storage, 
interconnects or operators by forming special ALU groups. The cUque partitioning 
algorithm uses the common neighbourhood property to produce near minimal cliques. A 
graph node is a neighbour to another if an arc exists between them. If a third node is 
connected to two neighbouring nodes then it is a common neighbour of the pair. A VT 
input can be compacted to form an ASAP schedule by moving operations to the point 
where their inputs are defined. Graphs are formulated for each synthesis problem from 
the VT and clique partitioning applied. In register minimisation for example, a 
compatibility graph is constructed using lifetime analysis, where nodes represent 
registers and arcs join mergable pairs. Register minimisation gives priority to combining 
registers with pure data transfers between them as this increases speed and also reduces 
area. VT compaction is repeated after register clustering. The synthesis tasks are 
sequentially applied to the design in a fixed order with no communication between them 
therefore the design space is not explored. 
The Emerald system [35,46] (1984,6) allows the user to perform initial VT code 
compaction by either serialising operation pairs or moving an operation to another VT 
block. Other VT changes involve transformations such as converting instances of a 
counter to an adder or local incrementer. In this way alternative data paths can be 
achieved. The Facet tool is used by Emerald to synthesize the data path, where design 
costs are represented in terms of component counts, bits or gate counts. 
The Silc [4] (1985) system is similar to MacPitts. Silc performs placement and routing 
and has storage and FSM states (scheduling and two level parallelism) bound in the 
description. The Silc chip is a collection of FSMs communicating by an asynchronous 
protocol; the first level of parallelism. The second level of parallelism occurs through 
each state controlling a set of operations. Heuristics are used to improve the FSM logic 
in the sum of products form. As with MacPitts, Silc shares functional units that occur on 
different FSM states and takes mutual exclusion into account. However, In addition Silc K R Baker; 1992 2. Literature Survey of High-Level Synthesis Systems 34 
interconnects to contain an additional module. Module generators build technology 
dependent modules specified in DAA output. 
The Caddy system [57,58] (1989,90) (Carlsruhe Digital Design System) also uses rules 
which form local optimisations, these use commutativity type axioms. Global 
optimisations used involve folding operators, folding variables based on lifetime analysis 
and loop unrolling. The optimisations are applied to data and control flow graphs which 
are directly compiled from DSL a Pascal like language [9] where unnecessary registers 
and data transfers have been removed. Some explicit control and parallelism is bound in 
the language, however additional parallelism is extracted. The optimisations are applied 
in order to minimise area subject to timing constraints. Scheduling is performed by list 
scheduling using resource limits and freedom, taking into account mutually exclusive 
instructions. Register minimisation is done separately using a graph colouring approach. 
A graph colouring algorithm finds the minimum number of colours such that each node 
when coloured is not adjacent to a similar colour. After optimisation a one to one 
mapping using parameterized structure generators is performed. The output is a 
hierarchical netlist which provides an interface to other Caddy tools. 
The S(p)licer tools [27] (1986) also use a resource constraint, which is used to guide 
scheduling. Slicer creates a preliminary ASAP schedule and determines critical paths 
and operator freedom using unit times based on the fastest units. An optimised schedule 
is created a state at a time by binding operations to functional units from each ASAP 
state starting at the first. The operations in an ASAP state are ordered on increasing 
freedom (critical path first) and each assigned to the new state until the resource limit is 
reached for that state; a new state is then created. The Splicer tool uses a greedy 
algorithm to assign structural components, sharing where possible. A depth first branch 
and bound method is used to find a fair solution quickly and subsequent best solutions 
are retained. 
Raj [59] (1986) describes a system which uses operator similarities. Operations are 
assigned to micro-instructions (conceptually similar to control steps) to form an ASAP 
schedule. Operations are delayed where similar functional units could be shared, thus 
scheduling the operations in different time steps. A hardware allocator then uses a 
greedy methW which allocates operations to hardware one at a time, sharing where 
possible or creating additional hardware if sharing is not possible. The allocation K R Baker: 1992 2. Literature Survey of High-Level Synthesis Systems 42 
iteratively applies transformations using a stochastic process to simultaneously 
perform the allocation and scheduling tasks. Therefore no fixed design model is 
used. 
5. Local rather than global minimum is found (EMUCS) - MOODS avoids local minima 
by using reversible transformations allowing the design to be temporarily 
degraded and thus climb out of local minima. The global minimum (or near 
minimum) is found with the aid of a global cost function. Due to their limited 
design model many systems do not have (or require) a cost function as trade-offs 
are pre-programmed in the optimisation process. 
6. Goals limited to one or two aspects or minimised aspects (HAL) - Again this is due 
to the limited design model and the lack of design evaluation through a cost 
function. By using a global cost function and stochastic process, which contains 
no pre-programmed trade-offs, MOODS can synthesis to any design aspect. The 
complex interaction between design aspects and their resulting trade-offs are 
another reason why more than two goals are rarely optimised using tailored 
heuristics. Encapsulating the trade-offs in an algorithm results in a complex set 
of heuristics as for example in the Camad system. 
7. Design decisions made too early which require re-design to correct (S(p)licer) -
MOODS uses reversible transformations therefore it is never too late to correct 
an inappropriate design decision. 
8. Inaccurate evaluation or estimation of the design, that is, design goals are based on 
unit or bit counts rather than real quantities and do not consider control or 
interconnect factors (Spaid) - Although MOODS does not currentiy take 
interconnects into account there would be no algorithm changes to do so as the 
interconnects would be part of the cost function. Real design quantities are used 
to constrain the design, for example, area in microns rather than bits and delay in 
seconds rather than control states. The design quantities in MOODS are produced 
by feeding up technology dependent information to the design evaluation 
procedures. The use of real quantities gives the designer realistic information; it 
is unlikely that he is concerned with the type of resources utilised but would like 
to know if the design will fit the chip die. 
9. Restricted control model, that is, the optimisation process is bound to one controller 
type which again pre-defines the trade-offs (Silc) - The costs associated with the 
MOODS controller implementation are included in the cost function therefore 
changes to the controller style are reflected in it and thus taken into account K R Baker; 1992 2. Literature Survey of High-Level Synthesis Systems 43 
during optimisation. No restrictions on parallelism are made, such as limited 
depth, therefore other controller styles can be used. 
10. Limited design space exploration - Due to fixed optimisation strategies and trade-off 
assumptions most systems can only provide design space exploration through 
manual intervention; for example, by changing the design description or available 
resources. The MOODS system can automatically explore the design space and 
provide a varied set of implementation from one description; a feature possessed 
by no other system. 
The closest systems to MOODS are those described by Devadas and Newton, and Safir 
and Zavidovique, however, there cost functions are fixed, whereas, the MOODS 
multiple objective cost function is specified by the designer. Devadas and Newton 
estimate the cost of a design and Safir and Zavidovique approximate speed to the 
number of time steps; both of which introduce opportunities for errors. MOODS 
however, uses technology dependent information fed up from a cell library. The use of 
technology dependent information means that variations in trade-offs caused by 
technology variations are also taken into account. MOODS also provides a wider range 
of transformations similar to those in Camad, however they are not applied using pre-
programmed trade-offs as in Camad. K R Baker: 1992 3. Development of the MOODS Silicon Compiler 56 
executed in concurrent sections of the graph are not contentious. Additional information 
is extracted from the control graph after its creation and before performing any 
optimisations. This information does not change during the application of 
transformations and therefore a significant reduction in computational effort can be 
made by generating and retaining the information instead of generating it each time it is 
required. There are two sets of information that can be extracted, firstly, the minimum 
feedback arc set (MFBAS) which when removed from the control graph renders it 
acyclic and secondly, mutual exclusion between instructions. 
The MFBAS is generated in two stages, firstly, taking the permanent of the adjacency 
matrix by recursive expansion and secondly, the construction of a boolean function 
which when manipulated yields the required arc set [74,75,76,77]. For large general 
graphs the computation is "hard". However, in a control graph there are few feedback 
arcs, which in addition to matrix reduction methods result in a matrix where in many 
cases the MFBAS can be directly obtained without any further algebraic manipulation. 
The reduction methods involve removing and noting self arcs which by their definition 
are feedback arcs and removing single input control nodes, where the input arc can 
never be a feedback arc as the node would be inaccessible in the acyclic graph. 
Mutual exclusion occurs between a pair of instructions that can never be executed 
concurrently, therefore the instructions may share hardware even when executed in the 
same control state as they are not executed together. For example, in Figure 3.5 
instructions i7 and i8 are mutually exclusive owing to the preceding conditional node. 
Mutual exclusion is determined for all instractions in conditional branches of the control 
graph by recursively analyzing conditional nodes. For a given node, each instruction in 
one branch with branch condition Sy is mutually exclusive to all instructions in all other 
branches with branch condition 
After optimisation the control graph is likely to take on a different appearance. Many 
instructions may occur in a control state and arc conditions and conditions for firing will 
have changed. K R Baker: 1992 3. Development of the MOODS Silicon Compiler 62 
the design. The second consideration is the ability to explore the design space quickly. 
This can be done by re-synthesizing the design. The re-synthesis computation time can 
be reduced by synthesizing from the current control and data path graphs rather than the 
initial control and data path graphs. This is based on the assumption that most practical 
optimal solutions in the design space will be closer to the current design space position 
than the initial position, therefore requiring less computation to reach them from the 
current position. A third consideration is to allow the designer to manually adjust the 
implementation to include his quirks or refine the implementation. 
Algorithmic approaches using linear programming have been reported as having a 
computational explosion for even the smallest of practical designs. For multiple 
objectives integer goal programming [78] must be used. Linear programming is a special 
case of goal programming, therefore the use of goal programming would be impractical 
from the computational point of view. Other algorithmic approaches, such as clique 
partitioning, were considered and although they are applied to individual synthesis tasks, 
they can take constraints into account which allow for better results from subsequent 
synthesis tasks. However, the problem arises whereby the constraints may not be the 
best ones to achieve a particular goal in subsequent processes or there may be too many 
or too few of them. This is due to the lack of feedback from later synthesis processes to 
earlier ones and because of this a global optimum cannot be achieved. In addition, the 
constraints are not in terms of real circuit parameters such as area and delay and would 
therefore be meaningless to the user. 
Design space exploration in the algorithmic methods must be achieved by 
re-synthesizing the design from the initial point in the design space. Re-synthesizing is 
normally done from the initial design point, however, in very few circumstances a 
partially synthesized design between synthesis tasks can be used which avoids 
performing preceding tasks. 
An iterative optimisation strategy is used in MOODS as it overcomes the problems 
described above. Iterative optimisation is achieved by breaking the synthesis tasks into a 
number of local transformations, some associated with allocation and others with 
scheduling and translation. This allows the simultaneous consideration of the synthesis 
tasks which is recognised as being extremely difficult [79] and can result in complex 
algorithms or simplified design models. However any implementation can be obtained K R Baker: 1992 5. Design Optimisation 95 
must be achieved. An achievement function a, is considered better than 8% if the first 
non-zero component of a, - a, is negative given that all components of a, and a^ are 
non-negative. 
The MOODS cost function is a variation of the goal programming achievement function 
and allows any objective to be associated with a priority. As objectives measured in 
different units are not directly comparable the function of deviation variables, G^(n,p), is 
replaced by a vector of g(n,p) functions for the objectives associated with priority k, 
therefore, Gk(n,p) becomes: 
ar. == fjaCKz*?,), .... 
where each gw(ni,Pi) represents an objective given by equation (5.2) above. 
In synthesis not only is it required to know whether one cost function is better than 
another but also by how much. Typically the two cost functions are cf^ and cfp^ 
representing the cost functions for the next and present designs respectively. The 
difference between the two cost functions, AE, represents the change in energy between 
the functions and thus the design. AE is determined by constructing a third vector, E, 
the energy change vector, whose elements represent the change in energy at each 
priority, thus: 
E,(g,) 05.6) 
where E^(gJ is the combined objective change for priority k. For each g^ in c(^ and 
cfpres the change between them, Ac^, is evaluated and for each priority level, k, their 
average is taken, thus: 
EAc..  h (5.7) 
m 
where m is the number of objectives at priority k. The function used to determine Ac^ 
may take into account other objective factors such as initial or target values, however 
their difference was initially taken, thus: 
du:* := - ghOZmP,)*,, OSJS) K R Baker: 1992 5. Design Optimisation 101 
It is worth noting that the length of the critical path can be found by performing stage 1 
of the critical path analysis and is given by the maximum end time of the end nodes in 
the control graph. All four stages are performed as the slack information may be helpful 
during optimisation in selecting the transformation data. 
In the MOODS synthesis system the design may be optimised either manually or 
automatically using an iterative optimisation algorithm. The manual method entails the 
user to manually apply transformations to improve the design. The user has access to 
the cost function and evaluation routines to guide him, however it is essential that the 
user can visualise the design as it is being optimised. This requires the user to draw the 
initial implementation and update it as transformations are successfully applied. This 
results in a laborious process when optimising a complete design, however the manual 
option is essential for making adjustments to an already optimised design either to 
improve it or to include some of the designer's quirks. 
Iterative optimisation consists of selecting transformations and applying them to the 
design in such a way that the user's criteria are met. The method of selecting and 
applying transformations constitutes the optimisation algorithm. There are two 
approaches to iterative optimisation, namely tailored and adaptive heuristics [81]. In the 
tailored heuristic approach the cost function is analyzed and a transformation chosen and 
applied depending on the current position of the design within the design space relative 
to the required position set by the user's objectives. For example, if the user has set an 
area objective which has not yet been met then a transformation which performs an area 
reduction is selected. This approach is used in Camad [23] and Chippe [69]. The 
adaptive heuristic method arbitrarily selects a transformation and its effect on the design 
is estimated. The transformation is applied depending upon the analysis of the 
estimation with respect to the user's objectives. This approach is used by Devadas and 
Newton [42]. 
There are advantages to both approaches. For example, the tailored heuristic approach 
guarantees an improvement with each iteration, when no improvement occurs the 
optimisation ends. However, this leads to local minima. The adaptive heuristic approach K R Baker: 1992 6. Results 117 
The other annealing parameters (T,^ and I^^p) are difficult to determine, however an 
estimate of the total number of iterations, given by equation (6.3), can be made by 
examining the cost of improvements measure. 
Lw = (1 + <«-3) 
The set of temperature-cost graphs in Figure 6.3 show how the cost of improvements 
(the lower curve) and cost of degradations (the upper curve) vary with parameters T,^, 
and Itotai for the FRISCl benchmark design. The values chosen for T,^ and T^ 
were 200 and 0 respectively. T,^ was chosen to include the freezing point which was 
previously found to be 100. The values for T,^p were 2, 10 and 20 for the left, middle 
and right columns of graphs respectively and was chosen such that the total number 
of iterations, applied to the design was approximately 10000, 15000 and 20000 for 
the top, middle and bottom rows of graphs respectively. The resulting costs shown were 
summed over a range of temperature steps equal to the largest step used thereby giving 
comparable graphs. The cost curves were then smoothed as shown by the solid curves. 
In optimising a design to a particular cost function the design cost will be reduced by a 
particular value equal to the distance between the initial and optimised design points in 
the design space. The costs at temperature point T=0 are a good aid in determining 
whether sufficient iterations have been performed in order to optimise the design. If 
transformations were applied to the design at this point (only improvements are applied 
at T=0) then the design may not be optimal. On the other hand if sufficient iterations 
have been performed then few improvements will be possible at T=0. This is 
demonstrated in Figure 6.3 where in the graphs of the top row insufficient iterations 
have been performed thus the design was still being improved at T=0. As the total 
number of iterations is increased the bulk of the improvements to the design (shown by 
a bulge between the smoothed curves) occur at a higher temperature, with improvements 
at lower temperatures being applied only to counteract the effect of degradations. The 
value of where the number of improvements applied at T=0 becomes a minimum is 
the minimum total number of iterations required to optimise the design. This value 
appeared to be independent of the size of T^t^p, however the quality of the design was 
not. Using this method the minimum value of 1,^.^, was determined for each benchmark 
and entered into the table of annealing schedules, Table 6.1 on page 124. Given and 
a value for T^^^p, 1;,^^ was calculated using equation (6.3). aw 
lemperalure 
1CX) 150 200 
lomperature 
9 '''siop"^. 
100 150 200 
temperature 
50 100 190 
lemperalure 
aooo 
temperature 
I 
M 1M 1M 2M 
temperature 
W 1M 1M 2Mi 
temperature 
t-^-20. I.Wp-1364 
5000 
4000 
3000 
^ 2000 
1000 
M 1M UW 2M 
temperature 
i: T.,^-20. I.,^-1810 
I 
4500 
3000 
1500 
temperature 
Figure 6.3 Variation in cost curves with different step and iteration values. 
Total number of iterations are KKKX), 15000 and 20000 for the top, middle and bottom rows respectively. K R Baker: 1992 6. Results 129 
thought that due to the similar reduction in registers caused by bypassing and sharing, 
register optimisation can be performed as a separate synthesis task; however this is not 
the case as the optimisation opportunities for registers are highly dependent on the 
scheduling of instructions. 
The number of multiplexers used in an implementation has increased in the optimised 
implementations due to the increase in unit sharing shown by the total number of 
functional units. In comparing the area of units used in a design the total number of bits 
is a good representation of the area used. For multiplexers the number of inputs is also 
an important figure. The units which make up each design have been selected in the 
implementations such that a greater proportion of fast cells are used in the delay 
optimised design than in the area optimised design. 
An attempt has been made to determine how good a design is without reference to the 
user's objectives. The measure of goodness (MOG) measures give a guide as to how 
well the clock period, registers and units are utilised and are similar to other utilization 
measures [28,70]. For the clock period this consists of analyzing the slack time in each 
control state. The slack time is given by the difference between the end time of the 
maximum instruction graph and the clock period. The register and unit usages are 
determined by calculating the ratio of the number of control steps during which they are 
in use, to the critical path length. The unit usages are scaled by the probability of 
execution given by the conditional branch probabilities to allow for the possibility of 
mutually exclusive instructions sharing functional units. As a consequence of this the 
unit usage figures are small. In all designs the clock period is better utilised in the delay 
optimised implementations as would be required in achieving a fast design; similarly the 
unit usage should reflect the optimisation of the area, however this is not apparent due 
to the small unit usage figures. In all except the Kalman filter design the register usage 
is better for the area optimised implementations. This is not due to register sharing as 
indicated by similar register figures for both implementations but due to the increased 
critical path length which would increase the register active times compared to their 
inactive times. 
To determine the effect of inline expansion two design descriptions which use modules, 
FRISCl and KALMANIO, were transformed such that the modules were expanded 
inline resulting in the FRISC2 and KALMANI02 descriptions. The modules in the K R Baker: 1992  6. Results  144 
2.5*10* 
aoxio* 
1,5x10* 
1.0x10* 
0° o o 
"•^'^aOxlO* 3.5x10* 5.0x10* 6.5x10* aOxlO* 9.5x10* 
7.5x10* 
6.0x10* 
4.5x10* 
3.0x10* 
1,5x10* 
7.5x10* 
6.0x10* 
4.5x10* 
3,0x10* 
1.5*10* 
o o 
0.5*10* 1.0*10* 1.5*10* 2.0*10* 25*10* 
doiay 
pO*#f 
Z 
ZOKlO* ISxIO* 5.0x10* 6.5x10* 8.0x10* 9.5x10* 
Figure 6.14 Automatic exploration of a three dimensional design space consisting of 
area, delay and power for the FRISC2 design. 
power trade-offs thus indicating that the method of calculation has a greater influence on 
the correlation of criteria than the cells. Another three dimensional design space was 
generated using the area, delay and number of nets criteria. Again, a high correlation 
between the area and the number of nets was expected as the number of nets is closely 
related to the number of registers and shared units in the design. The resulting design 
space is shown in Figure 6.15. 
As well as showing the range of designs and trade-offs between design aspects, the 
exploration of the design space can also be used to illustrate the effect of changes to the 
design description. As an example, module inline expansion was shown by the results of 
Table 6.3 to improve the design, however the improvement can be graphically illustrated 
using the design space. Figure 6.16 shows the design spaces for the descriptions without 
and with their modules inline expanded on the left and right of the figure respectively. 
The KALMANIO design has modules which are only called once, therefore by 
expanding the modules no additional units are created; however, by removing the 
module boundaries further optimisation opportunities are generated as optimisations K R Baker: 1992 6. Results 146 
between modules can not occur. The lack of additional units and the increased 
optimisation opportunities caused by module expansion are shown in the KALMANI02 
design space by the similar spread of design points which have been shifted towards the 
origin. In contrast to the KALMANIO design, the FRISC design calls its modules more 
than once therefore their expansion causes additional units to be created which can be 
used in further optimisations; this is shown by the increased range of implementations in 
the FRISC2 design space. The design points are closer to the origin, as in the 
KALMANI02 design space, due to the removal of the module boundaries. K R Baker: 1992 7. Conclusions and Future Work 149 
opportunities [44]. Experiments in Section 6.1 have shown that iterations applied at T=0 
are an important fine refinement process and that the temperature reduction method is 
critical to the design quality. 
The comparison of the results with those generated by other systems illustrates the 
importance of knowing all the design data in order to do a proper comparison; for 
example, a short clock period does not necessarily mean a fast implementation. The 
conditions for synthesis are also important in the comparison of systems; for example, 
some systems require the user to select the hardware from which to build an 
implementation, whereas other systems, including MOODS, select their own hardware. 
This should be taken into account when comparing systems as by specifying hardware 
the user is biasing, or in some cases binding, the hardware used in the final 
implementation. 
Despite the execution time of MOODS being longer than that of other systems the 
improved variety and quality of the resulting implementations is considered more 
important. A design time of one minute or one hour is still faster than a hand crafted 
design. 
The results of Section 6.4 show that design space exploration is an important aspect of 
designing by high-level synthesis and in the development of synthesis systems. The 
MOODS synthesis system includes an efficient method for automated design space 
exploration. It allows the designer to obtain a perspicuous characterization of the design 
space for a design and thus allows him to investigate alternative designs and determine 
whether a design can satisfy a variety of simultaneous constraints. The design spaces 
show that there are many near optimal implementations for a given description. A 
characterized design space can be used, as shown in Section 6.4, to investigate the effect 
of changes to the synthesis system. A design space shows graphically the impact of 
system changes, giving a better overall view of their effect than is obtained by 
individually synthesising designs. 
The exploration of design spaces illustrates that there is a high correlation between some 
criteria, in particular area, power and the number of nets. This fact could be put to use 
in some systems to optimise criteria not explicitly optimised by the system; however, it 
would require an experienced designer to know how one criterion is related to another. K R Baker: 1992 7. Conclusions and Future Work 153 
Extra tools which would aid the designer in producing an implementation could include 
automated test structure and test pattern generation [95] and automated methods in 
finding the annealing schedule or other adaptive heuristic parameters. Additional 
flexibility would be achieved if the designer could define operators to be used in the 
description that could subsequently be optimised by the system. At present user defined 
modules can be described, however their instances are not optimised. 
Currently the user interface for the MOODS system is a textual one. Once initiated the 
system creates an initial implementation and displays the MOODS prompt. The user 
issues commands at the MOODS prompt and data scrolls up the screen. When satisfied 
with the implementation the user exits the system whereupon output files are created. 
An improvement could be made in the presentation of the implementation to the 
designer. At present the data and control paths of the implementation are described in a 
number of output files and their nodes can be examine from the MOODS prompt. 
Alternative methods of design representation could either be textural, by back-annotation 
of the implementation to the original behavioural description, or by graphical output. 
Textural back-annotation would be useful as the designer could see how the original 
description had been altered by the system and could be useful in the development of 
new algorithms. Graphical output would also be valuable in order for the designer to 
visualise the implementation. By using platforms with graphical interfaces the user could 
watch the design being optimised and interact with it to produce the required 
implementation. K R Baker: 1992 Appendix B: ELLA Simulation Example 168 
optimised design the number of functional units is 31, twice that in area optimised 
design (1-2 operations per functional unit). The reduction in sharing is accompanied by 
(traded off against) a decrease in critical path length to 95. K R Baker: 1992 References 181 
1 Goldberg, A V - Hirschhom, S S - Lieberherr, K J, "Approaches Toward Silicon 
Compilation.", IEEE Circuits and Devices, May 1985, pp. 29-39. 
2 VLSI Design, Staff, "Silicon Compilers. Part 1; Drawing a Blank.", VLSI 
Design. September 1984. 
3 Allen, Jonathan, "Performance-Directed Synthesis of VLSI Systems.", Proc. 
IEEE, Vol. 78, No. 2. Feburary 1990. pp 336-355. 
4 Blackman, Timothy - Fox, Jeffrey - Rosebrugh, C, "The SILC Silicon Compiler: 
Language and Features.", Proc. 22nd DAC. 1985 IEEE. Paper 17.1, pp. 232-237. 
5 Hartley, Richard I - Jasica, Jeffrey R, "Behavioural to Structural Translation in a 
Bit-Serial Silicon Compiler.", IEEE Trans, on CAD. Vol 7. No 8. August 1988. 
pp. 877-886. 
6 Parker, Alice C, "Automated Synthesis of Digital Systems.", IEEE Design & 
Test, November 1984. pp. 75-81. 
7 Werner, Jerry [editor], "Progress Toward the 'Ideal' Silicon Compiler. Part 1: 
The Front End.", VLSI Design. September 1983. 
8 Werner, Jerry [editor], "Progress Toward the 'Ideal' Silicon Compiler. Part 2; 
The Layout Problem.", VLSI Design. October 1983. 
9 Camposano, Raul, "Synthesis Techniques for Digital Systems Design.", Proc. 
22nd DAC, 1985 IEEE. pp. 475-481. 
10 Gajski, Daniel D - Dutt, Nikil D - Pangrle, Barry M, "Logic Design and Silicon 
Compilation for VLSI Design. Silicon Compilation (A Tutorial).", Proc. IEEE 
1986 Custom IC Conf. NY. May 1986, pp. 102-110. 
11 Southard, Jay R, "MacPitts: An Approach to Silicon Compilation.", IEEE 
Computer, December 1983, pp. 74-82. 
12 Bergmann, Neil, "A Case Study of the F.I.R.S.T Silicon Compiler. ", Third 
Caltech Conf. VLSI, March 1983, pp. 413-430. 
13 Walker, Robert A - Thomas, Donald E, "Behavioural Level Transformations in 
the CMU-DA System.", Proc. of the 20th DAC, ACM/IEEE, Miami, FL, 
June 1983. 
14 Thomas, Donald E - Blackburn, Robert L - Raj an, J, "Linking the Behavioural 
and Structural Domains of Representation for Digital Systems Design.", IEEE 
Trans. CAD, Vol. 6, No. 1, 1987. pp. 103-110. 
15 Walker, Robert A - Thomas, Donald E, "Design Representation and 
Transformation in the System Architect's Workbench.", Proc. Int. Conf. 
Computer Design (ICCD) 1987. pp. 166-9. K R Baker: 1992 References 182 
16 Werner, Jerry [editor], "The Silicon Compiler: Panacea, Wishful Thinking, or 
Old Hat?", VLSI Design, Vol. 3, No. 5, Sept/Oct 1982, pp. 46-52. 
17 Hauge, Peter S - Nair, Ravi - Yoffa, Ellen J, "Circuit Placement For Predictable 
Performance.", Proc. Int. Conf. Computer Design (ICCD) 1987. pp. 88-91. 
18 McFarland, Michael C, "On Proving the Correctness of Optimising 
Transformations in a Digital Design Automation System.", Proc. 18th DAC, 
IEEE Comp. Soc. DATC, June 1981, pp. 90-97. 
19 Haroun, Baher S - Elmasry, Mohamed I, "Architectural Synthesis for DSP 
Silicon Compilers.", IEEE Trans, on CAD. Vol 8. No 4. April 1989, 
pp. 431-447. 
20 Bergamaschi, Reinaldo A, "The Development of a High Level Synthesis System 
for Concurrent VLSI Systems.", PhD Thesis. Southampton University. 
December 1988. 
21 Peng, Zebo, "Synthesis of VLSI Systems with the CAMAD Design Aid.", Proc. 
23rd DAC, 1986 IEEE. pp. 278-284. 
22 Peng, Zebo, "A Formal Approach to the Synthesis of VLSI Systems From Their 
Behavioural Descriptions.", Proc. 19th Hawaii Int. Conf on Sys. Sci, January 
1986. pp. 160-7. 
23 Peng, Zebo, "A Formal Methodology for Automated Synthesis of VLSI 
Systems.", PhD Thesis. Linkoping University, 1987. 
24 Parker, Alice C - Mlinar, Mitch - Pizarro, Jorge, "MAHA: A Program for 
Datapath Synthesis.", Proc. 23rd DAC, Las Vagas, July 1986, pp. 461-466. 
25 Paulin, P G - Knight, J P - Girczyc, E F, "HAL: A Multi-Paradigm Approach to 
Automatic Data Path Synthesis.", Proc. 23rd DAC, July 1986, pp. 263-270. 
26 Tseng, Chia-Jeng - Siewiorek, Daniel P, "Facet: A Procedure for the Automated 
Synthesis of Digital Systems.", Proc. 20th DAC, 1983 IEEE. pp. 490-496. 
27 Pangrl, Barry M - Gajski, Daniel D, "State Synthesis and Connectivity Binding 
for Microarchitecture Compilation.", IEEE 1986. pp. 210-213. 
28 Zimmermann, G, "The MIMOLA Design System: A Computer Aided Digital 
Processor Design Method.", Proc. 16th Design Automation Conf., June 1979, 
pp. 53-58. 
29 Girezyc, E F - Knight, J P, "An ADA to Standard Cell Hardware Compiler 
Based on Graph Grammers and Scheduling.", Proc. ICCD, 1984. October 1984. 
30 Knapp, David W - Parker, Alice C, "The ADAM Design Planning Engine.", 
IEEE Trans, on CAD Vol. 10 No. 7, July 1991. pp. 829-846. K R Baker: 1992 References 183 
31 McFarland, Michael C, "Reevaluating the Design Space for Register-Transfer 
Hardware Synthesis.", Proc. Int. Conf. Computer Design (ICCD) 1987. 
pp. 262-265. 
32 Park, Nohbyung - Parker, Alice C, "Sehwa: A Software Package for Synthesis of 
Pipelines from Behavioral Specifications.", IEEE Trans, on CAD. Vol 7. No 3. 
March 1988. pp. 356-370. 
33 Silvar Lisco, "CAL-MP 10 - General Overview.", Document No: MO 14-3, 
March 1984. 
34 Jain, Rajiv - Mlinar, Mitchell J - Parker, Alice, "Area-Time Model for Synthesis 
of Non-Pipelined Designs.", Proc. Int. Conf. Computer Design (ICCD) 1988. 
pp. 48-51. 
35 Tseng, Chia Jeng - Siewiorek, Daniel P, "Emerald; A Bus Style Designer.", Proc, 
21st Design Automation Conf., June 1984. 
36 Johannson, D L - McElvain, K - Tsubota, S K, "Intelligent Compilation.", VLSI 
Syst. Design. Vol. 8, No. 4, pp. 40-46. April 1987. 
37 Johannsen, D, "Bristle Blocks: A Silicon Compiler.", Proc. 16th DAC June 1979, 
pp. 310-313. 
38 Siskind, J M - Southard, J R - Crouch, K W, "Generating Custom High 
Performance VLSI Designs from Succinct Algorithmic Descriptions.", Proc. 
Conf. Advanced Reseach in VLSI, January 1982, pp. 28-40. 
39 Claesen, L - Catthoor, F - Goossens, G - et al, "Automatic Synthsis of Signal 
Processing Benchmark using the CATHEDRAL Silicon Compilers.", Draft 
version 22/1/88, Proc. IEEE 1988 CICC. 
40 Jamier, R - Jerraya, A A, "APOLLON, A Data-Path Silicon Compiler.", IEEE 
Circuits and Devices Magazine, May 1985. 
41 Choi, Y H, "Synthesis of pipelined data paths.", CAD Butterworth. Vol. 24, 
No. 1, January 1992. pp. 36-40. 
42 Devadas, Srinivas - Newton, A. Richard, "Algorithms for Hardware Allocation in 
Data Path Synthesis.", IEEE Trans, on CAD. Vol. 8, No. 7. July 1989. 
pp. 768-81. 
43 Hitchcock, Charles Y - Thomas, Donald E, "A Method of Automatic Data Path 
Synthesis.", Proc. 20th DAC. 1983 IEEE. pp. 484-489. 
44 Hafer, Louis J - Parker, Alice C, "Automated Synthesis of Digital Hardware.", 
IEEE Trans, on Computers, Vol. 31, No. 2, February 1982. p93. K R Baker: 1992 References 184 
45 Parker, Alice C - Hafer, Lou, "The Application of a Hardware Description 
Language for Design Automation.", January 1978, pp. 349-355. 
46 Tseng, Chia Jeng - Siewiorek, Daniel P, "Automated Synthesis of Data Paths in 
Digital Systems.IEEE Trans. CAD, Vol. 5, pp. 379-395, July 1986. 
47 Allerton, D J - Batt, DA- Currie, A J, "Second Progress Report: Silicon 
Compiler Project.", University of Southampton, dept. of Electronics. April 1983. 
48 Jong, Ivan C. C., "SCHOLAR User Manual (vl.O).", Southampton University, 
Dept. Electronics and Comp. Sci. July 1988. 
49 Camposano, Raul - van Eijndhoven, J T J, "Combined Synthesis of Control 
Logic And Data Path.", Proc. Int Conf Computer Design (ICCD) 1987. 
pp.327-329. 
50 Bendas, J B, "Design Through Transformation.", Proc. 20th DAC, Miami, FL. 
June 1983. 
51 Hong, Youn Sik - Park, Kyu Ho - Kim, Myunghwan, "Automatic Synthesis of 
Data Paths based on the Path-Search Algorithm.", Proc. Int Conf Computer 
Design (ICCD) 1987. pp. 270-273. 
52 Nagle, Andrew W - Cloutier, Richard - Parker, Alice C, "Synthesis of Hardware 
for the Control of Digital Systems.", Trans. CAD of ICs and Systems, No. 4, 
October 82, pp. 201-12. 
53 Nagle, Andrew W - Parker, Alice C, "Algorithms for Multiple-Criterion Design 
of Micro- Programmed Control Hardware.", Proc. 18th DAC, (Nashville, TN), 
June 1981. pp. 486-493. 
54 Girczyc, E F - Buhr, R J A - Knight, J P, "Applicability of a Subset of Ada as 
an Algorithmic Hardware Design Language for Graph-Based Hardware 
Compilation.", IEEE Trans on CAD, Vol. CAD-4, No. 2, April 1985. 
55 Kowalski, T J - Thomas, D E, "The VLSI Design Automation Assistant: What's 
in a Knowledge Base.", DAC, 1985, pp. 252-258. 
56 Kowalski, T J - Geiger, D J - Wolf, W - Fichtner W, "The VLSI Design 
Automation Assistant: From Algorithms to Silicon.", IEEE Design & Test, 
August 1985, pp. 33-43. 
57 Camposano, Raul - Rosenstiel, Wolfgang, "Synthesizing Circuits From 
Behavioural Descriptions.", IEEE Trans, on CAD. Vol 8. No 2. February 1989. 
pp. 171-180. 
58 Hienrich, Kramer - Rosenstiel, Wolfgang, "System Synthesis using Behavioural 
Descriptions.", IEEE Proc. of EDAC, 25-28 February 1990, pp. 277-282. K R Baker: 1992 References 185 
59 Raj, Vijay K, "Another Automated Data Path Designer.", IEEE 1986. 
60 Paulin, P G - Knight, J P, "Force-Directed Scheduling in Automated Data Path 
Synthesis.", Pmc. 24th ACM/IEEE DAC 1987, pp. 195-202. 
61 Paulin, P G - Knight, J P, "Extended Design-Space Exploration in Automatic 
Data Path Synthesis.", Proc. Canadian Conf. on VLSI, October. 1986, 
pp. 221-226. 
62 Camposano, Raul - Bergamaschi, Reinaldo A, "Redesign Using State Splitting.", 
EDAC 1990, Glasgow, March. IBM, Research Report. 
63 Camposano, Raul, "Path-Based Scheduling for Synthesis.", IEEE Trans on CAD, 
Vol 10, No 7, January 1991. pp. 85-94. 
64 Balakrishnan, M - Majumdar, A - Banerji, D - et al, "Allocation of Multiport 
Memories in Data Path Synthesis.", IEEE Trans, on CAD. Vol 7. no 4. April 
1988. pp. 536-540. 
65 Tseng, Chia Jeng - Siewiorek, Daniel P, "The Modeling and Synthesis of Bus 
Systems.", Proc. 18th DAC, June 1981 IEEE. pp. 471-478. 
66 Rajan, Jayanth V - Thomas, Donald E, "Synthesis By Delayed Binding Of 
Decisions.", 22nd Design Automation Conf. IEEE. 1985. pp. 367-373. 
67 Lagnese, ED- Thomas, D E, "Archetectural Partitioning for System Level 
Synthesis of Intergrated Circuits", IEEE Trans on CAD, Vol. 10, No 7, July 
1991. pp. 847-860. 
68 Bushnell, ML- Director, S W, "ULYSSES; An expert-system based VLSI 
design environment.", in Proc. ISCAS 85, 1985. 
69 Brewer, Forrest - Gajski, Daniel, "Chippe: A System for Constraint Driven 
Behavioral Synthesis.", IEEE Trans, on CAD. Vol 9. No 7. July 1990. 
pp. 681-695. 
70 Safri, A - Zavidovique, B, "Towards a Global Solution to High Level Synthesis 
Problems.", IEEE Proc. of EDAC, 25-28 February. 1990, pp. 283-288. 
71 Morison, J D - Peeling, N E - Thorp, T L, "ELLA: A Hardware Description 
Language.", IEEE conf. on Circuits and Computers, September 1982. 
72 Morison, J D - Peeling, N E - Whiting, E V, "Sequential Programming 
Extensions to ELLA, with Automatic Transformation to Structure.", Proc. ICCD 
1987, pp. 571-576, Rye Brook, NY, October 1987. 
73 Baker, Keith R, "The ELLA to ICODE Interface.", Southampton University, 
DepL Electronics & Comp. Sci. June 1990. K R Baker: 1992 References 186 
74 Younger, D H, "Minimum Feedback Arc Sets for a Directed Graph.", IEEE 
Trans. Circuit Theory. June 1963. pp. 238-245. 
75 Lempel, A - Cederbaum, I, "Minimum Feedback Arc and Vertex Sets of a 
Directed Graph.", IEEE Trans. Circuit Theory, December 1966. pp. 399-403. 
76 Yau, S S, "Generation of all Hamiltonian Circuits, Paths and Centers of a Graph, 
and Related Problems.", IEEE Trans. Circuit Theory, March 1967. pp 79-80. 
77 Divieti, L - Grasselli, A, "On the Determination of Minimum Feedback Arc and 
Vertex Sets.", IEEE Trans. Circuit Theory, March 1968. pp. 87-89. 
78 Ignizio, James P, "Goal Programming and Extensions.", Lexington Books. 1976. 
79 McFarland, Michael C - Parker, Alice C - Camposano, Raul, "Tutorial on 
High-Level Synthesis.", Proc. IEEE 25th DA Conf., 1988, pp. 330-336. 
80 Koelmans, AM- Bums, F P - Kinniment, D J, "Use of a Theorem Prover for 
Transformational Synthesis.", Computing and Control Division, 21st January 
1991. No: 1991/014. 
81 Nahar, Surendra - Sahni, Sartaj - Shragowitz, E, "Simulated Annealing and 
Combinatorial Optimization.", IEEE, 23rd DAC 1986. Paper 16.1, p. 293. 
82 Carre, Bernard, "Graphs and Networks.", Oxford University Press 1979. 
83 Ullman, J D - Aho, A V - Sethi, R, "Compilers: Principles, Techniques and 
Tools.", Addison-Wesley, Mass, 1986. 
84 Moder, Joseph J - Phillips, Cecil R - Davis, E W, "Project Management with 
CPM, PERT and Precedence Diagramming. 3rd edition.", Van Nostrand 
Reinhold Company. 
85 Rutenbar, Rob A, "Simulated Annealing Algorithms: An Overview.", IEEE 
Circuits and Devices Mag. January 1989. pp. 19-26. 
86 Kirkpatrick, Scott - Gelatt Jr., CD- Vecchi, M P, "Optimization by Simulated 
Annealing.", Science. 13 May 1983, Volume 220, No. 4598. pp. 671-680. 
87 Kirkpatrick, Scott, "Optimization by Simulated Annealing; Quantitive Studies.", 
Jml of Statical Phys, Vol 34, Nos 5/6, 1984. pp. 975-986. 
88 Metropolis, N - Rosenbluth, A - Teller A & E, "Equation of State Calculations 
by Fast Computing Machines.", Jr. Chem. Phys., Vol 21. p. 1087. 1953. 
89 Nahar, S - Sahni, S - Shragowitz, E, "Experiments with Simulated Annealing.", 
22nd Design Automation Conference, 1985, pp. 748-752. 
90 Microelectronics Centre of Northern California, "High-Level Synthesis Workshop 
Benchmarks.", 1989, 1991. K R Baker: 1992 References 187 
91 Baker, Keith R, "The MOODS Synthesis System - User Manual V2.0.", 
Southampton University, Dept. Electronics & Comp. Sci. October 1992. 
92 Leive, G - Thomas, D, "A Technology Relative Logic Synthesis and Module 
Selection System.", Proc. 18th DAC. IEEE Comp Soc DATC, June 1981, 
pp. 479-85. 
93 Knapp, David W, "Datapath Optimization Using Feedback.", IEEE Proc. of 
EDAC, 12-15 March 1991, pp. 129-134. 
94 Hands, J P, "What is VHDL?", CAD, Buttenvorth. Vol 22. No 4. May 1990. 
pp. 246-9. 