Layout-driven allocation for high level synthesis by Wu, Allen C.H. & Gajski, Daniel D.
UC Irvine
ICS Technical Reports
Title
Layout-driven allocation for high level synthesis
Permalink
https://escholarship.org/uc/item/4zc6k948
Authors
Wu, Allen C.H.
Gajski, Daniel D.
Publication Date
1991-04-09
 
Peer reviewed
eScholarship.org Powered by the California Digital Library
University of California
Notice: Tills Material 
may be protected 
by Copyright Law 
(Title 17 U.S.C.) 
Layout ... Driven Allocation 
<for High Level Synthesi~_ 
Allen C-H. Wuc 
Daniel D. Gajski 
Technical Report #91-30 
April 9, 1991 
Dept. of Information and Computer Science 
University of California, Irvine 
Irvine, CA 92717 
(714) 856-8059 
Abstract 
We propose a hypergraph model and a new algorithm for hardware allocation. The 
use of a hypergraph model facilitates the identification of sharable resources and the 
calculation of interconnect costs. Using the hyper graph model, the algorithm per-
forms interconnect optimization by taking into account interdependent relationships 
between three allocation subtasks: register, operation, and interconnect allocations 
simultaneously. Previous algorithms considered these three tasks serially. Another 
novel contribution of our algorithm is the exploration of design space by trading off 
storage units and interconnects. We also demonstrate that traditional cost functions 
using the number of registers and the number of mux-inputs can not guarantee the 
minimal area. To rectify the problem, we introduce a new layout area cost func-
tion and compare it to the traditional cost functions. Our experiments show that 
our algorithm is superior to previously published algorithms under traditional cost 
functions. 
G 'l f 
c_ :3 
Y\6,91-::Sd 

TABLE OF CONTENTS 
1. Introduction 1 
2. The allocation problem . .......... ........ .. .... .. .. .. .. .... .. .. .. .. .. .. .... ........ .... ...... .... .. .. .. .. .. .. . 2 
2.1 Hypergraph formation .................. .................... ............................................... 2 
2.2 Layout-area cost function .......................... ...... ............................................... 7 
2.3 Interchange optimization ................................................................................ 8 
. 2.3.1. liype.re.dge merging by node relocation ................................................... 8 
2.3.2 Hyperedge merging by node swapping ..... .. ... ................ .......................... 11 
2.3.3 Interchange under global considerations ................................................. 11 
2.4 The overall algorithm ................................. ...... ............ ................................... 15 
3. Results and Discussions ................................... ...... .. .................................... ......... 17 
4. Conclusions ........................ .............................. ..................................................... 19 
5. Acknowledgements ................................................ ............................................... 19 
6. References .. ...... .... .. .. .. ...... .. .... ...... .... .. ...... .... .... .... .. .. .. .... .... ...... .. .. .. .... .. .. .. .. .. .. .... .. . 25 
Page i 

LIST OF FIGURES 
Figure 1. Hypergraph formation: (a) Data flow graph and schedule, (b) Vari-
able assignments, (c) Hypergraph of (a), (d) Hypergraph after hyperedge 
·. megering. . .. .. .. ... .. .. .. .. . .. . .. ..... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 4 
Figure 2. Hyperedge merging. .. .................. .... .... ...... ...... .. ...... .. .. .. .. .. .. .. .. .... .... .. .... .. . 6 
Figure 3. Hyperedge merging by node relocation (a) before, (b) after. .... .. .. .. .. .. ... 9 
Figure 4. Group reloca.tion ........... : ........ :................................................................. 10 
Figure 5. Node swapping. ........................................................................................ 12 
· Figure. 6. The data transfer model. ............ .... .................. .......................... ............. 13 
Figure 7. Interchange based on FU-REG-FU data path. ...................................... 14 
Figure 8. Figure 8. The 17-step Elliptic Filter example: (a) Schedule, (b) 
Structure, and ( c) Variable and operation assignments. 20 
Figure 9. The 19-step Elliptic Filter example: (a) Schedule, (b) Structure, and 
( c) Variable and operation assignments. ........................... .... ............ ............. 21 
Figure 10. The lauoyt of a 16-bit 21-step Elliptic Filter example. 
Figure 11. The relationships between area costs and actual areas. 
Page ii 
22 
23 

LIST OF TABLES 
Table 1. The area results of the Elliptic Filter example. ..... .. .... .. .. .. .... .. .. .. .... .. ..... 23 
Table 2. Allocation results of the Elliptic Filter example. .................................... 24 
Page iii 

1. Introduction 
The purpose of high-level synthesis is to transform a behavioral description into a 
multistage register transfer design. Commonly, high-level synthesis consists of two phases: (i) 
scheduling and (ii) allocation. In the scheduling phase, operations are scheduled and assigned 
to the control steps to satisfy timing and resource constraints. In the allocation phase, 
operations are assigned to functional units, variables are assigned to storage elements, and 
required data transfers in each control step are assigned to interconnect units between 
functional units and storage elements. In this paper, we focus on the allocation problem. 
Typically, unit allocation is divided into three subtasks: (i) register allocation, (ii) operator 
assignment, and (iii) connection allocation. In the register allocation phase, all variables are 
bound to a set of registers. Variables with nonoverlapping lifetimes can share the same 
register. In the operation assignment phase, operations are assigned to functional units such 
that no more than one operation within the same control step is assigned to the same 
functional unit. In the connection allocation phase, the communication paths (busses and 
multiplexers) are chosen so that the functional units and registers are connected to perform the 
required data transfers. The primary objective of unit allocation is to minimize the total 
hard ware cost and to satisfy the design goal. 
Most systems perform three allocation subtasks separately (2,3,4,5,7 ,8,9,10]. Since these 
three subtasks are interdependent, the result of one subtask may prevent other subtasks from 
finding an optimal solution. Hence, some other systems (1,13,15,16,17] perform functional 
units, registers, and interconnect units simultaneously. Traditionally, the design quality is 
evaluated using the number of registers and selector (mux) inputs. However, since the 
relationships between the structural design and the physical design have not been established, 
the traditional design quality measurement may not reflect the real physical design. The 
Page 1 
approach described in this paper is different from previous approaches in three ways. First, we 
use a hypergraph model of design that facilitates the identification of sharable components and 
the calculation of interconnect costs. Second, using the hypergraph model, we formulate 
allocation as a partitioning problem. The partitioning problem is solved by using an 
interchange optimization · technique which performs the three interdependent subtasks 
simultaneously. Third, we use a new layout-based cost function (20] to predict the real 
physical design. Moreover, this approach allows trading off storages and interconnections. 
The remainer of this ·paper is organized as follows: Section 2 discusses the hypergraph 
model, the interconnect minimization technique, and the allocation algorithm. Section 3 
presents our experimental results. Finally, Section 4 summarizes our approach. 
2. The allocation problem 
The objective of allocation is to assign operations to functional units and to assign 
variables to storage elements such that the storage and interconnect costs are minimized for a 
given set of functional units and storage elements. In this section, we first describe the 
hypergraph formation and the layout-area cost function. Then, we describe the hyperedge 
merging technique for interconnect minimization. Finally, we present the complete allocation 
algorithm. 
2.1. Hypergraph formation 
Let G=(V,E) denote a data flow graph (DFG), where V={vf li=Ln} is a set of operation 
nodes, and E={e1m I j=l..n,m=l..n}, is a set of dependency edges. Z={zk I k=l..q} is a set of 
variables in the DFG, where each variable corresponds to a set of edges as zk={e1m I 
j = 1 .. n ,m= 1 .. n}. 
Page 2 
Let H=( V,E) denote a hypergraph, V= V;
0 
UV:,egUVop in which there are three types of 
hypernodes: (i) input/output, (ii) register, and (iii) operation. Y;
0
={vP I p=l..t} is a set of i/o 
hypernodes which denote the input and output ports in the DFG. V:.eg={vg I g=l..r} denotes a 
set of register hypernodes and v g={zk I k=l..q}. Each register hypernode denotes a register 
which contains a set of variables. Vop={vc I c=l..s} denotes a set of operation hypernodes and 
v c={vi I i=l..n}. Each operation hypernode denotes a single functional unit or a multi-
functional unit such as adder/subtracter, multiplier, shifter, or ALU. Each operation 
hypernode contains a set of operation nodes in the DFG. 
Let E={egc I {v g,v)E V'}, denote a set of hyperedges and w( egc) is the weight of egc· Each 
hyperedge denotes the physical connection between two hypernodes which can be two 
functional units, a functional unit and a register, or two registers. The weight of a hyperedge is 
the number of dependency edges between two hypernodes, which also tan be viewed as the 
number of variables (signals) communicating between two hypernodes. In addition, the 
hyperedge direction is depended on the data flow between hypernodes. For example, a 
hyperedge e12 is connected from v1 to v2 so that e12 is an outgoing hyperedge of v1 while e12 is 
an incoming hyperedge of v 2 • Since certain hypernode inputs are non-commutable, each 
hyperedge uses a flag to indicate the input position of the connecting hypernode. 
For example, three registers, one functional unit +, and one functional unit * are given to 
perform data transfer for the data flow graph and its schedule shown in Figure 1( a). The 
variable assignment is shown in Figure l(b ). Operations multl and mult2 are assigrred ·to the 
functional unit * and operations addl, add2, and add3 are assigned to the functional unit +. 
Figure 1( c) shows an example of hypergraph formulation from the data flow graph shown in 
Figure l(a). There is a set of input/output hypernodes V:
0
={vl'v2 ,v3 ,v9 ,v10}, and a set of three 
register hypernodes V:.eg={v4 ,v5,v6} where v4 and u5 consist of three variables, and v 6 consists of 
Page 3 
Page 4 
step 
0 
2 
3 
4 
• 
•47 i=(d,multl,Jeft) 
e 47=2=(d,mu1t2,1ert) 
R 1 
add2 R2 
add3 R3 
(a) 
(c) 
(d) 
a d h 
0-1 1-3 3-4 
b e 3~4 0-2 2-3 
c f 
0-2 2·3 
variable 
birth time • death time 
(b) 
e 58_ 1 =(b, add 1 ,rl ght) 
e 58_2 z(b,add2, left) 
e s8_3=(e,add3, I ert) 
Figure 1. Hypergraph formation: (a) Data fl.ow graph and schedule, 
(b) Variable assignments, (c) Hypergraph of (a), (d) Hypergraph after 
hyperedge megering. 
two variables: v 4={a,d,h}, v5={b,e,g}, and v6={c,f}. In addition, there are two operation 
hypernodes Vop={v 7 ,v8} where v7={multl,mult2}, v8={addl,add2,add3}, type(v 7)=multiplier, and 
At the time of hypergraph formation, each dependency edge is mapped to a pair of 
hyperedges. The process of hyperedge mapping from dependency edge consists ·of two parts: (i) 
from source node to register and (ii) from register to destination node. For example, e11 in 
Figure l(a), is mapped to: (i) e14 from v1 to v 4 and (ii) e48 from v4 to v8 in Figure l(c). 
After forming the hypergraph, the algorithm performs hyperedge merging to reduce 
interconnect cost. Hyperedge merging is important because it contributes to interconnect 
sharing and selector reduction, which will be described in the next section. Before merging, 
each hyperedge is labeled as a right data input or a left data input as shown in Figure 2. 
Hyperedge merging consists of two cases: (i) merging hyperedges on hypernodes with 
commutable inputs and (ii) merging hyperedges on hypernodes with non-commutable inputs. 
In both cases, two hyperedges can be merged if and only if: (i) they have same source and 
destination hypernodes, and (ii) they are entering the same input of the hypernode. 
Consider Figure 2(a). If e1(opl_right) and e2(op2_right) are connected from z3 and z4 of 
the riode reg3 to right inputs of opl and op2 respectively, E\(opl_right) is mapped to el' and 
~( opl_right) is mapped to e2 • Since e1 and e2 are connected to the right input of FU, they can 
be merged. In Figure 2(b ), if the operation hypernode inputs are commutable, two hyperedges 
e1(opl_right) and e2(op2_left) can be merged when they enter different inputs of the operation 
nodes (e1 enters the right input of opl and e2 enters left input of op2). Therefore, if the 
operation hypernode inputs are commutable, the hyperedges e1(opl_left) and e2(op2_right) can 
be commuted first and then merged in to one hyperedge. On the other hand, in Figure 2( c) 
e1(opl_right) enters the right input of opl and t'.2(op2_left) enters the left input of op2. If the 
Page 5 
Page 6 
I 
' ' 
t t 
~ ~ 
(a) 
reg1 
FU t 
Commute 
and 
Merge 
(b) 
(c) 
Reg 
(d) 
Figure 2. Hyperedge merging. 
operation hypernode inputs are not commutable, then these two hyperedges can not be 
merged. Figure 2( d) shows that each register hypernode has only one input, the incoming 
hyperedges from the same operation hypernode of a register hypernode can be merged. 
By applying hyperedge merging to the example of Figure 1( c ), two hyperedges 
e4u =( d,multl,left) and e4u=( d,mult2,left) can be merged into e47 with w( e47 )=2 as shown in 
Figure l(d). Since operation hypernode v 8 is an adder with commutable inputs, three 
hyperedges e58_1, e5u and· e58-3 cap. be merged into e58 with w( e58)=3. 
2.2. Layout-area cost function 
Using the hypergraph model, we first describe the interconnect cost for each hypernode in 
term~ of the number of selectors and· the .number· of inputs of selectors. We use a single-level 
interconnect model. . If a hypernode has more than one incoming hyperedge on one of its input 
ports, then this input port needs a selector to select one input from several sources. This 
selector can be implemented with a mux or a bus with tri-state devices. For example, the left 
input port of FU in Figure 2(b) has two incoming hyperedges from two registers regl and reg3. 
Thus, a two-input selector is needed for this input port. On the other hand, if an input port 
has only one incoming hyperedge, then a selector is not needed. Such a hyperedge is call.ed the 
minimal-cost hyperedge. For example, the right input port of FU in Figure 2(b) has a minimal 
cost incoming hyperedge from reg2 so that this input port does not need a selector. 
Furthermore, when an input port needs a selector, the number of inputs of this selector is equal 
. to the number of incoming hyperedges on this input port. 
To take into account the physical design effects, the area cost function is based on a bit-
sliced stack architecture [11,18,19], which uses abutment to connect different bit slices, and 
over-the-cell routing for connecting different units inside one bit slice. The stack grows 
horizontally when the bit-width increases, and grows vertically when the number of units 
Page 7 
increases. Using this layout architecture, the total area cost is the sum of four parts [20]: (1 ). 
Functional unit area, (2). Register area, (3). Interconnect unit area, and (4). Wiring area. We 
use the transistor counts as a function of the area consumptions. The transistor counts of the 
functional units and registers can be estimated by examing the component library [18]. The 
number of transistors in a selector is proportional to the number of inputs of the selector which 
also can be obtained from the component library. We use the sliced layout architecture [11] 
which has 13 over-the-cell routing. tracks for each bit~slice. If the required routing tracks are 
less than 13 then the wiring area is not needed. The overall area cost is calculated as follows: 
n m p 
Atotal = c1 ( ~trs(FUk) + ~trs(Regj) + ~trs(Seli)) + Awire 
trs(FUk) is the number of transistors ill functional unit k; 
trs(Reg.) is the number of transistors in register j; 
trs( Seij is the number of transistorn in selector i; 
c1 is the transistor area coefficient (area/per transistor) which correlates to the layout 
technology and the layout system; 
A . is the wiring area. 
wire 
2.3. Interchange optimization 
In the interchange optimization phase, the algorithm minimizes the interconnect cost by 
merging hyperedges. In the following sections, we first describe two possible ways to merge the 
hyperedges by interchanging variables and operations: (i) relocation and (ii) swapping. Then, 
we discuss the interchange technique under a global consideration. 
2.3.1. Hyperedge n:ierging by node relocation 
The first possible way for hyperedge merging is to relocate variables among register 
hypernodes or operations among operation hypernodes. A variable zk can be relocated from a 
source register hypernode v to a destination register hypernode v d t if and only if 
reg_source reg_ es 
vreg_dest is free during the life time of zk. An operation vi can be relocated from a source 
Page 8 
v2 
e13 
...-.------
selector 
V3 
{a) 
v1 Qv2 v1 w 
e123 
V3 V3 
{b) 
Figure 3. Hyperedge ~rging by node relocation (a) before, (b) after. 
opera ti on hypernode v op_source to a destination hypernode v op_dest if and only if: (i) v op_dest can 
performs the same functions as vi' and (ii) there does not exist an operation vf in v op_de.st such 
that vj is assigned to the same control step as v/s. We term the above conditions relocation 
preconditions. The node relocation can be performed if and only if the relocation preconditions 
are satisfied that is called a feasible relocation. 
For a hypernode, the interconnect cost of this hypernode can be reduced by merging its 
incoming hyperedges. For example, consider the register hypernode v 3 in Figure 3. v 3 has two 
incoming hyperedges e13 and e23 from v1 and v2 respectively. Since v 3 has to select one input 
Page 9 
v 
2 
v 
2 
v 
2 
Page 10 
v 
1 
v 
3 
v 
1 
(a) 
I 
1---...... 
I merge 
6 
(b) 
e I le 
13(1) I split I 13(2) 
~~d~ 
(c) 
Figure 4. Group relocation. 
from two sources v1 and v 2 , a 2-input selector is required for v 3 (Figure 3(a)). If node b of v2 
can be moved to v1 then e13 and e23 can be merged into e123 (Figure 3(b )). As a result, v3 does 
not need a selector for its input. The hyperedge merging in this case results in interconnect 
reduction. 
The node relocation also allows group relocation, which relocates more than one node 
simultaneously. For example, in Figure 4(a), e13 can be merged with e14 by relocating a group 
of nodes v3 and v4 from v 3 to v 4 (Figure 4(b) ), or splitting and relocating to different 
hypernodes as shown in Figure 4( c). 
2.3.2. Hyperedge IIErging by node swapping 
The second possible way for the hyperedge merging is ·to swap the variables bBtween 
register hypernodes or to swap the operations between operation hypernodes. Node swapping 
can be viewed as a two-way node relocation problem. IN the node relocation, the node 
relocation is performed as relocating nodes from the same source hypernode to one or more 
destination hypernodes if there exists a feasible relocation. On the other hand, node swapping 
is performed when a one-way feasible relocation from a source hypernode to a destination 
hypernode can not be found; but a feasible relocation can be created by rearranging the nodes 
in the destination hypernode. For example, in Figure 5(a), assuming v2 and v3 are the 
operation hypernodes, e12 and e13 can be merged by relocating v5 from v3 to v2• If v3 in v2 is 
assigned to the same control step as v5 's, then v5 can not be relocated from v3 to v2• However, 
e12 and e13 can be merged by swapping v3 and v5 as shown in Figure 5(b ). 
2.3.3. Interchange under global considerations 
In general, a data path can be viewed as follows: a functional unit fetches data from a set 
of storage elements via a set of interconnect units, performs the data computations, then stores 
Page 11 
from other 
node 
(a) 
(b) 
Figure 5. Node swapping. 
the data back to the storage elements via a set of interconnect units as shown in Figure 6( a). 
Thus, a data path forms a closed loop relationship among the storages, functional units, and 
interconnect units. Because of this closed loop relationship, reducing the interconnect cost for 
a register or a functional unit by rearranging the variables or operations might increase the 
interconnect cost for other functional units or registers so that the total interconnect cost is 
unchanged or even increased. 
To take into account the interdependent relationships between operation and register 
assignments, our interchange algorithm evaluates the node rearrangements by cutting the data 
Page 12 
interconnect 
unit 
storage 
unit 
Interconnect 
unit 
(a) (b) (c) 
Figure 6. The data transfer model. 
path loop into two basic forms: (i) register-functional unit-register (Figure 6( c)) and (ii) 
functional unit-register-functional unit (Figure 6(b )). In case (i), when the algorithm tries to 
reduce the interconnect cost of a functional unit node by rearranging the operations, the 
algorithm will take into account the register interconnect cost which will be affected by the 
node rearrangement. In case (ii), when the algorithm tries to reduce the interconnect cost of a 
register node by rearranging the variables, the algorithm will take in to account the functional 
unit interconnect cost, which also will be affected by the node rearrangement. 
Page 13 
Based on this global consideration, the algorithm tries to rearrange variables in the 
registers and operations in the functional units from a global scope so that the hyperedges can 
be merged, which results in selector inputs reductions. An example is shown in Figure 7(a). 
e36 and e46 can be merged by relocating variable z2 from v 3 to v 4 so that the number of selector 
inputs of v6 is reduced by 1. However, after relocating z2 from v 3 to v4 , e13 has to be split into 
two hyperedges e13 and ew as shown in Figure 7(b ), so that the number of selector inputs of v 4 
is increased by 1. Thus, the total number of selector inputs are unchanged. However, if there 
FU V5 
(a) (b) 
FU V5 
(c) 
Figure 7. Interchange based on FU-REG-FU data path. 
Page 14 
is a feasible relocation of moving v2 from v1 to v2 , and this node relocation will not increase the 
overall selector inputs, then the algorithm finds a solution to achieve overall interconnect 
reduction. As a result, the algorithm relocates z2 and v2 to v 4 and v2 respectively so that the 
overall interconnect cost is reduced (Figure 7(c)). 
2.4. The overall algorithm 
We assume that allocation is performed after scheduling with the available functional 
units and control time steps given. The algorithm consists of three phases: (i) initial register 
and operation assignments, (ii) hypergraph formation, and (iii) interchange optimization. 
In the first phase, the algorithm determines the life time for all vari_ables and sorts them in 
descending mder according to their life. -spaTI:.· The algorithm then determines the lower bound 
of required registers by using a left edge algorithm (6). Starting with the minimal number of 
required registers, the algorithm assigns operations to the given functional units randomly such 
that no more than one operation within the same control step will be assigned to the same 
functional unit. In the second phase, the algorithm transforms the data flow graph into a 
hypergraph. 
In the final phase, the algorithm minimizes the interconnect cost by interchanging the 
operations and the variables. The algorithm minimizes the incoming hyperedges of register 
hypernodes by relocating variables between register hypernodes. In addition, the algorithm 
minimizes the incoming hyperedges of operation hypernodes by relocating operations between 
operation hypernodes. A hypernode that has more than one incoming hyperedges is called a 
feasibly mergeable hypernode, and the incoming hyperedges of this hypernode are called 
feasibly mergeable hyperedges. For a feasibly mergeable hyperedge, the algorithm locates a set 
of variables or operations associated with this hyperedge, rearranges the nodes, and evaluates 
the area cost using the cost function described in the previous section. If a lesser area cost is 
Page 15 
obtained, then a feasible merging solution has been found. The algorithm performs the 
allocation iteratively by incrementing the number of registers to trade-off the register and 
interconnect costs. For each allocation iteration, the algorithm runs repeatedly until no more 
improvement can be found. 
Algorithm I Hardware Allocation 
Let 
reg_connt denote a given arbitrary number; 
P={eh I h= 1..w} denote a set of feasible merging hyperedges; 
Z={zk I k = L.q} denote a set of variables; 
F denote a set of given functional units; 
R={ra I a=l..b} denote a set of registers; 
T denote a set of control steps for {vi Ii= 1..n} in the data flow graph; 
Hardware_Allocation( G,F,Z, T,reg_connt ){ 
/*determine the lower bound of required registers*/ 
R = left_edge_alg(Z); 
} 
while ( reg_count > 0 ){ 
/*initial register and operation assignments*/ 
init_reg_op_assignment( G,T,F,R); 
/*build hypergraph* / 
H = build_hyv'rgraph(G,F,R); 
/*interchange optimization*/ 
no_more_improve = FALSE; 
while ( no_more_improve = FALSE){ 
} 
P = locateJeasible_merging_hyperedge(H); 
a_gain_merging = FALSE; 
for (h = 1 to w){ 
} 
/*relocate nodes associated with hyperedge eh and evaluate the area cost*/ 
gain = relocate_node( eh); 
if (gain = TRUE){ 
} 
rearrange_node(H,eh); 
a_gain_merging =TRUE; 
if ( a_gain_merging = FALSE) 
no_more_improve = TRUE; 
Output netlist and total area cost; 
/*incrementing register for next allocation run*/ 
. reg_count = reg_count - 1; 
if (reg_ count > 0) 
R= RU {r}; 
Complexity analysis. Since the algorithm performs registers and selectors trade offs in several 
Page 16 
runs (outer while loop), we consider only one allocation run which consists of three parts: 
( 1) Using the left edge algorithm, it takes 0( rnlogm) time to sort variables, and it takes 0( m) 
time to assign variables, where m is the number of variables in the data flow graph. 
Therefore, it takes O(rnlogm) to determine the lower bound of registers required for 
performing data transfer. 
(2) The initial variable assignment takes O(m) time. The initial operation assignment takes 
O(n) time. In addition, it takes 0(2m+q) time to build the hypergraph, where n is the 
number of operations, m is the number of edges in the data flow graph, and q is the 
number of hypernodes. 
(3) In the interchange optimization procedure, it takes O(pq) time to locate feasible merging 
hyperedges, where pis the number of hyperedges. For each feasible merging hyperedge, it 
takes O(r) time to locate a set of nodes, rearranging nodes takes O(rq) time, and area 
estimation takes constant time, where r is· the average number of variables or operations 
associated with the feasible merging hyperedge. Thus, each interchange optimization loop 
takes O(prq) time. In our experience, the local optimal (no_more_improve) state can be 
achieved by less than 20 iterations (interchange optimization while loop). 
3. Results and Discussions 
We ·have impleme.rited the previously · de_s.cribe~l · aigorithni using 'the· ·c programming 
language on SUN4 workstations under the UNIX operating system. We have tested our 
algorithm on the elliptic filter benchmark with different control steps (19-step with 2-adder and 
1-piped multiplier, 21-steps with 2-adder and 1-multiplier, 19-steps with 2-adder and 2-
multiplier, and 17-steps with 3-adder and 2-piped-multiplier) collected from literatures [2,10,11] 
in order to compare our results to previously published results. Due to the paper length 
limitation, we only show the schedule, variable and operation assignment, and structure results 
for one 17-step and one 19-step examples which are shown in Figures 8 and Figures 9 
respectively. The layouts were generated by [12]. Figure 10 shows a 16-bit Elliptic Filter 
example with 21-step and 10 registers. 
We use the ~ingle-level multiplexer model described in Section 2.2. Moreover, we did not 
apply multiplexer optimization procedures, such as multiplexer merging. Table 2 (a), (b), (c), 
and ( d) show the area results for 17-step, 19-step, 21-step, and 19-step with 2-adder and 1-
piped multiplier designs respectively. Since the areas of multipliers for each design are same, 
Page 17 
alg/system c-step #OP #Reg. #Mux #Reg. #Mux #Reg. #Mux #Reg. #Mux Time(s) im_s l/ps i/ps i!J!s 
Ours 17 2*p,3+ 10 34 11 33 12 31 13 33 5.1 
HAL 17 2*p,3+ 10 nla 11 n/a 12 31 13 n/a 120-480 
CATREE 17 2*p,3+ 10 n/a 11 n/a 12 38 13 n/a n/a 
REAL 17 2*p,3+ 10 50 11 n/a 12 n/a 13 n/a n/a 
ELF 17 2*p,3+ 10 n/a 11 28 12 n/a 13 n/a n/a 
ASYL 17 2*p,3+ 10 38 11 25 12 24 13 n/a nla 
Ours 19 2*,2+ 10 30 11 28 12 28 13 29 1.1 
HAL 19 2*,2+ 10 n/a 11 n/a 12 29 13 n/a 120-480 
SAW 19 2*,2+ 10 n/a 11 n/a 12 32 13 n/a n/a 
MIMOLA 19 2*,2+ 10 n/a 11 30 12 n/a 13 nla n/a 
REAL 19 2*,2+ 10 39 11 n/a 12 nla 13 n/a n/a 
ELF 19 2*,2+ 10 n/a 11 30 12 n/a 13 n/a n/a 
ASYL 19 2*,2+ 10 33 11 31 12 28 13 n/a n/a 
Ours 19 1*p,2+ 10 36 11 28 12 26 13 23 2.0 
HAL 19 1*p,2+ 10 nla 11 n/a 12 26 13 n/a 120-480 
REAL 19 1*p,2+ 10 35 11 n/a 12 n/a 13 n/a n/a 
ELF 19 1*p,2+ 10 n/a 11 30 12 n/a 13 n/a n/a 
ASYL 19 1*p,2+ 10 30 11 28 12 26 13 n/a n/a 
Ours 21 1*,2+ 10 30 11 27 12 28 13 31 2.6 
HAL 21 1*,2+ 10 n/a 11 n/a 12 31 13 n/a 120-480 
SPLICER 21 1*,2+ 10 n/a 11 n/a 12 n/a 16 35 >50 
V-\RYL-L YRA 21 1*,2+ 10 30 11 n/a 12 nla 13 n/a 0.65 
ELF 21 1*,2+ 10 n/a 11 24 12 n/a 13 n/a n/a 
ASYL 21 1*,2+ 10 22 11 21 12 n/a 13 n/a n/a 
*:multiplier, *p: piped multiplier, +:adder. 
Table 2. Allocation results of the Elliptic Filter example. 
Page 24 
6. References 
[1] S. Devadas and A. R. Newton, "Algorithms for Hardware Allocation in Data Path 
Synthesis," IEEE Trans. on Computer-Aided Design, vol. CAD-8, no. 7, pp. 768-781, 
1989. 
[2] E. Dirkes Lagnese and D. E. Thomas, "Architectural Partitioning for System Level 
Design," Proc. 26th DAG, pp. 62-67, 1989. 
[3] B. S. Haroun and M. I. Elmasry, "Architectural Synthesis for DSP Silicon Compiler," 
IEEE Trans. on Computer-Aided Design, vol. 8, no. 4, pp.431-447, April 1989. 
[4] Huang, C.Y., Chen, Y.S. et. al., "Data Path Allocation Based on Bipartite Weighted 
Matching", Proc. 27th DAG, pp. 499-504, June 1990. 
[5] B. Pangrle, and D. G_ajski, "Design Tools for Intelligent Silicon Compilation", IEEE 
Trans. on Computer-Aide.d Design, vol. CAD~6 no. 6, Nov. 1987. 
[<?] F. J. Kurdahi and A. C. Parker, "REAL: A Program for REgister ALlocation," Proc. 
24th DAG, pp. 210-215, 1987. 
[7] A. C. Parker, J. Pizarro and M. Mlinar, "MAHA: A Program for Datapath Synthesis," 
Proc. 23rd DAG, pp. 461-466, 1986. 
[8] P. G. Paulin, J. P. Knight and E. F. Girczyc, "HAL: A Multi-Paradigm Approach to. 
Automatic Data Path Synthesis," Proc. 23rd DAG, pp. 263-270, 1986. 
[9] P. G. Paulin,· ~md J. Knight, "Force~Directed· Scheduling for llie Behavforal Synthesis 
of ASICs", IEEE Trans. on Computer-Aided Design, vol. CAD-8 no. 6, June 1989. 
[10) C. J. Tseng and D. P. Siewiorek, "Automated Synthesis of Data Path in Digital 
Systems," IEEE Trans. on Computer-Aided Design, vol. CAD-5, no.3, pp. 379-395, 
1986. 
[11] Lawrence L. Larmore, D. D. Gajski, and Allen C-H Wu, "Layout Placement for Sliced 
Architecture," IEEE Trans. on Computer-Aided Design, to appear. 
[12] Allen C-H Wu, G. D. Chen, and D. D. Gajski, "Silicon Compilation from Register-
transfer Schematics," Proc. ISCAS, 1990. 
[13) M. Balakrishnan and P. Marwedel, "Integrated Scheduling and Binding: A Synthesis 
Approach for Design Space Exploration," Proc. 26th DAG, pp.68-74, 1989. 
[14] T. A. Ly, W. L. Elwood, and E. F. Girczyc, "A Generalized Interconnect Model for 
Data Path Synthesis," Proc. 27th DAG, pp.168-173, 1990. 
[15] A. Mignotte and G. Saucier, "A Generalized Model for Resource Assignment," Fifth 
International Workshop on High-Level Synthesis, pp.37-43, 1991. 
[16] K. Kucukcahar and A. C. Parker, "Data Path Tradeoffs Using MABEL," 4th 
International Workshop on High-Level Synthesis," 1990. 
[17] C. Hitchcock and D. Thomas, "A Method of Automatic Data Path Synthesis," Proc. 
20th DAG, pp.484-489, 1983!. 
[18] "Data Path Library," VLSI Technology, INC., 1988. 
[19] R. Jamier and A. Jeraya, "APPLON: A Datapath Compiler," Proc. ICCD, 1985. 
[20] Allen C-H Wu, Viraphol Chaiyakul and D. D. Gajski, "Layout Models for High-Level 
Synthesis," Tech. Rpt. #91-31, ICS Dept., UC Irvine, 1991. 
[21] C. H. Gebotys and M. I. Elmasry, "VLSI Design Synthesis with Testability," Proc. 25th 
DAG, pp.16-21, 1988. 
Page 25 
...I L 
~Ill Ill Ill I II Ill 111111 I Ill I Ill I I Ill I Ill II 11111 I I Ill Ill I Ill II~ 
3 1970 00882 4366 
