Floorplanning and Topology Generation for Application-Specific
  Network-on-Chip by Yu, Bei et al.
Floorplanning and Topology Generation for Application-Specific
Network-on-Chip
Bei Yu, Sheqin Dong
Department of Computer Science & Technology
TNList∗
Tsinghua University, Beijing, China
Song Chen, Satoshi GOTO
Graduate School of IPS
Waseda University, Kitakyushu, Japan
Abstract— Network-on-Chip(NoC) architectures
have been proposed as a promising alternative to clas-
sical bus-based communication architectures. In this
paper, we propose a two phases framework to solve
application-specific NoCs topology generation prob-
lem. At floorplanning phase, we carry out partition
driven floorplanning. At post-floorplanning phase, a
heuristic method and a min-cost max-flow algorithm
is used to insert switches and network interfaces. Fi-
nally, we allocate paths to minimize power consump-
tion. The experimental results show our algorithm is
effective for power saving.
I. Introduction
Network-on-Chip(NoC) architectures have been pro-
posed as a promising alternative to classical bus-based
and point-to-point communication architectures when the
CMOS technology entered the nanometer era [1, 2, 3]. In
NoCs, the communication among various cores is achieved
by on-chip micro-networks components(such as switch
and network interface) instead of the traditional non-
scalable buses.
Comparing with bus-based architectures, NoCs have
better modularity and design predictability. Besides,
the NoC approach offers lower power consumption and
greater scalability.
NoCs can be designed as regular or application-specific
network topologies. For regular Noc topology design,
some existing NoC solutions assume a mesh-based NoC
architecture [4, 5], and their focus is on the mapping prob-
lem. For application-specific topology design, the design
challenges are different in terms of irregular core sizes,
various core locations, and different communication flow
requirements [6, 7, 8, 9, 10]. Most SoCs are typically
composed of heterogeneous cores and the core sizes are
highly non-uniform. An application-specific NoCs archi-
tecture with structured wiring, which satisfies the design
objectives and constraints is more appropriate. In this pa-
per, we focus on synthesis problem of application-specific
NoCs architecture.
∗Tsinghua National Laboratory for Information Science and
Technology
Network components, such as switches and network in-
terfaces(NI), consume area and power. The area con-
sumption of these network components should be consid-
ered during topology generation. Besides, power efficiency
is one of the most important concerns in NoCs architec-
ture design. Many characteristics influence NoCs power
consumption: total wirelength; communication flow dis-
tributions and path choosing. In this paper, we propose
a methodology to design the best topology that is min-
imize power consumption of interconnects and network
components.
There are a number of works addressing NoCs topology
generation. In [6], a novel NoC topology generation algo-
rithms were presented, however their solutions only con-
sider topologies based on a slicing structure where switch
locations are restricted to corners of cores. In [7], Murali
et al. proposed a two steps topology generation proce-
dure using a min-cut partitioner to cluster highly com-
municating cores on the same switch and a path alloca-
tion algorithm to connect the clusters together. In [9],
Chan et.al. presented an iterative refinement strategy to
generate an optimized NoC topology that supports both
packet-switched networks and point to point connections.
In most of the previous works, system-level floorplan-
ning tool is used only estimates the area and the wire
lengths. Partition is carried out at pre-floorplanning, so
physical information such as the distances among cores
are not able to be taken into account. Besides, area of
switches and network interfaces are not consider during
topology generation.
In this paper, we integrate partition into floorplanning
to make use of physical information such as the length
of interconnects among cores. At post-floorplanning op-
timization, a heuristic method is used to insert switches
and a min-cost max-flow algorithm is used to insert net-
work interfaces. Finally, we allocate paths to minimize
power consumption.
The remainder of this paper is organized as follows.
Section 2 defines the partition driven floorplanning prob-
lem. Section 3 presents our algorithm flow. Section 4
reports our experimental results. At last, Section 5 con-
cludes this paper.
1
ar
X
iv
:1
40
2.
24
62
v1
  [
cs
.A
R]
  1
1 F
eb
 20
14
v1
v2
v3
v4
v5
v6
v1
v2
v3
v4
v5
v6
(a) CCG (b)  partition (c)  Corresponding SCG
p1
p2
p3
.
Fig. 1. CCG and SCG examples. (a)A simple CCG. (b)CCG is
partitioned based on communication requirements and related
positions. (c)Corresponding SCG.
II. Problem Formulation
Definition 1 (Core Communication Graph(CCG))
The core communication graph is a directed graph,
G¯ = (V¯ , E¯) with each vertex vi ∈ V¯ representing a
core and the edge eij representing the communication
requirement between the core vi and vj. The weight of
edge eij is denoted as wij.
Definition 2 (Switch Communication Graph(SCG))
The switch communication graph is a directed graph,
G = (V,E) with each vertex vi ∈ V representing a
switch, and the directed edge eij = {vi, vj} ∈ E denotes a
communication trace from vi to vj.
A simple CCG with six cores is shown in Fig.1(a). Af-
ter partition, corresponding SCG with three switches are
generated as shown in Fig.1(c).
Definition 3 (Cluster Bounding Resource) The
cluster bounding resource of a cluster is evaluated by the
half perimeter wirelength of the minimal bounding box
enclosing the cluster.
Problem 1 (NoCs Topology Generation) The topol-
ogy generation problem can be defined as follows: given
a set of n cores C = {c1, c2, . . . , cn}, a switches num-
ber constraint m, a core communication graph(CCG) and
network components power model, find an NoC topology
that satisfies several objectives: minimize area consump-
tion of cores and network components(m switches and n
network interfaces); minimize the communication energy.
Cores with more communication requirements are in-
cline to be assigned into same cluster to minimize com-
munication energy. Relative positions of cores should be
considered during partition to minimize area consump-
tion. Besides, positions of network components, such as
switches and network components, should be taken into
account to minimize interconnect length. Finally, the ac-
tual physical connections between switches are established
to find paths minimizing traffic flows energy across the
switches.
Path Allocation
Floorplanning Post-Floorplanning
Core
Size
CCG
Generate new floorplan
Partition
Evaluation
Stop?No
Switches Insertion
Network Interfaces Insertion
Optimized Floorplan
Yes
.
Fig. 2. Topology Synthesis Algorithm Overall
III. Topology Synthesis Algorithm
As shown in Fig.2, the algorithm flow consists of
two phases: (I)partition driven floorplanning, (II)post-
floorplanning optimization.
In Phase I, we integrate partition into floorplanning.
When generate a new packing, we carry out partition to
assign each core into one cluster. Partition should con-
sider not only communication requirements among cores
but also physical information of cores.
In Phase II, in switches insertion, a heuristic method
is adopted to calculate every switch’s position in white
space. In network interfaces insertion, we present a Min-
Cost Max-Flow based method to insert each NI in white
space. Finally, an effective incremental path allocation
method is proposed to minimize power consumption.
A. Partition Driven Floorplanning
Traditionally, floorplanning tool is only used to evaluate
the wire lengths between each cores and switches. And
partition is carried out before floorplanning, so physical
information such as the distances among modules are not
able to be taken into account during partition.
In this paper, we integrate partition into floorplanning
phase. During floorplanning, after generating a new chip
floorplan, we can estimate the interconnect length be-
tween module i and module j, denoted as lenij . Given core
communication graph(CCG) and switches number con-
straint m, partition assign cores into m min-cut clusters.
Those cores with larger communication requirements and
less distances are assigned to the same cluster and hence
use the same switch for communication. On the one hand,
cores with larger communication requirements are more
incline to cluster together to minimize interconnect power
consumption. On the other hand, cores with less distances
should be cluster to minimize cluster bounding resource.
The partitioning is done in such a way that the edges of
the graph that are cut between the partitions have lower
weights than the edges that are within a partition and the
number of vertices assigned to each partition is almost the
same. In partition, we define new edge weight w′ij in CCG:
w′ij = αw ×
wij
max w
+ αd × mean dis
disij
(1)
(a)
c1 c2
c3
c4
1 2 3 4
8
96
107
(b)
c1 c2
c3
c4
1 2 3
8
96
107
Sw2
sw1
5
sw1
sw2
.
Fig. 3. A floorplan with four cores, in which white spaces are
divided into grids(label from 1 to 10). (a)Core c1 and c3 are
partitioned into one cluster and c2 and c4 are partitioned into
another cluster. Two dots are initial positions of two
switches. (b)Switches are assigned to grids one by one and finally
two switches sw1 and sw2 have decided their positions.
where wij denotes communication requirement between
core i and core j, disij denotes distance between core i and
j, max w is the maximum communication requirement
over all flows and mean dis is average distance among
cores.
During floorplanning, we use CBL[12] to represent ev-
ery floorplan generated. CBL is a topological representa-
tion dissecting the chip into rectangular rooms. The cost
function in simulated annealing is:
Φ = λAA+ λFF + λRR (2)
where A represent the floorplan area; F represents the to-
tal communication amount between clusters; and R rep-
resents the sum of all cluster bounding resources. The
parameters λA, λF and λR can be used to adjust the rel-
ative weighting between the contributing factors.
B. Switches Insertion
Once a floorplan with m clusters P = {p1, p2, . . . pm} is
obtained, the next step is to find the latency and power
consumption on the wires. In order to do this, the posi-
tion of the switches needs to be determined. Each clus-
ter has one switch and communication among clusters
are through switches. We denote the set of switches as
SW = {sw1, sw2, . . . swm}, and switch swk belongs to
cluster pk. Due to the restriction that switches cannot
be placed on a core, the location must be within a white
space.
We partition the dead space into grids and each grid
provides sites for switches insertion. Then a heuristic
method is proposed to insert each switch into one grid(as
shown in Fig. 3).
The minimal bounding box enclosing cluster pk is de-
fined as Bk. For switch swk, its candidate grids are the
free grids inside Bk. For example, in Fig. 3(a), cluster
p1 includes core c1 and core c2, and switch sw1’s candi-
date grids are label from 1 to 4. Switch sw2’s candidate
grids are those label 5, 6, 8, 9. Initially, each switch swk is
located in the center of cluster’s bounding box.
c1 c2
c3
c4
1 2 3
8
96
107
(a)
Sw2
sw1
(b)
s t
ni1
ni2
ni3
ni4
1
2
3
6
7
8
9
10
l-box of c3
Fig. 4. A simple example of network interfaces insertion. (a)When
l sets as the width of grid, l-box of core c3 includes five free
grids(label 1, 2, 3, 6, 7). (b) Corresponding network flow model.
For switch swk, its communication requirement is de-
fine as follow:
flowk =
∑
i,j
wij ,∀eij ∈ E¯ & i ∈ pk & j /∈ pk (3)
where i ∈ pk means core i is assigned to cluster pk.
We sort switches by their communication requirements,
and assign each switch into one of its candidate grids one
by one. If one free grid(label g) is candidate grid of switch
swk, then the insertion cost Costgk is defined as follow:
Costgk =
∑
i,j
wij×(disgi+disgj),∀eij ∈ E¯ & i ∈ pk & j /∈ pk
(4)
where disgi is the distance from grid g to core i. Each
switch chooses one of the candidate grids with lest inser-
tion cost to insert. As shown in Fig. 3(b), sw1 inserts
into grid 4 and sw2 inserts into grid 5.
C. Network Interfaces Insertion
After switches insertion, every switch is assigned a grid
in white space. Then we carry out minimum cost flow
based network interfaces insertion to assign each NI into
one grid. We define set of Network Interfaces as NI =
{ni1, ni2, . . . nin}, where n is number of cores. Each core
ck needs one network interface nik to connect to switch.
Definition 4 (l-bounding box) Given a core ck,
whose width is widk and height is heik. The l-bounding
box of ck is Blk, which has the same centric position.
Besides, width of Blk is (widk + 2 × l) and height is
(heik + 2× l) (as shown in Fig.4(a)).
For each core ck, we construct its l-bounding box. The
free grids in the l-bounding box are ck’s candidate grids,
denoted as CGk.
We construct a network graph G∗ = (V ∗, E∗), and then
use a min-cost max-flow algorithm to determine which
grid each network interface belong to. A simple example
is shown in Fig.4.
• V ∗ = {s, t} ∪NI ∪Grids.
TABLE I
Notation used in Path Allocation
tij power consumption to connect eij .
Pre(i) {vk|∀vk ∈ V & eki ∈ E}
Post(i) {vk|∀vk ∈ V & eik ∈ E}
dise(i, j, d) minimum distance from node vi to vd
while edge eij is used.
disn(i, d) minimum distance from node vi to vd.
path(i, d) denote which node vi connect to go to vd.
• E∗ = {(s, nik)|nik ∈ NI} ∪ {(nik, gj)|∀gj ∈
CGk} ∪ {(gj , t)|gj ∈ Grids}.
• Capacities:
C(s, nik) = 1, C(nik, gj) = 1, C(rj , t) = 1.
• Cost: F (s, nik) = 0, F (gj , t) = 0;F (nik, gj) = Fkj .
where Fkj equals to distance from grid j to switch swk.
Network Interfaces insertion can be solved effectively
by minimum cost flow algorithm(run in polynomial
time[14]).
D. Energy Aware Path Allocation
After switches insertion, we use dynamic programming
based method for path allocation to minimize power as-
sumption.
Given switch communication graph(SCG) G = (V,E)
representing communication requirement among switches.
The communication requirement of eij ∈ E denoted as
wsij :
wsij =
∑
∀a∈pi
∑
∀b∈pj
(wab + wba) (5)
where pi is cluster i and wab is communication require-
ment from core ca to core cb.
We denote nodes in SCG as v1, v2, . . . , vm, where m
is the number of switches. We assume SCG only exists
directed edge eij that i < j because eij represents both
communication from switch swi to swj and swj to swi.
As shown in Table I, we define set Pre(i) as vi’s front-
end nodes and Post(i) as vi’s back-end nodes. We also
define two kind of distance dise(i, j, d) and disn(i, d). Be-
sides, path(i, d) denotes which node vi should connect to
go to vd. We use the following ways to solve dise, disn
and path:
dise(i, j, d) =
{
tid, j = d & i ∈ Pre(d)
tij + disn(j, d), otherwise
(6)
disn(i, d) =
{
0, i = d
mink dise(i, k, d), ∀k ∈ Post(i) (7)
path(i, d) = j, ∀j s.t. dise(i, j, d) = disn(i, d) (8)
(a) (b)
(c) (d)
1
2
3
4
5
6
7
2
3
7
4
2
5
2
5
3
2
26
4
1
2
3
4
5
6
7
8
10
9
6
9
7
6
7
2
28
4
1
2
3
4
5
6
7
8
10
9
6
9
15
615
2
2->1016
4
1
2
3
4
5
6
7
8
8
9
6
7
7
4
7
2
28
4->2
7
5 7
Fig. 5. A simple example of paths allocation with seven
switches. (a)Initial network, the value on each edge eij is tij .
(b)After InitSolve(7), the value on each edge eij is dise(i, j, 7)
and each bold edge eij means path(i, 7) = j. (c)Compare with (b),
t67 decreases from 4 to 2, update some edges(labeled as dotted
arrows). (d)Compare with (b), t57 increases from 2 to 10, update
some edges(labeled as dotted arrows).
Algorithm 1 InitSolve(d)
1: //Given d, solve all dise(i, j, d) and disn(i, d);
2: Initialize all D(i, j, d)←M ;
3: for all k ∈ Pre(d) do
4: dise(k, d, d)← tkd;
5: disn(k, d)← tkd;
6: end for
7: for i = d− 1 to 1 do
8: for all j ∈ Post(i) do
9: dise(i, j, d)← tij + disn(j, d);
10: end for
11: disn(i, d)← minj dise(i, j, d),∀j ∈ Post(i);
12: path(i, d)← j;
13: end for
We use a dynamic programming based method to solve
distance dise(i, j, d), disn(i, d) and path(i, d), as shown in
Algorithm 1.
Theorem 1 The required time for Algorithm InitSolve()
is at most O(|E|). The run time to solve all the nodes is
bounded by O(|V | · |E|).
If t(i, j) changes, instead resolving all the dise(i, j, d)
and disn(i, d), we can effectively update them. If t(i, j)
decreases, we use Algorithm 2, otherwise we use Algo-
rithm 3.
We consider a simple paths allocations as shown in
Fig.5. A SCG with seven switches is shown in (a), the
value on each edge eij is initial tij . Using Algorithm 1 set-
ting d = 7, we can solve each dise(i, j, 7)(labeled on each
edge in (b)). If t67 decreases from 4 to 2, we use Algorithm
2 to update some dise(i, j, 7) and disn(i, 7). As shown
Algorithm 2 DecreaseUpdate(i, j,∆t, d)
1: //Update when tij change to (tij −∆t);
2: tij ← (tij −∆t);
3: queue q.push(eij);
4: while q is not empty do
5: eab ← q.pop();
6: dise(a, b, d)← tab + disn(b, d);
7: if tab + disn(b, d) < disn(a, d) then
8: disn(a, d)← tab + disn(b, d);
9: path(a, d)← b;
10: q.push(epa), ∀p ∈ Pre(a);
11: end if
12: end while
Algorithm 3 IncreaseUpdate(i, j,∆t, d)
1: //Update when tij change to (tij + ∆t);
2: tij ← (tij + ∆t);
3: queue q.push(eij);
4: while q is not empty do
5: eab ← q.pop();
6: dise(a, b, d)← tab + disn(b, d);
7: if PATH[a][d] = b then
8: Find k ∈ Post(a) to minimize disn(k, d) + tak;
9: disn(a, d)← disn(k, d) + tak;
10: path(a, d)← k;
11: q.push(epa), ∀p ∈ Pre(a);
12: end if
13: end while
in (c), queue q pushes edges e67, e36, e46, e13, e23 one by
one(labeled as dotted arrows). And path3,7 changes from
5 to 6 and path1,7 changes from 2 to 3. If t57 increases
from 2 to 10, we use Algorithm 3 to update dise(i, j, 7)
and disn(i, 7). As shown in (d), queue q pushes edges
e37, e35, e45, e13, e23 one by one(labeled as dotted arrows).
And path3,7 changes from 5 to 6.
IV. Experimental Results
We implemented our algorithm in the C++ program-
ming language and executed on a Linux machine with a
3.0GHz CPU and 1GB Memory. During floorplanning we
use hMetis[13], an efficient hierarchical graph partitioning
tool.
A. Power Model
NoC power consumption consists of two parts: power
consumed by interconnects and power consumed by
switches For each network link e, we assume Pe repre-
sents bit energy on link e and the corresponding switches.
Pe = Pl + Ps, where Pl and Ps are bit energy on inter-
connects and switches, respectively. Power consumption
is P = Pe × f , where f represents communication re-
quirements passing the link and the corresponding switch.
TABLE II
Power Model of Switch
ports 2 3 4 5 6 7 8
(pJ/bit) 0.22 0.33 0.44 0.55 0.66 0.78 0.90
TABLE III
Power Model of Interconnects
Wire length(mm) 1 4 8 12 16
(pJ/bit) 0.6 2.4 4.8 7.2 9.6
We use Orion[11] as power simulator. Table II gives the
switch bit energy in 0.18um technology and Table III gives
the power model of links.
B. Results and discussion
We have applied our topology generation procedure to
three sets of benchmarks. The first set of benchmarks are
several video processing applications obtained from [2]:
MPEG4, MWD and VOPD. The next set of benchmarks
are obtained from [6]: 263decmp3dec, 263encmp3dec and
mp3encmp3dec. The last benchmark is obtained from [8]:
D 38 tvopd. Fig.6 shows two floorplan generated for the
263decmp3dec and D 38 tvopd benchmark.
We performed experiments to evaluate our topology
generation algorithm. For comparison, we have also gen-
erated another approach PBF, which is similar to the min-
cut based algorithm presented in [7]. In PBF, partition
is solved only before floorplanning. Table IV shows com-
parisons between our experimental results and PBF. The
column Power means the actual power consumption and
column Hops means average number of hops. Our method
can save 41.8% of power and 2.6% of hops number. For
test cases that have more communication requirements,
such as 263encmp3dec, our algorithm can save much more
power(reduce power consumption from 58.6 mW to 19.2
mW). The column W.S means the white spaces and col-
umn Time is run time. The white space of our method
increases from 12.31% to 13.92% and run time is reason-
able. Since power saving is the most important concern,
the deteriorating is acceptable.
We further demonstrated the effectiveness of Algorithm
2 and Algorithm 3. To update routing when link cost
changes, we performed another contrastive approach DSP.
DSP re-solves all the distances of flows by Dijkstra’s short-
TABLE V
Comparison for Fault Tolerant
V# Flow# Update# Run Time(s) Diff
DSP ours
t 01 20 34 20 0.024 0.008 -66.7%
t 02 100 130 30 0.604 0.016 -97.4%
t 03 300 457 50 20.35 0.08 -99.6%
TABLE IV
The Consumption Between the PDF and the PBF
Benchmark V# E# Part# Power(mW) Hops W.S(%) Time(s)
PBF ours PBF ours PBF ours ours
MPEG4 12 13 3 25.9 16.0 1.17 1.0 12.25 16.43 13.86
4 24.3 14.1 1.25 1.041 7.63 16.43 15.07
MWD 12 12 3 3.05 3.08 1.33 1.33 12.22 11.82 13.37
4 3.19 3.02 1.25 1.25 12.22 12.22 15.46
VOPD 12 14 3 7.43 6.12 1.0 1.0 12.16 13.54 14.54
4 7.62 6.59 1.0 1.15 12.17 13.85 17.32
263decmp3dec 14 15 3 4.96 3.92 1.0 1.0 14.24 13.44 23.78
4 7.86 4.35 1.25 1.0 13.59 14.50 24.96
263encmp3dec 12 12 3 24.7 19.2 1.0 1.0 6.06 8.82 13.19
4 58.6 19.2 1.0 1.0 9.58 9.58 15.42
mp3encmp3dec 13 13 3 8.4 4.4 1.0 1.0 15.23 17.60 20.29
4 11.2 8.6 1.0 1.0 15.23 15.24 21.0
D 38 tvopd 38 47 3 12.7 8.2 1.33 1.33 15.1 24.5 92.7
4 12.3 6.8 1.44 1.4 14.7 22.60 104.0
Avg - - - 15.16 8.83 1.14 1.11 12.31 13.92 28.93
Diff - - - - -41.8% - -2.6% - - -
−20 0 20 40 60 80 100 120 140 160
0
20
40
60
80
100
120
140
160
0
1
2
3
4
5
6
7
8
9
10
11
0 50 100 150 200
0
20
40
60
80
100
120
140
160
180
200
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14 15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Fig. 6. Experimental results of 263decmp3dec and D 38 tvopd
with four clusters.
est path algorithm[14]. We have applied another set of
test cases: t 01, t 02 and t 03. For each case, table V re-
ports the number of nodes V#, flow number and update
times Update#. We can see that our updating algorithm
can save lots of run time: t 01 saves 66.7%, t 02 saves
97.4% and t 03 can save 99.6%.
V. CONCLUSIONS
We have proposed a two phases framework to solve
topology synthesis for NoCs: phase one is partition driven
floorplanning; phase two is switches insertion, network in-
terfaces insertion and paths allocations to minimize power
consumption. Experimental results have shown that our
framework is effective and can save power consumption
by 41.8%.
References
[1] L. Benini and G. De Micheli, ”Networks on chips: A new SoC
paradigm”, IEEE Computer, 2002.
[2] D. Bertozzi et al. ”NoC Synthesis Flow for Customized Domain
Specific Multiprocessor Systems-on-Chip”, IEEE Transactions
on Parallel and Distributed Systems, 2005.
[3] R.Marculescu et al. ”Outstanding Research Problems in NoC
Design: System, Microarchitecture, and Circuit Perspectives”,
IEEE Transactions On Computer-Aided Design Of Integrated
Circuits And Systems, 2009.
[4] J.Hu, R.Marculescu, ”Energy-Aware Mapping for Tile-
based NoC Architectures Under Performance Constraints”,
ASP DAC, 2003.
[5] S. Murali, G. D. Micheli, ”Bandwidth-Constrained Mapping of
Cores onto NoC Architectures”, DATE, 2004.
[6] K. Srinivasan, K. S. Chatha and G. Konjevod, ”Linear program-
ming based techniques for synthesis of network-on-chip archi-
tectures”, IEEE Transactions on VLSI, 2006.
[7] S.Murali et al. ”Designing Application-Specific Networks on
Chips with Floorplan Information”, ICCAD, 2006.
[8] S. Murali et al. ”Synthesis of Networks on Chips for 3D Systems
on Chips”, ASP DAC, 2009.
[9] J.Chan, S.Parameswaran, ”NoCOUT : NoC Topology Genera-
tion with Mixed Packet-switched and Point-to-Point Networks”,
ASPDAC, 2008.
[10] Shan Yan and Bill Lin, ”Application-specific Network-on-
Chip architecture synthesis based on set partitions and Steiner
Trees”, ASP DAC, 2008.
[11] H. Wang, X. Zhu, L. Peh, S. Malik, ”Orion: A Power-
Performance Simulator for Interconnection Networks”, Int.
Symp. on Microarchitecture, 2002.
[12] Xianlong Hong, Sheqin Dong. Non-slicing floorplan and place-
ment using corner block list topological representation. IEEE
Transaction on CAS, 51:228–233, 2004.
[13] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, ”Multi-
level Hypergraph Partitioning: Application in VLSI Domain”,
DAC, 1997.
[14] R.K.Ahuja, T.L.Magnanti, and J.B.Orlin. Network Flows:
Theory, Algorithms, and Applications. Prentice Hall, 2005.
