Floorplacement for Partial Reconfigurable FPGA-Based Systems by A.  Montone et al.
Hindawi Publishing Corporation
International Journal of Reconﬁgurable Computing
Volume 2011, Article ID 483681, 12 pages
doi:10.1155/2011/483681
Research Article
Floorplacement forPartialReconﬁgurable FPGA-BasedSystems
A.Montone,1 M. D. Santambrogio,1,2 F.Redaelli,1 and D. Sciuto1
1Dipartimento di Elettronica e Informazione, Politecnico di Milano, 20133 Milano, Italy
2Computer Science and Artiﬁcial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Correspondence should be addressed to M. D. Santambrogio, marco.santambrogio@polimi.it
Received 20 August 2010; Accepted 20 December 2010
Academic Editor: Aravind Dasu
Copyright © 2011 A. Montone et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
We presented a resource- and conﬁguration-aware ﬂoorplacement framework, tailored for Xilinx Virtex 4 and 5 FPGAs, using
an objective function based on external wirelength. Our work aims at identifying groups of Reconﬁgurable Functional Units that
are likely to be conﬁgured in the same chip area, identifying these areas based on resource requirements, device capabilities,
and wirelength. Task graphs with few externally connected RRs lead to the biggest decrease, while external wirelength in task
graphs with many externally connected RRs show lower improvement. The proposed approach results, as also demonstrated in
the experimental results section, in a shorter external wirelength (an average reduction of 50%) with respect to purely area-driven
approaches and a highly increased probability of reuse of existing links (90% reduction can be obtained in the best case).
1.Introduction
Nowadays one of the most important design styles in VLSI
is represented by Field Programmable Gate Arrays (FPGAs).
The standard FPGA design ﬂow starts from an RT-level
description of the circuit (e.g., provided by a HDL language)
and ends with a conﬁguration ﬁle (bitstream) conﬁguring
the desired circuit on the device. Much eﬀort has been
already placed towards improving the diﬀerent stages of the
design ﬂow, from logic synthesis to placement and routing.
Ononehand,theseproblemshavebeenaddresseddirectlyby
the FPGA vendors[1]a sm u c ha sb ya c a d e m i cw o r k s[ 2]. On
the other hand, the ﬂoorplanning automation for FPGAs is
a current research topic, to face the new challenges provided
by FPGAs.
FPGAs particularly present two unique aspects with
respect to traditional VLSI designs: resource heterogeneity
and reconﬁgurability. FPGA devices are generally deﬁned
using several kinds of resources (e.g., programmable logic
cells, memories, multipliers, DSPs, and so on). This requires
to take into consideration the resource heterogeneity, to
allow each architectural module to be placed in an area
region containing all the needed resources. On the other
hand, reconﬁguration allows the possibility to change the
architecture or the application implemented on the FPGA
without requiring any physical action on the device. In par-
ticular, in partial dynamic reconﬁgurability, an architecture
may change at runtime (i.e., without having a disruption of
theprovidedfunctionality)asubsetofitsmodulesinorderto
modify its own behavior. Reconﬁgurability introduces time
as a new variable within the ﬂoorplanning formulation.
A ﬂoorplanner taking into account both of these aspects
has to ﬁnd, for each module (in this paper we use the
term module to address an architectural component after
technologymapping),anareaassignment accordingtotarget
device’s capabilities considering that the modules can be
replaced later on due to the system needs using the reconﬁg-
uration capabilitiesof the target device. The term ﬂoorplacer
has been ﬁrst introduced in the context of traditional
VLSI design [3, 4] to emphasize how recently developed
algorithms for VLSI design automation face placement and
ﬂoorplanning tasks concurrently. In a similar spirit, resource
management for partially and dynamically reconﬁgurable
FPGAscanbeneﬁtfromthisparadigm.Inthispaper,whichis
an extendedversion ofthe work presented in [5], we propose
to develop a ﬂoorplacer for such a device, which will identify
groupsofmodulesthatarelikelytobeconﬁguredinthesame
rectangular area and, consequently, identify these chip areas
according to the modules’ requirements, device capabilities
and design objectives. Particularly, in this work, we focus2 International Journal of Reconﬁgurable Computing
on optimizing the wirelength since the base ﬂoorplacement
framework.
The paper is structured as follows. Section 2 introduces
the problem and deﬁnitions of relevant concepts. Section 3
describes the related work. The target FPGA architecture
is described in Section 5, and the experimental results are
presented in Section 6. Finally, conclusions and future work
are outlined in Section 7.
2.RelatedWorks
The problem of resource-aware ﬂoorplanning tailored for
FPGAs has been addressed in literature [6, 7]. Montone
et al. [7] proposed a ﬂoorplanning method that isonly aware
of resource requirements for the reconﬁgurable modules.
Feng and Mehta [6] proposed an approach, where each
architectural module has a list of required resources, hence
each module has to be placed in an area region containing all
the needed resources. This approach is based on a two steps
algorithm: ﬁrst the execution of Parquet [8] ﬂoorplanner
with thefollowing resource-aware costfunction consisting of
a linear combination of Parquet’s objective function and the
amount of satisﬁed resource requirements. The result is then
reﬁned solving a ﬂow maximization with cost minimization
on a purposely built graph. While this approach is currently
the state of the art in resource-aware ﬂoorplanning for
FPGAs, in the majority of cases, the result is not compatible
with the FPGA reconﬁgurability process.
One of the earliest works dealing with reconﬁguration-
aware ﬂoorplanning is [9]. An oﬄine ﬂoorplanner is used
to decide whether a required functionality should be imple-
mented in hardware on a reconﬁgurable device or has to
be executed via software on a general purpose processor.
Diﬀerent heuristics are proposed to minimize execution
time, while reducing implemented modules’ fragmentation
across the device area.
The ﬁrst deﬁnition of temporal ﬂoorplanning has been
introduced in [10]. According to the author a temporal
ﬂoorplanning consists of two phases: (i) partitioning and
sequencing design modules into design conﬁgurations (also
called temporal partitioning or scheduling), and spatial
positioning of design modules and wiring within the recon-
ﬁgurable area, in the following named as reconﬁgurable
region, of each conﬁguration.
One of the most important contributions in three-
dimensional ﬂoorplanning is presented in [11, 12]. They
introduce a full 3D ﬂoorplanner based on simulated
annealing using a three-dimensional transitive closure graph
(TCG) and T-trees. They just evaluate the time required
by each module to communicate with RAM chips outside
the FPGA in order to store results and read input, but
they do not evaluate if a found ﬂoorplan, particularly its
communication infrastructure, is feasible on a given device
or not. Furthermore, they do not consider other resources
than logic blocks.
A more recent work [13] faces the FPGA ﬂoorplanning
problem considering partial dynamic reconﬁguration, in
terms of module reusability. They introduce a variation to
t h es e q u e n c ep a i rr e p r e s e n t a t i o n[ 14] in order to represent
the ﬂoorplan in diﬀerent time frames. The aim if this work
is to reduce the quantity of device area reconﬁgured between
diﬀerenttimeframes. Giventwodesignsintwodiﬀerenttime
instants, the authors present a simulated annealing-based
ﬂoorplanning able to reduce the reconﬁgured area between
the two instants by exploiting reuse of already conﬁgured
modules.
A brief comparison between previous works is provided
in Table 1. The most common limitations of prior work is in
their lack of control over the feasibility of the resulting com-
munication infrastructure. Furthermore, most of the related
work do not treat resource awareness as a primary goal.
However, this is of utmost importance for making module
allocation decisions for modern dynamically reconﬁgurable
systems, which contain a variety of heterogenous resources.
Our proposed ﬂoorplacement method aims to address both
of these two important problems, that is, enabling an
interconnect and resource-aware design framework.
To the best of our knowledge, there are no prior works
dealing with the minimization of wirelength between RFUs
and IOBs. The problem of minimizing internal wirelength
of RFUs is considered in the past [12], but this approach is
based on an estimation of the internal wirelength related to
RFUs’ aspect ratio and cannot be easily extended in order
to support external wirelength. Similarly, by applying the
approachintroducedin[13],areuseoflinkscanbeobtained,
but this approach aims only at re-using RFUs implementing
the same functionality, while our approach aims at re-using
links between diﬀerent RFUs connected to the same set of
IOBs.
In the following section, we will introduce the basic
concepts and deﬁnitions related to our proposed ﬂoorplacer.
3.ProblemDescriptionand Basic Deﬁnitions
Given a set of Reconﬁgurable Functional Units (RFUs) the
resource- and conﬁguration-aware ﬂoorplacement problem
consists of ﬁnding, for each RFU, a chip area where it can
be placed and routed oﬄine and conﬁgured at runtime. The
entire processissubject tothe constraint thateach RFUmust
beassociated withan areacontainingalltherequiredtypesof
resources for its functionality.
Let a RFU be a technologically mapped netlist imple-
mentingarequiredfunctionality,andaReconﬁgurable Region
(RR) be a rectangular FPGA area where two or more
RFUs are going to be placed and routed (at design time)
and conﬁgured (at runtime) according to the application
implemented in the reconﬁgurable system. The goals of the
ﬂoorplacer are to
(1) deﬁne the number of RRs and associate each RFU
with one and only one RR (partitioning task),
(2) ﬁnd a position ofRFUs inside the corresponding RRs
(Temporal Floorplacement inside Reconﬁgurable
Region),
(3) ﬁndanareaconstraintforalltheRRsinsidetheFPGA
area (RR Floorplacement).International Journal of Reconﬁgurable Computing 3
Table 1: Comparison amongprevious works.
Authors Comm
Infrastructure
Resources
Aware
Reconﬁgurability
Aware
Reusability
Aware Algorithm
Feng and Mehta [6]N o Yes (high res.
usage) No No Sim. Annealing over seq. pairs + Flow
Max over Flow Graph
Montone et al. [7] No Yes Yes For logic only Sim. Annealing
Bazargan et al. [9]N oN oY e sN o Sim. Annealing over cubic modules (2d
spatial, 1d temporal)
Yuh et al. [11, 12] Limited, w/High
Overhead No Yes No Sim. Annealing over cubic modules using
T-trees and TCG
Singhal and
Bozorgzadeh [13] No No Yes Yes Sim. Annealing over seq. pairs
The set of input RFUs is represented by a scheduled task
graph. Such a task graph is divided into time intervals
named static snapshot. Each static snapshot is characterized
by having the same set of conﬁgured and running RFUs,
during its time interval. Given a RFU m,t h ef u n c t i o n
TIME(m) returns the set of static snapshots containing m,
that is, static snapshots requiring that m is conﬁgured and
running. According to these deﬁnitions, the scheduled task
graph can be considered as a ﬁnite state automaton having
static snapshots as states and the reconﬁguration process as
transitions.
Our framework is tailored for Xilinx Virtex 4 and 5
FPGAs [15] that provide access to the oﬀ-chip logic through
a set of FPGA pins dedicated for communication. The
connections between the FPGA’s internal logic and external
pins are managed through three-state buﬀers named Input
Output Blocks (IOBs).
Our proposed approach speciﬁcally deals with the
interconnection optimization. For FPGAs routing resources
are often the limiting resources, hence, achieving better
routability in a design may in fact determine the overall
feasibility of a design.
Our ﬂoorplacement framework manages an objective
function based on external wirelength, that is, the estimated
length of the nets connecting each RFUto the corresponding
IOBs.Thisframew orkcanbeusedintw odiﬀerentscenarios.
(i) The designer is implementing an application on an
FPGA belonging to an existing board with previously
assigned IOBs. In this case the framework can help
the designer to deﬁne area constraints in order to
reduce external wirelength.
(ii) The designer has to assign the IOBs and build the
board from scratch. In this case the framework can
provide feedback to the designer in order to identify
or approximate the best IOBs assignment.
In order to formulate this objective function the distance
between RFUs and IOBs need to be estimated. Experimental
resultsprovedthatusingtheManhattandistancebetweenthe
center of the RFU and the position of the IOB provides a
good approximation. This deﬁnition can be generalized as
t h ed i s t a n c eb e t w e e no n eR F Ua n da n yl o c a t i o no nt h ec h i p
area as follows:
d(RFU,P) :=
       Px −
 
1
2
· RFUw +RFU x
        
+
       Py −
 
1
2
·RFUh +RFU y
        ,
(1)
where subscripts x, y, w,a n dh for RFUs stand for the x
coordinate, y coordinate, width and height, of the RFU.
RFUs have x and y coordinates corresponding to the
bottom-left most corner. Px and Py denote the x and y
coordinates of an arbitrary location on the FPGA. Similarly
the distance between one Reconﬁgurable Region and a
point can be deﬁned. The rationale of the wirelength-
driven ﬂoorplacement is to constrain a group of RFUs
characterized by being connected to a set of IOBs within
the same area, thereby, creating a neighborhood. Hence,
RFUs communicating with IOBs that are near each other,
are kept together and constrained within the same area. This
approach resultsin ashorter externalwirelength and a highly
increased probabilityofreuse ofexisting links. While theﬁrst
outcome is intuitive, the second requires some elaboration.
In a dynamically reconﬁgurable device a common approach
is to allocate two main partitions on the device: the static
part and the dynamically reconﬁgured part, where RFUs
will be allocated. The communication infrastructure serving
all dynamically inserted functionality is managed by the
static part of the design and all the communication, both
among RFUs and between RFUs and the static part, are
performed by the communication infrastructure exposed
throughhardware macros. Also the communicationbetween
RFUs and IOB has to be managed by the static part of the
design and similarly has to be exposed through hardware
macros. Let us consider N RFUs accessing an external device
through a set of M IOBs, in the worst case the static part
has to provide one set of M hardware macros for each RFU,
hence,M·N hardware macros. Instead,iftheﬂoorplacement
is aware of IOB positions, RFUs accessing to the same set
of IOBs could be constrained within the same area and
hardware macros may be reused by the diﬀerent RFUs.
Consequently, the number of hardware macros that need to
be provided by the static part of the design can be drastically
lowered toward the theoretical limit of just one set of M
hardware macros, that is, just one for each IOB.4 International Journal of Reconﬁgurable Computing
Each RFU of the input scheduled task graph can be
annotated with information on the position of the IOBs. In
order to simplify the management of the IOBs, for each RFU
n connected to M IOBs we deﬁne a point C,n a m e dcentroid,
whose coordinates are given by the arithmetic average of the
coordinates of the IOBs:
Cx(n) =
1
M(n)
M(n)  
m=1
IOBx(n,m), (2)
where Cx(n)isth ex coordinate ofRFUn,I O B x(m,n)i st h ex
coordinateofthemthIOBofRFUn,andM(n)isthenumber
of IOBs of RFU n. The same holds for y coordinates. The
centroid represents the ideal position where an RFU should
be positioned in order to minimize the external wirelength,
that is, ﬂoorplacing the RFU such that its own geometrical
center coincides with the centroid will result in minimizing
the external wirelength. Only RFUs connected to the IOBs
have an associated centroid.
In the following section we will ﬁrst give an overview
of the dynamic reconﬁguration mechanism of our target
FPGA architecture. Next, in Section 5 we will describe our
ﬂoorplacer which caneﬀectivelymanage theresourcesofthis
reconﬁgurable architecture.
4.FPGAReconﬁgurationand
TargetArchitecture
This section introduces the reconﬁguration process from
the FPGA’s physical point of view. Section 4.1 presents
the physical limits of the reconﬁguration process, while
Section 4.2 shows the design ﬂow proposed by Xilinx for
reconﬁgurable architectures. Finally, Section 4.3 deﬁnes the
target architecture considered by this work.
4.1. Smallest Reconﬁgurable Region. The reconﬁguration
process described here relates to the latest generation of
Xilinx FPGAs, that is, Virtex 5 devices. According to their
datasheets [16] all the devices of this family share a common
s t r u c t u r e .T h ee n t i r eF P G Ai sm a d eo fp r o g r a m m a b l el o g i c
(commonly referred to as CLBs for this FPGA family) and
periodically distributed Block RAMs (BRAMs), while all
the other resources (such as DSP blocks) are placed along
the vertical edges of the FPGA. User logic is implemented
combining all these resources and connecting them using
channels and switchboxes. The information about device
conﬁguration is described in a binary conﬁguration ﬁle
named bitstream: logical functions implemented by CLBs
(i.e., the content of the lookup tables implementing a logical
function with 6 inputs and 1 output), BRAMs content,
routing information (i.e.,logicalstatus ofswitches managing
theinterconnections),andsoon.Thesmallest reconﬁgurable
element is 1row high and 1CLB wide and is referred to as
frame, each frame is addressed by a row number (from 0 to
3) and column number (expressed in CLBs). The bitstream
follows the device topology, hence inside a conﬁguration
bitstream one can easily identify data conﬁguring a speciﬁc
frame.
As previously mentioned, the frame is the smallest FPGA
area that can be conﬁgured independently of the others.
Given a generic module hmodule rows high and wmodule
CLBs wide, the smallest area that can be involved in the
reconﬁguration process is a rectangle with hsmall-area rows
height and wsmall-area CLBs width such that the following.
(i) The height in rows is the smallest integer greater than
the module’s height in rows
hsmall-area =  hmodule  [rows]. (3)
(ii) The width in CLBs is the smallest integer greater than
the module’s width in CLBs
wsmall-area =  wmodule  [CLBs]. (4)
For example, given a module requiring for its placement
1.5rows height and 30CLBs width, the smallest area that can
be assigned to this module, considering the reconﬁguration
constraints, is a rectangle of height and width of 2rows
and 30CLBs, respectively. Conﬁguration and reconﬁgura-
tion processes take place by writing the bitstream inside
the FPGA’s conﬁguration memory.I nt h ec a s eo fp a r t i a l
reconﬁguration the bitstream carries information addressing
the frames that are going to be replaced by the carried data.
4.2. Xilinx Partial Reconﬁguration Design Flow. Due to the
fact that Xilinx FPGA Virtex families are the target devices
of this work, the Xilinx Partial Reconﬁguration (PR) design
ﬂow [17] will be brieﬂy introduced here. This ﬂow is com-
patible with dynamic reconﬁguration. Let a Reconﬁgurable
Functional Unit (RFU) be a technologically mapped netlist
implementing a required functionality and a Reconﬁgurable
Region (RR) be a rectangular FPGA area where two or more
RFUs are going to be placed and routed (at design time)
and conﬁgured (at runtime) according to the application
implemented in the reconﬁgurable system. PR allows the
deﬁnition of a set of nonoverlapping RRs. All the static logic
(i.e., all logic that will always remain conﬁgured, including
the glue logic) is placed outside the reconﬁgurable regions,
while they are allowed to use routing resources intersecting
and even crossing RRs (the use of routing resources crossing
RRs is the most relevant case). Figure 1(a) provides an
example of a reconﬁgurable architecture developed with the
PR design ﬂow. Note how the reconﬁgurable regions are
aligned to the grid deﬁned by rows and CLBs according to
PR requirements.
According to the PR ﬂow, hardware macros can be placed
on the boundaries of RRs in order to deﬁne pins where RFUs
can hook themselves. Such macros are made with pairs of
CLBs, one side of the CLB pair is connected to an RR signal,
while the other is connected to a static logic signal. Previous
design ﬂows [18] required static logic being placed and also
routed outside RRs, while the PR relaxes this constraint.
4.3.TargetArchitecture. Thetargetarchitectureconsideredin
this work is based on the PR design ﬂow and has a static part
implementing the communication infrastructure providingInternational Journal of Reconﬁgurable Computing 5
Row 3
Row 2
Row 1
Row 0
Row
CLB
Static
A
BC
D
Hardware macro
Static logic routing
Reconﬁgurable region
(a) Physical view
Static
ABCD
Reconﬁgurable
Static
logic
Communication
infrastructure
(b) Logical view
Figure 1: Target Reconﬁgurable Architecture based on PR: (a) physical and (b) logical views.
a number of interfaces at least equal to the number of RRs
(each RR may provide more than one link to the commu-
nication infrastructure). Each RFU can communicate with
other conﬁgured modules or with the static part by just
hooking up to the hardware macros corresponding to the
communication infrastructure interfaces. The most general
view of the target architecture is given in Figure 1:f r o ma
physical point of view RRs communicate with the static part
through to PR stand-alone static-logic nets routing, while
from a logical point of view the static logic is responsible
for managing intermodule communication implementing
diﬀerent communication infrastructures (such as bus-based,
NoC-based, point to point, and so on). In conclusion, PR
allows a more scalable communication infrastructure for
partial dynamic reconﬁgurable architectures, since it allows
an RFU to communicate with static logic regardless of
the position of other RFUs. This approach simpliﬁes the
communication infrastructure design, while the frame size
still needs to be considered during RRs ﬂoorplacement in
order to prevent the conﬁguration of an RFU inside an RR
from interfering with other RFUs being executed on other
parts of the FPGAs.
5.The FloorplacementFramework
In this section, we introduce our proposed framework for
solving the ﬂoorplacement problem targeting the dynam-
ically reconﬁgurable architecture and the reconﬁguration
technology described in the previous section.
The ﬂoorplacement framework accepts as input a sched-
uled task graph (TG) composed of a node for each RFU. A
taskgraphisaDirectedAcyclicGraph whose nodesrepresent
asingletaskofagivenapplication(orpartofanapplication).
A TG representation of an application has been chosen
according to most of the related works [9–11, 13]. The TG
can be scheduled according to diﬀerent requirements (e.g.,
timing requirements or target device). Dividing time into
time slots the concept of static snapshot can be deﬁned as
the set of TG’s nodes (i.e., tasks) that must be conﬁgured
and must be running in a given time slot. A partial dynamic
reconﬁgurable system can be seen as a ﬁnite state automaton
according to the following deﬁnition.
(i) States. There isonestateforeach time slot(henceone
for each static snapshot).
(ii) Transitions. The transition is a reconﬁguration pro-
cess.
After presenting the chosen scheduling technique, we
will describe three algorithms that comprise our framework
[5]. The ﬂoorplacement process starts with a Partitioning
step, where RFUs are ﬁrst grouped into RRs according
to two criteria, wirelength for external routing to IOBs
and utilization of resources. In the second step, once the
partitions have been computed, the position of each RFU
inside the corresponding RR needs to be determined. This
is performed by the Temporal ﬂoorplacement step inside RRs.
Finally, the design is completed by placing the RRs on the
FPGA using the Reconﬁgurable Regions Floorplacement step.
5.1. Static Scheduling Phase. The heuristic used to compute
the scheduled task graph has been deﬁned starting from the
Napoleon scheduler [19] and the ILP formulation proposed
in [20]. This heuristic is a reconﬁguration-aware scheduler
for dynamically partially reconﬁgurable architectures that
exploits conﬁguration prefetching, module reuse, and also
antifragmentation techniques.
In the following, nodes that have to be scheduled while
their ancestors have already been will be called available
nodes.Theheuristicperforms alist-basedschedulingusingas
priority function the ALAP value of the tasks. This function
has been slightly changed: an available task can be scheduled
if (i) it has an ALAP value greater than the minimum
ALAP value of the available nodes, (ii) if the possibility of6 International Journal of Reconﬁgurable Computing
sLength ← 0
t ← 1
g ← readGraph()
setALAP(g)
RNs ← getRootNodes(g)
while ∃ not scheduled tasks do
Control possibility of reuse for available tasks in RNs
if ∃ not scheduled tasks then
avTask ← getFirstALAPAvailableNode(RNs)
endT ← ﬁndEndTime(avTask,t)
whileall the available nodes in RNs have been observed do
if ∃ a position on the FPGA for avTask then
avTask.terminationTime ← endT
avTask.schedulingTime ← t
avTask.setScheduled ← true
if sLength < endT then
sLength ← endT
end if
for all avTask child nodes chTask do
if All chTask parents have been scheduled then
RNs ← RNs + chTask
end if
end for
Control possibility of reuse for available tasks in RNs
avTask ← getNextALAPAvailableNode(RNs)
end if
end while
end if
t ← nextControlStep
end while
Algorithm 1: Heuristic pseudocode.
scheduling for all the available tasks with an ALAP value less
than its own has been veriﬁed and (iii) if there is also enough
space onto the FPGA.
Two antifragmentation techniques have been designed.
(i) Farthest Placement. When a module needs to be
reconﬁgured, it will be placed in the farthest position
with respect to the centerof the FPGA. We have to do
this because when a large module (a module which
is demanding many hardware resources) has to be
placed, maintaining the emptiness of the center of
the FPGA could increase the probability of placing
largemodulesquickly.Thesame conceptisappliedto
those tasks exploiting module reuse: when more than
one module is available to be used on the FPGA, the
f a r t h e s to n ew i t hr e s p e c tt ot h ec e n t e ro ft h eF P G Ai s
selected.
(ii) Limited Deconﬁguration. The deconﬁguration policy
leaves on the FPGA all modules that are not involved
directly in the cleaning process (the creation of
enough contiguous space for a new task, increasing
the possibility of reuse of those modules).
Algorithm 1 shows the pseudocode of the proposed algo-
rithm.
The most important functions used inAlgorithm 1are as
follows.
(i) ∃ a Position on the FPGA for avTask.T h i sf u n c t i o n
involves the placer that, using antifragmentation
techniques, tries to place the current task avTask.T h i s
function takes into account also that if a module
is being reconﬁgured, no other modules can be
reconﬁgured onto the FPGA, and in this case it
returns false.T h i sf u n c t i o nh a sn o tt ob ec o n f u s e d
with a later phase of the proposed framework. In the
scheduling phase this function is used to make the
scheduler aware of the resource, while in the next
phase theitwill be usedtoproperlymanage the RFUs
into the RRs.
(ii) Control the Possibility of Reuse for Available Tasks in
Root Nodes (RN). The pseudocode of this function is
presented in Algorithm 2, and it simply considers all
theavailabletasksin ALAPorderand veriﬁesforeach
one if there is a module available to be reused.
(iii) nextControlStep. This function returns the next time
assignable to a task. This is done to reduce the
complexity of the algorithm by reducing the number
of iterations in the external while cycle. Not all the
time instants are available to assign a task:International Journal of Reconﬁgurable Computing 7
avTask ← getFirstALAPAvailableNode(RNs)
while ∃ an available task not yet considered do
endT ← ﬁndEndTimeReusedTask(avTask,t)
if ∃ am o d u l eu s a b l eb ya v T a s kthen
avTask.terminationTime ← endT
avTask.schedulingTime ← t
avTask.setScheduled ← true
if sLength < endT then
sLength ← endT
end if
forall avTask child nodes chTask do
if All chTask parents have been scheduled then
RNs ← RNs + chTask
end if
end for
end if
avTask ← getNextALAPAvailableNode(RNs)
end while
Algorithm 2: Reuse function pseudocode.
(a) when a task is being reconﬁgured, the scheduler
cannot reconﬁgure any other task;
(b) when there is not enough available area on the
FPGA to place any task, the scheduler has to
wait for the termination of at least one running
task;
(c) when a module exploits the module reuse
concept and there are no available modules of
the same type on the FPGA, the scheduler has
to wait for the termination of at least one of
those modules to schedule the selected tasks.
F o rt h i sr e a s o nnextControlStep assigns to the
current time t av a l u eg i v e nb y1 plus the
minimum time between the last time in which
the reconﬁguration device is used and the ﬁrst
termination time of the tasks running on the
FPGA.
In the worst case, the algorithm assigns only one task
per time instant so the external while is executed O(n)t i m e s
where n is the number of tasks in the task graph, the control
for reused tasks takes O(fo) time, where fo is the maximum
fanout of the nodes of the task graph, the internal while is
executed O(fo). The functions that return the tasks in ALAP
order can be designed by implementing the binomial search
in O(1) time, but in this case the process of inserting a new
available node into RNs will take O(log fo). Also the for used
to verify the availability of the children nodes of avTask is
executedO(fo) times. Hence the complexity ofthe algorithm
in the worst case is O(nf2
o log fo).
5.2. Partitioning into RRs. Given N RRs, the N-RRs par-
titioning problem consists of ﬁnding a surjective binding
cm,n of RFUs into RRs (i.e., each RFU has to be bound to
one and only one RR and each RR has to contain at least
one RFU). Algorithm 3 ﬁrstly aims at grouping together
Buckets B;
For all Externally connected RFU r do
B.add(r);
end for
Wirelength-driven-partition(B);
Fix-existing-associations(B);
for all Remaining RFU q do
B.add(q);
end for
Resource-driven-partition(B);
Algorithm 3: Partitioning into RRs.
externally connected RFUs (i.e., RFUS that are connected
to IOBs) having nearest centroids and keeping RFUs with
distant centroids in diﬀerent RRs. We refer to this as the
(wirelength-driven partitioning).
Secondly, the remaining RFUs are partitioned to min-
imize the variance of RRs’ resource requirements along
diﬀerent static snapshots. In other words, for a given RR, the
algorithm tries to keep the amount of resources needed by
the RFUs conﬁgured and running inside the considered RR
constantas muchaspossibleacrossdiﬀerentstaticsnapshots.
We refer to this as the resource-driven partitioning.
The problem of wirelength aware partitioning of RFUs
into RRs, can be reduced to the problem of clustering the
corresponding centroids in a two-dimensional space (i.e.,
chip area). Each identiﬁed cluster is associated with one and
onlyoneRRandtheRFUsarepartitionedintoRRsaccording
to the association between their corresponding centroids
and clusters (a partition belongs to an RR if and only if its
centroid belongs to the clusterassociated with the RR).Once
the wirelength-driven partitioning has been performed, the
createdpartition isusedasaninitialsolutionbytheresource-
driven partitioner to further partition the RFUs that are not
externally connected. This means that the surjective binding
cm,n is no longer modiﬁed for externally connected RFUs.
While data-mining algorithms provide several tools to
solvetheclusteringproblem(likethewell-known k-means or
fuzzy-k-means algorithm),theyareprimarily gearedtowards
very large datasets. For our purposes, real life task graphs
consist of fewer than a hundred RFUs and only a few of
them are connected to IOBs. Hence, we adopted a simulated
annealing-based approach.
Data Structure. Let us consider a bucket data structure
having a set of buckets Bn for each Reconﬁgurable Region n.
Ag i v e nR F Um belongs to a bucketBn (and only to that one)
if and only if m is going to be placed in the Reconﬁgurable
Region n at TIME(m).
Annealer’s Moves. Let the simulated annealer’s moves be the
following.
(i) Randomly Move One RFU. Move one module
between two buckets: randomly pick up a module
m ∈ Bn and move to a bucket Bn  where n / =n .T h i s8 International Journal of Reconﬁgurable Computing
move can be performed if and only if Bn contains
another module m  / =m.
(ii) SwapTwo RFUs. Swapmodulesbelongingtodiﬀerent
buckets: randomly pick up two modules m and m ,
respectively, m ∈ Bn and m  ∈ Bn  such that Bn / =Bn .
The move consists in swapping modules’ buckets
such that m ∈ Bm  and m  ∈ Bm.
Oncethepartitionshavebeencomputed,thepositionofeach
RFU inside the corresponding RR needs to be determined.
5.3. Temporal Floorplacement inside Reconﬁgurable Regions.
The aim of the Temporal Floorplacement inside Reconﬁg-
urable Regions (TFiRR) is to compute, for each RR, a set
of height-width pairs describing rectangular areas where all
RFUs bound to this RR can be successfully ﬂoorplaced. In
this phase the ﬁnal on-chip position of the rectangular area
is not considered. Fora target FPGA devicethat is divided by
up to k rows for reconﬁguration, the goal of this algorithm is
to determine for each RR n as e to fp a i r s
Ω =
  
n1
h,n1
w
 
,
 
n2
h,n2
w
 
,...,
 
nk
h,nk
w
 
,...
 
(5)
such that an eventual actual placement of this RR on
t h ed e v i c eg i v e na sAn =  nx,ny,ni
h,ni
w  for all nx,ny,i,
resultsin a feasible ﬂoorplacement independentlyofthe ﬁnal
position nx and ny decided for this RR. Here, h, w, x,a n d
y stand, respectively, for the height, width, and the two
coordinates of the bottom-leftmost corner of the rectangular
area.
Consequently,foreachRRn,thesetofheight-widthpairs
Ω can be described by providing just four elements (due
to technological constraints related to Xilinx Virtex 4 and 5
FPGAs that are divided in 4 rows)
Ω =
  
1,n1
w
 
,
 
2,n2
w
 
,
 
3,n3
w
 
,
 
4,n4
w
  
,( 6 )
where n1
w is the smallest width that RR n, ﬂoorplaced in 1
row, should have in order to feasibly host all the associated
RFUs, n2
w is the smallest width that RR n (ﬂoorplaced
in 2 rows) should have in order to feasibly host all the
associated RFUs, n3
w and n4
w are the smallest widths that RR
n, ﬂoorplaced in 3 and 4 rows, respectively, should have in
order to feasibly host all the associated RFUs.
ThecoreoftheTFiRRstepisthecomputationofthepairs
 i,ni
w . In order to ﬁnd, for a given height i, the minimum
feasible width ni
w, the algorithm has to check that every RFU
can be successfully ﬂoorplaced inside the area described by
the pair  i,ni
w , that is, for each RFU the algorithm has to
provide a height, width, and a position within the RR n.
Such a problem is itself three-dimensional (i.e., two
spatial dimensions and a tempor a lo n e ) .I no r d e rt os i m p l i f y
the problem the following assumption is introduced: all
RFUs’ heights are equal to RRs’ heights. Fixing the height
dimension of the RFUs, the problem is reduced to a
bidimensional packing problem such that the static snapshot
and the width are the only two considered coordinates.
Given a RFU m,a n dah e i g h ti, the smallest position-
independent width required by the RFU in order to be
hosted inside an area of height equal to i rows, can be
easily computed by taking into account the FPGA’s resources
periodicdistribution.The TFiRRalgorithmworksasfollows.
For each RR n and each possible height i ∈{ 1,2,3,4}
(1) consider RFU m such that cm,n = 1( i . e . ,R F U s
belonging to RR n), let RFUs’ height be equal to RR’s
height,then,computetheminimumfeasiblewidth of
RFUs m;
( 2 )p a c ka l lt h eR F U si n s i d et h eR Ri no r d e rt ominimize
the maximum width of RR n.
T h ep a c k i n go ft h eR F U sc a nb ep e r f o r m e dw i t haz e r o
temperature simulated annealing (ZT-SA) algorithm. For
each static snapshot p an ordered list of RFUs m,s u c ht h a t
p ∈ TIME(m), is kept. The RFUs are ordered from the
leftmost to the rightmost with respect to the RR’s area. The
following moves are applied.
(1) Randomly Move an RFU. Randomly pickan RFU and
move to an integer position belonging to the interval
[0,width], where width is the current width of the
RR.
(2) Randomly Swap Two Concurrent RFUs.R a n d o m l y
pick two RFUs m  and m  , being concurrently
conﬁguredandrunning inatleastonestatic snapshot
a n ds w a pt h e i rp o s i t i o ni n s i d eR R .
In order to keep the ﬂoorplacement compact, each
step of the annealer is followed by a compression function
that computes, for each RFU, the leftmost feasible solution
preventing overlaps between RFUs. The computation of
the objective function is the most expensive operation,
requiring in the worst case Θ(R · P)t i m e( w h e r eR is
the number of RFUs), but experimental data on randomly
generatedpartitionsindicatethatinpracticesuchcomplexity
is asymptotic O(R·logR). From a memory complexity point
of view the algorithm requires only the management of a list
for each static snapshot, hence the memory requirementsare
Θ(R).
The quality of this algorithm at ﬁrst seems to be strictly
related to the quality of the partition provided by the
previous step. Our experiments showed that TFiRR applied
onnonpartitioned taskgraphscanreach theresults ofTFiRR
applied to a partitioned task graph by increasing the number
ofiterationsbyatleasttwoordersofmagnitude.Ontheother
hand, the diﬀerence between the objective functions remains
fairly low. We observed degradationsranging between 1–5%.
At the end of this second step, each resulting RR is
annotated with a centroidwhose coordinates are givenbythe
arithmetic average of the corresponding coordinates of the
RFUs associated with the considered RR. This identiﬁes the
ideal position where each RR should be placed in order to
globally minimize the external wirelength of the associated
RFUs. This particular formula for computing the centroid
places more emphasis on the most heavily utilized IOBs.
For example, if three RFUs are connected only to the USB
interface, then all of them will have the same centroid CUSB
that will occur three times in the set of centroids associated
with the RR. In the third and ﬁnal step, the centroids of the
RRs will be used during the ﬁnal ﬂoorplacement of RRs.International Journal of Reconﬁgurable Computing 9
5.4. Reconﬁgurable Regions Floorplacement. The aim of this
step is to deﬁne, for each RR n,a na r e a
An =
 
nx,ny,nh,nw
 
. (7)
Thealgorithmhastochooseone  nw,nh couple,foreachRR
n, out of the set provided by the TFiRR step. Furthermore,
it has to determine the speciﬁc x and y positions on the
FPGA area. According to classical ﬂoorplanning this task
can be performed through simulated annealing. The RR
Floorplacement algorithm is divided in two steps: the ﬁrst
one ﬂoorplacing the RRs connected to IOBs (wirelength-
driven RRs ﬂoorplacement) and the second one ﬂoorplacing
the remaining RRs (area-driven RRs ﬂoorplacement).
Data Structure. To represent the ﬂoorplacement, a Horizon-
tal Constraint List (HCL) is used for each row of the device.
The HCL for row r is a list containing all the RRs occupying
row r and ordered by increasing nx.
Objective Functions. The ﬁrst step is characterized by an
objective function that must take directly into account the
wirelength:
Γ =
⎛
⎝
 
r∈RR
d(r,C(r)) · #{RFU ∈ r}
⎞
⎠
f
,( 8 )
where d(r,C(r)) represents the distance of the RR from its
ideal position (centroid), #{RFU ∈ r} is the number of
RFUs connected to IOBs belonging to the rth RFU, while
f is a positive number. If f is small, this indicates that the
ﬂoorplacement is feasible and if it is large the ﬂoorplacement
is not feasible. The goal of the ﬂoorplacement is to minimize
Γ. Note that the objective function is weighted by the
number of externally connected RFUs. This means that
RRs containing more RFUs connected to the external world
wouldbeneﬁtfrom apartitioning intheneighborhood ofthe
centroid. Once the wirelength-driven RR ﬂoorplacement has
been performed, the remaining RRs can be ﬂoorplaced by
a purely area-driven algorithm. This second step is guided
by an objective function involving free area and feasibility of
the ﬁnal ﬂoorplacement. Given an RRs’ ﬂoorplacement, the
following quantitiesare deﬁned: Negative area slack (N),that
is,theareaoftheﬂoorplancrossing targetdeviceboundaries,
andpositivearea slack (P),thatis,thegreatestcontiguousfree
area starting from the right-top most corner of the device
and with nonincreasing width going bottom-ward. Figure 2
shows an example of such slacks. Given such slacks, the
RR Floorplacement objective function is deﬁned as follows
(where M ∈ N and greater than thenumber of frames on the
target FPGA area):
Θ = P − M · N. (9)
This objective function Θ is positive if the ﬂoorplacement
is feasible (i.e., N = 0), otherwise it is negative (because
P<N ). The aim of the annealer is to maximize Θ,c o n s e -
quently,to provide a feasible ﬂoorplacement maximizing the
contiguous FPGA area left free for static logic.
Annealers’ Moves. Given the HCL data structure, the follow-
ing moves are deﬁned.
(1) Randomly Swap Two RRs. Randomly choose two RRs
a n ds w a pt h e i rp o s i t i o n s .
(2) Move an RR to a Randomly Chosen Position.R a n -
domly pick an RR n and two coordinates  x, y 
belonging to the device area.
(3) Span a Randomly Chosen RR over Rows.R a n d o m l y
choose an RFU, having height less than number of
ROWS, and increase its height by 1 row.
(4) Unspan a Randomly Chosen RR. Randomly choose an
RFU, having height greater than 1, and decrease its
height by 1 row. It is the inverse of the span move.
It can be noticed how the ﬂoorplacement of a large num-
berofsmall functionalunitsiseasierthantheﬂoorplacement
of a small number of large functional units, as shown in
Figure 3.Abadchoiceofﬂoorplacementofalargefunctional
unit during the early stages of the ﬂoorplacement is diﬃcult
to correct in the later steps, particularly when temperature
decreases rapidly and each correcting move is likely to be
rejected because it results in a worse objective function.
5.5. Identifying Optimal Number of Partitions. The three
steps described above comprise our ﬂoorplacement frame-
work. The overall framework relies on the concept of
partitioning, hence, some ﬁnal remarks on how we control
the granularity of these partitions will be useful. In order
to identify the most suitable number of partitions let us
consider the maximum number of concurrently conﬁgured
CC R F U s ,t h a ti s ,h o wm a n yR F U sa r ep r e s e n ta tm o s ti no n e
static snapshot in any partition. We deﬁne this quantity as
follows:
CC = max
 
np,b | #
 
RFU r | p ∈ TIME(r) ∧r ∈ Bb
  
,
(10)
where B1,...,B#partitions represent the diﬀerent output par-
titions. CC can be considered as a good metric to describe
the complexity of the entire partition. Let Γ be deﬁned as
the global normalized variance in resource requirements,
used to represent the heterogeneity of the partitions. We
have observed experimentally that the number of partitions
minimizing the product of Γ and CC provides (in resource-
driven partitioning algorithm) a good tradeoﬀ between
partition complexity and intra-partition variance.
6.ExperimentalResults
Our proposed approach for the wirelength and resource
managementhasbeenvalidatedbothonrandomlygenerated
task graphs and real-world applications from the domain of
data processing for biomedical applications (i.e., collecting
data from sensors, performing some preprocessing like FIR
ﬁltering, computing error detection codes, and sending
data through a network). General consideration about the
performance in ﬁnding the optimal results can be found10 International Journal of Reconﬁgurable Computing
Column
Row
(a)
P
N
(b)
Figure 2: (a) A variationof horizontalconstraintgraphused for ﬂoorplanrepresentation. (b) Negative (N)an dp o si t i v e( P)a r e as l a c k s( t h e
empty space on the second row is not included in P because it has a width greater than one of the empty areas in the upper row).
Row 0
Row 1
Row 2
Row 3
Figure 3: Limits of the ﬂoorplacement of big modules.
in [21]. The results introduced by this new approach are
application dependent, therefore we will describe three
diﬀerentmetricsusedtoevaluatetheresults.Foreachmetric,
we will provide the description of the task graphs. (Table 2
shows a summary of the metrics and a summary of our
comparison with an existing ﬂoorplacement method that is
only resource aware, but does not consider the wirelength
implications of resource management [7].)
Our results conﬁrm that introducing the wirelength
awareness indeed improves the interconnect cost signiﬁ-
cantly. We observed a reduction of 90% in external wire-
length in the best case and an averagereduction of50%. Task
graphs with few externally connected RRs lead to the biggest
wirelength reduction. On the other hand, the reduction
for external wirelength in task graphs with many externally
connectedRRswithnear centroidsisless. Insuchtask graphs
onlyfew RRscanbe placedneartheircentroidsand theother
RRs have to be placed far away. The number of hardware
macros provided by the static part can be reduced by 90% in
the best case. Task graphs containing several RFUs accessing
(in diﬀerent static snapshots) the same set of IOBs (e.g.,
RFUs connected to the same external interface like USB)
would particularly beneﬁt from our approach, while task
Table 2: Quality metrics and the summary of the variationin these
metrics compared with existing work [7].
Metric Variation/Value
External
wirelength
Reduction in external wirelength ranging between
(90, 30)% compared to existing area-driven
method
Links
Reduction in number of links required ranging
between (90, 0)% compared to existing
area-driven approach
Blank Area (5, 35)% of the ﬁnal ﬂoorplacement
graphs having all the RFUs connected to a distinct sets of
IOBs would not beneﬁt from links reduction (this is the 0%
reduction case referred in Table 2).
Figure 4 shows how our proposed approach drastically
reduces the number of required links.
In this ﬁgure we also observe one weakness of our
approach. The most relevant drawback of our approach is
referred to as the blank area problem(i.e.,theamount ofarea
being surrounded by RRsbut not assigned to any RR). Let us
consider a set of RRs, each one being externally connected
and apply our approach several times, each time increasing
thepercentageofRRsthatare considered(bytheﬂoorplacer)
as attached to IOBs. During the ﬁrst iteration no RR is
consideredasattachedtoIOBs,whileduringthelastiteration
all RRs are considered as attached to IOBs. Figure 5 plots
the percentage of the ﬁnal ﬂoorplacement that remains as
blank area with respect to the percentage of RRs considered
as attached to IOBs as a result of this experiment.
When no RR is considered as externally connected a
5% blank area is obtained (same as the purely area-driven
approach). The peak is obtained when half of the RRs are
managed by the wirelength-driven algorithm and the other
h a l fb yt h ea r e ad r i v e n ,i ns u c hac a s et h eb l a n ka r e am a y
reach 30–35% of the ﬁnal ﬂoorplacement. On the other
hand, the blank area is generally divided in no more than
three or four areas that are wide enough to be used by
the static part of the design according to the PR designInternational Journal of Reconﬁgurable Computing 11
Area-driven
Wirelength-driven
Link
Blank area
0
0.2
0.4
0.6
0.8
1
1.2
Figure 4: Comparison of the area and wirelength metrics among
the area-driven and wirelength-driven approaches (normalized
w.r.t. area-driven approach).
ﬂow. Figure 5 shows also how the ﬁrst externally connected
RRs obtain a great wirelength improvement, while the latest
one cannot obtain such improvement due to the previously
introduced nonoverlapping constraints.
Therefore, we observe that our approach can yield
signiﬁcant improvements in the cost of the communica-
tion infrastructure by tradingoﬀ ar e a s o n a b l ea m o u n to f
blank area. Using our framework, the designer can chose
between (a) considering more RRs as externally connected,
thereby, decreasing external wirelength or (b) considering
that beyond a certain point blank area overcomes beneﬁts
provided by the wirelength-driven approach. Hence, the
designer may decide to ﬂoorplace in a wirelength-driven
way only for the most relevant RRs leaving the others to
be ﬂoorplaced by an area-minimizing approach. Finally, we
observe that the blank area problem is not an issue for
task graphs requiring most of the resources of the target
FPGA (because the feasibility of the ﬂoorplacement requires
as much area as possible to be used, hence blank area is
reduced as a consequence) and for task graphs having few
RFUs connected to IOBs (or many of them connected to the
same IOBs).
7.Conclusionsand FutureWork
In this paper, we presented a resource- and conﬁguration-
aware ﬂoorplacement framework, tailored for Xilinx Virtex 4
and 5 FPGAs, using an objective function based on external
wirelength. The proposed approach has achieved a shorter
external wirelength and a highly increased probability of
reuse of existing communication links. The reduction in
wirelength ranges from 30% to 90% in comparison to a
purely area-minimizing approach. Task graphs with few
100 80 60 40 20 0
RRs considered as externally connected (%)
0
10
20
30
40
50
60
70
80
90
100
(
%
)
Blank area (% of whole ﬂoorplacement)
External wirelength (normalized)
Figure 5: Example of wirelength reduction (normalized) and
blank area left (percentage of the ﬁnal ﬂoorplacement) for the
ﬂoorplacement of only externally connected RRs plotted with
respect to the percentage of RRs that are considered as externally
connected by the algorithm.
externally connected RRs lead to the biggest decrease, while
external wirelength in task graphs with many externally
connected RRs show lower improvement. Future improve-
ment for the work presented in this paper can be done in
considering a hybrid approach between the area and the wire
length solution. Furthermore we aim at directly addressing
the blank area problem by modifying the objective function
or ﬂoorplacing all the RRs, not connected to the IOBs,
around the RRs connected to IOBs, trying to keep the RRs
as close as possible in order to further reduce the blank area.
References
[1] ISE 9.2i Manual, Xilinx Incorporation, 2007.
[2] V. Betz and J. Rose, VPR: A New Packing, Placement and
Routing Tool for FPGA Research, Springer, London,UK, 1997.
[ 3 ]J .A .R o y ,S .N .A d y a ,D .A .P a p a ,a n dI .L .M a r k o v ,“ M i n - c u t
ﬂoorplacement,” IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems, vol. 25, no. 7, pp. 1313–
1326, 2006.
[ 4 ]S .N .A d y a ,S .C h a t u r v e d i ,J .A .R o y ,D .A .P a p a ,a n d
I. L. Markov, “Uniﬁcation of partitioning, placement and
ﬂoorplanning,” in Proceedings of the IEEE/ACM International
Conference on Computer-Aided Design, Digest of Technical
Papers (ICCAD ’04), pp. 550–557, November 2004.
[ 5 ]A .M o n t o n e ,M .D .S a n t a m b r o g i o ,a n dD .S c i u t o ,“ W i r e l e n g t h
driven ﬂoorplacement for FPGA-based partial reconﬁgurable
systems,” in Proceedings of the IEEE International Symposium
on Parallel and Distributed Processing, Workshops and Phd
Forum (IPDPSW ’10), pp. 1–8, 2010.
[6] Y. Feng and D. P. Mehta, “Heterogeneous ﬂoorplanning
for FPGAs,” in Proceedings of the IEEE 19th International
Conference on VLSI Design held jointly with 5th International
Conference on Embedded Systems Design, vol. 2006, pp. 257–
262, 2006.
[ 7 ]A .M o n t o n e ,F .R e d a e l l i ,M .D .S a n t a m b r o g i o ,a n dS .O .
Memik, “A reconﬁguration-aware ﬂoorplacer for FPGAs,” in12 International Journal of Reconﬁgurable Computing
Proceedings of the International Conference on Reconﬁgurable
Computing and FPGAs (ReConFig ’08), pp. 109–114, Decem-
ber 2008.
[8] S. N. Adya and I. L. Markov, “Fixed-outline ﬂoorplanning:
enabling hierarchical design,” IEEE Transactions on Very Large
Scale Integration(VLSI)Systems,vol.11, no.6, pp. 1120–1135,
2003.
[9] K. Bazargan, R. Kastner, and M. Sarrafzadeh, “3-D ﬂoor-
planning:simulated annealingandgreedy placementmethods
for reconﬁgurable computing systems,” in Proceedings of the
10th IEEE International WorkshoponRapid SystemPrototyping
(RSP ’99), pp. 38–43, June 1999.
[10] M. Vasilko, “Dynasty: a temporal ﬂoorplanning based cad
framework for dynamically reconﬁgurable logic systems,”
in Proceedings of the 9th International Workshop on Field-
Programmable Logic and Applications (FPL ’99), pp. 124–133,
Springer, London,UK, 1999.
[11] P. H. Yuh, C. L. Yang, and Y. W. Chang, “Temporal ﬂoor-
planning using the T-tree formulation,” in Proceedings of
the IEEE/ACM International Conference on Computer-Aided
Design, Digest of Technical Papers (ICCAD ’04), pp. 300–305,
November 2004.
[12] P. H. Yuh, C. L. Yang, and Y. W. Chang, “Temporal ﬂoor-
planning using the three-dimensional transitive closure sub-
Graph,” ACM Transactions on Design Automation of Electronic
Systems, vol. 12, no. 4, article 37, 2007.
[13] L. Singhal and E. Bozorgzadeh, “Multi-layer ﬂoorplanning
on a sequence of reconﬁgurable designs,” in Proceedings of
the International Conference on Field Programmable Logic and
Applications (FPL ’06), pp. 1–8, 2006.
[14] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani,
“VLSI module placement based on rectangle-packing by the
sequence-pair,” IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems, vol. 15, no. 12, pp. 1518–
1524, 1996.
[15] Virtex 5—Family Overview, Xilinx Incorporation, 2007.
[16] Xilinx Inc., “Virtex-5 user guide,” Tech. Rep. ug190,
Xilinx Inc., 2007, http://www.xilinx.com/bvdocs/userguides/
ug190.pdf.
[17] Partial Reconﬁguration User Guide, Xilinx Incorporation,
2010.
[18] Xilinx Application Note 290, Xilinx Incorporation, 2007.
[19] F.Redaelli, M.D. Santambrogio,andD. Sciuto,“Task schedul-
ing with conﬁguration prefetching and anti-fragmentation
techniques on dynamically reconﬁgurable systems,” in Pro-
ceedings of the Design, Automation and Test in Europe
(DATE ’08), pp. 519–522, March 2008.
[20] R. Cordone, F. Redaelli, M. A. Redaelli, M. D. Santambrogio,
and D. Sciuto, “Partitioning and scheduling of task graphs
on partially dynamically reconﬁgurableFPGAs,” IEEE Trans-
actions on Computer-Aided Design of Integrated Circuits and
Systems, vol. 28, no. 5, pp. 662–675, 2009.
[21] A. Montone, F. Redaelli, M. D. Santambrogio, and S. O.
Memik, “A reconﬁguration-aware ﬂoorplacer for FPGAs,” in
Proceedings of the International Conference on Reconﬁgurable
Computing and FPGAs (ReConFig ’08), pp. 109–114, 2008.Submit your manuscripts at
http://www.hindawi.com
VLSI Design
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
 International Journal of
 Rotating
Machinery
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Hindawi Publishing Corporation 
http://www.hindawi.com
 Journal of Engineering
Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Shock and Vibration
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Mechanical 
Engineering
Advances in
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 Civil Engineering
Advances in
Acoustics and Vibration
Advances in
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Electrical and Computer 
Engineering
Journal of
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Distributed 
 Sensor Networks
International Journal of
The Scientific 
World Journal
Hindawi Publishing Corporation 
http://www.hindawi.com Volume 2014
Sensors
Journal of
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Modelling & 
Simulation 
in Engineering
Hindawi Publishing Corporation 
http://www.hindawi.com Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
 Active and Passive  
Electronic Components
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Chemical Engineering
International Journal of
Control Science
and Engineering
Journal of
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
 Antennas and
Propagation
International Journal of
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Navigation and 
 Observation
International Journal of
Advances in
OptoElectronics
Hindawi Publishing Corporation 
http://www.hindawi.com
Volume 2014 Robotics
Journal of
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014