Tutorial on FPGA Routing by Gomez Prado, Daniel Francisco
23
Tutorial on FPGA Routing
Daniel Francisco Gómez Prado
Deparlmel1l nf E/ce/rical ami Computer Eugi"eering, Uuil'crsity of Alassachuserts. AmhersI. USA
I. INTRODUCTlON
The entire CAD proeess that is neeessary to
implement a eireuit in an FPGA (from the RTL
deseription of the design) eonsists of the following
steps:
• Logic optimiza/ioll. Pcrforms t\Va-level or
mu1li-Ievel minimization of the Boolean
equations 10 optimize arca, delay, or a
eombination of both.
• Tccllllologv mappillg. Transforms the
Boolean equations into a eireuit of FPGA logie
bloeks. This step also optimizes the total
number of logie bloeks required (arca
optimization) or the number of logie bloeks in
time-eritieal paths (delay optimization).
• PlaecmclIl. Seleets the speeifie ¡oeation for
eaeh logie block in the FPGA, while trying to
minimize the total length of intereonneet
required.
• Routing. Conneets the available FPGA's
routing resourees' with the logie bloeks
distributed inside (he FPGA by the plaeement
tool, earrying signals from where they are
generated to where they are used.
To understand beller this dependeney between
routing and the target architecture. an overvicw of one
of the most important eommereially available FPGAs is
shown belo\V
• Xilillx
The general arehiteeture of Xilinx FPGAs eonsists
of a two-dimensional array of programmable blocks,
called Configurable Logie Blocks - CLBs [24], with
horizontal and vertical routing ehannels between CLB's
rows ami eolumns. The routing resourees available on
this architecture are:
A. COJlJlecliol1 boxes
The C boxes eonneet the ehannel wires with the
input and OUlput pins of the CLBs. It has two major
properties that can affeet the routability of a design: its
flexibility, Fe, whieh is the number of wires that eaeh
logie block pin can eonneet 10; and its topology, which
is the pallern of switehes' that make the conneetion
(especially if Fe is low). For example in figure 1, for a
C box with Fe = 2, topology I can not connee! pin A
with pin B, meanwhile topology 2 can.
Fig. l. Connection box topology
Routing is an important step of the proeess as most
of (he FPGA's area is devoted to the in(ereonneet [21],
and the intereonneetion delays are greater than the logie
delays of the designed eireuit. Therefore an effieient
routing algorithm tries to reduce the total wiring arca
and the lengths of eritieal-path nets to improve the
performance of the eireuit; and for this, the router needs
the intereonneet information of the target FPGA
arehiteeture. This means that the problem of routing is
arehiteeture dependent and therefore the number of
routers needed to route FGPAs is as varied as FPGA
arehiteetures there are in the market.
Topoloqy I
-- 1 l! .. ---




CLE H :;'>- CLE--
I-~J:::,. 11-' .
Topology 2
~L~I_:~ji ¡i: ~E l
'( , I ' I e
r
--c~~'I-'~~::_lCL~-I
•• " • -< .•.••.- .
, ~ ••••• _~ B. . ,-¡ i',
I Thc clock nel is nol considcrcd hcrc as il 15usually Tauteo ..,la a dcdicalcd
rouling nclwork in commercial FPGAs
ELECTRÓNICA - UNMSM
1 Thc switchcs can be pass transistors or multip1cxcrs





e -= ConIl ••C\lOfl Block
S = S_eh Block
Switch boxes that connect only tracks in lhe same
domain, i.e, O-O, 1-1, are called planar or subset switch
boxes. Switch boxes lhat allow connection to any other
domains, i,e. 0-3, 1-2, are called Wilton switch boxes,
and they are broadly used as they provide greater
flexibility on routing.
C. Single-lellgth lines.
The S boxes allow wires to switch between vertical and
horizontal wires. lts flexibility, Fs, defines for a wiring
segment entering the S block lhe number of other
wiring segments it can be connected to. The topology
of the S blocks is very important since it is possible to
choose two different topologies wilh the same
flexibility Fs that result in very different routabilities.
For example, figure 2 shows tha! meanwhile topology 1
can't connect wire A with B, topology 2 can.
,_:[1nL--l !!lJ_ i~Dll_,!IU_,
:=.:;-: ':")---"""1: -,.: = ::~:";.::==::t: :~='-'~lr(í¡---¡~hh'-¡ ' -{rhiC'--\i'IY-
A : A
Fig. 2. Switch box topology
B. Switch boxes:
They are intended for relatively short connections
among CLBs and they span through one CLB only. See
figure 3.b,
D. Double-length lines.
They are similar to the Single-Iength Lines, except
that each one spans two CLBs, offering lower routing
delays for moderately long connection,
..• c)Double.lengthwire~
Fig. 3. IslandSlyle Architecteclure
E. Long lines
They are appropriate for connections that require
reaching several CLBs with low-skew. See figure 3,c,
Increasing the flexibility of the switch box, the
connection box and the number of wires per channel
makes routing a trivial problem [17] as all possible
interconnections are available, But ¡ncreasing rouling
resources has the drawback that waste area anJ
transistors in the FPGA, as only a fraction of those
resources will be used for a given design, even worse it
increases the number of interconnect transistors which
are the principal reason of delay on FPGAs,
As FPGAs have prefabricated routing resources, lhe
router must work within the framework of the
architecture's resources, deciding exactly which routing
resources will be used to carry the signals between
ELECTRÓNICA - UNMSM W 17,Agosto del 2006
25
1II. GENERAL BACKGROUND FOR ROUTlNG
A. Global rOlllillg
The global router perfonns a eoarse route to
determine, for eaeh eonneetion, the minimum distanee
path through routing ehannels that it has lo go lhrough.
If the net to be routed has more than two terminals the
., ,
., ,
1-" 'b -, ¡.:+-:-.t~.-. -. "e.i-+-r
. , :
Fig. 4. The FPGA Model
Rouling is an NP eomplele problem" [23] that is
generally separaled in two phases using the divide and
eonquer paradigm [8]: a global routing that balances
the densities of all rouling ehannels, and a detailed
routing lhal assigns speeifie wiring segments for eaeh
eonneetion [17][ 18][25]. These two phases avoid
eongestion and optimize the performance of the eireuit,
making sure all neIs are routed minimizing wirelength
and eapaeitanee on the path. By running both
algorithms a complete routing solution can be ereated.
Of eourse there are a number of routing algorilhms
that solve the problem using a mixed routing, both
global and detailed routing at lhe same time, based on
the idea that a higher integralion of the two phases can
prevent inaeeurate estimation and the routing result will
be bcller. The drawbaek of this approaeh is that as
eireuit size grows this mixed routing beco mes more
eomplex and less sealable [¡3].
• The wlre segments span only one logie block
before tenninating. This means thal all
intereonneetions have to pass as many C boxes and
S boxes as logie bloeks there are between the two
eonneeting points.
Commereial FPGAs have double-Iength and long
wires to speed up this kind of eonneetions, and
avoid eongesling lhe C and S boxes.
• Eaeh logie block has 4 inputs pins and I output pin,
and all logie bloeks are alike.
Commereial FPGAs have logie bloeks with
different number ofinputs, ranging from 3 to 7, and
they provide two or more outpuls.
• The C box is implemented with pass transistors
rather than multiplexers for input eonneetions. This
allows two or more traeks to be eleetrieally
eonneeted via the input pin by tuming on individual
switehes in the C box. This is ealled inpul pin
doglegs.
Commereial FPGAs implement the C box via
multiplexers to save area, so only one traek may be
eonneeted to the input pin and no input pin doglegs
are possible. See figure 4.
11. THE FPGA 1\I0DEL
Aeademie researeh has adopted as FPGA
arehiteeture a simplified version of the island style
model from Xilinx. The main reason is that FPGA
market share is divided in mainly three eompanies:
Xilinx with the highest share has an average presenee
of roughly half of the total marketl, Altera has roughly
one-third, and Aetel has one-six of the market. From
lhese three eompanies Aetel and Altera have
respeetivcly solved their routing problems by adapting
ehannel-style ASIC routing algorithms [1] or over
assigning routing resourees [2]; so the active researeh
area left to academia is the island style arehiteeture
from Xilinx FPGAs, nevertheless this is an important
arehiteeture as it is responsible of half of the entire
FPGAs produetion [3] on the market.
In academia the most eommon simplifieations made
to lhe island style model are:
logie bloeks, and making sure that no more eonneetions
are made through a region than there are resourees to
support them. Thus the router must eonsider the
eongestion of signals in a ehannel, and through multiple
iterations rip out and reroute those eongested areas and
wires. This seareh of possible eonneetions to route the
plaeed logie bloeks is not ensured to be feasible and it
is possible that after a given number of iterations, 40
for example, the eireuit ean't still be routed and the
plaeement has to be redone. Therefore, together with
the routing algorithm a routability deteetion algorithm
is c1early desirable to avoid long routing iterations on
designs that eventually will be determined to be
unroutable.
1 FPGA markcl share research by www.rhk.comlrhk/rcsearch and
www.icinsight.col1l ~ Therc is no polynomial algorithm that can solve the problcm.
ELECTRÓNICA - UNMSM N" 17,Agosto del 2006
26
difTerent results. For i.e.: if the order in which
the two nets are routed in figure 5 is reversed, a
bener solution is found.
global router will break the net into a set of two-
terminals connections and route each set independently.
The global router considers for each connection
multiple ways of routing it and chooses the one that
passes through the least congested routing channels. By
keeping track of the usage of each routing channel,
congestion is avoided; and the principal objective of the
global router, balancing the usage of the routing
channels, is achieved.
Once all connections have been coarse routed, the
solution is optimized by ripping up and re-routing each
connection a small number of times. Afier that, the
final solution is passed to the detailed router.
.• 1 .; 3
J ! I ~ )
] J 1 1 J 1




J I Q ~ , , o 1 2 1 • ~ & 7 '1
1(1 .• '71'01 Jl~5,r









C~OI'l¡'''''''' t ""' t::.
,, ,
• • 1
9 o I 1 3
~tOI2]
1 , 'J Q 1 2 l
t101lU-
Q (1 I J );
~.."" ..
(ltl~QrlSOIlfC.
'--. .•.,' "., , ,., , , I ,
." , , •.. 1 1 , , • •, • , , • , l •• 1 , , • •• , •• 1
•
0ic"0fI~1(4 - -,,-o ", .•..
B. Detail routing
The detail router determines for each two point
connection the specific wiring segments to use in the
routing channel assigned by the global router. To do
this, detail routing algorithms construct a directed
graph from the routing resources to represent the
available connection between wires, e blocks, S blocks
and logic blocks within the FPGA.
The search performed on this directed graph is
usually based on Dijkstra's algorithm to find the
shortest path between two nodes. The paths are labeled
according to a cost function that takes into account the
usage of each wire segment and the distance of the
interconnecting points. The distan ce is estimated by
calculating the wire length in the bounding box of the
interconnecting points using a Manhattan metric. Most
of the routers relax the bounding box constraints and
allow searching for possible solutions in the
surrounding routing channels ofthe bounding box. This
is done to avoid subsequent iterations of ripping out
and re-routing if the solution lies on the near outside of
the bounding box.
The most common detail routing algorithms are:
• Maze Rauter
The Maze routing algorithm is based on a
wavefront expansion technique that attempts to
find the shortest path between two points while
avoiding any used routing resources [4]. This
algorithm is an iterative process that rips up and
re-routes sorne of the routes to eliminate
congested routing channels.
The principal drawback of the maze routing is
that it does lhe routing without taking into
account that the path found can block the
routing of the subsequent nets. This means that
the performance of the algorithm is net ordering
dependent, and different orderings will yield
~ By breaking the multiple Qulpul net in a sel oftwo.nel connections. the
global rouler is (mast likely) allowing dogleg pino
ELECTRÓNICA - UNMSM N" 17,Agosto del 2006
C!tl.0I4~
Fig. 5. 111C Mazc router wavcfront6
• A' Seareh RoutiJlg
The maze routing is a special case of the A'
routing. The A' routing allows to tune the path
search from a breadth-first search algorithm into a
shorter dcpth-first search algorithm. The BFS is an
exhaustive search that consider all possible paths
and will find the best path if there is any but has
the drawback that it can be slow; meanwhile, the
DFS may not find the minimum cost path but can
be fast. See figure 6.
Weighting a scaling factor a between O and 1 the
A' routing tunes the search from BFS to DFS. The
cost function used to evaluate the directed graph for
each node i is [15]:
.Ii = (1 - a) x ((,-1 + ei) + a x di
Where ei is the node cost and indicates the current
usage of the node and it is used to penalize nodes
occupied by previous routes; .Ii-I is the total cost of
the previous path, and di is the estimated cost of the
path from the node i to the destination.
'0 O O '0 O O
-
n O O n O O O
.0 O O n O O O
'0 O 'O '0 O 'O 01 111
Fig. 6. BFS and DFS algorithms
In FPGA architectures with planar or subset switch
boxes, wires can only change domajn at the output of a
logic block or at an input dogleg pin; this means that
the route from output to input is confined to the same
track domain. Therefore in a DFS search, it is important
to attempt routíng first in domains that have high
probabilitjes of completion, so that subsequent DFS
1> sourcc: http://foghom.cadlab.lafaycuc.cdu/cadappletslMaleRouter. hlm I
ELECTRÓNICA - UNMSM
27
searches in different domains will not be needed. This
is the concept of domain negotiation.
Doma;n negotiation consísts on ranking the domains
based on the usage of its wires adjacent to the output
pins before routing. Then the cost function is modified
by[15]:
.Ii = (1 - a) x ((,-1 + ei) + a x di + rd
Domajns with lower congestion will have a lower
rank, rd, thus promoting routing in less congested
domains first.
• The Pathfinder
The pathfinder algorithm is based on the maze
router, but speeds up the algorithm by routing every
connection on a free obstacle environment and allowing
routing resources to be overused,
Afier a single iteration of the algorithm, all nets are
routed once as if they were the only connection to be
routed; and the cost of using every resource is
calculated according to its demando The cost function
implemented by the pathfinder is [10]:
.Ii = (1 + h. 'hlae) x (1 +P. 'Plae) + b •.•• !
where b•.••! is the penalty of bending the wire, p. is
the cost of using a speeific wire, h. is the history that
keeps track of the usage of the wire during previolls
iterations; and, hlae and Plae are the respective weighting
factors.
Subsequent iterations rip up and re-route all nets,
and the process goes on until no overuse of routing
resources exist. This process of ripping out and re.
routing every net allows the pathfinder algorithm to
minimize the net ordering problem ofthe maze routing.
IV. THE STATE OF THE ART IN ROUTING
The routers described in this section represent the
trend in FPGA routing research. Even thought these are
academic tools and they don't actually route any real
FPGA, they are importan! because modifying the used
model architecture, the core algorithms implemented on
these tools can be effectively use to route commercial
FPGAs.
A. VPR: Versatile Place aJld Route
The VPR router is one of the most versatile routers
available in academia as it allows describing the
targeting architecture. It can be used to route island
. style FPGAs as well as row-based FPGAs [19]. In this
N" 17,Agosto del 2006
28
• VPR's roulability-driven router
L _ _ _ ..:J
(el VrR mdh".l "''''''(a'',
""\<'fr,,nr ",,,1 c"l'.uhl
"f",u).1 n.:\' ",r"
['1'"",,,.,, I{•...,,' ••.•IIJ al\>lIl,,1 EXI\llhi"ll
".lwfr ••••, ,1<.''' \"r~ ";""""~l'
r -..1.,
¡lll Tr" ...hh"",t Oldh •••t




l. -r - ---'~
Curren! 1""11.,1 Sil'" f~""hnr
¡{,>l,l,n;:
1", ['1""""'0 r ••."'I••.., .• <mI;
t'",:"llllc,I,.J
.,,,l
dynamieally inside lhe ileration of the algorithm, when
routing every net; and lhe olher one is computed once
at the beginning of each iteration.
Fig. 7. VPR wavcfront expansion
VPR's timing-driven rouler
The objeetive of this algorilhm is to increase
hardware cireuit speed. To do lhis, it adds an Elmore
delay model to lhe function cost, so the rouling gives
preference to those solulions wilh minimum delay. To
sel an upper bound on delay, this algorithm starts
routing lhe nets with most distant connections firsl. The
imposed ordering on routing produces suboptimal lrack
counts, and faster resulls.
Another modification common to bolh approaches is
lhal for multiple oulput nets the maze wavefront
expansion is modified. As mentioned before the global
roule breaks all 11 tenninal net into 11-1 two-te1l11inal
nets, and it performs 11-1 iteralions of lhe wavefront
expansion to connect lhe nets. The norrnal maze router
empties lhe wavefront expansion for each ileralion,
meanwhile the VPR router does not empty the current
wavefront, it adds all lhe routing resouree segmenls
required to connect the reachcd terminal to the
wavefront with a cost of O, and it continue expanding
n01l11ally; therefore, the next terminal will be reached
much more quickly lhan if the entire wavefront
expansion would ha ve been slarted from scratch. Figure
7 shows a) a wavefront expansion; a normal maze
router when reaches a terrninal net emplies the
wavefront and restart it from the beginning, as shown
in b), meanwhile the VPR router adds the lasl net found
to the wavefront with a cost of O and continues
expanding it, see e).
Other than the modifications stated abo ve the VPR
algorithm behaves as a Palhfinder algorilhm [19]
rouling each net by the shortesl path it can find
regardless of any overuse of routing resources, and
ripping up and re-routing every net in the circuit and




1/" = It' ti + max( O, [1 + occupancYII - capacit)',J *h¡'Jc)
router the type of switch boxes for the FPGA can be
ehosen to be [20]: planar, Wilton or universal; differenl
length of wires can be defined, input dogleg pins can be
allowed or disallowed; and the paramelers of the eost
funetion can be modified. The VPR router can perfonn"
a global rouling or a eombined global-detailed routing;
being lhe VPR eombined router able lo ehange the
eurrent global routing eonfiguration when it can not
easily find a detail routing solution [6].
This router is based on a modified version of the
Pathfinder algorithm, and it can run in lwo different
flavors lO target two differenl main objeetives:
p" = 1 + max( O , [1 + occupancYn - capaci/J',J *Pjilc)
COSI"= b" *h,,*p'l + bend",m
where b" is lhe base eost, usually 1 or 0.95 for mosl
routing resourees and O for sinks, lhe latter is to prevent
the algorithm to keep searehing for possible
eonneetions if the sink can already be reaehed. Note
lhat eongestion in the sink is not possible as it will
mean that the design requires an input to be driven by
two different sourees, therefore a base eost of O for lhe
sink improves lhe running time of the algorithm
without affeeling its quality. belld".m penalizes bending
the wire when routing and it is only taken into aeeount
by the global routing. p" is the presenl eongestion
penalty and ils value is lhe differenee belween lhe
number of neIs using a ehanne! and the number of
wires thal can be plaeed on that ehanne!. It is eall
present congcstion beca use its value is updated within
an ileration of the algorithm to avoid overusing a
channe!. Its eost is given by:
The primary objeetive of this algorithm is routing a
design sueeessfully with minimum traek eount. For
lhis, the routability-driven incorporales a modified
routing eost model as show below [22]:
where hfae is a conslant value between
The present congestion and the historieal
are eomputed similarly, but one IS
with Pfae equal to 0.5 in the first iteration and to 1.5
or 2 times its previous value in subsequent interations.
h" is the historical congeslion penalty and it keeps track
of previous cost of lhe resources, thus avoiding reusing
a channel in subsequenl iteration. It starts with a value
of 1 in the first iteration and lhen it is:
ELECTRÓNICA - UNMSM N' 17, Agosto del 2006
29
Where ad jli. (ni) are the neighbors of ni thal are on the
track T, , and /(11) is the lenglh of "j in terms of the
lracks segment it occupies. The previous funclion can
be further improved by looking ahead to the next
transitions, this is done by calculating the minimum
cost of going fram T¡to Ihe lrack T, and Ihen from T, to
the Iraek T" the tinal eost is given by:
B. ROAD: An Order-Imperviol/s Optimal Detailed
ROl/ter/or FPGAs
The rauters described so far have been based on the
rip-out and reroute paradigm. The ROAD router is
based instead in the bump and retit B&R paradigm.
The main idea of this paradigm is to modify the nets
already routed when a new conflicting net is found. lt
starts by routing the nets one by one unlil a conflict is
found, if there are other tracks that can suecessfully
raute the conflicting net, the prablem is solved and the
next net is rauted. In the case Ihal all rauting resourees
have been used and no other tracks are available, the
router bumps all tracks conflicling with Ihe resource
needed, and Ihen all those unrouled net segments are
retitted, as at Ieast one of them wont be able to tit
properly in the design, this would cause the untitted
track to be rauted thraugh another channel possibly
bumping another traeks. In Ihis way, the B&R
paradigm makes net eongested arcas to be depopulated.
Therefore in the B&R algorithm when a Irack is
bumped, the bumped track can be propagaled
producing a path with many bump searches until a
vacant resource or a spare rauting arca is found; and if
all possible palhs are exhausled and no solution exists
or a cycle is detected (a previous bump in the path is
revisiled) a backtrack to the initial conflicting resource
is done and anolher track is bumped instead.
Even thought this represenls no problem for an
incremental router, in which some neIs have been
added lO an exisling rauting in an FPGA, and the goal
is to raule the new connections without changing the
global topology of the existing nets; for a detail rauter
this represents a main prablem as many more of the
prior routed nets are bumped, thus leading to extensive
and time consuming depth tirst based searches.
To overcome this prablem three major
enhancements are done to the space search in Ihe B&R
algorithm:
o
clique while rauling a path <P, and Ihe maximum
tracks per channel available in the FPGA is t,
then if bumping the net produces (k+m) > t the
depth tirst search is prune as the solution will be
unfeasible. The number m in Ihe inequality
above is the number of unusable tracks for Ihe
clique, this is Ihe m nets in the clique whose
adjacent are anceslor of the path <p; and
therefore they can't be used to raute the actual
path as they will cause a cyclic conflict.
Lookahead transition cost funclions: This is a
cosl function that measures which transition of
the net ni 011 track '0 to lhe track Tk , ll;j--.T! • is
more likely to succed so fewer searches are
performed and backtracked. Two principal
factors considered on thc cost function are the
total wirelength of lhe bumped nets and the
flexibility of the bumped nets. This flexibilily
means that if there is a solulion wilh one nel of
wirelength 9, and there is anolher solution
produced by 3 smaller nets each of wirelength 3,
the set of smaller nets will be more likely lo
praduce a feasible solution as it is easier to
move smaller nets than a big one. The cost
funclion will be [7]:
LI(nj)
"J€adj"(n,)
o Leaming-based scarch space pruning: This
lechnique records the unsuccessful bumps of a
net, and if later on, in another depth tirst search
pracess we try to bump the same net again, a
comparison on the search graphs is done. 1f both
search graphs are found to be isomorphic the
bumping of the net is disregarded as il will be
unsuccessful, thus saving search time.
o Clique-based search space pruning: This
technique dynamically determines the presence
of cliques among nets and its size k is used to
determine the minimum number of different
tracks needed to raute successfully the nets in
the clique. If we attempt to bump a nel in the
ELECTRÓNICA - UNMSM
The three enhancements do not affecl the quality of
the rauting resuIt as they only prune results thal are
suboptimal or search spaces that do not yield any result;
and with these enhancements lhe basic B&R algorithm
is sped up 604 limes, which makes the algorithm fasl
enough to perfoml not only incremental but complete
detail rauting.
N" 17,Agosto del 2006
30
The ROAD delail router based on B&R is
implemented sueh as if no solution exits (the eireuit is
unroutable) one traek is added to the ehannel so the
router can find the minimum traek solution for the
given plaeement and global assignment. This router is
said to be independent of the net order in whieh it
routes beeause bumping previous routes is equivalent to
reversing previous routing decisions, or changing the
order in whieh the nets are routed.
C. ROllting Approoeh Via Seareh-Based Boa/can
Satisjiability
This approaeh addresses the routing problem
eompletely different, Iransfomling the eomplex
inleraetions of the routing eonstraints as a Boolean
funetion. It has two main virtues: alJ paths for alJ nets
are eonsidered simultaneously as the routing eonstraints
are a sel of equations that need to be satisfied
simultaneously; and the Boolean funetion is satis fiable
if and only if Ihe design is roulable. The latter means
that ifthere is no salisfying assignment for the Boolean
funetion, it is proven that no rouling solution exist for
the design, for the given plaeement and global route
assignment.
The Satisfiability-Based Detailed Router (SDR)
takes as input the eonneetions assigned for a global
router and produces two types of eonstraints [6]: The
eonneetivity eonstraints, ensure that a net has a
eontinuous path between the net temlinals; these
eonstraints form a list of traeks and C boxes that can
possibly form lhe path in ehannel segment. And the
exclusivity eonstraints, ensure that different nets are
assigned to different lraeks in a ehannel so no overlap
oeeurs. In the interseetion of horizontal and vertical
ehannels (S boxes) the eonstraints of a same net are
eonneeted by the logie operation AND.
Once these eonstraints have been formulated they
are lransformed and eneoded inlo Boolean equations
represented in Conjunetive Normal Form - CNF
clauses. The eonjunetion of alJ eonneetivity and
exclusivity eonstraints for all nets form the routing
eonstraint Boolean funetion, whieh models the routing
problem as a whole. The Boolean SAT solver takes as
input the routability funetion and tries to satisfy Ihe
assignments or to prove that the given layout is not
satisfiable. If the layout is satisfiable, the solution is an
assignment of binary values l or O to the Boolean
variables whieh eneode the traek variables. This
information is transformed into an assignment of actual
routing resourees (traeks, C boxes and eorresponding S
boxes) to nets whieh forms the actual FPGA routing
soJution. If it is not satisfiable, then no legal detailed
ELECTRÓNICA- UNMSM
routing solution exists wilh the given plaeement and
global routing topoJogy.
It is important to note that the equations are written
in the CNF form and they are not represented as a
BDD'. A BDD satisfiabiJity approaeh explieitly
represents alJ possible satisfying assignments as paths
through the BDD direeled aeyelie graph, and any path
found in this graph to l is said to satisfy the problem,
and if sueh a palh doesn't exist it is said to be
unsatisfiable. The problem with BDDs is that without a
good variable ordering lhe BDD graph can beeome
memory-unmanageable in intermediate eomputations;
and finding a good variable ordering is an NP-eompJete
problem.
Instead of BDDs the SAT instanees ereated from the
routing eonstraint Boolean funetion are solved using
GRASP [5][6][16], a generie seareh algorilhm for the
satisfiability problem. GRASP is based on seareh
leehniques that analyze eonlliets on the graph and base
on this it can prune large portions of the seareh spaee.
The analysis yields the causes that produce eonlliets,
and this information is reeorded to reeognize similar
eonlliets on the graph and assignments that are
neeessary for a solution to be found. This means that
GRASP searehes to find one satisfying assignment, and
must seareh exhaustively to eonclude that no satisfying
assignments exist; a trade-off of more seareh-time for
manageable memory sizes.
V. CONTRAST OF THE APPROACHES
To thoroughly nnderstand the differenees among the
routers previously deseribed, it is neeessary to compare
them based on the ideal objeetives of any router:
• Unrolltabilily deteelion
Only the SAT approaeh is able to prove that the
layout is unroutable for a given plaeement and
global route assignment, though this eonclusion
can take long, as it has to determine that the
SAT problem is unsatisfiable and this mean s
that alJ possible seareh eombination have to be
done. VPR can not detem'¡ne routability and it
stops searehing afler 30 iterations assuming by
then that the eireuit is unroutable, during th<,e
iterations VPR global-detailed router can
modify the global assignment if it simplifies the
detail rouling operation. This eharaeteristie
alJows VPR to find routing solutions that the
SAT solver determines as unroutable, as the
SAT solver relies heavily on the global
7 Binary Dccision Diagram is a graph represcntation for Boolean funclions
based on the Shannon expansiono
N" 17,Agosto del 2006
assignment, and the VPR actually perfonns
modification on the global assignment if
needed. The ROAD approach is not really
concem with detennining routability of the
given layout as its ultimate goal is to route the
circuit with minimum track eount, so if it can't
find a solution wilh a given lrack counl il will
add one lrack lo lhe channel and it will conlinue
routing lhe designo
The overall unroulability delection classification
ofthe roulers is lhen SAT, VPR and ROAD.
• RUllning time
The fastest rouler is the ROAD algorilhm,
wilh the enhancemenl perfonn to the basie
B&R algorithm ROAD is able lo route in
average 2 times faster than VPR roulability-
driven rouler. To perfonn lhis comparison
VPR is run as a eombined global-detailed
router and as global only rouler, lhe
differenee of lhese values is assume to be
the time spent in lhe detail rouling for VPR,
and the eomparison is against this value.
VPR timing-driven router is [20] 2 lo 10 times faster
than VPR routabilily-driven router, so in average VPR
timing-driven router is still faster lhan ROAD by l to 5
times faster.
To establish lhe running time of SAT we compare
the benchmarks provided on [6] and [7] (see lable
below), only two circuits can be compared ALU2 and
VDA. A straight forward eomparison of these eircuits
is misleading and concludes that VPR is 3 limes faster
than SAT and for those speeifie eireuits ROAD is 18
times faster than SAT. Sueh comparison is false as
ROAD and VPR benehmark were run on a 550Mhz
Pentium 1II and SAT was run on a 170Mhz Ultra 5
Spare, so a more fair comparison speeds up SAT results
by a factor of 3 yielding lhat SAT and VPR have
approximately lhe same running time and ROAD is 6
times faster for those speeifie cireuits. The laller
eomparison only gives us an idea of the running time
performance of SAT and more benehmarks needs to be
eompared before a final eonclusion on SAT running
time can be made.
TABLE 1.COMPARISONFROM
Ckt name VPR ROAD
ALU2 8.54 1041
VDA 34.13 4.99
The overall running time classification of the routers is




• Minimum Track caunt
Bolh ROAD and VPR roulability-driven router
aehieve lhe same minimum number of traeks
per ehannel on all benchmarks. VPR timing-
driven router requires 5% more tracks than
ROAD and SAT rouler requires about 25%
more than ROAD.
These results are heavily correlated with the
cost function implemented inside the routers.
VPR has two different kind of eost funetion for
eaeh of its routers, being lhe routability-driven
algorithm coneemed wilh achieving minimum
traek coun!. ROAD does nol really have a cost
function oriented toward minimum track counl,
it is more seareh-base pruning oriented; but the
faet thal ROAD looks for the clique
intereonneetivilY and no lraeks are added lo the
ehannel unless it is mandatory makes this
router to always find the minimum lraek eoun!.
SAT doesn'l have any cosl funclion
implemented in its algorithm and its seareh is
eomplclely "!lat", as it only looks for a solution
regardless ofils oplimalily.
The overall minimum traek eount classifiealion
of the roulers is then VPR routabilily-driven,
ROAD, VPR timing-driven and SAT.
• Memor)' utiliza/ion
Even lhough memory utilizalion has not been
addressed as an objeetive for any of the routers
deseribed, a small eomparison can be
perfonned by realizing lhe eorrelation belween
memory and running lime of an algorilhm.
Thus VPR timing-driven algorithm being the
fastest has the leasl memory requirements,
ROAD with its seareh-based pruning and a
running time fasler lhan VPR roulability-driven
has second leasl memory requirements, VPR
roulabilily-driven is the lhird and lhe non-
direeted seareh of SAT that needs lo solve
simultaneously all routing constrainls has the
highest memory requiremenls.
The overall memory ulilization classifieation of
the roulers is lhen VPR timing-driven, ROAD,
VPR routabilily-driven and SAT.
• Circuit speed
The only rouler eoncemed with this objeetive is
the VPR timing-driven algorithm, so all the
other approaehes will show slower perfonnance
and higher delays.
It can be seen that different approaehes lO the same
problem inherently targets different main objeetives,
VPR heuristically seareh for minimum traeks by
minimizing the net ordering problem while its modified
version looks for fast running times allowing slightly
N" ¡7, Agosto del 2006
32
higher traek eounts, B&R finds the optimal mll1,mum
trae k by overeoming the net ordering problem, and
SA T can formally determine whieh eireuits are
uoroutable.
VI. SUl\Il\IARY
It has been shown that FPGA routing is a eomplex
problem and even thought historieally it has been
underestimated by VLSI designers, due to its fixed
routing resourees that should make the routing easier, it
has been all the eontrary. Fixed routing resourees
makes routing in FPGA a mueh harder problem sinee
multiple and all eonstraints have to be satisfied to
sueeessfully route the designo
Most approaehes to FPGA routing ha ve been based
on the divide and eonquer paradigm, in whieh the
rouling has been split in two phases, a global route Ihat
sparse the traek requirement throughout the FPGA and
a detail router that does the actual assignment of
routing resourees. From these two phases Ihe detail
router is a mueh harder problem as it has to eonsider in
deep the arehiteeture of the FPGA. For Ihe detail router,
the maze algorithm has been the starting point and
different modifieations and improvements have been
done to the basie algorithm with the A * seareh and the
pathfinder algorithm, and finally this approaeh has
reaeh is state ofthe art with the VPR tool.
Of eourse the maze & rip-out & reroute algorithm
used by VPR has not been the only approaeh to the
routing problem, and two other different approaehes
have been shown, the ROAD router based on bump &
refit paradigm, and the SA T router based on the
satisfiability eonstrainls of an equivalent Boolean
funetion of the routing problem. The summary
perfonnanee of these three different approaehes can be
seen in the next lable.
TABLE 2. ROUTER COMPARISON
VPR VPR ROAD SAT
Timing routability
-driven -drivcn
Unroutabilit Heurist Heuristic None Formal
V deteetion ,e Pro ve
Running Best Good Very Bad
time Good
Minimum Good Best Best Bad
track eoun!
Memory Best Good Very Bad
rCQuircment Good
The above eomparison shows that ROAD approaeh
is a niee trade off between the two different flavors of
ELECTRÓNICA - UNMSM
VPR; and that more researeh has lo be done in Ihe SA T
approaeh to make it eompetitive, maybe adding a
speeialize metrie eost on the seareh to reduce the
number oftraeks and speed the running time.
The approaehes presented here are not the only
ones, and there are many more that can outperfonn the
approaehes deseribed in a particular objeetive. For
example Justln Time routing [13] intended to place the
routing algorilhm in hardware so it can reroute and
reprogram anolher FPGAs aehieves outstanding
running times anu very few memory requirements with
the penalty of requiring more traeks; another approaeh
differentiate the so far eombined uelay minimization
and wirelenght optimization [11], by using Steiner
graphs to obtain beller eireuit performance, and so me
others are eapable of deteeting routability as early as in
the first iteration of the pathfinder roUler using some
heuristie [9] [14).
The uifferenl approaehes in routing and the different
perfonns obtaineu do nol mean that researeh on some
trends should be pruneu as they have nol outperform
any previous router. AII research in lhe area enlightcn
the routing problem from a different perspeetive thus
helping lo refine or to improve existing algorithms or
even to combine so me of them.
As arehiteelures evolve and the logie inside eaeh
logie block grows, routing resourees will be more
searee and routing will be more eonstraint, Iherefore
FPGA routing will always be an active tapie of
researeh throughout the life ofFPGA teehnology.
REFERENCES
[1] ACle/IIIC, Axceleratorfamily FPGA, 2004.
[2] Allera CO'1JOralioll, Slra!ix 11 Device Halldbook,
Volul11e 1, Ju12004.
[3] Eleclrollics /Veekly; ABI/INFORM Trade &
Inuustry, Feb 25, 2004, pg. 12
[4] Mo, A, Tabbara alld R. Brayloll, A Foree-Direeted
Maze Router, Department of EECS, University of
California al Berkeley.
[5] Nam, K Sakallah al/{I R. RlI/ellbar Satisflabilily-
Based Layou! Re!'isiled: Detailed Routing of
Complex FPGAs Via Seareh-Based Boolean SA T,
ACM/SIGDA IlIlemaliollal Symp 011 FPGA, 1999,
pp. 167-175.
[6] G.Nam, K. Sakallah alld R. Rulellbar, A Ne",
FPGA Delailed ROUlillg Approach Via Seareh-
Based Boa lean Satisfiability, IEEE Trallsacliolls
011 compuler-aided desigll of integraled circui!s
al/{Isys!ems, vol. 21, N°, 6, june 2002.
N" 17, Agosto del 2006

