Abstract. In this paper, we propose an order-independent global routing algorithm for SRAM type FPGAs based on Mean Field Annealing. The performance of the proposed global routing algorithm is evaluated in comparison with LocusRoute global router on ACM/SIGDA Design Automation benchmarks. Experimental results indicate that the proposed MFA heuristic performs better than the LocusRoute in terms of the distribution of the channel densities.
Introduction
This paper investigates the routing problem in Static RAM (SRAM) based Field Programmable Gate Arrays (FPGAs) 7]. As the routing in FPGAs is a very complex combinatorial optimization problem, routing process can be carried out in two phases: global routing followed by detailed r outing 5]. Global routing determines the course of wires through sequences of channel segments. Detailed routing determines the wire segment allocation for the channel segment routes found in the rst phase which enables feasible switch b o x i n terconnection congurations 5, 9 , 1 0 ]. Global routing in FPGA can be done by using global routing algorithms proposed for standard cells 5] . LocusRoute global router is one of this type of router used for global routing in FPGAs 4] w h i c h divides the multi-pin nets into two-pin nets and considers only two or less bend, minimum distance routes for these two-pin nets. The objective in LocusRoute is to distribute the connections among channels so that channel densities are balanced. In this work, we propose a new approach for the solution of global routing problem in FPGAs by using Mean Field Annealing (MFA) technique.
MFA merges collective computation and annealing properties of Hop eld neural networks 2] and simulated annealing 3], respectively, to obtain a general algorithm for solving combinatorial optimization problems 1]. MFA can be used for solving a combinatorial optimization problem by c hoosing a representation scheme in which the nal states of the spins can be decoded as a solution to the target problem. Then, an energy function is constructed whose global minimum value corresponds to the best solution of the target problem. MFA is expected to compute the best solution to the target problem, starting from a randomly chosen initial state, by minimizing this energy function. Steps of applying MFA technique to a problem can be summarized as follows.
1) Choose a representation scheme which encodes the con guration space of the target optimization problem using spins. In order to get a good performance, number of possible con gurations in the problem domain and the spin domain must be equal, i.e., there must be a one-to-one mapping between the con gurations of spins and the problem.
2) Formulate the cost function of the problem in terms of spins, i.e., derive the energy function of the system. Global minimum of the energy function should correspond to the global minimum of the cost function.
3) Derive the mean eld theory equations using this energy function, i.e., derive equations for updating averages (expected values) of spins. 4) Select the energy function and the cooling schedule parameters. The FPGA model used in this paper are given in Section 2. The proposed formulation of the MFA algorithm for the global routing problem following these steps is presented in Section 3. The performance of the proposed MFA algorithm is evaluated in comparison with LocusRoute algorithm. Section 4 summarizes the implementation details of these two-algorithms. Finally, experimental results are presented in Section 5.
Global Routing Problem in FPGAs
The form of commercial FPGA consists of a two dimensional regular array o f programmable logic blocks (LB's), a programmable routing network and switch boxes (SB's) 6, 13, 14]. Logic blocks are used to provide the functionality of a circuit. Routing network makes connections between LB's and input/output pads. Routing network of FPGA consists of wiring segments and connection blocks. Wiring segments have three type of routing resources in the commercial SRAM based FPGA 13]: channel segments, long lines and direct-interconnections. A horizontal (vertical) channel segment consists of a number of parallel wire segments connecting two successive SB's in a horizontal (vertical) channel. The SB's allow programmed interconnection between these channel segments. Directinterconnection provides the connections between neighbor LB's. Long lines cross the routing area of FPGA vertically and horizontally. Connection blocks provide the connectivity from the input/output pins of LB's to the wiring segments of the respective c hannel segments. Each pin can be connected to a limited number of wiring segments in a channel and this is called as exibility of connection block 7 ] . In this paper, it is assumed that each LB pin can be connected to all wiring segments in the respective c hannels. Therefore, we can omit the connection block in our FPGA model.
Since the direct-interconnections are used by neigbor LB's to provide minimum propagation delay and the long lines are used by signals which m ust travel long distances (i.e., global clock), these interconnection resources are not considered in the global routing. Hence, our FPGA model for global routing considers (Fig. 1) .
In this work, we divide all multi-pin nets into two-pin nets using minimum spanning tree algorithm 12] as in LocusRoute. Hence, a net refers to a two-pin net here, and hereafter. Consider the possible routings for a two-pin net with a Manhattan distance of d h +d v where d h and d v denote the horizontal and vertical distances, respectively, b e t ween the two pins of the net on the LB grid. The routing area of this net is restricted to a (d h +1) (d v +1) LB grid as shown in Fig. 2 .a. Then, the shortest distance routing of this net can be decomposed into three independent routings as follows. Each pin of this net has only one neighbor SB in the optimal routing area. Hence, each pin can be connected to its unique neighbor SB either through a horizontal or a vertical channel segment (Fig. 2) . Meanwhile, the optimal routing area for the connection of these two unique SB's is restricted to a d h d v SB grid embedded in the LB grid (Fig. 2) . Hence, by exploiting this fact, we further subdivide each n e t i n to three two-pin subnets referred here as LS, SS and SL subnets ( Fig. 2.b) . Here, LS and SL subnets represent the LB-to-SB and SB-to-LB connections, respectively, a n d SS subnets represent the SB-to-SB connection for a particular net. Therefore, we consider only two possible routings for both LS and SL subnets and d h +d v ;2 possible one or two bend routings for SS subnets for routing the original net.
We de ne an FPGA graph F(L S C) for modeling the global routing problem in FPGAs. This graph is a P Q two-dimensional mesh where L S and C denote the set of LB's, SB's and channel segments, respectively. Here, P and Q The global routing problem reduces to searching for most uniform possible distribution of the routes for these subnets. The uniform distribution of the routes is expected to increase the likelihood of nding a feasible routing in the following detailed routing phase. Hence, we need to de ne an objective function which r e w ards balanced routings. We associate weights with the edges of FPGA graph in order to simplify the computation of the balance quality o f a g i v en routing. The weight w h pq (w v pq ) of a horizontal (vertical) edge c h pq (c v pq ) denotes the density of the respective c hannel segment. Here, the density o f a c hannel segment denotes the total number of nets passing through that segment for a given routing. Using this model, we can express the balance quality B o f a g i v en routing R as
As is seen in Eq. (1), each c hannel segment c o n tributes the square of its density t o the objective function thus penalizing imbalanced routing distributions. Hence, 
MFA F ormulation
The MFA algorithm is derived by analogy to Ising and Potts models which are used to estimate the state of a system of particles, called spins, in thermal equilibrium. In Ising model, spins can be in one of the two states represented by 0 and 1, whereas in Potts model they can be in one of the K states. All LS=SL subnets are represented by Ising spins since they have o n l y t wo possible routes. In Ising spin encoding of each LS=SL subnet m, u m = 1 (0) denotes that the LB-to-SB or SB-to-LB routing is achieved through a single horizontal (vertical) channel segment. Each SS subnet n having K n 2 possible routes is represented by a K n -state Potts spin. The states of a K n -state Potts spin is represented using a K n dimensional vector v n = v n1 : : : v nr : : : v n Kn ] t (2) where \t" denotes the vector transpose operation. Each P otts spin v n is allowed to be equal to one of the principal unit vectors e 1 : : : e r : : : e Kn , and can not take a n y other value. Principal unit vector e r is de ned to be a vector which has all its components equal to 0 except its r'th component w h i c h is equal to 1. Potts spin v n is said to be in state r if v n = e r . Hence, a K n -state Potts spin v n is composed of K n two state variables v n1 : : : v nr : : : v nKn , where v nr 2 f 0 1g, with the following constraint
If Potts spin n is in state r (i.e., v nr = 1 f o r 1 r K n ) w e s a y that the corresponding net n is routed by using the route r.
In the MFA algorithm, the aim is to nd the spin values minimizing the energy function of the system. In order to achieve this goal, the average (ex- In order to construct an energy function it is helpful to associate the following meaning to the values hu m i for LS=SL subnets. hu m i = P(subnet m is routed by using the horizontal channel segment) 1 ; h u m i = P(subnet m is routed by using the vertical channel segment) That is, hu m i and 1;hu m i denote the probabilities of nding Ising spin m at states 1 and 0, respectively. In other words, hu m i and 1;hu m i denote the probabilities of routing subnet m through a single horizontal and vertical channel segment, respectively. Similarly, for SS subnets represented with Potts spins hv nr i = P(subnet n is routed through route r) for 1 r K n (4) That is, hv nr i denotes the probability of nding Potts spin at state r for 1 r K n . In other words, hv nr i denotes the probability of routing net n through 9) respectively. After the mean eld equations (Eqs. (6-7)) are derived, the MFA algorithm can be summarized as follows. First, an initial high temperature spin average is assigned to each spin, and an initial temperature T is chosen. Each u m value is initialized to 0:5 m and each v nr value is assigned to 1=K n nr where m and nr denote randomly selected small disturbance values. Note that lim T!1 u m = 0 :5 and lim T!1 v nr = 1 =K n . I n e a c h M F A iteration, the mean eld e ecting a randomly selected spin is computed using either Eq. (6) or Eq. (7). Then, the average of this spin is updated using either Eq. (8) or Eq. (9). This process is repeated for a random sequence of spins until the system is stabilized for the current temperature. The system is observed after each spin update in order to detect the convergence to an equilibrium state for a given temperature. If energy function E B does not decrease in most of the successive spin updates, this means that the system is stabilized for that temperature. Then, T is decreased according to a cooling schedule, and iterative process is re-initialized. At the end of this cooling schedule, each Ising spin m is set to state 1 i f u m 0:5 or to state 0, otherwise. Similarly, maximum element i n e a c h P otts spin vector is set to 1 and all other element are set to 0. Then, the resulting global routing is decoded as mentioned earlier.
Implementation
The performance of the proposed MFA algorithm for the global routing problem is evaluated in comparison with the well-known LocusRoute algorithm 4]. The MFA global router is implemented e ciently as described in Section 3. Average of each Ising spin m is initialized by randomly selecting u init m in the range 0:45 u m 0:55. Similarly, a verage of each P otts spin n is initialized by randomly selecting K n v nr values in the range 0:9=K n v nr 1:1=K n and normalizing v init nr = v nr = P Kn k=1 v nk for r = 1 2 : : : K n . Note that random selections are achieved by using uniform distribution in the given ranges.
The initial temperature parameter used in mean eld computation is estimated using the initial spin averages values. Selection of initial temperature parameters T 0 is crucial to obtain good routing. In previous applications of MFA, it is experimentally observed that spin averages tend to converge at a critical temperature. Although there are some methods proposed for the estimation of critical temperature, we prefer an experimental way for computing T 0 which is easy to implement and successful as the results of experiments indicate. We compute the initial average mean eld as The cooling schedule is an important factor in the performance of MFA global router. For a particular temperature, MFA proceeds for randomly selected unconverged net spin updates until E < for M consecutive iterations respectively where M = N initially and = 0 :05. Average spin values are tested for convergence after each update. For an Ising spin m, if either u m 0:05 or u m 0:95 is detected, then spin m is assumed to converge to state 0 or state 1, respectively. F or a Potts spin n, i f v nr 0:95 is detected for a particular r = 1 2 : : : K n , then spin n is assumed to converge to state r. The cooling process is realized in two phases, slow cooling followed by fast cooling, similar to the cooling schedules used for Simulated annealing. In the slow cooling phase, temperature is decreased by T = T where = 0 :9 u n til T < T 0 =1:5. Then, in the fast cooling phase, M is set to M=2, is set to 0:8. Cooling schedule continues until 90% of the spins converge. At the end of this cooling process, each unconverged Ising spin m is assumed to converge to state 0 or state 1 if u m < 0:5 o r u m 0:5, respectively. Similarly, each unconverged Potts spin n is assumed to converge to state r where v nr = m a x fv nk : k = 1 2 : : : K n g. Then, the result is decoded as described in Section 3, and the resulting global routing is found.
The LocusRoute algorithm is implemented as in 4]. As the LocusRoute depends on rip-up and reroute method, LocusRoute is allowed to reroute the circuits 5 times. No bend reduction has been done as in 6]. Both algorithms are implemented in the C programming language.
Experimental Results
This section presents experimental performance evaluation of the proposed MFA algorithm in comparison with LocusRoute algorithm. Both algorithms are tested for the global routing of thirteen ACM SIGDA Design Automation benchmarks (MCNC) on SUN SPARC 1 0 . T h e r s t 4 c o l u m n s o f T able 1 illustrate the properties of these benchmark circuits.
These two algorithms yield the same total wiring length for global routing since two or less bend routing scheme is adopted in both of them. Last six columns of Table 1 illustrate the performance results of these two algorithms for the benchmark circuits. The MFA algorithm is executed 10 times for each circuit starting from di erent, randomly chosen initial con gurations. The results given for the MFA algorithm in Table 1 illustrate the average of these executions. Global routing cost values of the solutions found by both algorithms are computed using Eq. (1) and then normalized with respect to those of MFA. In Table 1 , maximum channel density denotes the number of routes assigned to the maximally loaded channels. That is, it denotes the minimum number of tracks required in a channel for 100% routability.
As is seen in Table 1 , global routing costs of the solutions found by M F A are 3.1%-10.5% better than those of LocusRoute. As is also seen in this table, maximum channel density requirements of the solutions found by M F A are less than those of LocusRoute in almost all circuits except alu2 and term1. B o t h algorithms obtain the same maximum channel density f o r t h e s e t wo circuit.
Figures 4 and 5 contain visual illustrations as pictures (left) and histograms (right) for the channel density distributions of the solutions found by M F A and LocusRoute, respectively, for the circuit C1355. The pictures are painted such that the darkness of each c hannel increases with increasing channel density. Global routing solutions found by these two algorithms are tested by using SEGA 5] detailed router for FPGA. Figure 6 illustrates the results of the SEGA detailed router for the circuit C1355 
