Mean field annealing (MFA) algorithm, proposed for solving combinatorial optimization problems, combines the characteristics of neural networks and simulated annealing. Previous works on MFA resulted with successful mapping of the algorithm to some classic optimization problems such as traveling salesperson problem, scheduling problem, knapsack problem and graph partitioning problem. In this paper, MFA is formulated for the circuit partitioning problem using the so called net-cut model. Hence, the deficiencies of using the graph representation for electrical circuits are avoided. An efficient implementation scheme, which decreases the complexity of the proposed algorithm by asymptotical factors is also developed. Comparative performance analysis of the proposed algorithm with two well-known heuristics, simulated annealing and Kernighan-Lin, indicates that MFA is a successful alternative heuristic for the circuit partitioning problem.
Introduction
for the circuit partitioning problem in comparison with two well-known heuristics : simulated annealing and Kernighan-Lin.
Circuit Partitioning Problem
An instance of the circuit partitioning problem consists of a circuit that is to be partitioned and an integer K representing the number of partitions. A circuit can be represented by a set of components called cells, and a list of nets which defines the connection relationships among the cells. Cells may represent different electrical components as transistors, standard cells or logic gates. Nets represent the connections among the cells that can be realized using different types of conductors depending on the application (e.g., wires, metal layers). Cells incident to a net are called the terminals of that net. Both cells and nets of a circuit have an attribute called the weight of a cell or a net. Weights of the cells may represent their areas if the partitioning is used for placement.
Nets can be weighted due to their effect on the total delay of the circuit. An example circuit with 10 cells and 5 nets is given below. A circuit Ω can be formally represented by a set of cells C, a set of nets N, a cell weight function w cell : C ! N, and a net weight function w net : N ! N , where N represents the set of natural numbers. Each element in the set N is a subset of set C, i.e., N is a multiset such that N 2 C . Given a circuit as defined above, the problem is to divide the cells of the circuit into K (K 2) evenly weighted partitions while minimizing the cost of the external connections (i.e., cut-set size) among partitions.
The difference between the net-cut and graph models is in the computation of the cost of external connections.
In the graph representation of a circuit, each cell of the circuit is represented by a vertex and each net of the circuit is represented by a clique of vertices corresponding to its terminals. Cell weight function becomes the vertex weight function of the graph. Weights of the edges are equal to the weights of the nets that they represent. The graph representation of the circuits can be restricted to simple graphs. All edges between two vertices are represented by a single edge of which weight is the summation of the weights of the edges it represents. This simplification has no effect as far as the partitioning is concerned. If an edge between two vertices is in the cut set of a partitioning then all other edges between these two vertices are also in the cut set and vice versa. Therefore, a single edge with a weight equal to the summation of the weights of these edges can represent their contribution to the cost. Figure 1(a) illustrates the graph representation of the example circuit.
Formally, a circuit Ω(C; N) is represented by a graph G(V; E), where V = C, and w vertex = w cell . The edge set E is formed using the net set N as, uv 2 E if and only if there exists an n 2 N such that u 2 n and v 2 n.
The weight function w edge is computed as w edge (uv) = P u;v2n;n2N w net (n) for all uv 2 E. In the graph model, the connection cost is computed by simply adding the weights of the edges that have their vertices in different partitions. (1)
(1)
(4) (4) (4)
(1) In the net-cut model, the connection cost for K-way partitioning may be computed as follows. If the vertices incident to an hyperedge are in`different partitions, then that hyperedge contributes (`? 1)w e to the connection cost, where w e is the weight of the hyperedge. There are also some other alternatives for computing the connection cost. One of them is adding w e to the connection cost if and only if` 2. Another one is adding (`(`? 1)=2)w e . Note that, all choices are equivalent for bipartitioning.
The problem with the graph model is that it treats a net with s terminals as s(s ? 1)=2 two terminal nets.
This strategy exaggerates the importance of the nets that have more than two terminals and the exaggeration grows with the square of the size of the net [13] , where the size of a net denotes the number of terminals of a net.
For example, the actual cost of a unit weight net of size 4 in the cut-set of a bipartitioning is 1 since such a net will cause a single connection between the two partitions. In the graph model, the same situation contributes a cost of 3 or 4 according to the distribution of the terminal cells of the net between the two partitions. This cost contribution in the graph model is far from the actual cost. In general, the actual cost contribution of a unit weight net across a cut of a bipartitioning is 1, but the cost contribution of a clique, which is evenly split across a cut, rises quadratically with the size of the clique [10] . This quadratic growth does not adequately reflect the costs arising in practice. In fact, heuristics using the graph model for representing circuits will try to remove all nets with large sizes from the cut-set and try to put the smaller ones. This situation can cause performance degradation if the actual cut size is minimized when the nets with large sizes are in the cut-set. Experimentation
shows that this occurs in most of the cases [13] . For example, using the net-cut model instead of the graph model increases the performance of the Kernighan-Lin heuristic drastically, reducing the connection costs by 19 to 50% [13] . The bipartitioning, P 1 = fc 1 ; c 4 ; c 5 ; c 6 ; c 9 g and P 2 = fc 2 ; c 3 ; c 7 ; c 8 ; c 10 g with sizes jP 1 j = 14 and jP 2 j = 12, illustrated in Figure 1(b) is the global minimum of the net-cut model with the same restrictions on the partition sizes. Here, size of a partition denotes the summation of the weights of the cells assigned to that partition.
As explained earlier, actual connection cost is the cost of the cut computed using the net-cut model. Hence, to compute the actual cost of the cut in Figure 1 (a) we transform it to the net-cut model and observe that the size of the cut is 4, whereas the size of the cut in Figure 1 (b) is 2. Note that, the global minimum solution of the graph model cuts two nets (n 1 and n 4 ) with smaller sizes (3 and 2, respectively) although their weights are high (2 for both of the nets). However, the global minimum solution of the net-cut model cuts two nets (n 3 and n 5 ) with larger sizes (4 and 5, respectively) but smaller weights (1 for both of the nets). Although both cuts
give the global minimum according to the model used, min-cut bipartitioning using the graph model yields a suboptimal solution because of the incorrect representation of the problem. This demonstrates that even if one computes the global optimum using the graph model, the computed solution can be a suboptimal solution of the actual problem. It can be argued that some other representation scheme can be used to represent circuits with graphs which can give better approximations to the actual cost, but it can be shown that there is no good way of mapping a circuit instance into a graph [10] .
Applying MFA to the Circuit Partitioning Problem
Mean field annealing (MFA) merges collective computation and annealing properties of Hopfield neural networks [6] and simulated annealing [8] , respectively, to obtain a general algorithm for solving combinatorial optimization problems. MFA can be used for solving a combinatorial optimization problem by choosing a representation scheme in which the final states of the spins (neurons) can be decoded as a solution to the target problem. Then, an energy function is constructed whose global minimum value corresponds to an optimum solution of the problem to be solved. MFA is expected to compute the optimum solution to the target problem, starting from a randomly chosen initial state, by minimizing this energy function. Steps of applying mean field annealing technique to a problem can be summarized as follows :
1) Choose a representation scheme which encodes the configuration space of the target optimization problem using spins. In order to get a good performance, number of possible configurations in the problem domain and the spin domain must be equal, i.e., there must be a one-to-one mapping between the configurations of spins and the problem.
2) Formulate the cost function of the problem in terms of spins, i.e., derive the energy function of the system.
Global minimum of the energy function should correspond to the global minimum of the cost function.
3) Derive the mean field theory equations using this energy function, i.e., derive equations for updating averages (expected values) of spins.
4) Minimize the complexity of update operations in order to get an efficient algorithm.
5) Select the energy function and the cooling schedule parameters.
The proposed formulation and implementation of the MFA algorithm for the circuit partitioning problem following these steps are presented in the following sections.
Encoding
The MFA algorithm is derived by analogy to Ising and Potts models which are used to estimate the state of a system of particles, called spins, in thermal equilibrium. 
In our encoding of the circuit partitioning problem, each spin vector corresponds to a cell in the circuit Ω(C; N). Hence, number of spin vectors is S = jCj. Dimension K of the spin vectors is equal to the number of partitions. If a spin is in state k we say that the corresponding cell is assigned to partition k. Hence, s ik = 1 means that cell i is assigned to partition k. If this spin matrix is decoded as described above, the resulting partitioning is P 1 = fc 4 ; c 7 ; c 9 g, P 2 = fc 2 ; c 5 ; c 10 g, P 3 = fc 1 ; c 6 g and P 4 = fc 3 ; c 8 g where P 1 , P 2 , P 3 , P 4 are the sets representing partitions. Sizes of the partitions are jP 1 j = 7, jP 2 j = 7, jP 3 j = 6 and jP 4 j = 6. The size of partition k is defined to be jP k j = P c2P k w c where w c denotes the weight of cell c. The interconnection cost computed according to the net-cut model is 8. This encoding is similar to the encodings used for graph partitioning problem [2, 5, 12, 16] and bipartite subgraph problem [9, 14] in the previous works. Although we have proposed a hypergraph partitioning formulation using this encoding in an earlier work [2] , its energy function formulation was an approximation to the net-cut model as is the case for graph partitioning formulation. In the next section, we propose an energy function formulation according to the net-cut model using the encoding described above.
Energy Function Formulation
In the MFA algorithm, the aim is to find the spin values minimizing the energy function of the system. In order to achieve this goal, the average (expected) value V i = hS i i of each spin vector S i is computed and iteratively updated until the system stabilizes at some fixed point. Hence, we define v ik = 1; for 1 i jCj: (2) This constraint guarantees that each Potts spin S i is in one of the K states at a time, and each cell is assigned to only one partition for our encoding of the circuit partitioning problem.
In order to construct an energy function it is helpful to associate the following meaning to the values v ik , v ik = Pfcell i is in partition kg for 1 i jCj; 1 k K i.e., v ik is the probability of finding spin i at state k. If v ik = 1 then spin i is in state k and the corresponding configuration is S i = V i . Now, we formulate the interconnection cost of the circuit partitioning problem for the circuit Ω(C; N) as an energy term (E C )
Pfone or more cells of net n is in partition kg ? 1
where V = V 1 ; : : :; V i ; : : :; V jCj ] t is the spin average matrix consisting of jCj K-dimensional spin vectors. Here, i 2 n and w n denote a terminal cell and the weight of net n respectively. In this formulation, cost of each net is computed one by one and added to the total interconnection cost. According to the netcut model, as discussed in the Section 2, cost contribution of a net n to the total interconnection cost is (`? 1)w n if the net is distributed to`different partitions. Equation (3) follows by the observation`= P K k=1 Pfone or more cells of net n is in partition kg. The (?1) term in Eq. (3) is a constant term and can be eliminated. Another observation is Pfcell i is not in partition kg = (1 ? v ik ) which follows from the probability interpretation of variables v ik . Hence, Q i2n (1 ? v ik ) denotes the probability that no cell of net n is in partition k and (1 ? Q i2n (1 ? v ik )) denotes the probability that at least one cell of net n is in partition k. Note that, minimization of E C corresponds to minimization of the actual interconnection cost of the circuit partitioning problem.
Another term of the energy function is the term for penalizing imbalanced partitions. We formulate this term (E B ) similar to the formulation of balance term proposed for the mapping problem [1] .
where w i and w j denote the weights of cells i and j. This triple summation term computes the summation of the inner products of the weights of the cells assigned to individual partitions. Global minimum of this term occurs when equal amounts of cell weights are assigned to each partition. If there is an imbalance in the partitioning, E B term increases with the square of the amount of the imbalance, penalizing imbalanced partitionings.
The total energy function E can be defined in terms of E C and E B as
where parameter r is introduced to maintain a balance between the two optimization objectives of the circuit partitioning problem. Hence, minimization of the energy function E corresponds to evenly distributing cells among K partitions while minimizing the interconnection cost among the partitions computed according to the net-cut model.
Derivation of the Mean Field Theory Equations
Mean field theory equations, needed to minimize the energy function E, can be derived as [12, 16] 
where N i is defined to be the set of nets connected to cell i. The quantity ik represents the k'th element of the mean field vector effecting on spin i. Using the mean field values ik , average spin values v ik can be updated using the following equation [12, 16] v ik = e ik =T P K l=1 e il =T for 1 i jCj; 1 k K
where T is the temperature parameter which is used the relax the system iteratively. Equation (11) enforces the summation of each row of the spin matrix to be unity, handling the constraint given in Eq. (2) . Hence, it is guaranteed that all rows of the spin matrix will have only one spin with output value 1 when the system is stabilized.
Mean field ik can be interpreted as the increase in the energy function E(V) when spin i is assigned to state k. Note that in Eq. (10), first summation term represents the increase in the total interconnection cost by assigning cell i to partition k. Second summation term represents the increase in the imbalance cost associated with partition k by assigning cell i to partition k. Hence, ? ik may be interpreted as the decrease in the overall solution quality by assigning cell i to partition k. Then, in Eq. (11), v ik is updated such that the probability of assigning cell i to partition k increases with increasing mean field ik .
After the mean field theory equations (Eq. (10), Eq. (11)) are derived, mean field annealing algorithm can be summarized as follows. First an initial, high temperature spin average is assigned to each spin, and an initial temperature is chosen. In general v ik is initialized to 1=K plus a disturbance term (note that, lim T!1 v ik = 1=K). In each iteration the mean field vector effecting on a randomly selected spin is computed using Eq. (10). Then, spin average vector is updated using Eq. (11). This process is repeated for a random sequence of spins until the system is stabilized for the current temperature. The system is observed after each spin vector update in order to detect the convergence to an equilibrium state for a given temperature. If energy function E does not decrease in most of the successive spin vector updates, this means that the system is stabilized for that temperature. Then, T is decreased according to the cooling schedule, and iterative process is re-initiated. Note that, the computation of the energy difference ∆E necessitates the computation of E (Eq. (8)) at each iteration. In general, the computation of the total energy (Eq. (8)) is much more expensive than the computation of the mean field vector. Hence, the computation of E at each iteration drastically increases the complexity of a MFA iteration. For example, the complexity of computing the energy function E is Θ(jN jKs avg + jCj 2 K) for the proposed formulation (Eq. (8)). Here, s avg denotes the average number of cells of a net (i.e., average size of a net). We present an efficient scheme [1] which reduces the complexity of energy difference computation by asymptotical factors.
The incremental energy change E ik due to the incremental change v ik in the value of v ik is E = E ik = ik v ik from Eq. (9) . Since E(V) is linear in v ik (see Eq. (8)), this equation is valid for any amount of change ∆v ik in the value of v ik , that is ∆E = ∆E ik = ik ∆v ik (12) At each iteration of the MFA algorithm, K v ik values in the same spin average vector are updated in a synchronous manner, and Eq. (12) is valid for all updates performed in a particular iteration. Thus, energy difference due to the spin vector update operation in a particular iteration can be computed as
ik ∆v ik (13) where
ik . The complexity of computing Eq. (13) is only Θ(K) since mean field ( ik ) values are already computed for the spin updates.
The MFA algorithm derived from the proposed formulation of the circuit partitioning problem is shown in Figure 2 . The complexity analysis of one iteration of this algorithm (from step 3.1.1 to step 3.1.6 in Figure 2 ) is as follows. The complexity of computing the first summation term in Eq. (10) 
An Efficient Implementation Scheme
As mentioned earlier, the MFA algorithm proposed for the circuit partitioning problem is an iterative process.
The complexity of a single MFA iteration is mainly due to the mean field vector computation. In this section, we propose an efficient implementation scheme which reduces the complexity of the mean field computations, and hence the complexity of the MFA iteration, by asymptotical factors.
Assume that, cell i is selected at random for updating the spin average vector V i in a particular iteration.
The expression given for ik (Eq. (10) 
For the sake of clarity of the representation, the overall mean field computations involved in a single iteration can be expressed using vector representation as values can be excluded from the complexity analysis since they are computed only once at the very beginning of the algorithm. In this scheme, the computation of an individual ik using Eq. (18) is a Θ(1) operation.
Hence, the construction of the i vector becomes a Θ(K) operation (Eq. (20)). The update of an individual k value (using Eq. (19)) at the end of each iteration is a Θ(1) operation. Thus, the overall complexity of k updates is Θ(K) since K weighted column sums should be updated (Eq. (21)). Hence, the proposed scheme reduces the complexity of computing the i vector (needed in Eq. (17)) from Θ(jCjK) to Θ(K). 
where n k = w n Q j2n (1?v jk ). Here, n k represents the probability that no cell of net n is in partition kmultiplied by the weight of the net n. At the beginning of the MFA algorithm, the initial n = n 1 ; : : :; n k ; : : :; n K ] t vector for each net can be computed using the initial spin averages. Then, n k values can be updated at the end of each iteration (i.e., after the spin average vector V i is updated) using
This formulation proposed for the efficient computation of an individual n i vector, which is needed in Eq. (17) ik ) for 1 k K, and the operation " " represents element-by-element multiplication of two column vectors. The computation of initial n vectors can be excluded from the complexity analysis since they are computed only once at the very beginning of the algorithm. In this scheme, the computation of an individual n ik value for a particular net n using Eq. (22) is a Θ(1) operation. Hence, the construction of a n i vector becomes a Θ(K) operation (Eq. (24)). The update of an individual n k value for a particular net n at the end of each iteration is a Θ(1) operation (Eq. (23)). Thus, the overall complexity of updating a particular n vector is a Θ(K) operation (Eq. (25)). Hence, the proposed scheme reduces the complexity of computing an individual n i vector (needed in Eq. (17)) from Θ(s n K) to Θ(K) where s n denotes the size of the net n. 
Performance of the MFA Algorithm
This section presents the performance evaluation of the proposed Mean Field Annealing (MFA) algorithm for the circuit partitioning problem. To evaluate the performance of the proposed algorithm two well-known circuit partitioning heuristics are used: simulated annealing (SA) and Kernighan-Lin (KL). Each algorithm is tested using randomly generated circuit partitioning problem instances. Hypergraphs representing circuits are generated using two different schemes, resulting with two families of hypergraphs referred here as random hypergraphs and geometric hypergraphs.
Random hypergraphs are generated using the following parameters: number of cells (jCj), number of nets (jN j), maximum cell weight (W c ), maximum net weight (W n ), and maximum net size (s max ). Each net is generated by randomly selecting a net size between 2 and s max . Then, that many cells are selected randomly from the cell set to form the net. If a new generated net contains exactly same cells with another net generated earlier, then it is discarded and another net is generated instead of it. Each cell or net is weighted randomly by choosing a number between 1 and W c or 1 and W n , respectively. Geometric hypergraphs are generated using an algorithm similar to the one used for generating geometric graphs. Geometric hypergraphs may represent electrical circuits better than random hypergraphs as they present clustering and local connectivity properties. Parameters used for generating geometric hypergraphs are number of cells (jCj), number of nets (jN j), maximum cell weight (W c ), maximum net weight (W n ), and average net size (s avg ). A geometric hypergraph is generated using these parameters by randomly distributing jCj cells and jNj nets in a unit square. Then, the nets are formed using the following rule: a cell is incident to a net if it is contained in the bounding box of that net. Bounding box of a net is a square with the net in its center. Sizes of the bounding boxes are fixed and computed using the input parameters s avg and jCj as b = q s avg =jCj, where b denotes the length of the sides of the bounding box. Nets of which bounding boxes contain less than 2 cells are discarded. Also, as in the generation of random hypergraphs, if two or more nets have the same subset of cells, only one of them is accepted and all others are discarded. New nets are generated instead of the discarded ones. Note that, the average size of the nets in the resulting hypergraph may be slightly different than the input parameter s avg as the nets near the borders of the unit square will have smaller sizes than expected, but also nets having less than 2 terminals are discarded which may compensate this effect. Each cell or net is weighted randomly by choosing a number between 1 and W c or 1 and W n , respectively. Figure 3 illustrates a geometric hypergraph with 10 cells and 5 nets which corresponds to the example circuit given in Section 2.
MFA Implementation
The MFA algorithm proposed for the circuit partitioning problem is implemented efficiently as described in 
ik )=(jCjK) of these two terms using the initial v ik values and compute r as r = 8h C ik i=h B ik i.
Our experiments show that computing r using this method is sufficient for obtaining balanced partitions.
Selection of T 0 is crucial for obtaining good quality solutions. In previous applications of MFA [12, 16] , it is experimentally observed that spin averages tend to converge at a critical temperature. It is suitable to choose T 0 close to this critical temperature. Although there are some methods proposed for the estimation of critical temperature [12, 16] we prefer an experimental way for computing T 0 which is easy to implement and successful as the results of the experiments indicate. After the parameter r is fixed, average mean field h ik i = ( P jCj i=1 P K k=1 ik )=(jCjK) is computed using initial v ik values. Then, T 0 is computed as T 0 = ch ik i=K. Our experiments indicate that it is suitable to choose the parameter c as 100 for geometric hypergraphs, and as 60 for random hypergraphs. Note that, T 0 is inversely proportional to the number of partitions (K) which is also observed for the critical temperature formulations presented in the other implementations of MFA [12, 16] .
After the spin averages (v ik values) and the parameters T 0 and r are initialized, the cooling schedule of the algorithm proceeds as follows. At each temperature a random sequence is generated for the spins that are not i is converged to state k and its average is not updated in the future iterations. Cooling process is realized in two phases; slow cooling followed by fast cooling. In the slow cooling phase, temperature is decreased using = 0:95 until T is less than T 0 =1:5. Then, in the fast cooling phase, is set to 0:7 and cooling is continued until T is less then T 0 =5:0. At the end of this cooling process, maximum element in each spin average vector is set to 1 and all other spin average values are set to 0. Then, the result is decoded as described in Section 3.1, and the resulting partitioning is found. Note that, all parameters used in this implementation are either constants or found automatically except the parameters c and . The parameters c and are also constants for each family of hypergraphs but differ for random and geometric hypergraphs. Figure 4 illustrates the evolution of interconnection and total energy terms (E C and E, respectively) with MFA iterations for partitioning two random hypergraphs ((a) jCj = 200, jNj = 200, (b) jCj = 400, jNj = 400) into K = 8 partitions. Figure 4 is constructed by computing the E C (Eq. (6)) and E (Eq. (8)) terms at each 10 MFA iterations for the given two circuit partitioning problem instances. The displayed energy values are normalized with respect to the initial energy values. The vertical solid lines in each curve denote the temperature changes according to the cooling schedule. As is seen in Figure 4 , interconnection energy (E C ) monotonically decreases as the iterations proceed. The oscillatory decrease of the total energy term (E) is due to the balance term (E B given in Eq. (7)) oscillations superimposed on the monotonically decreasing E C term. As is seen in Figure 4 , the major decrease in the energy terms occur during a single temperature which corresponds to the critical temperature mentioned earlier. Note that, the number of iterations performed during the critical temperature is substantially greater than the numbers of iterations performed during other temperatures. The decrease in the energy terms during the last two temperatures demonstrate the merits of the 
Simulated Annealing Implementation
In simulated annealing, starting from a randomly chosen initial configuration, configuration space is searched for the best solution using a probabilistic hill climbing algorithm [8] . A configuration of the circuit partitioning problem is a partitioning of the circuit to K partitions. In order to search the configuration space, neighborhood of a configuration must be defined. For the implementation in this work, neighborhood of a configuration consists of all configurations which results with moving one cell of the circuit from the partition with maximum size to any other partition. At each iteration of the simulated annealing algorithm, one of the possible moves is chosen randomly as a candidate move. Then, the resulting decrease in the total interconnection cost caused by the candidate move is calculated without changing the configuration. If the candidate move decreases the interconnection cost, it is realized. If it increases the interconnection cost, then it is realized with a probability which decreases with the amount of increase in the total interconnection cost. Acceptance probabilities of the moves that increase the cost are controlled with a temperature parameter T which is decreased using an annealing schedule. Hence, as the annealing proceeds acceptance probabilities of uphill moves decrease. An automatic cooling schedule is used in the implementation of the SA algorithm [12] . Note that, the SA algorithm implemented in this work implicitly achieves the balance among the sizes of the partitions by selecting the neighbor configurations as defined above. This increases computationally efficiency of the algorithm but decreases its flexibility since the amount of imbalance among the partitions can not be controlled.
Kernighan-Lin Implementation
Kernighan-Lin (KL) heuristic is implemented efficiently as described by Fiduccia and Mattheyses [3] . In order to apply the KL heuristic to K-way partitioning a two phase approach is used which consists of recursive bisection and pairwise min-cut phases. In recursive bisection phase, the circuit is recursively partitioned into two partitions until K partitions are obtained. Then, in the pairwise min-cut phase, total interconnection cost is iteratively minimized by executing KL heuristic between each pair of partitions until no improvement can be achieved. In the KL heuristic, balance among partitions is maintained implicitly by the algorithm. Cell moves causing intolerable imbalances are not considered. In the implementation used in this work a move which increases or decreases the size of a partition more than 10% of the size of a perfectly balanced partition is considered as causing intolerable imbalance.
Experimental Results
In this section, performance of the proposed MFA algorithm is experimentally evaluated in comparison with the Kernighan-Lin (KL) and the simulated annealing (SA) algorithms. These heuristics are experimented with a large number of randomly generated circuit partitioning problem instances.
Six different types of random hypergraphs and six different types of geometric hypergraphs are generated with jCj = 200, jCj = 400 and jNj = jCj=2, jNj = jCj and jNj = 2jCj. For each type of hypergraph 10 different random instances are generated. That is, a total of 120 different hypergraph instances are generated randomly. In each hypergraph instance, maximum cell weight (W c ) and maximum net weight (W n ) are both selected as 4. The maximum net size (s max ) in random hypergraph instances and the average net size (s avg ) in geometric hypergraph instances are selected as 16 and 8, respectively. A total of 3 120 = 360 circuit partitioning problem instances are constructed by using these hypergraph instances and selecting the number of partitions as K = 4, K = 8, and K = 16. Tables 1-3 and Figure 5 illustrate the performance results of the MFA, KL and SA heuristics for the circuit partitioning problem instances constructed using random and geometric hypergraphs. In Tables 1-3, jCj (number of cells) and jNj (number of nets) determine the type of the 10 distinct hypergraph instances experimented for collecting the data in each row. Each algorithm is executed 10 times for each problem instance starting from different, randomly chosen initial configurations. Each entry in Tables 1-3 illustrates the overall average (and standard deviation) of the results of 10 10 = 100 executions of a particular algorithm for partitioning 10 different hypergraph instances of the same type into K partitions. Tables 1 and 2 illustrate the quality of the solutions obtained by the MFA, KL and SA heuristics for random and geometric hypergraphs, respectively. Total interconnection cost averages (and standard deviations) of the solutions are normalized with respect to the results of the MFA heuristic. Percent imbalance ratio averages (and standard deviations) of the solutions displayed in these tables are computed using 100 (jP j max ? jPj min )=(2jP j avg ) where jPj max , jPj min and jPj avg = ( Table 3 : Execution time averages (in seconds) of the MFA, KL, and SA heuristics for random and geometric hypergraphs.
As is mentioned earlier, circuit partitioning has two different optimization objectives: interconnection cost and imbalance cost. Hence, the quality of a solution of a particular circuit partitioning problem instance has two components: interconnection quality and balance quality. For random hypergraphs, as is seen in Figure 5 (a)
and Table 1 , interconnection qualities of the solutions found by the MFA and the SA heuristics are comparable and both better than the interconnection qualities of the solutions found by the KL heuristic. For geometric hypergraphs, as is seen in Figure 5 (b) and Table 2 , the KL and SA heuristics produce solutions with slightly better interconnection qualities compared with those of the MFA heuristic for K = 4. However, for K = 8 and K = 16 interconnection qualities of the solutions obtained by the MFA heuristic are better than those of the KL and SA heuristics.
As is seen in Tables 1 and 2 , the balance qualities of the solutions found by the MFA algorithm are comparable with those of the KL heuristic. Note that, the balance qualities of the solutions found by the SA heuristic are superior to those of the MFA and KL heuristics. This is due to the implementation of the SA heuristic (explained in Section 4.2) which compels balanced partitionings.
As is seen in Figure 5 and Table 3 , the MFA and KL heuristics are significantly faster than the SA heuristic.
For the case of random hypergraphs, the MFA heuristic is always faster than the KL heuristic. For geometric hypergraphs, execution time averages of the MFA and KL heuristics are comparable for K = 4 and K = 16, whereas for K = 8 the KL heuristic is faster than the MFA heuristic. Note that, as the number of partitions increase, both the solution quality and the speed advantage of the MFA heuristic increases in comparison with those of the KL heuristic. The relative increase in the speed of the MFA heuristic is also observed in the literature [16] for the case of graph partitioning problem. 
Conclusion
In this paper, a mean field annealing (MFA) algorithm is proposed for the circuit partitioning problem using the net-cut model. An efficient implementation scheme is also developed for the proposed algorithm. The proposed implementation scheme decreases the complexity of a single MFA iteration by asymptotical factors.
The performance of the proposed algorithm is experimentally evaluated in comparison with two well-known heuristics (simulated annealing (SA) and Kernighan-Lin (KL)) for a large number of randomly generated circuit partitioning problem instances. The qualities of the solutions obtained by the MFA heuristic are comparable with those of the SA heuristic. In general, the MFA heuristic produces better solutions than the KL heuristic.
The proposed MFA algorithm is significantly faster than the SA algorithm. In general, the MFA algorithm is also faster than the KL algorithm. It is also observed that, as the number of partitions increase, the solution quality and the speed advantage of the proposed MFA heuristic increases in comparison with those of the KL heuristic.
