Abstract-In this paper, a power/ground (P/G) pin assignment method using simulated annealing (SA) for large-scale high-pin-count ball-grid-array (BGA) packages is proposed. Two objective functions describing the power integrity (PI) and signal integrity (SI) of the pinout are introduced. The SA algorithm is customized to meet the needs of the pin assignment problem. Accelerating strategies are introduced, and some special considerations for customized SA optimization are discussed. The SA method can generate large-scale P/G pinout with any power-ground-signal pin ratios ( P 0 /G 0 /S 0 ) in a few minutes. Large-scale BGA packages with more than 2000 pin numbers including the I/O, core, and different-pair blocks can be generated by the proposed SA method quickly, with a similar PI and SI performance compared to the products from Xilinx and Altera.
I. INTRODUCTION

W
ITH the fast development of semiconductor technology and packaging technology, the electronic density of ICs and packages is becoming higher and higher. The modules and the functionalities integrated in a single package are becoming more and more numerous and complicated. Furthermore, the trend of high frequency, low voltage, and large current in package design poses high requirement on the power integrity (PI), signal integrity (SI), and electromagnetic compatibility design of the packages [1] . In package-printed circuit board (PCB) codesign, total inductance reduction and return path optimization are effective ways to improve the PI and SI performance by reducing the crosstalk and the rail-collapse noise [2] , [3] .
As an important part of package design, pinout design draws wide attention of researchers. Some research works focus on the manufacture technology and realization in pinout design and the electrical modeling of package ball grid array (BGA) pinout [4] , [5] . These works provide productive possibility of the BGA packages. In [6] , a methodology for prediction of transient dynamic behavior of a board assembly during drop is developed. In [7] , methods for accurate prediction of the bumps and solders on board reflow are investigated and experimentally verified. In [8] , a ball chart describing pin locations for flip-chip BGA package in chipset design is presented. Some research works pay attention to the global routing optimization [9] . Some works specialize on the pin assignment in pinout design [10] , [11] . These works consider the pin assignment of packages as one of the important factors that influences of the package's PI and SI performance. In [12] , a design methodology is presented to achieve the signalground bump patterns with minimum simultaneous switching noise. In [13] , a customized genetic algorithm (GA) method is proposed to improve the PI and SI performance by optimizing the power/ground (P/G) pin assignment. In [14] , an accelerating strategy is proposed to improve the efficiency of the P/G pin assignment.
P/G pin assignment is an NP-complete combinational problem, which cannot be solved by traditional optimal algorithm within a satisfactory time consumption. Simulated annealing (SA) is chosen to solve this problem because it is one of the metaheuristics that is a suitable approximation for large-scale global optimization.
In this paper, a P/G pin assignment method using SA is proposed for large-scale high-pin-count BGA package design. Two objective functions are derived. The first objective function uses the amount of local condition to quantize the return path quality (SI quality). The second objective function uses the partial inductance between P/G pins to describe the total inductance influenced by P/G pin assignment (PI quality). A general optimization flow is presented, where the basic concepts of SA optimization are introduced and some important considerations for package design are demonstrated. A customized SA is developed in which some basic steps are adjusted, including the definition of neighborhood, the normalization, and the initial solution generation. The verification of the proposed SA method is presented in two aspects: 1) use the SA method to generate large-scale P/G pinout with any power-ground-signal pin ratios (P 0 /G 0 /S 0 ); and 2) use the SA method to generate P/G pinouts of two practical packages and compare them to the ones from Xilinx and Altera.
II. DISCUSSION ON THE OBJECTIVE FUNCTIONS
The objective functions provide the direction for optimal solution in evolutionary algorithms. In the P/G pin assignment optimization, the objective functions describe the influence of P/G pin assignment to the SI and PI performance.
2156-3950 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. The efficiency and validity of an optimization method highly depends on the accuracy of the objective functions. In this section, two objective functions describing SI and PI are presented, respectively.
A. Objective Function on SI
SI objective function is developed based on the distribution of the return P/G pins of every signal pin. In [13] , quality of the return P/G pins is described by the dispersity and the uniformity of P/G pins. In this paper, we use the idea of entropy [15] and the number of unique P/G pin patterns to efficiently describe the PI performance.
Given an arbitrary BGA package pinout A, a dynamic observation window O with smaller size is used to evaluate the P/G pin assignment. The observation window moves all over the pinout plane, recording the P/G pins pattern of each position as shown in Fig. 1 . The amount of P/G pins in the observation window O is recorded in an array T.
Assume that the size of pinout A is M × N and the size of observation window
. The amount of the unique elements in T is used to characterize the uniformity and dispersity of pinout A. Then, the PI objective function can be described as follows:
The U describes the total amount of the unique pin patterns T in pinout A, which is similar to the concept of entropy in information theory [15] . While counting the amount of the pin patterns, some equivalent pin patterns should be considered as the same, as shown in Fig. 2 . While considering the PI performance, P pins and G pins should be both treated as the reference pin as shown in Fig. 2(a) . If one pattern is a rotation of the other pattern, the two patterns are equivalent, as demonstrated in Fig. 2(b) . If one pattern is an axial symmetry of the other pattern, the two patterns are equivalent either, as shown in Fig. 2(c) . However, if one pattern is the translation of the other pattern, the two patterns are not equivalent, as shown in Fig. 2(d) .
Uniformly distribution of pinout means small change in each observation window O, which results in small entropy U (A). If the P/G pins are uniformly distributed in the pin plane, the amount of P/G pins in the observation window O should vary very few no matter where O moves, as shown in Fig. 3(a) . Otherwise, if the P/G pins are distributed unevenly, the amount of P/G pins in the observation window O changes as O moves to areas with different P/G pin densities, as shown in Fig. 3(b) .
B. Objective Function on PI
For PI, since the self-inductance is not affected by pin distribution, the objective function should focus on mutual inductance. In [13] , the mutual inductance between every pair of P/G pins is considered, as shown in Fig. 4(a) . Given a pinout with the P/G pin amount of N, the number of times that the mutual inductance is calculated is given by
In order to simplify the calculation, some adjustment is applied on the objective function proposed in [13] . Considering that the mutual inductance between two P/G pins strongly relies on the distance between them, the mutual inductance between two P/G pins with large distance can be ignored. If only the mutual inductance of two closest P/G pins is taken into consideration, the simplified SI objective function can be described as follows:
where d i1 and d i2 are the distances between P/G pin i and two other closest P/G pins as shown in Fig. 4(b) . a i j is the coefficient of mutual inductance, a i j = −1 when the current flows in pins i and j in the opposite direction, a i j = 1 when the current flows in P/G pin i and j in the same direction, and a i j = 0 when i = j . d max is a constant value that is always larger than the denominator d i j . In (3), the accumulation part ln(d max /d i j ) is the mutual inductance, which is simplified from the partial mutual inductance between two parallel thin wires. The definition of partial mutual inductance is shown as follows:
where l is the length of the wires and d is the distance between two wires. Since l and d are invariable in a BGA package, they can be regarded as constant. By ignoring the constant in (4), the mutual inductance can be simplified as follows:
where
With this new objective function, for a pinout with the P/G pin number of N, the number of times that the mutual inductance is calculated is given by 
C. Verification of the Objective Functions
In Fig. 5 , the verification of these two objective functions is demonstrated using the GA [13] . Two examples are presented. In Fig. 5(a) , the pinout is optimized under the PI objective function L(A). The P/G pins are congregated in a small area. The total mutual inductance is minimized, while most of the signal pins are far from the return P/G. Therefore, this pinout has good PI performance but poor SI performance. In Fig. 5(b) , the pinout is optimized under the SI objective function U (A). The P/G pins are separated uniformly throughout the whole pinout plane without considering the alternatively distribution of P/G pins. Therefore, this pinout has good SI performance but poor PI performance.
These two examples show the effectiveness of the two objective functions, respectively. The combination of the two objective functions is presented in Section III-B.
III. PIN ASSIGNMENT USING SIMULATED ANNEALING
The P/G pin assignment optimization is a combinational optimization problem. Traditional algorithm is difficult to get a solution within satisfactory time consumption on these problems. Recently, probabilistic algorithms find wide use in combinational optimization problem. SA [16] , a typical probabilistic algorithm, is selected to solve the P/G pin assignment problem. SA has the property of gradual approach convergence. The model of SA is simple, and it has wide range of application. SA has been proved as an effective method to solve complicated nonlinear optimization problems. In this section, both the basic flow of SA and the specific issues in pin assignment are discussed.
A. Basic Flow of Customized SA
The mathematical model of SA consists of three parts: solution space, objective function, and the initial solution. The solution space consists of all solutions of the solving problem, including some infeasible solutions maybe. The infeasible solutions may have an objective function value, but they are meaningless in the practical problem. In Section III-B, a mechanism that can ensure that all the new solutions in the iterations of SA are feasible will be introduced. The basic flow diagram of the customized SA for BGA pin assignment is shown in Fig. 6 . It can be implemented by six steps. 1) Initialization: Determining the initial temperature T 0 and the final temperature T F . Generating initial solution S. 
if r > r and replace S with S , where rand is a random number in [0, 1). 5) If the end-of-iteration condition is met, output the recent solution S as the optimal solution. 6) Lower the temperature T , if T > T F , return to step 2)
where α is between 0 and 1. During iteration, SA searches not only toward the good but also the bad direction, which ensures that SA can escape from local optimum.
In the customized SA proposed in this paper, several adjustments are applied based on the traditional SA as follows.
B. Redefining of the Neighborhood
In [13] , a mathematical expression of the P/G pinout called matrix-coding is proposed. A P/G pinout is represented by a 2-D matrix containing 0, 1, and 2. In the customized GA proposed in [13] , all the GA operators will not change the feasibility of all the solutions in a population.
In the SA method proposed in this paper, this mathematical expression of the P/G pinout is still used. Some adjustments are applied on the SA operators to maintain the feasibility of the solutions in each iteration.
In order to ensure the feasibility of the final solution, two conditions should be reached: 1) the initial solutions are all feasible; and 2) the solutions in the neighborhood are all feasible. As the generation of solutions uses the same method in [13] , the initial solutions are all feasible, which means condition 1) is reached. As for condition 2), the neighborhood of a solution using matrix coding is redefined in the following.
As shown in Fig. 7 , the neighborhood of the solution S is redefined. The neighborhood of the solution S includes two kinds of S . The first kind of S is those that exchange a P/G pin and a signal pin of S. The second kind of S is those that exchange a P/G pin and another P/G pin of S. In this neighborhood, all the possible solution S are feasible. In this way, the initialization and the generation of all new solutions in SA can maintain the feasibility.
C. Multiobjective Functions Optimization
In Section II, two objective functions L and U are deduced to describe the PI and SI quality of a P/G pin assignment. In our customized SA method, a fitness function of a solution A is defined by the following weighted sum of two objective function values:
where w is a nonnegative weight for the L, which satisfies the following range:
This fitness function changes the two-objective optimization problem into a single-objective optimization problem. The direction of the convergence in the customized SA is determined by the weight value w. The value of w should be set before the SA method begins according to which species of P/G pins is to be assigned. For example, while the P/G pinout to be assigned focuses on the return path quality, the value of w should be set smaller than one. Then, U slightly outweighs L, so the SA would pay more attention to the return path quality during the iteration. If the P/G pinout to be assigned focuses on reducing the mutual inductance, the value of w should be set larger than one. Then, the U weighs less than L, so the SA would pay more attention to reduce the mutual inductance during the iteration. The influence of w on the optimization results is shown in Fig. 8 .
In practical application, while generating a P/G pinout in the I/O part or differential pair (DP) part of a package, the value of w is usually set to 0.6-0.8. While generating a P/G pinout in the core part of a package, the value of w is usually set to 8-10.
D. Normalizing the Objective Functions
In fact, (9) cannot be applied directly because the values of L and U are not normalized. If the absolute value of L is much larger than the one of U , then in f (A), L still outweighs U even if the value w of is set smaller than one.
In the GA and particle swarm optimization, the normalization used to be realized according to the relative value. Because in the GA and particle swarm optimization there are several solutions in the iteration at the same time, the relative value of one solution is determined by its absolute value and the absolute values of the best solution and the worst one. However, in SA, there is only one solution at the same time of the iteration. Therefore, determining the relative value is the key to the normalization in SA.
For this reason, a pretreatment is proposed to determine the relative objective function values. Before the multiobjective SA starts, four single-objective SA optimizations are processed, as shown in Fig. 9 . These four single-objective optimizations figure out the upper limit and the lower limit of two objective functions. For different amounts of P/G pins, the upper limit and the lower limit of two objective functions are different. However, once the amount of P pins and G pins is determined, the upper limit and the lower limit of two objective functions can be determined. The relative objective function value and the fitness value of a solution A are defined as follows:
where L worst and L best are the lower limit and upper limit of L, respectively, and U worst and U best are the lower limit and upper limit of U , respectively. In this way, the objective functions are normalized. By setting w to the suitable value, two objective functions can lead the SA to the preconceived point of convergence.
IV. ACCELERATING STRATEGIES
FOR SA PIN ASSIGNMENT In Section III, the customized SA for P/G pin assignment is proposed and the details of its implementation are discussed. In this section, some strategies are applied on the proposed SA pin assignment method to improve the performance. These rules of thumb cannot be included in the basic SA flow; they are the customized adjustment based on the P/G pin assignment problem.
A. Initial Solution Optimization
In general SA, initial solution is usually generated randomly in the whole solution domain. The randomness is one of the guarantees of the robustness and potency of the modern intelligence optimization algorithm. Based on the totally random initial solution generation in the traditional SA as shown in Fig. 10(a) , some design rules in package pinout design are added in our customized SA.
According to the PI and SI requirements mentioned in Section II, the P pins and the G pins should be placed uniformly and alternately. As shown in Fig. 10(b) , while generating the intimal solution, the P/G pinout is divided into several small blocks with the same size. Then, the P pins and the G pins are assigned into each block as averagely as possible. Within each block, the P/G pins are assigned randomly. In this way, the initial solution not only ensures the randomness for the SA, but also incorporates the rules in P/G pinout design, which can improve the efficiency of the convergence in SA.
B. Using Static Template for Large-Scale Pin Assignment
While assigning P/G pins for some extreme large-scale package, the intelligent algorithm SA takes considerable time to find the optimal solution. In this case, the idea of divide and conquer is applied on the problem. The large-scale pin plane is divided into several smaller pin blocks. In this way, the exponential explosion of the combinational problem is under control well. In [14] , a fast P/G pin assignment method using static template was proposed for large-scale high-pincount packages.
In the static template method, the large-scale pin assignment problem is replaced by a small-size pin assignment problem using a constant template. The P/G pin assignment in this template is a reproducible P/G pinout generated based on the total P/G pin ratio. By applying the static template on the SA method proposed in this paper, the efficiency of the large-scale pin assignment can be improved significantly. The P/G pin assignment for four large-scale packages with different P and G pins number is shown in Fig. 11 . The time consumption and the sizes of static templates are also illustrated. Large uniform pinouts with a size larger than 40 × 40 are generated within a few minutes.
V. EXPERIMENTAL RESULTS
In this section, two practical examples of the package P/G pin assignment using SA are introduced in detail. Two latest high-performance high-pin-count package productions, Xilinx XCVU440-FLGA2892 and Altera Stratix V GX 5SGXBB, are used as references. 
A. Xilinx FPGA XCVU440-FLG2892
The P/G pin assignment provided by Xilinx is shown in Fig. 12(a) . This pinout is partitioned into three blocks in Fig. 12 : The I/O block for I/O communication with strong driver power is boxed by a green line at the borders, the core block for core power delivery with low voltage is boxed by a yellow line in the center, and the DP block for high-speed series communication is boxed by a blue line on the left. Because of the different usage of these blocks, their powerground-signal pin ratios (P 0 /G 0 /S 0 ) are different. Therefore, these three blocks are generated independently while using our SA method. After the assignment of all three blocks, they are combined into one piece as the final integrated pinout. When generating these three blocks, the parameters of the SA optimization method are different. In the I/O block, major pins are signal pins for communication, so a fewer P/G pins supply the I/O power. For the I/O block, the P 0 /G 0 /S 0 is 1:1:8. In the core block, there is no signal pin. The same number of P pins and G pins provide the power supply for the central control unit. Therefore, for the core block, the P 0 /G 0 /S 0 is 1:1:0. In the DP block, half of the pins are signal pins, assigned one next to another for DP signal. Another half of the pins are P pins and G pins for reference and shielding. Therefore, for the DP block, the P 0 /G 0 /S 0 is 1:1:2. An electromagnetic shielding is manually built by adding a row of G pins between the I/O block and the DP block. The final P/G pinout from our SA method is shown in Fig. 12(b) . All parameters of the SA optimization and running times are listed in Table I. A color map is a visual assessment method to evaluate the SI and PI performance of a P/G pin assignment [14] . An SI color map is used to estimate the SI performance of a P/G pinout. The yellow area represents the P/G pins themselves, or the S pins that are close to P/G pins (return path). The orange area represents the S pins that are far away from P/G pins. The deeper the color is, the worse the PI performance of this area is, as shown in Fig. 13(a) and (b) . PI color map is used to estimate the PI performance (power distribution network quality) of a P/G pinout. The green area is the neutral mutual inductance area of the pinout, standing for S pins. The cooler area represents the P/G pins with negative mutual inductance (good for SI), while the warmer area represents the P/G pins with positive mutual inductance (bad for SI), as shown in Fig. 13(c) and (d) .
By comparing the color maps of the Xilinx pinout to the ones of the SA's result, it can be concluded that these two pinout assignments have almost the same SI and PI performance. For both SI and PI color maps, the shape, the luminance, and the saturation of two pinouts are highly similar.
B. Altera FPGA Stratix V GX 5SGXBB
The second example is Altera FPGA Stratix V GX 5SGXBB and the P/G pin assignment provided by Altera is shown in Fig. 14(a) . This pinout is partitioned into four blocks in Fig. 14: the I/O block in the green box, the core block in the yellow box in the center, and two DP blocks in blue boxes on the left and right. In the SA method, these four blocks are assigned independently as well. The major difference between the Xilinx SA pinout and Altera pinout is the P 0 /G 0 /S 0 of each block. For the I/O block, the P 0 /G 0 /S 0 is 1:1.85:8.25. For the core block, the P 0 /G 0 /S 0 is 1:1:0. For the outer part of DP blocks, the P 0 /G 0 /S 0 is 0:1:1. For the inner part of DP blocks, the P 0 /G 0 /S 0 is 1:1:2. The integrated pinout using SA method is shown in Fig. 14(b) . All parameters of the SA optimization and running time are listed in Table II . By comparing the color maps of the Altera pinout shown in Fig. 15(a) and (b) to the ones of the SA method shown in Fig. 15(c) and (d), some differences are found. 1) For the I/O block, the SI color map of SA method is lighter than the one of Altera overall, which means the PI performance of SA pinout is better. 2) For the I/O block, the PI color map of SA method pinout is slightly less neutral. The warm area is warmer and the cool area is cooler in the SI map of Altera pinout. In general, the SI performance is comparable for two pinouts. 3) For the core block and the DP block, irregular discontinuity is found in both the SI color map and the PI color map of the Altera pinout. However, the irregular discontinuity does not exist in the color maps of SA pinout. The reason for the third difference mentioned above is some irregular assignment of P pins in the inner part of the DP block and some irregular assignment of do not use pins in the core block in the Altera pinout, which can be found in Fig. 14(a) . This kind of irregular assignment cannot be realized by the SA method because no objective function can describe it. In order to get a more similar result to the Altera pinout, some manual adjustments are applied on the SA result: 1) Some P pins are added on the inner part of two DP blocks; and 2) some P/G pins on core block are removed. This manual adjustment is very easy to apply based on the pinout of the SA method. It takes only a few seconds. The time consumption can be negligible. The pinout after manual adjustments is shown in Fig. 14(c) . The color maps of it are shown in Fig. 15 (e) and (f). According to the color maps, for the SA method pinout, the SI and PI performance in the DP blocks and the core block is almost the same as the Altera pinout. For the SA method pinout, the SI and PI performance in the I/O block is even better than the Altera pinout.
VI. CONCLUSION
In this paper, a P/G pin assignment method using SA for a large-scale high-pin-count package is proposed. Two new objective functions describing the PI and SI of power and ground pinouts are introduced. The SA algorithm is customized to meet the needs of a pin assignment problem. The special issues of SA in the BGA pin assignment are discussed in detail, and some accelerating strategies are introduced for fast optimization. Using the static template, the SA method can generate a large-scale P/G pinout with any P 0 /G 0 /S 0 ratio in a few minutes.
