Abstract-A unified optimization framework is presented for simultaneous gate sizing and placement. These processes are unified using Lagrangian multipliers, which synchronize the efforts of the gate sizing and placement subproblems. As far as we know, this is the first work that formulates and solves the simultaneous gate sizing and placement under area density constraints, which are handled by the quadratic penalty method. We show that this rigorous framework results in an algorithm that is faster than separate iterations of gate sizing and placement steps, and leads to more robust results for a set of benchmarks.
INTRODUCTION
The success of Moore's Law in the past four decades has resulted in designs with many millions of transistors, if not billions. This increase in the density of the transistors has also resulted in a tremendous increase in the power of each design. While the number of transistors increases by a factor of 2x every 18 months, the power that each transistor consumes decreases at a much slower pace.
From a design standpoint, this presents a problem. The increased power drives the need for increasingly effective algorithms, while the increased size makes these problems more complex. Nonetheless, there has been good progress in physical design over the past decade, with multilevel paradigms in circuit placement [3] and with Lagrangian relaxation methods in circuit sizing [5] . These algorithms scale well with the size of the problem and are algorithms that facilitate design at a large-scale.
This leads to a natural question -can circuit sizing and circuit placement be unified? There are heuristic techniques that can be used, but the problem may be too difficult to obtain an exact solution. It is well-known that the general placement problem is NP-hard [12] . In addition, the gate-level delay function, with placement and gate sizes as variables, is nonconvex [6] . These difficulties create challenges for simultaneous sizing and placement.
Delay minimization using simultaneous sizing and placement has been shown to be effective. In [6] the authors begin with the simultaneous optimization problem, and reduce the problem into smaller iterations of placement and sizing for the worst paths in the circuit. They show that this results in an improvement of 15% over sizing alone for a 0.35-µm process. In [7] the authors formulate an exact algorithm for the delay optimal mapping of trees, and extend the results to general circuit topologies using Lagrangian multipliers. Their results show that this performs better than an ad hoc net-weighting methodology, but overlap or area density constraints are not considered.
The authors of [8] consider the power minimization problem. The algorithm first uses iterations of placement and sizing to improve the slack vs. power tradeoff. The slacks are then used by a V th assignment algorithm to minimize the leakage power of the design. They show significant improvement for a 65nm process.
In this paper we present a new unified optimization method for simultaneous sizing and placement. This algorithm is interesting from both an algorithmic and mathematical point of view for several reasons: Lagrangian multipliers are used to join the placement and sizing problems, acting as both net weights in the placement, and delay weightings in the sizing. The overlap constraints are handled using a single penalty factor. In practice, our algorithm is not slower than a single round of sizing and placement, thus allowing the solution of large-scale problems.
Our algorithm is distinct from previously referenced algorithms because it features all of the following:
• Area density constraints • Power minimization • Unified framework for sizing and placement
The contributions of this paper are:
• Area-density-aware simultaneous placement and sizing • Scalable methodology for handling area density, timing, and power
•
Results that compare separate iterations of sizing and Lagrangian multiplier-based, timing-driven placement with our unified optimization method
The remainder of this paper is organized as follows. In Section II we provide an example of the unified sizing and placement optimization, which motivates us to design a different approach. In Section III we describe the notations and mathematical formulation of the simultaneous optimization problem. In Section IV we present our unified optimization framework, which uses Lagrangian multipliers to provide a consistent objective for the placement and sizing subproblems. In Section V experimental results are presented to show the runtime and solution quality of our algorithm.
II.

A MOTIVATING EXAMPLE
A typical physical synthesis flow includes iterations of separate placement and gate sizing steps. Based on a wirelength-driven placement, iterations of density-aware gate sizing and timingdriven placement are performed to meet the timing constraints and minimize power consumption.
In Figure 1 we show the optimization results of iterations of gate sizing and placement on the circuit c1355 from the ISCAS-85 benchmarks [13] . The sum of wire and gate capacitances is used as an indirect measure of power consumption. The actual power value can be viewed as a weighed sum of the capacitances, and does not affect the following analysis in terms of optimization. In Figure 1 we can see that the total capacitance continues to decrease after several iterations. However, the iterations of full sizing and placement are time-consuming, and furthermore, each step of the separate sizing and placement ignores the objective of the other and slows down the convergence. Therefore, we develop a unified optimization for the simultaneous sizing and placement problem, which is faster and has a stronger mathematical foundation. III.
MATHEMATICAL FORMULATION
The power-driven simultaneous gate sizing and placement optimization problem can be written as
wire gate
where x is the vector of placement variables, and s is the vector of gate sizing variables. The box constraints, i.e., the fixed-outline placement constraints on x and upper and lower bounds on , s are omitted for simplicity. 
are the weighted capacitances of wires and gates. This is a good proxy for the total power, as total capacitance is related to the power dissipation of the design. In the above, i α is set as the per unit-length capacitance of net , i e and set i β as the capacitance of gate i v for a minimum-sized gate. Improved models can also be considered by adding information about the switching activity and the leakage power. In this paper we shall assume that these parameters are given and focus on solving problem (1). The nonoverlapping constraints are usually formulated as area density constraints ( , ) A x s C ≤ in analytical placers [9] .
The constraint ( , ) D x s T ≤ is a shorthand notation for the delay constraints. Given a circuit where the set of primary inputs is , PI the set of primary outputs is , PO and the set of fan-ins of node j is FI(j), the delay constraints can be written using the dynamic-programming formulation:
The minimum values of t that satisfy the conditions in (2) are the arrival times at each node.
Posynomial models are used to model the size vs. delay relations, and the Elmore delays are used for wire delay [11] . In other words, we assume that the gate delay is given as a function:
where the coefficients i α , k β are fitted from the data. For simplicity, we use the half-perimeter wirelength for ( ) :
The problem (1) does not have a convex formulation -it is not jointly convex in x and .
s However, the delay function is convex in s for a fixed x and is convex in x for a fixed . s
IV. OPTIMIZATION FRAMEWORK
There are three levels to the optimization process. At the highest level, the density penalty μ is adjusted to find a densityfeasible solution. The next level down updates Lagrangian multipliers λ to find a power-minimized solution that meets the delay constraints. At the lowest level, placement and sizing subproblems are solved efficiently for a given set of Lagrangian multipliers .
λ
A. Quadratic Penalty for Density Constraints
Density and overflow are major considerations in problem (1) and have been an active research topic in the past decade [9] . In our work the density and overflow constraints are handled using an analytical framework [3] with an area penalty.
The first step is to convert the area density inequality constraints in problem (1) to equality constraints by adding filler cells [4] :
The area density equality constraint is achieved by penalizing the objective function with a quadratic penalty term [10] , where the penalized objective function is defined as
) ( Q μ is computed for different values of μ . In theory, the area equality constraint will be satisfied as μ increases. This process is outlined in Algorithm 1, which solves problem (5). It is required that
and we set the penalty update factor γ to 1.1 in the implementation.
B. Lagrangian Method for Delay Constraints
Algorithm 1 requires the computation of ( ), Q μ which solves the following delay constrained problem:
We can handle the delay constraints by using the Lagrangian method [2] , where the Lagrangian function with dual multipliers 
It was pointed out in [5] that λ should satisfy the flow (dual feasible) condition as in equation (9); otherwise the function ( , , ; )
The auxiliary variables for delay constraints are eliminated when .
λ ∈ Ω The Lagrangian function is simplified as in equation (10), and the Lagrangian method is listed in Algorithm 2. We use a sequence of step sizes { } ε which is a user-defined stopping criterion. In the implementation we set ε to 0.001.
V. EXPERIMENTAL RESULTS
Our unified optimization method was tested on the ISCAS-85 benchmarks [13] . They are synthesized to a 45 nm library [1] , and gate delay and power for the continuous sizes are modeled in a least-square fit. The placement region is set to be a square with 20% whitespace, and the timing constraint is set to be 0.90X of the delay after one iteration of wirelength-driven placement and a subsequent gate sizing for delay minimization. The sizes of the benchmarks range from hundreds of gates to thousands of gates, and are run on a Quad Core Xeon 2GHz machine with 2GB memory. The statistics of the synthesized netlist are shown in the first three columns of Table 1. For comparison, the separate Lagrangian multiplier-based gate sizing and timing-driven placement algorithms are implemented in a way that is similar to the unified optimization algorithm, i.e., by solving only the sizing problem (Problem (1) with x fixed), or the placement problem (Problem (1) with s fixed). Separate iterations of the gate-sizing algorithm, followed by timing-driven placement, were performed. In other words, a sizing is run to minimize power with timing constraints, and then a placement is run to minimize power with timing constraints. The power is 
Given 0, and a starting point ( , );
Find a minimizer ( , ) of ( , ; ), starting at ( , ); (using Algorithm 3) measured by the total gate capacitances and wire capacitances. This is used to compare the benefits of using a unified formulation to those of a separate formulation.
Experimental results are shown in Table 2 for the unified optimization and iterations of separate optimizations. The total capacitance, which is the sum of the gate capacitance and wire capacitance, is listed as well as the final slack. The runtime comparison of the unified optimization and the separate iterations is shown in Table 1 .
Our unified optimization method can meet almost all timing constraints (9 out of 10) with 2% lower power than the separate iterations. Moreover, the runtime of the unified optimization is shorter than a single iteration of the sizing and placement (40% shorter on average), which demonstrates a clear advantage of using the unified optimization method. The runtime benefit comes from the shared Lagrangian multipliers that enable a global view of the joint sizing and placement problem, such that the sizing and the placement subproblems are optimizing a consistent objective function during the timing satisfaction process.
VI. CONCLUSIONS AND FUTURE WORK
We presented an algorithm that joins the gate sizing and cell placement methodologies into a unified framework. This method uses a penalty to manage the density and overlap constraints, and uses Lagrangian multipliers to manage the timing constraints. The algorithm performs well, with large runtime benefits.
Future work will include applying the unified optimization methodology to the simultaneous detailed placement and discrete sizing problem. Studies will be done to apply hierarchical methods to update the multipliers for large-scale designs. 
