Abstract-A flow is presented for the automatic synthesis of an analog circuit layout based on a schematic and a list of circuit design parameter values. The flow is driven by design, placement, and routing constraints-no layout template is necessary. Every possible layout for each device in the circuit is investigated; the layouts with the best geometric features and smallest quantization error (due to manufacturing grid alignment) are kept. For circuit placement, a complete enumeration of possible circuit placements, limited only by usual constraints of symmetry, proximity, and common centroid, is performed. Out of this enumeration a final circuit placement is selected and routed. The new flow is integrated with a deterministic nonlinear optimization algorithm to perform layout-driven circuit sizing; layouts are synthesized during both gradient approximation and next step determination. Layout-driven circuit sizing was applied to two example circuits. Sizing of the first circuit example took 8x the amount of CPU time needed for traditional circuit sizing, but remained feasible at 2.1 h of wall clock time on a contemporary workstation.
I. Introduction
T HE ANALOG circuit design flow consists of three steps: topology selection, circuit sizing, and layout synthesis. Post-layout electrical verification is also necessary to ensure that the layout meets the specifications placed on performances, such as gain and bandwidth.
Deterministic optimization algorithms combined with numerical simulation have been successfully employed to automate circuit sizing. In the algorithm of [1] , design parameters are mapped to performances using numerical simulation (SPICE), and a cost function based on performance specifications is minimized; the algorithm terminates successfully when all specifications are met. This can occur before a local minimum of the cost function is reached. Process parameters are also considered so as to meet yield objectives. A commercial implementation is available [2] . In [3] , the Pareto-optimal tradeoff between performances was calculated, while in [4] , a charge pump PLL was sized to meet specifications. The advantage of numerical simulation over methods that reduce circuit complexity using knowledge based equations, function fitting, or modeling is in the accuracy of calculated results. The disadvantage lies in the greater cost of numerical simulation. It is imperative to take into account the effect of layout synthesis in order to obtain a useful solution to the circuit sizing problem. Circuit performance values may change significantly due to layout-induced parasitic components. Layout area is difficult to estimate prior to layout synthesis without a large margin of error. A specification set on a performance or on area could become a critical term in the optimizer cost function, and the result of optimization could become sub-par post layout synthesis, or fail electrical verification.
Systematic and intra-die random process parameters that depend on device placement, such as the distance between symmetric devices in Pelgrom's law [5] , process gradients, and anisotropic effects cannot be accounted for accurately prelayout. This can be mitigated by the use of geometric design constraints, as well as the recognition of circuit structures affected by local mismatch and the employment of knowledge based heuristics during layout [6] . These heuristics can be formulated into placement and routing geometric constraints that can drive an automated layout flow.
The concerns described above shall be addressed in this paper by integrating layout synthesis with a deterministic algorithm in a layout-driven solution to the circuit sizing problem, as illustrated in Fig. 1 .
Several layout-driven circuit sizing methods, as well as automatic placement and routing methods can be found in the state-of-the-art literature; they are reviewed in Section II. The novelty in the solution of this paper is listed below, and detailed in Section IV. 1) A deterministic nonlinear optimization algorithm is used. The number of iterations is small, typically under 10. 2) Multiple layouts are synthesized in each iteration. The number of layouts is proportional to the number of design parameters. 3) Each layout is synthesized from scratch by a layout synthesis procedure driven by a set of design, placement, and routing constraints. No layout template is necessary. 4) DC electrical constraints are employed to ensure correct CMOS operating region and device matching. The effect of routing on constraint satisfaction is also considered. The proposed layout synthesis procedure is distinguished from template-based methods in some categories.
1) For each device in the circuit, a multi-valued mapping between device design parameters (such as CMOS transistor width and length) and possible device layouts is performed. Only layouts that satisfy certain geometric and quantization error constraints are considered 0278-0070/$26.00 c 2011 IEEE as possibilities for placement. This is described in Section VI. 2) For devices in common centroid configuration, the number of divisions and the interleave pattern is selected during layout synthesis for an optimal layout, as described in Section VII. Traditionally, the number of divisions is fixed at the schematic level. 3) For circuit placement, a complete enumeration of possible circuit placements, limited only by usual constraints of symmetry, proximity, and common centroid, is performed. This is described in Section VIII. In the state of the art, devices have a set location in the layout template-typically fixed by a single slicing or B* tree. 4) Out of this enumeration a final circuit placement is selected based on considerations of area, aspect ratio, electrical performance, and routing congestion. There are more criteria in selecting the final placement than in the state-of-the-art template methods. It will be shown empirically by the examples of Section XII that the new sizing flow is feasible, produces more usable results than topology level sizing, and is a tenable alternative to template-based methods as used in the state of the art. Conclusions are drawn in Section XIII.
II. State of the Art
The state of the art in layout-driven (or layout-aware) circuit sizing is based on the use of layout templates [7] - [9] . A template specifies the spatial relation between circuit devices, as well as fixed interconnect paths for routing. The template is created for each new circuit example. In [7] , a template defined by a slicing tree is used to estimate circuit area and layout parasitics. Interconnect parasitic estimates are stored in a lookup table associated with the template, while analytical-geometric techniques are used to extract the parasitics of placed devices. A simulated annealing algorithm is used for circuit sizing, requiring several thousand iterations for convergence in the given circuit examples.
Other methods, such as [10] and [11] , are aimed at process migration, or performance retargeting. An existing circuit layout is used as a template, device dimensions are modified, and interconnects are shrunk or extended to meet design rules.
For process migration, the layout design rules may change prohibiting a direct downscaling of a template. As an example, in some 65 and 45 nm technologies, transistor gates must be aligned on a grid while all gates must share the same orientation. It may also be difficult to avoid new routing conflicts or an increase in routing congestion with downscaling.
For performance retargeting, the aspect ratio of circuit devices may become extreme if the device parameters, such as CMOS width, change by a significant amount. This was solved in the references by the addition of geometric constraints on template devices. However, due to the fixed spatial relation between template devices, these geometric constraints must be severe, this, coupled with other template limitations (for instance the number of CMOS fingers is fixed), will decrease the size of the optimization search space.
Some non-template-based layout-driven sizing methods exist, but rely on simplifying approximations for performance evaluation, layout construction, and the modeling of layout parasitic devices in order to perform expeditious circuit sizing. In [12] , a linear regression model of the performances is used. The design space is sampled and a layout netlist is generated for each sample to define model parameters. The Pareto tradeoff between performances is then explored using a multiobjective simulated annealing algorithm. Only layout parasitic devices are roughly approximated, while geometric constraints and matching are not considered. In [13] , performance sensitivity to node capacitance and device mismatch is used to direct placement using an algorithm based on slicing trees. Different shapes are considered for each device. A custom fast circuit simulator is used, however only DC and AC performance sensitivities can be calculated.
Several constraint-driven placement and routing algorithms can be found in [14] - [17] and [18] , [19] , respectively. In [20] , the circuit graph is subdivided into hierarchical proximity and symmetry groups and layout constraints are automatically generated. The tool of [17] was then used for the placement generation of several example circuits. The results of postlayout simulation indicate that they were of high quality. These algorithms can be employed to construct a design flow to automatically synthesize compact layouts for a circuit based on a set of user defined or structurally recognized placement and routing constraints, as is described in Section IV.
III. Notation and Problem Definition
To help with the remainder of this paper, the circuit sizing problem is formulated. Let x d be the ordered vector of circuit design parameters, such as transistor widths and lengths, |x d | is the size of x d . Let c be the vector of DC electrical constraints, as addressed in [21] ; the value of c depends on the circuit DC bias point for a given value of x d
Let f be the vector of circuit performances, such as power, gain, and bandwidth
Starting with an initial value, x d = x d-initial , the goal of circuit sizing is to find a vector x d = x d-final for which the corresponding values of c and f satisfy
Vectors f L and f U denote the lower and upper specifications.
IV. Contribution of This Paper
The major focus of this paper is the development of an efficient layout-driven circuit sizing procedure using constraintbased layout synthesis and a deterministic optimization algorithm. As will be seen, the layout synthesis procedure has more degrees of freedom than template-based flows in placement selection, and will minimize the quantization error due to manufacturing grid alignment. The advantage over existing nontemplate flows is that the method here uses numerical (SPICE) simulation for performance evaluation. It is not limited only to DC and AC simulation. In addition, layout parasitic capacitance is extracted by an integral equation field solver with no modeling. The effect of routing resistance on DC constraints and circuit DC bias is also considered, as is routing congestion. The used placement algorithm also considered the widest range of placement arrangements using (non-slicing) B*-trees. The deterministic algorithm detailed in [1] and [2] is used for circuit sizing. The algorithm is thoroughly explained in the references, however the basic steps and cost are shown in Fig. 2 and briefly explained in Section V.
A flow is presented for automatic layout synthesis in Fig. 3 . The proposed flow considers all layout possibilities for each individual device. After enumerating all device layouts, all possible circuit placements are considered by a placement algorithm, and an optimal placement is selected based on aspect ratio, area, electrical performance, and routing congestion. The design flow steps are detailed in Sections VI through X.
V. Circuit Sizing Algorithm
In the first stage of each iteration, a scalar minimization subproblem is formulated based on a linearized approximation of f and c. In the second stage, the sub-problem is solved to give the next step in the design space using a modified trust-region algorithm called generalized boundary curve (GBC).
To perform layout-driven circuit sizing, a new layout is synthesized every time f has to be calculated for a new value of x d . The calculation is made by the simulation of the layout extracted netlist; this is loop 1 in Fig. 2 . In contrast, the DC electrical constraints, c, are calculated by the simulation of the schematic netlist; this is loop 2 in Fig. 2 . In Section X, a procedure is given to ensure that the DC electrical constraints remain fulfilled after layout synthesis by setting a maximum routing resistance limit.
Of interest is the total number of synthesized layouts during circuit sizing, because layout synthesis dominates the cost (time) of the layout-driven circuit sizing procedure. A formula for the number of synthesized layouts is derived here.
To formulate the sub-problem, the gradient of f with respect to x d is estimated by forward finite difference. This requires |x d | performance evaluations (one for each gradient component), so that |x d | new layouts are synthesized. If k (i) is the number of circuit design parameter vectors at which the performances are evaluated to determine the next step by GBC in the ith iteration, then k (i) additional layouts need to be synthesized. The total number of layouts synthesized in the i-iteration is therefore
If m optimization iterations are needed to fulfill (3), then the total number of synthesized layouts is 
Design parameters such as bias voltage or current do not contribute to the number of generated layouts, as they do not 
(described by rotation/reflection matrices) physically alter the layout. They are subtracted from |x d | in (4) . The number of circuit design parameters is also reduced by finding functional circuit blocks and applying geometric equality constraints, as explained in [21] . For example, the CMOS devices constituting a current mirror have equal length. The number of optimization iterations is reduced by selecting a good initial point, x d-initial . This can be achieved by performing traditional analog circuit sizing with relaxed specifications prior to performing layout-driven circuit sizing.
VI. Constrained Multi-Valued Mapping of Device
Design Parameters to Layout Parameters Each device in a technology library is associated with a list of layout parameters used to realize the actual geometric layout. For example, device PCELL functions are used in the Cadence framework for automated device layout from a parameter list. The list of layout parameters is denoted here by x l . The space of all valid device layouts is denoted by S xl , such that x l ∈ S xl . Also associated with each device is set of device design parameters, these will be denoted by x dd . The device design space is denoted by S xdd , such that x dd ∈ S xdd .
It should be noted here that the device design parameter sets of all the devices in the circuit are combined together in the circuit design parameter vector, x d . Bias voltages and currents designated as design parameters are also added to x d .
For an NMOS transistor, the lists x dd and x l and associated domains are described in Tables I and II, respectively. The typical NMOS device design parameters are the (total) transistor width and length. They are treated as continuous parameters in the circuit sizing problem solved by optimization. This is due to a limitation of the optimization algorithm. The NMOS layout parameters are the number of device fingers, as well as the transistor finger width and length. Additional layout parameters define the allowed device orientations and the location of substrate taps.
In general, multiple valid layouts can be realized for each set of device design parameter values, and the mapping between S xdd and S xl is a multi-valued map. In Fig. 4 , five different layouts, denoted by parameter vectors x (1) l to x (5) l are valid for the 
same value of device design parameter vector x dd . Constraints are applied to the mapping to reduce the quantization error due to manufacturing grid alignment. Furthermore, a preclusion of layout realizations with extreme aspect ratios will improve placement compactness and routing quality, and reduce the susceptibility to systematic process parameter variations. The constrained multi-valued mapping between design and layout parameters is described below for an NMOS device. Algorithm 1 is an equivalent given in pseudo code.
Designer preferences are handled first. For example, for circuit device NMOS-i, left substrate taps are to be used, such that S Second, equations are defined between layout and device design parameter values. In (5), 1 the value of L f is fixed by device design parameter L tot , while W f is written as a function of n f and W tot . Constants W step and L step are the minimum increment step for width and length allowed because of layout manufacturing grid alignment. This is handled in lines /6/ and /7/ of Algorithm 1
Third, additional geometric constraints can be added to ensure compliance with the layout design rules, such as (6); or to discard skewed device geometries that will not lead to reasonable final circuit placements, such as (7). This is handled in lines /9/, /13/, and /14/ of Algorithm 1
As min ≤ device width device length ≤ As max .
Fourth, a ceiling on the quantization error due to manufacturing grid alignment when mapping to layout parameters is set. For example, by (8) , the amount of error in transistor width is a modulated function of n f . Only values of n f that (7), and (9) are satisfied for n f =4, 6, 8, 10, 12.
Algorithm 1 Enumerate-NMOS-layouts
(apply the designer preferences to reduce the layout space)
(e.g., call a PCELL in the Cadence framework) /12/ Aspect-ratio ← device width/device length /13/ if Aspect-ratio < As min then next iteration /14/ if As max < Aspect-ratio then next iteration (Add the current layout parameters to the output set)
result in a small error magnitude, as given by (9), are selected. This is handled in lines /8/ and /10/ of Algorithm 1 
VII. Mapping of Device Design Parameters to Layout Parameters for Devices in Common
Centroid Configuration A single device in the circuit topology may be divided into a number of smaller identical devices in the layout. This is typical when devices are to be laid out in a common centroid configuration to improve matching. Dummy devices may also be used to complete the symmetry of the centroid array. The number of divisions is traditionally fixed in the circuit topology prior to circuit sizing. However, if the value of device design parameters is allowed to vary within a large range, then a fixed number of divisions may yield suboptimal layouts or lead to constraint violations. An example is given in Fig. 6 . Two matched CMOS devices, A and B, are laid out in a common centroid configuration. Initially, W tot = 100 μm for the two devices and a multiple of M = 4 for each device is reasonable. If W tot is reduced to 50 μm, then a multiple M = 2 presents a better layout geometry. Furthermore, if W tot becomes very small, then M must be reduced so that (6) is not violated.
In order to solve the problem described above, an extension to the constrained multi-valued mapping algorithm of Section VI is given here for NMOS devices in common centroid configuration. Algorithm 2 is an equivalent given in pseudo code. Designer specifications are imposed on the layout parameters as in the previous section. This is handled in lines /2/ and /4/ of Algorithm 2. A set of divider values, denoted by S M , is defined. The set S M will denote the divider values sanctioned during placement, For example, if S M = {2, 4, 10, 18}, then a device may be divided into two, four, 10, or 18 devices in the layout. This is handled in line /5/ of Algorithm 2. The set of possible number of fingers, S nf , is also assigned a small set of values. This is handled in line /6/ of Algorithm 2. Geometric constraints and quantization error limits are defined as in the previous section, however the divider term, M, is added to (5) and (8) to become (10) and (11), respectively
A tradeoff exists between the number of fingers and number of device multiples. (6), (7), and (define the set of finger numbers)
for each n in S nf do (find the largest possible number of device multiples) 
is minimized} return (9) with W f and W error defined by (10) and (11) . Of the shaded cells, [n f , M] = [4, 4] has the lowest order and is selected for layout. After all constraints are taken into account, the best valid combination of [n f , M] is selected for the device layout. This is handled in line /22/ of Algorithm 2.
In Algorithm 2, if W tot or the ratio W tot /L tot is very small, then the device will default to a layout with a single gate and no divisions, so that [n f , M] = [1, 1] . This is achieved by the steps in lines /9/, /10/, and /11/.
VIII. Circuit Placement Generation and Routing

A. Placement Generation
The next step in the flow of Fig. 3 is constrained circuit placement and routing. The benefit of constrained placement is three-fold. First, the robustness of circuit performance toward process parameter variation, device mismatch, and operating conditions is improved. Second, properly selected placement constraints reduce the total routing wire length and parasitic effects. Wire congestion will also decrease; as a consequence, smaller margins can be set between devices and less capacitive coupling occurs between wires. Third, for differential circuit topologies, matched routing paths with balanced parasitic devices require a symmetric circuit placement.
In [20] , the circuit graph is subdivided into hierarchical proximity and symmetry groups and layout constraints are automatically generated. The placement tool of [17] is then used. The same approach will be used here for placement. The tool in [17] is based on a hierarchically bounded enumeration of devices using B*-trees. Possible device layout combinations are explored using enhanced shape functions. The outcome is a Pareto-optimal set of circuit placements that differ by circuit width and length. Fig. 7 shows the Pareto front of placements produced during a run of the placement tool.
The industrial tool [22] is used for circuit routing. The underling routing algorithm is based on simulated annealing. Geometric routing constraints are set to meet DRC rules, as well as specify minimum wire widths, maximum wire lengths, the number of vias and corners allowed along a connection, maximum parallel and tandem wire separation and length, and so on. The maximum resistance and load capacitance of each connection between devices is also specified, as is the maximum coupling capacitance between critical nodes. For symmetrical devices, a preference for symmetrical signal routing paths is registered. Two issues associated with placement are pin assignment and the margins between devices. 1) Pin Assignment: Circuit pin location affects internal routing, as well as the chip floor plan and global routing. Three scenarios can be handled by the flow, one of which must be manually selected by the designer before starting circuit sizing. a) If pin assignment is performed prior to circuit design (top-down design), such as in [23] , then the pin locations are fixed, as shown in Fig. 8(a) . b) If pin assignment is performed externally, but is unknown at circuit design time, then no pin assignment is performed during block layout, as shown in Fig. 8(b) . An algorithm such as [24] is used for pin assignment and to complete both the circuit and the global routing. c) If pin assignment is left to the circuit designer, then pins are placed during routing to ensure routing symmetry, improve signal isolation, or reduce crosstalk. The outcome of pin assignment would be similar to Fig. 8(c) . 2) Device Margins: Minimum margins are specified for each device layout with regard to every other device in the circuit. This is to ensure compliance with DRC rules and to ensure that all device terminals are reachable (unblocked). Minimum margins are illustrated in Fig. 9 with a diagram and a table of device margins for a circuit of three devices. Characters (T , B, L, R) stand for (top, bottom, left, right).
Every combination of devices, (i, j), has four distinct margins: Fig. 9 . A minimum margin table can be generated automatically for any set of circuit device layouts.
Additional components used in the layout that have no equivalent in the circuit topology, such as wells or guard rings, must be added as devices in the table. The minimum device margins may be increased to reduce routing congestion and fulfill all routing constraints, as will be discussed below.
B. Placement Selection
A placement is selected from the Pareto-optimal set of placements based on aspect ratio and area constraints, routing congestion, and the value of layout sensitive performances: 1) Aspect Ratio and Area: In a top-down design methodology, maximum circuit area and the permissible aspect ratio range, are assigned during the initial chip floor planning stage. The placement must conform to aspect-ratio-min ≤ circuit width circuit length ≤ aspect-ratio-max (12) Area ≤ A max .
The enumeration of device layout variants, and the correct application of placement constraints typically means that the placement algorithm will return one or more placements that satisfy (12) . In the example of Fig. 7 , 21 placements satisfy (12) . Maximum area is set as a performance specification-to be sought by the circuit sizing procedure.
2) Congestion Control: If all device terminals are reachable (unblocked), and the routing constraints are reasonable and consistent, then the probable reason for the failure to fulfill all routing constraints is routing congestion [25] . Barring failure, congestion will degrade circuit performances and reduce yield.
Congestion occurs when the minimum margins between devices, illustrated in Fig. 9 , are too small for a placement. There are many methods of congestion estimation [25] , [26] . Here, the routing tool is used to estimate congestion, after which a congested placement can be adjusted.
Congestion estimation: first, the margin between each pair of adjacent devices in the placement is divided into tracks with width α (process dependent and given in μm). Fast global maze routing is then performed considering connectivity, DRC rules, wire length, and spacing only. For each margin, m i , the number of unused tracks is denoted by T i . If there are not enough tracks to complete routing, then m i is congested and T i <0. To add a safety margin, the minimum number of unused tracks is set to a small positive value, for example, T min = 2 and m i is nearly congested if T i < T min .
Placement adjustment: placements with nearly congested margins can be adjusted by increasing m i of each margin tó m i according to (14) , and adjusting the complete placement whilst retaining all placement constraints. Algorithm 3 details the congestion control and placement adjustment steps Placement adjustment incurs an additional cost, therefore adjustment is only performed if fewer than 5 of the Paretooptimal placements meet (12) and are congestion free.
3) Layout Sensitive Performances:
The value of some performances may depend on circuit placement. This is particularly apparent in mismatch sensitive performances, such as power supply rejection ratio (PSRR). In Fig. 10 , the area and PSRR for the 21 placements that fulfill the aspect ratio constraint in Fig. 7 are plotted according to increasing area. The maximum increase in area is 644 μm 2 or 6.5%, while the PSRR differs by a maximum of 6 dB amongst the placements.
Sensitive performances that change by a large amount affect the value of the objective function during optimization. It is important to consider the value of such performances when selecting from a Pareto front of placements such as Fig. 7 . Sensitive performances are considered as follows.
First, a range of points in the Pareto front is selected for which the change in area is relatively small, as given in
Here, A min-Pf is the area of the Pareto-optimal placement of minimum area, subject to aspect ratio and congestion constraints. This would be the area of placement 1 in the example of Fig. 10 . The constant is a small fraction, e.g., = 0.02 (2%). In Fig. 10 , placements 1 to 6 satisfy (15). A cost metric, φ, is then defined as shown in (16), and calculated for the placements that satisfy (15)
(e (wi·(fs,L,i−fs,i)) + e (wi·(fs,i−fs,U,i)) ) Fig. 11 . Placement 4 has the lowest value of φ and is selected.
The complete placement and routing flow is given in Fig. 12 . 
IX. Layout Netlist Extraction
A netlist is extracted from the layout using the commercial tool in [27] . This is completed as done in traditional analog design and without any special models or approximations to speed up the synthesis flow. Circuit devices are extracted from the layout geometry (LVS extraction). Geometry dependent device capacitances, such as CMOS c db and c sb , are extracted at this point and back annotated into the device models. Gate resistance for each MOS device is also calculated. Diffusion resistance is extracted using a 2-D Laplace solver.
The routing paths (interconnects) are partitioned into segments and a resistance is calculated for each segment. Partitions are made at contacts, line intersections, vias, and devices. Long lines are fractured into smaller segments; maximum segment length is 5 μm. For each segment, a lumped parasitic coupling capacitance to each other segment and to the substrate is calculated. An RC network is formed with the segment resistances. A commercial integral equation field solver, RCX-FS, is used to extract coupling capacitance. It is based on the algorithm, Nebula, described in [28] . The error in the value of each extracted capacitor is 3% (with 95% probability).
Parasitic inductance, and the effect of eddy currents in the substrate are not modeled. After the RC network is extracted, it is simplified by series and parallel device combinations and the elimination of dangling and small elements (for example, resistors smaller than 0.01 ). RC model order reduction is performed considering a maximum frequency of 1 GHz.
The number of parasitic resistors and capacitors in the extracted netlist for each circuit example is given in Section XII. Average simulation times for schematic and layout simulation are also tabulated and discussed.
X. Satisfying DC Electrical Constraints by
Limiting Routing Path Resistance As discussed in Section V and illustrated in Fig. 2 , the circuit performances, f, are evaluated by the simulation of the layout extracted netlist. In contrast, the DC electrical constraints, c, are evaluated from the simulation of the schematic netlist. There are three reasons for this distinction in evaluation methods. First, the DC electrical constraints, addressed in [21] , depend on the DC bias point of internal circuit nodes; for example, the constraint to ensure NMOS device M1 is in saturation is given by
where V d , V g , and V th are the DC drain, gate, and threshold voltages of M1, respectively; V margin is a safety margin greater than zero. However, the structure of the layout netlist changes for each value of the circuit design parameters, x d . There is no one-to-one correspondence between the nodes and devices of the circuit schematic and the extracted layout netlist. Second, if any DC constraints have been violated after solving the linearized sub-problem using the GBC algorithm (in Fig. 2) , then a feasibility analysis is performed to find the closest point in the design space that satisfies all the DC constraints. Many more constraint evaluations are needed to complete this operation. In the first example of Section XII, the ratio of constraint to performance evaluations is 4 : 1.
Finally, it is often possible to satisfy the DC constraints for a fixed value of x d by tightening the upper bound placed on routing path resistance during the routing procedure; this is what will be done in the reminder of this section.
From (3), the DC constraints, c, must satisfy c ≥ 0. First, an additional margin is added to this inequality to specifically account for the effect of routing path resistance c ≥ c routing-margin (topology, schematic netlist).
Each device in the schematic netlist is replaced by several devices after placement. For instance, a CMOS device may have multiple fingers (n f >1) and may be divided for common centroid layout (M>1). If a constraint is applied to a device in the schematic netlist, then it should still be satisfied after layout. Therefore, if M1 has three fingers in a layout, then constraint function c sat } for the three fingers. Let R be the vector of routing path resistances between devices in the layout. The value of R may change the circuit DC bias point after layout, and, as a consequence, the DC constraints. To account for this, layout constraints are parametrized by R. Continuing the previous example, the three layout saturation constraints of M1 become {c 
To ensure (20) is true, eitherĉ(R) has to be handled as a nonlinear system of inequality constraints in R by the routing algorithm, or c routing-margin must be increased and R ub decreased till (18) is a sufficient condition for the satisfaction of (20) . These two methods are illustrated in Fig. 13 Instead, here Algorithm 4 will be used: the value of c routing-margin is fixed, while R ub is adjusted to satisfy (20) , as shown in Fig. 13(c) . First, the routing algorithm is called and a routing solution is found for an initial large value of R ub . If (20) is satisfied, then the routing solution is accepted. This is handled in lines /4/ through /6/ of Algorithm 4. If (20) is not satisfied, then R ub is adjusted to satisfy (20) , such that the maximum normalized change in any resistance is minimized. This is formulated as a min-max problem in (21) , and can be rewritten as the minimization in line /8/ of Algorithm 4
The minimization in line /8/ can be calculated efficiently using sequential linear programming, as the residual of a linear approximation toĉ(R) in the neighborhood of R (0) ub is small in practice. The routing algorithm is then recalled with R ≤ R (1) ub , as is shown in line /9/ of Algorithm 4.
If the routing algorithm fails to find a routing solution (algorithm line /11/), then c routing-margin must be increased and the feasibility analysis repeated to find a new value of x d for which the layout is easier to route and satisfy (20) .
XI. Area Estimation Without Layout Synthesis
Given a vector of circuit design parameters, x d , there is a margin of error between the circuit area estimated before layout synthesis and the actual area as measured after synthesis.
Each device in the circuit has several valid layouts for the same value of device design parameters, as explained in Sections VI and VII. Actual area utilization (layout compactness) depends on the placement constraints, acceptable layout aspect ratio, minimum device margins, and routing congestion. As explained in Section VIII-B, some performances are sensitive to placement and routing effects, and the layout selected as best will be the one that minimizes the cost metric in (16)-increasing the margin of error in estimation. As a result, the difference between the upper and lower area estimates and the actual area after layout synthesis can be large.
The maximum allowed circuit area can be a critical specification in analog circuit sizing, such that the direction taken by the optimization algorithm in the design space will be affected by the area estimate. The outcome of optimization could be sub-optimal due to the error in area estimation.
In order to perform an honest comparison between traditional circuit sizing and layout-driven circuit sizing, as many considerations as possible were taken into account in area estimation. These considerations are abbreviated in Table IV. A minimum and maximum circuit area is calculated from all possible combinations of individual device layouts and the minimum and maximum allowed circuit aspect ratio. Area utilization is estimated by an upper and lower value by running the placement algorithm for the initial design point.
XII. Circuit Design Examples
A. CMOS Operational Amplifier
The first example is an amplifier consisting of 19 CMOS devices, as shown in Fig. 14 .
Each device has two design parameters, x dd = [W tot , L tot ] for 38 in total. The bias current, i bias is an additional design parameter. Also revealed in Fig. 14 are the functional circuit blocks. From these blocks, electrical inequality and geometrical equality constraints are specified, as explained in [21] . The device tuples (P8, P3, P6, N2, N4, N7) and (P7, P4, P5, N3, N5, N6) constitute the two branches of a differential signal path; each pair of corresponding devices must be matched-adding to the device constraints, e.g., x Hierarchical placement constraints are defined as described in [20] ; they are overlaid on the topology in Fig. 15 . Imposed are 14 proximity, 6 common centroid, and 3 symmetry constraints. The minimum margins between each pair of devices, illustrated in Fig. 9 , constitute additional placement constraints. Devices N2 to N7, and P3 to P8 are placed in grounded guard rings. Bulk taps are used to ground the remaining devices.
Two metal layers are used for routing. The geometric routing constraints, mentioned in Section VIII, are specified. Resistance and load capacitance is matched along the two branches of the differential signal path. The number of routing path resistances between connected devices (|R| for 0 ≤ R ≤ R ub ) is dependent on the divider value, M, of each common centroid pair in the circuit and is determined after placement. An example of an operational amplifier layout produced by the layout synthesis flow is shown in Fig. 16 .
An initial feasible point, x d-initial , is selected in the design space. The point was found using the traditional analog design flow of Fig. 1(a) with relaxed performance specifications. The traditional analog design flow of Fig. 1(a) , and the flow with layout-driven circuit sizing of Fig. 1(b) are then carried out and compared with tighter specifications. The results of optimization are given in Table V . Values that fail to meet a specification are gray shaded.
It is difficult to estimate circuit area before layout synthesis. Each device can have several valid layouts, as explained in Sections VI and VII. Area utilization also depends on the placement constraints and routing congestion. As a result, the difference between the upper and lower area estimates and the actual circuit area can be large. Fig. 17 compares the estimated and actual layout area during the progression of traditional circuit sizing. Traditional circuit sizing of the operational amplifier took three iterations. For each iteration, the area is plotted for the design parameter vectors used to determine the next step by GBC. They are labeled with alphabetical letters. The initial feasible point x d-initial is labeled "initial" in the graph.
The upper, lower, and average area estimate is graphed for each design point. The area of the placements generated by the constrained placement algorithm of Section VIII are also graphed for each design point. Finally, the layout selected for each design point by using the layout synthesis flow of this paper is graphed. It is clear from Fig. 17 that the area estimate was pessimistic in traditional sizing (average estimated area is greater than actual circuit area at each design point) however, this bias was not completely clear at the initial feasible point x d-initial , nor would it be possible to ascertain without performing each layout synthesis and plotting the trend for the actual layout area.
As explained in Section VIII-B3, some performances, such as PSRR, are sensitive to circuit placement and routing. For the operational amplifier, the difference between the value of PSRR-vss (ground node to output) obtained from the schematic netlist and post layout is 10 dB. Similar to inaccurate area estimation, this reduces the usefulness of the A layout is synthesized for the result of traditional circuit sizing. † For traditional circuit sizing, the average value of the upper and lower area estimates is used, as explained in Section XI and plotted in Fig. 17 . Unless otherwise labeled, cost is given by the CPU time needed for each task.
results of traditional circuit sizing, as performance values are overestimated or underestimated. In attempting to fulfill the area specification with a pessimistic estimate during optimization, the value of the other performances with a hard-to-meet specification; namely UGBW, PSRR-vdd, SR-rising, settling time, and THD; suffered and the algorithm converged on a sub-optimal solution in the design space. With layout-driven sizing, the actual area is calculated after layout synthesis. It was easy to fulfill the specifications of this circuit example using optimization. Only PSRR-vss fell short of the set specification by 2 dB. Fig. 18 . B*-trees used during layout-driven sizing, there are 13 nodes, devices in common centroid configuration are represented by a single node. A breakdown of sizing cost is given in Table VI . Simulations were performed on an 8-core Xenon workstation with 12 GB of ram. Cost is given by the CPU time on a single core needed for completing each task. Layout synthesis, with a mean CPU time of 93.18 s, constitutes 76% of the cost of the layout-driven flow. On average, each extracted layout netlist contained 176 parasitic resistors and 816 parasitic capacitors. The slew rate and settling time testbenches depend on a transient analysis and constitute most of the remaining cost. The mean cost of a single optimization iteration is 672 s and 3214 s, respectively, for traditional and layout-driven sizing; their ratio is 1:4.8. Layout-driven sizing takes more iterations to converge (5 in contrast to 3), therefore the complete layoutdriven flow took eight times the CPU time of the traditional flow. Parallelism in optimization steps was exploited. The determination of the search direction requires the synthesis of |x d |−1 = 22 layouts (the last design parameter in bias current), which can be performed independently and in parallel. Further parallelism can be exploited when performing the sub tasks of the layout synthesis flow. On the 8-core workstation and with a limited number of commercial software licenses, traditional circuit sizing took 0.35 h of wall clock time, while layoutdriven sizing took 2.1 h; their ratio is 1:6.
In order to compare results to the template-based methods in the state of the art, the B*-tree of placements generated by the layout synthesis flow is recorded in Fig. 18 . For a circuit placement, the B*-tree records the relative location of each device. The device in the lower left corner of the placement is represented by the root tree node, while the remaining devices are represented by the children of the root node as explained in [29] . The relative location of each device is fixed in the template-based methods and does not change during circuit sizing, therefore the corresponding B*-tree structure is also fixed. For the amplifier example, placements corresponding to six different B*-tree structures were used during optimization. Furthermore, the initial and final placement (after optimization) correspond to different B*-trees as given in Table VII . As a conclusion, any template-based method would not be able to find the final solution found by the method of this paper.
The flexibility to select the divider, M, of devices in common centroid configuration in the layout synthesis flow according to Algorithm 2 was utilized for the amplifier circuit; Fig. 19 . Tunable OTA topology, the operational amplifier of Section XII-A (excluding the bias circuit) is used for A1 and A2. the number of divisions changed for some common centroid pairs before and after circuit sizing, as given in Table VII .
B. Tunable Operational Transconductance Amplifier (TOTA)
The second example is the tunable operational transconductance amplifier (TOTA) consisting of 52 CMOS devices illustrated in Fig. 19 . It is based on the circuit in [30] . The operational amplifier of Section XII-A is used for A1 and A2.
The number of circuit design parameters is reduced to 42 after the application of geometric equality constraints and the matching of devices on the differential signal paths. In addition to device dimensions, the parameters include a bias current, a common-mode potential, and a minimum tuning current value.
Defined are 154 electrical inequality constraints-extracted as explained in [21] . Devices N3, N6, and P5 operate in the triode region, while devices P8 to P13 can operate from weak inversion to saturation. All remaining devices operate in the saturation region. Additional DC electrical constraints are added to ensure sufficiently high gain and low IOV for A1 and A2, as well as a sufficiently high gain for the common mode feedback (CMFB) circuit.
The main, bias, and CMFB circuits, as well as A1 and A2 are placed concurrently. In addition to the placement constraints of Section XII-A, corresponding devices in A1 and A2 are matched. Once more, the setup of placement constraints was guided by the work in [20] . In order to ensure electrical matching after placement and to route symmetrically, all matched devices must be placed symmetrically along the circuit axis or in common centroid, and in close layout proximity. In all, 14 symmetry constraints, 33 proximity constraints, and 7 common centroid constraints are imposed on the placement. The NMOS devices are also placed within a grounded guard ring. Routing constraints are specified in a similar manner to the first circuit example.
To be used in wide tuning range filter design, the TOTA is specified to have a max/min transconductance ratio of 40, with the condition that the minimum be greater than 1 μA/V, i.e., Gm-min ≥ 1 μA/V and Gm-max ≥ 40×Gm-min. According to [30] , there is a tradeoff between OTA noise, linearity, and power consumption. A linearity measure is defined as the percentage change in the value of Gm when the differential input voltage changes from 0 to 0.75 V Linearity measure = 100
From [31] , the input referred noise power spectrum density of an OTA is given by A layout is synthesized for the result of traditional circuit sizing. † For traditional circuit sizing, the average value of the upper and lower area estimates is used, as explained in Section XI and plotted in Fig. 17 . Unless otherwise labeled, cost is given by the CPU time needed for each task.
The thermal, S t , and flicker, S f , components depend on the technology and OTA topology. As flicker noise was a considerable problem for this OTA, a specification is set on RMS input referred noise (IRN) from 1k to 500 kHz. In filter design, the finite DC gain of the OTA will result in a passband loss (instead of an ideal 0 dB), therefore a minimum OTA gain is specified. A minimum bandwidth for the unloaded TOTA is also specified at Gm-min and 40×Gm-min. The DC commonmode operating point of the outputs should follow the desired value (denoted by V cm ); a specification is set to assure this. Finally, a specification is set on maximum circuit area. The results of optimization are given in Table VIII . For the OTA, the area estimated in traditional sizing was optimistic (average estimated area is smaller than actual circuit area at each design point). The estimate of 9793 μm 2 was close to the specification value of 10 000 μm 2 , and the actual circuit area after layout synthesis increased to 10 666 μm 2 -passing above the specification limit. The RMS IRN and bandwidth performances were sensitive to circuit placement and routing, while the linearity measure was systematically higher after layout synthesis. These three performances benefited from the use of layout-driven sizing.
For this example, the output common-mode level (V out −V cm ) was very sensitive to routing symmetry along the commonmode feedback loop, and was typically much higher after layout synthesis. For this reason, the value of the output common-mode level for the result of traditional sizing jumped from 0.79 mV to 2.22 mV after layout synthesis. Furthermore, it was hard to keep this performance within specification during layout-driven circuit sizing. The result of layout-driven sizing, 1.20 mV, is a great improvement on what was possible with traditional sizing. With the insight gained by the results of layout-driven sizing, it can be said that the specification |V out −V cm | ≤ 1 mV is in a range too small to realize and retain after layout. An example TOTA layout is shown in Fig. 20 .
Optimization costs are given in Table IX . On average, each extracted layout netlist contained 327 parasitic resistors and 1678 parasitic capacitors. For this larger example, layout synthesis took a mean time of 145.3 s and constitutes 85% of the cost of the layout-driven flow. The mean cost of a single optimization iteration is 791 s and 7980 s, respectively, for traditional and layout-driven sizing; their ratio is 1:10. Layoutdriven sizing progress terminated after five iterations, while traditional sizing edged on with small improvements in the performances for ten iterations. Therefore, for this particular example and specification values, the complete layout-driven flow took five times the CPU time of the traditional flow.
XIII. Conclusion
In this paper, a flow was described for automatic layout synthesis. Designer knowledge is supplied by geometric circuit placement and routing constraints, as well as geometric and DC electrical constraints for individual devices. A nonslicing placement algorithm was used to find the set of circuit placement possibilities. Algorithms were presented to estimate and adjust for routing congestion in placements, and to keep DC constraints from being violated due to routing resistance. Parasitic coupling capacitance is extracted directly by an integral equation field solver without any modeling or approximation. Numerical (SPICE) simulation is used to evaluate performances from the extracted netlist. Circuit area and aspect ratio, as well as performance sensitivity to the layout parasitics are considered when selecting a final layout. This flow was combined with a deterministic nonlinear optimization algorithm to perform layout-driven circuit sizing. Sizing is completed within only a few iterations; hasty optimization compensates for the cost of layout synthesis.
A tunable OTA consisting of 52 devices, 42 design parameters, and complex layout symmetry requirements was successfully sized. Since the complexity of the proposed sizing method increases with the number of devices and placement and routing constraints (for successful layout synthesis), and with the number of circuit design parameters (for optimization cost), it is suggested that proposed method can be applied to new circuits of similar size at a similar cost in time.
