Introduction
Several types of parameters Ô ´Ü × µ influence the behaviour of electronic circuits and have to be taken into account when optimizing appropriate performance functions ´Ôµ: design parameters Ü, manufacturing process parameters ×, and operating parameters .
For optimizing one wants to minimise a performance function ´Ôµ while also several constraints have to be satisfied. The performance function ´Ôµ and the constraint functions ´Ôµ can be costly to evaluate and are subject to noise (for instance due to numerical integration effects). For both, the dependency on Ô can be highly nonlinear. In circuit simulation, sensitivities of ´Ôµ and ´Ôµ with respect to Ô are not always provided (several model libraries do not yet support the calculation of sensitivities). When the number of parameters increases adjoint sensitivity methods become of interest. For transient integration of linear circuits this is described in [3] . Recently, in [9] a more general procedure is described that also applies to nonlinear circuits and retains efficiency by exploiting (nonlinear) techniques from Model Order Reduction. In this paper we will describe our in-house developed method for optimization and our experiences with it. We restrict ourselves to so-called derivative free methods, so we will not require sensitivities of ´Ôµ and ´Ôµ with respect to Ô from the circuit simulator. Also some new directions for further research will be described.
Constrained optimization by augmented Lagrangian
In this section we restrict our parameters Ô to the design variables Ü, which can be geometrical quantities like transistor width Ï and length Ä. The designer can adjust them during the process of optimizing the performance of an actual design.
The performance functions ´Üµ typically consist of one or more circuit characteristics that the designer would like to maximize or minimize. Examples are: maximization of gain or bandwidth, and mimization of power dissipation or area.
The constraints ´Ôµ are also related to circuit performance but these functions have an explicit target. Examples are: bandwidth 800MHz, and 49% duty cyle 51%. Formulating a proper set of performance functions and constraints is a non-trivial task. In practice, this requires trial-and-error and a fair amount of tuning. Obtaining the value of a performance function or constraint is done via one or more circuit analyses of the same or different types, e.g. DC, AC, Transient, Noise. Typically, the time required to perform an analysis is a limiting factor in the overall optimization approach. The search for the optimal values of the optimization variables (OVs) Ü can be formulated as a nonlinear constrained optimization problem in Ò variables with Ñ constraints,
where Ü denotes the -th OV. With Ü we denote the point where the minimum occurs. The values of the objective function ´Üµ and the constraining functions ´Üµ are obtained from circuit simulation. The performance and stability of the optimization algorithm is affected by the scaling of the OVs, of ´Üµ and of the ´Üµ [7] .
The Nelder-Mead algorithm can be used to minimizë´Ü
where « Ñ Ü´« ¼µ. If the minimum is taken at Ü , one has Ð Ñ ½ Ü Ü , which means that the minimization of (2) becomes more and more illconditioned. Apart from that, the Nelder-Mead algorithm, which is simple to program, is not that easy to analyse [21] . A similar conditioning problem happens when dealing with logarithmic barrier functions that provide an impassible barrier at the boundary of the feasible region [21] .
Recently pattern search methods have been studied [11, 12] . The most well-known method of this class is the method of Hooke-Jeeves [8] . The Nelder-Mead method does not belong to that class (it can also beat pattern search methods). The pattern search methods are nice in the sense that they prove rather weak conditions under which the methods do converge, f.i. for the unconstrained problem ¾ ½ . By introducing slack variables × ¼, the augmented Lagrangian penalty function can be written as [18] ,
in which Ä is the standard Lagrangian. The parameters and are Lagrange multipliers and penalty factors, respectively. Pattern search methods for (3) converge when ¾ ¾ [13] . Minimization (4) over the slack variables × at optimal value × Ñ Ü ´Üµ · ¾ ¼ yields the simplified merit function that is used in [7] ,
where ©´Ý µ Ñ Ü Ý ℄ and ©´Ý µ Ñ Ü Ý ¼℄ ¾ ¾
The optimal point´Ü µ becomes a stationary point of Ä and satisfies the Karush-Kuhn-Tucker (KKT) conditions,
Note that in (6) ¼ ©´ ´Ü µ ¾ µ ¼ and ©´ ´Ü µ ¾ µ ¼. The KKT conditions imply that the projected gradient È´Ö Ü¨ Ä ´Ü µµ ¼,
The reduction in dimension has been paid back by loss of smoothness due to the 'max' function. However, any algorithm can check for ´¾ µ and in general this does not occur at a KKT point [4] . Thus, when locally enough started, pattern search methods will also converge for (6 
The basic method used to solve the optimization problem is the Method of Multipliers (Algorithm 2.1). In step 3 a trust region approach is applied on a re- sponse surface model around Ü´ ½µ to solve the minimization problem (for details see [7] ). In step 4 we keep ´ µ Ñ Ü . This is based on the basic observation made above: in the end we have trust that only the ´ µ should be able to do the work. Note that in step 5 linear convergence of the ´ µ can be improved by applying vector extrapolation techniques. The overall method is derivative free (no gradients of and of are required). Also the method is not sensitive to noise on and of . Convergence of a related (but simpler, because no grid is used) algorithm is discussed in [18] , assuming that ¾ ¾ and that one assures ´Üµ ´¾ µ: error control is on È´Ö Ü¨ Ä ´Ü µµ and on ©´ ´Üµ ¾ µ .
In our case [7] the whole approach is applied on a grid (i.e. all Ü´ µ are projected to the nearest grid point), that subsequently is refined, like in [5] . This prevents the algorithm from going too fast to a small scale (and then gets stuck in a local minimum). However, in practice equally appreciated, is that it also allows to store and re-use expensive parts of ´Üµ and of ´Üµ during the re-building of the response surface model in the optimization process, when or are updated, and also when refining the grid. Our trust region error control is described in [7] . Because we only have approximative gradients from the response surface model approximation, for final convergence, error control is based on ´Ü´ µ µ ´Ü´ ½µ µ ´½ · ´Ü´ µ µ µ (9) Ü´ µ Ü´ ½µ Ü´½ · Ü´ µ µ
The parameters Ü are decreased when refining the grid. At initialization we apply a Uniform Design approach which is based on number theory. Actually this approach is problem independent. Here improvements can be expected by introducing additional techniques, like pattern search methods [11, 12] , that add more information from the problem itself to improve initial starts for the above method. Also Kriging techniques using the DACE regression model [2, 10, 15] can be used in this phase, which additionally offer a more global optimization aspect, when also applied later in the process. [10, 20] maximize the expected improvement for ´ N( × ¾ ). However, experience shows that only a small reduction in the number of overall function evaluations was obtained when including this approach in the start-up phase of our simulator. In this case all parameters were involved. Note that the method becomes expensive when the number of parameters increase. It may make sense to limit the Kriging approach to the parameters for which the augmented Lagrangian varies only moderately.
Discrete optimization
Considerable interest has been shown for discrete optimization methods since continuous optimization methods were well established in the late 1980s. In general, discrete optimization methods are aimed at solving the so-called mixed-discrete nonlinear programming problem, where the term mixed-discrete indicates that both discrete and continuous design variables are present. Current discrete optimization methods can be classified as either deterministic or probabilistic [1] . Probabilistic methods have been applied to solve engineering optimization problems for a long time. The most well known probabilistic methods for both continuous and discrete optimization are Genetic Algorithms [17] and Simulated Annealing [22] . One major advantage of probabilistic methods is their ability to directly deal with discrete design variables. However, these methods are extremely expensive (because of a large number of function evaluations), and thus impractical for the optimization of electronic circuits. Various approaches have been developed to apply deterministic methods for discrete and mixed-discrete optimization problems [6, 19] . The simplest and least expensive method for obtaining a discrete solution is by rounding the continuous solution to the value of the closest discrete values. However, this rounding process can easily result in stagnation and convergence problems since extra local minimas are introduced. Therefore, the applicability of the rounding methods may be limited. The branch and bound method is probably the best known and most frequently used discrete optimization method. The method is based on the sequential analysis of a discrete tree for each variable, the computational cost grows exponentially with the number of design variables. Therefore, the method is also extremely expensive. In our in-house developed method, a dynamic rounding approach was adopted. Design variables are rounded to the value of the closest discrete values at each evaluation step. The approach has the advantage that no extra local minimums are generated. Stagnation problems are avoided by the use of a component-wise trust region method when solving the quadratic problems generated by the algorithm. The component-wise trust region method allows us to select the space where the new design variable candidate should be stamped by forcing the trust radius of each discrete design variable to be larger than a desired value.
Toward Robust Design
The additional types of parameters, the manufacturing process parameters × and operating parameters , introduce additional requests.
The first ones, ×, reflect statistical process fluctuations, like Ì ÓÜ (oxide thickness), and Î Ø ¼ (threshold voltage), substrate doping, characterized by their distribution functions.
The operating parameters include supply voltage Î ×ÙÔ and temperature Ì . Here additional constraints are found that require ´Ü µ ¼ should hold for all within some interval.
The operating parameters imply simple additional inequality constraints that fit the already defined framework. The dependency with respect to ususally is quite smoothly. An interesting test function to consider the effect of operating 7 parameters is
When starting at´Ü ¼µ a local maximum at´Ü ¼µ is easily
found. An even higher value of may be preferred here as a more robust result with respect to when one likes to satisfy ´Ü µ ¼. Here Kriging techniques were of help to improve the result. Note that this is a particular form of robust design. The problem can be simplified by restricting to a discrete set ¾ ¢. In an outer loop the bounds on the constraints can be adaptively updated. New values of can be selected based on the Kriging approach to determine next interpolation points.
Most statistical parameters × appear as transistor model parameters and cannot be modified by the designer if a fully qualified production process is assumed. However, production variation must be taken into account when designing a circuit. For robust design one can think to optimize a combination of and of its (higher) derivatives (À ½ -norm optimization, say), which requires the need to evaluate these derivatives (sensitivities). Also spreads of several 's have to be taken into account. The occurence of process parameters × also makes the functions and In the worst-case analysis in [16] first linearization with respect to is done, and a Ï is derived. Next, this value is used in the linearization with respect to Ü, and a Ü Ï is determined. This process is iteratively repeated in a Gauss-Seidel-like manner. For several problems this algorithm gives reasonable results and yield estimations (as is confirmed by our own experiments). However, the algorithm appears not to be robust on the above constraint function , which gives way for further research. Also, because of linearizations, normality of distributions is an essential ingredient. As IC technologies scale down to finer feature sizes, it becomes increasingly difficult to control the relative process variations, especially when introducing new technologies. Hence large variations can be observed. Furthermore, these variations are different for each chip fabrication location. Depending on the physical design properties of the circuit, the designer can introduce additional correlations to the given distributions, targeting improved circuit performance. Note that with the introduction of the × parameters, one must add additional constraints to the problem formulation. These are called yield constraints and their targets are expressed in terms of probabilistic quantities, typically the number of standard deviations from the mean value. With fundamentally increasing variation magnitudes, non-normality of performance distributions must be taken into account. The APEX algorithm [14] addresses this point. A response surface model of is built that is quadratic in the process variations. The method relies (yet) on the explicit calculation of the stochastic moments of and on AWE-techniques (with their drawbacks) to efficiently approximate the transfer function. Here more robust model order reduction techniques may be applied. A design that is capable of achieving the performance targets under the given constraints, across all so-called environmental corner conditions can be called robust. Robustness of a design starts to play an increasingly important role in modern circuit design, as design margins are decreasing and process variability is increasing. The use of traditional nominal-design optimization techniques is falling short because they tend to give seemingly good results, but these results collapse under process variations. Therefore, automation of the robust design challenge is in high demand.
A Robust Design Example
We now describe a simplified but realistic example of robust design. The purpose of this example is to get a better idea of the intricacies and challenges of optimization applied to circuit design. The design is a high-frequency divide-by-two circuit fabricated in a state-of-theart Complementary Metal-Oxide Semiconductor (CMOS) technology. The primary target of the optimization job is to increase the maximum input frequency at which the circuit still correctly divides by two, i.e. the output frequency is half the input frequency across all specified environmental conditions. Denoting with ´Úµ freq´Úµ, the frequency of a (scalar) signal Ú Ú´Üµ, that depends on parameters Ü , the problem can in terms of (1) be formulated as follows:
Ñ Ò Ñ Þ ´Ú Ò´Ü µµ (12) ×Ù Ø ØÓ ´Ú ÓÙØ´Ü µµ ´Ú Ò´Ü µµ ¼ ¼
In this formulation, the Ü are coupled to the widths of selected transistors in the divider circuit, where typically the number of selected transistors is at least ¾Ò to enforce so-called "matched pairs" of transistors. The lengths of the transistors are kept fixed at a reasonably small value guided by design experience.
The first practical issue that comes into play is how to "measure" the frequency ´Úµ. As the circuit exhibits a non-linear large-signal behavior, we need to use a transient analysis to measure its response. This calls for a robust measurement expression that always gives a meaningful result, even if the simulation does not converge. Devising robust and accurate simulator expressions is a non-trivial task that requires design and simulation tool knowledge. Because we also want to have a robust solution that is inherently insensitive to process variations, we typically resort to using the well-known Monte Carlo (MC) analysis (placed as a loop around the former transient analysis). To limit the overall simulation and optimization time, a limit of ten MC trials per circuit evaluation with a fixed set of Ô values was enforced. Ten MC trials are not sufficient to guarantee so-called sign-off accuracy, but it appears to be sufficient for typical optimization purposes. Another important aspect of applied optimization is the fact that the original objective(s) and constraints typically need to be transformed and tuned into a set of new equations that give more directional information to the optimization algorithm; if this process is done well, most algorithms will usually give better results in fewer iterations. This is also the case for this design example. We add two more constraints to prevent getting an output voltage swing that is too small to be properly detectable by the succeeding circuit. Design-specific knowledge and feedback from the circuit behavior during initial optimization runs is normally used to refine the optimization setup further. The complete optimization problem is:
Ñ Ò Ñ Þ ´Ú Ò´Ü × µµ (13) ×Ù Ø ØÓ ´Ú ÓÙØ´Ü × µµ ´Ú Ò´Ü × µµ ¼ (14) ´Ú ÓÙØ´Ü × µµ ´Ú Ò´Ü × µµ ¼ (15) Ñ Ò Ø´ÚÓÙØ´Ü × µ´Øµµ ¼ (16) Ñ Ü Ø´ÚÓÙØ´Ü × µ´Øµµ ¼ ½ (17) Ü ½ Ò Solving this problem with the optimization algorithm described in this paper required about 200 iterations, depending on the stopping criterion and other tuning parameters. In each iteration the algorithm evaluates the circuit performance by running a 10-trial MC simulation in which a transient analysis is embedded. The overall throughput time is the number of iterations times the time required for a full circuit evaluation; approximately 15 hours on a contemporary Linux system with a single CPU. From this information one can directly derive the average CPU time per 10-trial MC simulation: about 5 minutes. Note also that the time required for an MC simulation can become prohibitive quite quickly. This has motivated recent research in this area with the goal of finding alternative methods to efficiently simulate performance variability under process parameter variability, e.g. [14] . Table 1 shows the optimization results in tabular format. The information on the optimization variables Ü is given in Table 2 . To illustrate the increase in robust- ness of the divider circuit we show the results of a 100-trial MC simulation before and after optimization. Fig. 1(a) shows the unoptimized circuit performance at the maximized input frequency of 17.1GHz (from 10.5GHz). All graphs represent histograms; in typewriter order the graphs show bin count versus output frequency, safeguarded frequency ratio (with voltage thresholds), raw frequency ratio, minimum "high" output voltage, maximum "low" output voltage, and average power dissipation. Fig. 1(b) shows the performance of the divider circuit after optimization. From these plots we can draw the following conclusions:
The performance of the divider circuit has been improved by increasing the maximum input frequency by more than 60%, with high robustness;
Statistical he optimized divider circuit can operate equally well at a maximized input frequency of 17.1GHz, with only a relatively small increase in power dissipation compared to 10.5GHz operation; Attaining increased robustness did not deteriorate any other metric.
