Abstract: This study proposes a gradient-free approach to integrated circuit sizing that takes into account the statistical variations of device parameters and ranges of operating conditions. A novel gradient-free algorithm for solving the worst-case performance problem is proposed. The proposed algorithm produces corners that are used in the optimisation loop of the circuit sizing process. The set of corners is dynamically updated during circuit sizing. The number of corners is kept low by considering only corners that are sufficiently unique. The final result is a circuit exhibiting the specified parametric yield. The proposed approach was tested on several circuits and the results were verified with Monte -Carlo analysis and worst-case distances. All resulting circuits were obtained in up to 12 h on a single processor and exhibited the specified yield. The method can easily be parallelised to an extent that can bring the runtime of the method in the range of an hour or less.
Introduction
The importance of design automation in the area of integrated circuit (IC) design is constantly increasing. The design automation for analogue circuits is still not developed to the extent it has reached in the area of digital circuits. Therefore new methods are needed to remove the bottleneck in the design of analogue and mixed signal circuits. One of the essential tasks in this area is obtaining IC designs with high parametric yield (in the remainder of the paper also referred to as the yield), that is, designs that result in a high percentage of manufactured circuits satisfying all design requirements. Parametric yield does not take into account catastrophic faults. This paper considers circuit sizing, that is, it assumes that the circuit topology is chosen in advance using designer's experience. By adjusting the design parameters of a preselected topology one attempts to achieve a satisfactory yield. The main cause for small yield are the random variations of the electrical properties of identically designed devices. The most problematic of these variations is the mismatch [1 -3] .
Several analogue design automation approaches were suggested in the past. Some of them dealt only with nominal design and neglected the issue of yield [4, 5] . More advanced techniques also automated the design of circuits with high yield [6 -12] . Older approaches assumed that statistical variations are applied to design parameters only [6] which is not the case for ICs. Methods that use response surface modelling (e.g. [8, 10] ) suffer from model accuracy problems. Often a linear model is used (e.g. [10] ) which is inadequate for describing the behaviour of many important circuit properties (like mismatch-caused offset voltage). Performance models are almost always local by nature and cannot be used with global optimisation methods without frequently rebuilding the model.
The prerequisite for a successful yield optimisation is a reliable and accurate worst-case analysis, either in the form of worst-case performance (e.g. [9] ) or worst-case distance (WCD) (e.g. [7, 10] ). The main shortcoming of approaches based on [7, 11, 12] is that the worst-case analysis must be performed for every iteration of the yield optimisation process. Owing to the complexity of the worst-case analysis, this leaves little space for anything else but a local optimisation method for sizing the circuit (usually a trust region method with a linear model like in [11, 12] ). Modelbased approaches do not analyse real circuits in the course of optimisation and so the final optimisation result must be verified with a circuit analysis to make sure it is correct.
The worst case can be obtained using global optimisation methods (e.g. [9] ). This approach is formally the most correct one. Unfortunately, global optimisation is computationally much more intensive than local optimisation. The results published in [9] show that the global approach produces a circuit with a relatively low yield (around 50%). This circuit is then used as a good starting point for a local yield optimisation approach like the one found in [10] . Owing to the method that is used simulated annealing (SA), [9] is a very time-consuming solution.
Gradient-based local algorithms are often used for solving the worst-case problem (e.g. [7, 11] ) despite the fact that most simulators are incapable of calculating sensitivities. Such approaches require the numerical evaluation of performance gradients from circuit performances which are often subject to a significant numerical error introduced by the simulator. Results on real-world circuits (e.g. [11 -13] ) have shown that a local worst-case evaluation method combined with sizing rules [14] and a local yield optimisation algorithm [15] produces good results despite the local nature of the underlying optimisation algorithms. This was explained with the fact that most circuit performances are only weakly non-linear in regions where sizing rules are satisfied.
An approach that requires no gradients or costly global methods for finding the worst-case performance is proposed in this paper. Response surface modelling is not used by the approach and so problems with model accuracy become irrelevant. The algorithm for finding the worst-case performance belongs to the family of direct search methods [16] . The proposed algorithm for finding the worst-case performance can be efficiently included in a yield optimisation process in a way similar to the one used in [9] . The resulting yield optimisation approach is tested on several IC cells and produces circuits with high yield (99%). In the process of yield optimisation a local or a global optimisation method can be used for sizing the circuit.
The remainder of the paper is structured as follows: Section 2 introduces the basic notions. Section 3 highlights the relation between WCDs [7] and yield whereas Section 4 establishes the relation between yield and worst-case performance. Sections 5 and 6 propose an approach to worst-case performance evaluation and apply it to yield optimisation. The proposed approach is evaluated on several test problems and verified in Section 7. Section 8 concludes the paper. 
Circuit performance and yield
The performance of an analogue IC can be described as a set of performance measures (e.g. current consumption, gain, phase margin, delay, etc.). In real-world circuits, these values depend on three kinds of parameters [17] : design parameters (x D ), range parameters (x R ) and statistical parameters (x S ). A performance measure f i is actually a function f i (x D , x R , x S ). Let n D , n R and n S denote the number of design, range and statistical parameters, respectively.
The circuit's performance is satisfactory if all its m performance measures satisfy their respective design requirements. A design requirement is of the form f i ≤ F i or f i ≥ F i where F i is the ith target value. For the sake of simplicity only f i ≥ F i is considered. f i ≤ F i can be transformed into f i ≥ F i by replacing f i and F i with −f i and −F i .
Range parameter values come from a bounded set R which can be expressed as x R [ R. Some of the most common range parameters that describe the operating conditions of a circuit are temperature, supply voltage, bias current, etc. Usually R is a cross-product of n R intervals where lowerand upper-interval bounds are specified by components of vectors L and H, respectively. Every interval corresponds to one range parameter. Nominal range parameter values are denoted by x N R . Statistical parameters (x S ) model the variations of the manufacturing process. They are usually described by a joint probability distribution function (JPDF) and generally come from a set that is not bounded. A very common JPDF is the multivariate normal distribution
where
S is the vector of mean statistical parameter values (also referred to as the nominal statistical parameters) and C denotes the covariance matrix. If the JPDF describing x S is continuous but not normal, it can be made normal by applying an appropriate transformation to x S . For modern ICs this transformation depends on the design parameters x D . One can always transform such JPDFs into a normal distribution with x N S = 0 and C ¼ I where I is the identity matrix. Therefore without loss of generality the latter distribution is assumed throughout the remainder of this paper. A yield partition is defined as
where h i (x D , x S ) is the indicator function of the ith performance measure (i.e. it is 1 if
. The job of a circuit designer is to choose the design parameters (x D ) in such manner that the circuit's total yield will be as large as possible.
Yield estimation and WCDs
The absolute value of the WCD b
is defined as the shortest distance between the origin (x N S ) and the contour C defined as min [7] . C represents the boundary of the acceptance region corresponding to Y i (shaded in dark grey in Fig. 1 ) which is the part of the space of statistical parameters for which the indicator function h i (x D , x S ) is equal to 1. The WCD is positive if
≥ F i holds and negative otherwise. If the acceptance region boundary is approximated with a halfspace tangent to C at the point closest to the origin, the yield partition suffers an error (acceptance region approximation error) denoted by the hashed region in Fig. 1 . This error proves to be reasonably small in IC design [17] . A yield partition can be approximated as [17] 
As all yield partitions Y i approach 1 the total yield Y also approaches 1. The yield can therefore be maximised by maximising the smallest yield partition which is equivalent to maximising the smallest WCD. Maximisation of WCDs pushes the boundaries of the acceptance region away from the origin. Owing to the nature of the underlying optimisation problem, the computation of WCD requires a gradient-based modified SQP (sequential quadratic programming) algorithm [7, 17, 18] .
4 Worst-case performance and its relation to yield
Instead of maximising the smallest WCD, let us require that WCDs be greater than some b WPM . 0. This is equivalent to requiring that the worst performance inside the sphere b ≤ b WPM satisfies the design requirements. To optimise the yield by improving worst performance measures (WPMs) an algorithm for calculating the worst-case performance is needed. Finding the WPM is a much simpler problem than the computation of the WCD. The WPM problem can be stated as
Ensuring that the worst-case performance inside the sphere b ≤ b WPM satisfies the design requirements is equivalent to ensuring that the WCD is greater than b WPM . This implies (within acceptance region approximation error) that the yield partition satisfies 
Solving the worst-case performance problem
Because all constraints in the WPM problem are simple (e.g. box constraints on the range parameters and a spherical constraint on the statistical parameters), it can be solved by adapting an unconstrained direct search method to the nature of these constraints. The proposed algorithm draws its inspiration from the Hooke-Jeeves method [19] . The method uses two types of steps. Trial steps search for a lower value of the function f in the neighbourhood of the current point. If a better point is found, the second part performs a speculative step which is supposed to speed up the search. The size of this step is determined by the a parameter.
The search steps in the space of range parameters are identical to Hooke-Jeeves steps, whereas the search steps in the space of statistical parameters are rotations and radial steps. This makes sure that the steps conform to the boundary of the search space. The step acceptance criterion in the space of statistical parameters requires sufficient descent (i.e. a sufficiently large decrease of the function that is subject to minimisation) [20] .
Let S 1 and S 2 denote the space of statistical and range parameters, respectively. The WPM problem for finding the set of statistical and range parameters at which worst-case performance occurs can be stated mathematically as
For x [ S 1 × S 2 vectors x S and x R represent the components of x corresponding to S 1 and S 2 , respectively. Vector x can be written as [x S , x R ] where x S and x R are vectors with n S and n R components, respectively. Basis vectors for S 1 and S 2 are denoted by e i and b i , respectively. All basis vectors are mutually orthogonal with unit length ( e i = 1, b i = 1). Every basis vector corresponds to one statistical or one range parameter. Fig. 3 depicts rotations (w) and radial steps (R) in S 1 and coordinate steps (F) in S 2 . Rotations enable the algorithm to move on the surface of a hypersphere defined by
e ′ x S sin w x S and e linearly independent x S ; otherwise where e ′ is obtained as
Radial steps move between hyperspheres in S 1 with different radii
The introduction of b min prevents x S from becoming 0. Radial steps that result in a point inside the hypersphere with radius b min are modified so that the resulting point lies on the surface of this hypersphere.
The following definitions simplify the description of the algorithm. The step size is defined as
where w, R and F denote the size of the rotational, radial and coordinate steps, respectively. i denotes the index assigned to an individual step. A rotational step is successful if it produces sufficient descent (g). Successful radial steps and successful coordinate steps must produce simple descent (i.e. f (x temp ) , f T ). d i denotes the amount of descent that is sufficient for ith step to be considered successful.
The actual steps (P(x, D, i)) are defined as
Constraints (7) and (8) can be violated by certain steps. For this purpose, a vector-valued function V(x) is defined which moves a point in such a manner that the constraints are satisfied again.
Let x 0 denote the initial point. The proposed algorithm for solving the WPM problem can now be represented by Fig. 4 .
The upper index of x represents different points that are calculated by the algorithm. x T and x B represent the trial point and the best point while the corresponding values of f are denoted by f T and f B , respectively. The point produced by the speculative step is denoted by x SP . The lower index of vector x denotes the sub-vector of statistical parameters (x S ) and range parameters (x R
If n S = 0 (or n R = 0), the sequence of indices traversed by the for statement is 2, 3, . . . , n R + 1 (or 1, 2, . . . , n S , n S + 1). For n R = 0 the condition in the 'if' statement that triggers the 'break' changes from i = n S + 2 to i ¼ 1.
The implementation used the following initial parameter values: Despite the fact that the algorithm is local, it is expected to be adequate because the modified SQP approach to the WCD problem in [7, 17] is also local but still produces good results on real-world IC designs. The optimisation (step A) starts with initial point x D . The cost function for optimisation is constructed in such a manner that if measure f i fails to satisfy f i ≥ F i in any of the corners from C i , a penalty proportional to the violation is added to the cost function [21] . The optimisation stops when the cost function reaches 0 (i.e. when all design requirements are fulfilled across all corresponding corners).
Direct search yield optimisation
After the optimisation is finished, the obtained design (x D ) is checked for new corners (i.e. the worst-case performance 
The last inequality must hold for all i = 1, . . . , n R . The algorithm gradually builds the set of corners corresponding to individual performance measures. It is finished when the worst-case performance satisfies all design requirements. At that point, final yield partitions satisfy (5). The WPM algorithm requires no derivatives for its operation. If a direct search optimisation algorithm is used in step A the yield optimisation algorithm does not require the evaluation of derivatives.
Examples
The proposed WPM-based yield optimisation approach was demonstrated on four different IC cells: an operational amplifier (OPAMP) [22] , a bandgap reference circuit (BGR) [23] , a beta multiplier reference (BMR) [22] and a comparator (COMPAR) [22] . The target yield was 99.87% (b WPM = 3). Two range parameters were common to all four circuits: temperature (220-808C) and supply voltage (1.6 -2.0 V). The OPAMP had an additional range parameter (bias current) ranging from 80 to 120 mA. Four parameters of every MOS transistor were affected by mismatch: threshold voltage, channel width reduction, channel length reduction and oxide thickness. The vector of these parameters for all transistors in a circuit is denoted by p and can be expressed as The first tested circuit is the BMR circuit in Fig. 6 . This is a current source that pushes the current (I REF ) through resistor R. This current is fairly stable with respect to temperature and supply voltage variations. One can mirror the reference current using gate voltages V BIASP and V BIASN . This circuit has nine design parameters (n D = 9): the resistance of R and the length and the width of all MOS transistors excluding the start-up circuit (M SU1 , M SU2 and M SU3 ). Several transistors must have matching channel dimensions (i.e. M 1 and M 2 , M 3 and M 4 , M A1 and M A2 and M A3 and M A4 ) and so MOS transistors contribute only eight design parameters instead of 16.
During the optimisation, two range parameters (n R = 2) and 32 statistical parameters (n S = 32) were considered. The range parameters were the temperature (220-808C) and the supply voltage (1.6 -2.0 V). The nominal values of range parameters were 278C and 1.8 V, respectively. The 32 statistical parameters were contributed by transistors . The goals for these measures were V ds − V th . 0 and V ds − V dsat . 0. In [14] , such measures are referred to as sizing rules. The sizing rules were not subject to WPM analysis and yield optimisation.
The remaining three examples are set up in a way similar to the BMR. The OPAMP is depicted in Fig. 7 . It has 32 statistical (four for every transistor with the exception of M N 1S and M P1S ) and three range parameters. The bandgap reference circuit on Fig. 8 is designed for low-power CMOS and is based on resistive subdivision. It has 40 statistical parameters. Transistors M 4 , M 5 and M 6 comprise the start up circuit and contribute no statistical parameters. In the COMPAR circuit (Fig. 9) , variations of statistical parameters cause a significant increase of hysteresis. The BMR circuit (see Fig. 6 ) was used as a source of reference current for the COMPAR. The complete COMPAR has 108 statistical parameters (32 because of the BMR and 76 because of the COMPAR).
The initial set of design parameters x 0 D for every circuit was obtained with the parallel simulated annealing with differential evolution (PSADE) global optimisation method by considering only the nominal corner. PSADE is a parallel version of the DESA global optimisation algorithm [27] , which performs well on various mathematical and IC optimisation problems. The initial point for all four circuits was obtained in a few minutes on a parallel computer system of five 3 GHz dual-core computers. The initial point obtained with such a global method is a good starting point for the direct search yield optimisation algorithm presented in the previous section.
Modified Box simplex algorithm [28] was used for the optimisation in step A. We used this local optimisation method because experiments have shown that with a good starting point (one that satisfies the design requirements in the nominal corner) it produces results that are as good as those obtained with the PSADE algorithm, except that the Box simplex algorithm is faster. The results of yield optimisation are listed in Tables 1 -4 . The first three columns of Table 1 The final results were verified by Monte -Carlo (MC) analysis of 10 000 random samples in the space of statistical parameters. For every sample, a WPM analysis with n S = 0 was performed resulting in worst-case performances over range parameters for the given set of statistical parameters. The WPM analysis results were used to obtain the actual yield partitions (Y i ) (last column of Table 2 ). The WCDs for the final results were also computed and are listed in the second column of Table 2 . The table lists only those circuit performance measures for which the WCD is below six. It can be seen that no WCD was below three which once again confirms the final yield partitions. The yield partition corresponding to betamultiplier temperature dependence (99.85%) was slightly lower than the target yield (99.87%). This can be attributed to the fact that the circuit optimisation algorithm was stopped as soon as all worst-case performances satisfied their respective goals. Owing to the numerical errors caused by the simulator, some of them were slightly below the goal resulting in a slightly lower yield partition than expected. Table 3 lists the number of optimisation variables considered in the yield optimisation process and in the proposed WPM algorithm. The number of optimisation variables in the yield optimisation process is equal to the number of design parameters (n D ). On the other hand, the number of optimisation variables in the proposed WPM algorithm is equal to the sum of the number of range and statistical parameters (n R + n S ).
The yield optimisation was not parallelised and was executed on a single 3 GHz Intel processor. One analysis in one corner was considered as one circuit simulation. For every set of design parameters examined by the optimiser, a performance measure contributed as many simulations as there were corners in its corresponding C i . With this in mind, the cost evaluation can be parallelised to great extent. For example, in a problem with five analyses where every analysis must be performed across three corners, a total of 15 simulations is needed. If this problem is run on a parallel system of 15 or more processors, then the time needed for performing 15 simulations is equal to the time of the longest simulation and some additional time arising from communication overhead. In the same way, the WPM evaluations of individual performances can also be run in parallel for all measures at once. This has the potential of speeding up the algorithm proportionally to the number of performance measures.
The comparison of the proposed algorithm with the approach presented in [9] is shown in Table 4 . The main difference between the two approaches is in the calculations of the WPM. In our approach, the algorithm presented in Section 5 is used, while [9] uses SA. The starting point and the initial circuit optimisation in the nominal corner were identical for both cases.
The second and fifth columns in Table 4 list the final yield obtained with the proposed method and the method presented in [9] . The final yields were obtained in the same manner as the yield partitions in Table 2 (10 000 sample MC analysis). The final yield of the circuits obtained with the proposed approach is close to the desired yield (99.87%). On the other hand the circuits obtained with the approach from [9] (with the exception of the BGR and BMR) exhibit a significantly lower yield. For these circuits a further yield optimisation step (e.g. using the LPP method [10] , as noted in [9] ) is required to reach the desired yield.
We tried to tighten the stopping criterion of SA in the hope of obtaining final yields that would be closer to the desired yield. The result was an improvement of the final yields (which unfortunately were still lower than the desired yield), but the algorithm runtimes increased by an order of magnitude.
The proposed method is also faster than [9] , both in terms of the number of simulations (n sim ) as in terms of runtime (T ). The average runtime of the algorithm per simulation is longer for SA than for the proposed algorithm. This can be attributed to the fact that because of its random nature the SA used in [9] generates a large number of circuits with extremely bad performance. Such circuits usually result in long transient analysis runtimes.
Conclusion
An approach to circuit sizing that takes yield into account was proposed. The approach is direct search based and does not require the evaluation of derivatives (sensitivities). It utilises a novel direct search approach for worst-case performance evaluation and a standard direct search optimisation algorithm (Box simplex algorithm) for circuit sizing. Algorithm runs on four examples of IC cell design have shown that the desired yield (i.e. 99.87%) can be achieved in hours on a single processor. The final yield was verified by simulating 10 000 MC samples and evaluating the final design's WCDs. Both tests confirmed the results. The algorithm can easily be parallelised. The expected maximal speed up is proportional to the number of performance measures subject to optimisation.
