In combinatorial blocks of static CMOS circuits transistor sizing can be applied for delay balancing as to guarantee synchronously arriving signal slopes at the input of logic gates, thereby avoiding glitches. Since the delay of logic gates depends directly on transistor sizes, their variation allows to equalize different path delays without influencing the total delay of the circuit. Unfortunately not only the delay, but also power consumption circuit depend on the transistor sizes. To achieve optimal results, transistor lengths have to be increased, which results in both increased gate capacitances and area. Splitting the long transistors counteracts this negative influence.
Introduction
Optimal sizing of MOS-transistors is a widely investigated method for the design of CMOScircuits with restricted area, propagation delay or power consumption. A large number of the previously published approaches aim at area and power optimization under given delay constraints [1, 2, 4] . Optimal area utilization is still important, but since the substantial progress in development of deep submicron techniques, power dissipation has become the main limiting factor. The power consumption models often include only the power consumed for charging transistor gate-and drain/source-capacitances. The power models in [1] include also the dissipation caused by short-circuit currents that occur during transition when both P-and N-transistors of a CMOS-stage are conducting. Delay balancing for reducing the glitching activities in combinational Wallace-trees and array multipliers has been introduced in [5] .
Unlike for most approaches that focus on maximizing the speed of a circuit by variation of transistor widths, the method presented here allows also the transisitor lengths to be variable.
Reducing speed for delay balancing is only allowed for parts of the circuit that are not in the critical path. In [6] a method is presented, where all transistor widths outside the critical path are reduced in order to reduce the total capacitance of the circuit. However, delay balancing may not be possible if only the widths are variable because the limit here is the minimum feature size. Further speed reduction can then be achieved by increasing the transistor length.
In order to keep track of the conflicting design objectives like increasing transistor sizes for delay balancing, and at the same time reducing the total power consumption caused by charging capacitances, the method is formulated as a multiobjective optimization problem. Furthermore, by introduction of changes to the topology of the circuit where possible, the reduction of gate capacitances can be achieved. This makes further decrease in power dissipation possible.
In the following we consider circuits in which increasing transisitor lengths is necessary.
Decreasing of W to make the gate slower, which is the usual approach, results in smaller area and less power consumption. On the contrary increasing L provides slower gates, but influences both, the area and power dissipation, negatively. Thus, increasing L represents the worst case approach to transistor sizing. Therefore, the power savings presented here reflect only the benefits of a delay balanced circuit due to reduced glitch activity. Of course GliMATS, a program that has been implemented for automated circuit optimization, is not limited by this artificial constraint.
Delay and Power Models
The delay and power models used for the transistor sizing method presented here are defined at gate level. Although transistor level models may offer more degrees of freedom and allow individual sizing of each single transistor, it turns out to be more desirable to have a low dimensional optimization problem in order to be able to optimize larger circuits within acceptable computation time. When modeling a circuit at gate level (macromodeling), the relatively large number of local parameters that describe every single transistor is reduced to a set of scale factors for each gate. In the considered case the number of variables is reduced to one specific W and one specific L for each gate. If W and/or L are varied, all transistor widths and/or lengths within the gate are scaled by the same factor simultaneously.
Delay Model
The macromodel delay has to be described for each type of logic gate separately. In the following, the delay model is derived exemplarily for a 2-input CMOS AND gate. The generalization to any other logic gate type is straightforward. The delay of a gate at position m can be split up into two parts [3, 4] : The step response delay s;m , which is independent from the input signal form, and in;m , which is the contribution caused by the finite input signal rise and fall times. The total delay m is then approximated by m = in;m + s;m :
The optimization method aims at the minimization of glitches, which necessitates equalizing all path delays. However, the step response delay s;m depends on the input transition. For example: The step response delay of a 2-input AND gate in 0:25m technology with a certain load is 0:4ns for the input transition 11 ! 00 and 0:75ns for 00 ! 11. Therefore, the different paths can exactly be balanced for one special transition only. Experiments have shown that the worst case delay is a good choice and is easy to formulate in the model. 
and C g n ;m = C g;m ; C g p ;m = C g;m :
All components can be formulated as functions of the channel length and width:
and
where r; c d=s andc g denote process dependent constants. The load capacitance depends on the transistor sizes at position m + 1 and the number of input transistor gates m+1 at position m + 1 , in such a way that C L;m = m+1 c g W m+1 L m+1 :
For the 2-input AND gate follows 
For details see [4] . The input dependent delay can then be written as: where all technology dependent parameters are merged in the constant K. With equations (1), (9) and (12) the total gate delay is given by:
All technology dependent parameters are merged in constants k 1:::6 . The total delay of a path number is the sum over all gate delays in this path:
where n is the number of gates in the path. 
in 
Power Consumption Model
With the objective function (14) only the delay can be considered in an optimization procedure so far. In order to take into account also the transistor size dependency of the short-circuit currents and the total capacitance of a circuit, an objective function for the power consumption of gate number m can be formulated as follows:
P m = P m;cap + P m;sc ;
where P m;cap denotes the power consumed for charging the gate and drain/source capacitances and P m;sc denotes the short-circuit power consumption of gate m. For a 2-input AND, P m;cap can be written as: 
Factors 1 , 2 , and 3 , denote the switching activities at nodes 1; 2, and 3 respectively (see the numbers in circles in Fig. 1 ), which are considered as constants over W and L.
The short-circuit power dissipation of a CMOS inverter according to [7] is given by: 
It can be shown experimentally, that neglecting the contribution of P m;sc has no negative influence on the results of the path balancing method even for complex gates. Therefore, it is reasonable to set P m = P m;cap . The total transistor size dependent power consumption in path number can be formulated as:
for a path with n gates.
Multiobjective Optimization
In order to find a power optimal solution for W and L the designer is confronted with two conflicting design criteria: path balancing by transistor sizing, achieved by enlarging transistors, and low power consumption for charging capacitances which requires small transistors at the same time. This problem usually can be solved with a non-linear programming method. A common approach to find a solution is to keep one of the design criteria within upper and lower bounds and find an optimal solution for the other one under these restrictions. The problem is to determine the upper and lower bounds if they are not previously known. Awkwardly chosen bounds may result in an unsolvable optimization problem. Therefore, not every single criterion is optimized while restricting all the others, but the weighted sum of all the design criteria [4] .
In order to equalize all the path delays with respect to the critical path, every path requires 
Equations (23) and (24) describe convex optimization problems in W and L. The multiobjective optimization problem is given by: min W;L S = w , crit 2 + 1 , w P :
The weight factor w varies between 0 and 1, w 2 0; 1 . Results of the optimization are highly independent of the choice of w. Only values extremly close to 0 or 1 influence the result.
In order to obtain a cost function, which is differentiable everywhere, j , crit j is replaced by its square. The upper and lower bounds of the transistor sizes are determined by the minimum feature size of the used technology and the user defined limits for the maximum available area for a single transistor.
Assigning a value to w allows a solution to be chosen depending on which of the design objectives is more desired: low power consumption caused by the total capacitive load or balanced path delays. However, experiments have shown that for many circuits the best low power solution is obtained if j , crit j = 0 , i.e. for optimally balanced paths. This is usually given when w = 0 :5:::1.
Minimizing Gate Capacitances
As mentioned before, the case considered in this paper is the one, when transistors are being made longer. This leads to larger channel resistance of the transistor and increases its gate capacitance. In the following we present two alternative ways of reducing this negative influence.
"Twin-Transistors"
So far the channel resistance as well as the gate capacitance
are proportional to the channel length. On the other hand the delay is proportional to both, the channel resistance and gate capacitance. To increase the channel resistance without increasing the gate capacitance one has to be able to change them independently from each other. This is possible if the capacitance and the resistance are no longer part of one common transistor. To achieve that one can split the common transistor into two. The resistance can then be assigned to one of them, the capacitance to the other. The goal is to make the capacitance as small as possible. The reasonable approach is to make its transisitor minimum feature sized. This one will be responsible for switching. The length of the other transistor has to be dimensioned in a manner that satisfies the delay constraint given. The gate capacitance of this transistor has, of course, been increased, but it's not of importance anymore since its gate can be hard wired to the voltage supply. Thus, it has no influence on dynamic power dissipation. By splitting the transistor into two, both goals have been achieved. Despite increased resistance, the gate capacitance can be held minimal. The topology of "Twin-Transistors" is shown in Fig.3 . 
"Merged-Transistors"
Introducing "Twin-Transistors" doubles the number of devices in the gate. Even if they can be placed in a area-saving way, together with additional wiring, the area taken is almost doubled.
It's obvious that, within one block, the transistors, responsible for the increased delay, can be merged together. This considerably influences the data dependency of the gate delay. The range in which the delay varies becomes smaller and moves towards worst-case-delay. This is advantageous for the purpose of optimization. The topology of "Merged-Transistors" is shown in Fig.4 . The changes made to gate topology have their influence on power savings and area increase.
Numerous simualtions have shown, that in combinatorial blocks of static CMOS circuits 50%
to 90% of power is being dissipated due to glitch activity. For further considerations we will assume a mean value of 70%. In the following we try to estimate, how much power could be saved if glitching was eliminated completely. Let us consider "Twin-Transistors" first. For all additional transistors, that are not connected to power supply, additional drain/source capacitances of about 25% have to be taken into account. This results in increased power disspation by 12-17%. Thus power that can be saved drops to 65%. With "Merged-Transistors" there are no additional drain or source capacitances. A gate that has been modified in this manner does not dissipate more switching power than an usual one. In theory all 70% could be saved.
The area increase is significant. A minimum size "Twin-Transistor" itself needs about 66% more area than a usual one. This number increases with the transistor length. Resulting average area increase is about 77%. Additional wiring could require even more space. For "MergedTransistors" the number of additional transistors is significantly lower. But the ones used could be very long. The wiring is much less costly than in the case of "Twin-Transistors" and is comparable to that of a standard gate.
Balancing of the Path Delays
In order to reduce the glitches in a combinatorial circuit, it must be guaranteed that the signal slopes at a gates inputs arrive synchronously. This can be achieved by path balancing, i.e. by slowing down fast paths outside the critical path such that the total delay of the circuit is not affected.
In general the delay of a gate depends on the input transition. Therefore, exact path balancing at gate level is only possible for one special case of transition. As mentioned in Section 2.1, the delay which shall be synchronized with the method presented here is the worst case delay.
For all the other transitions that cause different delays the paths will then be only aproximately balanced. Circuit optimization for maximum speed under given constraints can be included but is not considered here. It is assumed that the circuit is already optimized to match certain delay constraints. As far as possible, the path delay is being decreased by reducing the widths of the transistors in the path, thereby also reducing the total gate capacitance. Of course, this is limited by the minimum feature size of the technological IC-process. Further increase in delay is possible by increasing transistor lengths.
To explain the path balancing algorithm, we consider an example circuit pictured in Fig. 5 . As the sizes of gates 3 and 4 are fixed, only sizes in gate 5 can be changed for this purpose. After this process it is guaranteed, that signals at the inputs of gate 3 arrive synchronously. Following that gate 5 is also marked as "fixed". In the next step path 6-7-4 is equalized to the critical path, such that 6,7,4 = crit and gates 6 and 7 are marked. Finally the same is done with path 8-7-4.
Note that the total delay of the circuit, relevant for the clocking speed of the input and output registers remains the same, namely c r i t . By treating gate sizes as fixed in paths that have already been balanced, the number of paths in the circuit that remain for optimization decreases rapidly. I.e. for a 3 3 array multiplier with 107 paths only 5 optimization steps are necessary.
The transistor sizing algorithm for the reduction of glitching activity is implemented in the program GliMATS (Glitch Minimization by Automated Transistor Sizing). It allows to optimize the circuits automatically. GliMATS processes a netlist of the circuit. The user can set a value for the weight factor w and specify which of the transistor topologies is to be used -standard, "Twin-Transistors" or "Merged-Transistors". As output GliMATS produces netlist of the glitch minimized circuit. It is assumed that the circuit is already optimized to match eventually given timing constraints, so the critical path must not be manipulated in order to retain the required maximum delay. The input netlist to the path balancing program is given from this previous speed optimization.
GliMATS starts at the primary inputs with building a node list which describes the dependencies of all nodes from their predecessors. The delay of every passed node and its predecessors is stored. After all output nodes are reached, the critical path delay is known. Then the algorithm starts at the output nodes and traces back to the primary inputs by choosing at each passed node the predecessor with the largest delay for the next node. The delay and power functions of this path are built according to equations (14) 
Applications and Experimental Results
The proposed path balancing method has been tested on some example circuits, a few selected are shown here. They include array multipliers and combinational logic blocks (ISCAS'85
Benchmarks). The simulations have been performed with PowerMill before and after transistor optimization for glitch reduction. The different topologies have been tested in the optimization.
For simulation 10000 random input vectors have been applied to each circuit. The results are summarized in Tables 1, 2 has been estimated by measuring the area inrease of a single cell due to transistor sizing and projecting this to the total chip area including the wiring. For the standard topology the expected area increase is between 15% and 31%. The additional silicon space needed is even greater for the "Twin-Topology", but it's significantly lower for the "Merged-Topology".
To demonstrate the effect of path balancing by transistor sizing a logic circuit shown in Fig.   6 has been designed. If zero delay is assumed for all gates in this circuit, the output is always 0 regardless of the inputs. In a simulation with complex transistor models only glitches due to is delayed after transistor sizing in order to "wait" on signal A.
The results show significant power savings after GliMATS has been applied. However, one must be aware that enlarging of the transistor lengths to increase the delay results in slower signal slopes which may lead to larger short circuit power consumption (this is considered in the results presented) and to an increase of the required chip area. A good application of the method would be in combination with pipelining, where register stages work as glitch barriers.
In the combinational logic between two register stages glitching could be eliminated by the presented transistor optimization. In order to reduce design time for the different sized gates, module generators can be applied for automatic scaling of parameterized gate layouts.
Conclusion
In this work a method for transistor size optimization to achieve equal path delays in CMOS circuits has been presented. Delay and power consumption of a path can be modeled as functions of the transistor sizes W and L at gate level. With multiobjective optimization the path delay differences and the power consumed for charging capacitances can be minimized simultaneously. The solutions for W and L are restricted by upper and lower bounds, given by the minimum feature size and area limitations. By splitting long transistors into two a decrease in gate capacitances has been achieved. In case of "Twin"-topology a significant area increase has to be taken into account. The topology with "Merged"-transistors reduces the area increase and shows even better results in terms of power consumption. A tool -GliMATS -has been implemented that automatically reads a netlist of a circuit, builds the delay and power functions, starts multiobjective optimization and returns the netlist of the optimized, delay balanced circuit with the new values of W and L for each gate. GliMATS is capable of handling all three topologies. Depending on the chosen mode GliMATS can automatically introduce different topologies, where applicable, to achieve best power savings. By applying this method glitching in a circuit can be reduced drastically. Experimental results show significant power savings after optimization.
