Abstract
Introduction
Bipolar ECL LSIs are the main components in the high-performance systems such as main frame computers and digital communication systems. Advanced ECL LSIs can operate at a frequency of up to a few GHz [5, 111 . ECL is the most promising and practical technology to construct high-speed measurement systems, B-ISDN systems such as SDH * signal processing, and so on. However, the high power dissipation sacrificed for this high performance makes systems using ECL quite expensive.
Due to this high power dissipation, ECL LSIs used to be built with fewer gates than CMOS LSIs. However, recent advances in Bipolar technology has enabled the production of ECL chips with up to several thousand gates [3] , which require efficient CAD systems. Although most of CMOS CAD tools can also be used with slight modifications for ECL LSI design, additional techniques must be developed to reduce power while maintaining speed. This requires an essentially different consideration from CMOS technology.
In [2] , a technique to minimize the delay time of an ECL gate by adjusting its parameters was proposed. However this method can be used only within an ECL gate, not for the entire LSI. In [12] and [l] , cell selection programs were proposed for speed and power 'SDH: Synchronous Digital Hierarchy.
improvement. The problem is these methods require a huge cell library to ensure a wide range of improvement. In [9] , a technique that decreases the power dissipation in a CMOS circuit under timing constraints was proposed, but its models were not for Bipolar and they did not consider clock skew optimization.
In this paper, we propose a technique to minimize the power dissipation in cell-based design approach under timing constraints after placement and routing. We inherited some ideas from [4] , e.g. the idea of p r e grammable resistors. However in 141 they only minimized the sum of the switching current (not the power) under unrealistic quadratic delay functions that take into account only the interconnect capacitance. Furthermore they did not consider the clock skew constraints and the false path problem.
First we present a power dissipation model for an ECL gate as a linear function of its switching current. Then we propose a delay model for an ECL gate as a nonlinear function of its switching current. [6, 101 show that the power-delay product of an ECL circuit is constant. Our gate delay model is similar and is obtained by an estimation method based on [8], using the post-layout wire capacitance and resistance.
Using the delay model, a set of timing constraints is extracted from a given circuit. We have various types of constraints, -i.e., the maximum/minimum delay, clock skew and setup/hold -to consider the practical timing. A simple technique is presented not to make timing constraints for false paths. The set of constraints are then given to a nonlinear programming (NLP) package. There are basically three objective functions, which are the clock skew time, the clock cycle time, and the power dissipation. The clock skew is optimized first, followed by the clock cycle minimization under the timing constraints and the optimized skew. Then under this minimized clock cycle, the power dissipation is minimized. As a result, the switching current for each gate is obtained, by which the resistors in each cell are programmed so that each resistor has the optimized switching current.
The resistors are adjusted by only sliding contacts on them. Since sliding contacts can be done within cells, we can perform the optimization after the layout.
The experimental results show more than 40% power reduction for circuits including a real communication system chip, compared to the max power version. The clock cycle was maintained or even made faster due to the efficient clock skew optimization.
We'll show models for the power, delay and layout in section 2. In section 3, we formulate our problem as an N L P and present a procedure to get solutions. Section 4 is for the experimental results and the paper is concluded in section 5 .
2.1 Power model
Physical models for an ECL gate Figure 1 shows a typical Bipolar ECL gate circuit. Its power dissipation is mainly due to the switching current I,, and the emitter-follower current I e j .
The switching current I,, is forced to flow constantly since the V,, and the base-emitter voltage VBE of a transistor are constants. This can be written:
Note I,, is constant, regardless of the input signals.
It is easy to see that I,j is not constant since outputs woi swing depending on the inputs. We simply calculate the average. Since the reference voltage V, is set to the center of the output logic swing, the average I e j , denoted by the same variable, is written: ( 2 ) Therefore-, the current values can be controlled by R,, and R e f . Note the current t o a fanout gate's base is negligibly small. In our approach, we set I e j to be proportional to I,, by scaling R,, and R e f , because it is reasonable to assume that a high-speed (i.e. high I C , ) gate should have high drivability (i.e. high I e j ) in general. We denote the ratio by a9 = I:!/I;,, where the superscript g stands for gate g.
The power dissipation Pg of gate g in Fig. 1 is Total power dissipation P in an ECL LSI is (5) 9 9 where 
Delay model
It is known that the power-delay product in an ECL gate is constant [6] . The power dissipation in an ECL gate is proportional to I,, (eq. (4)). So we assume the current-delay product of an ECL gate is constant. A number of our simulation results support this assumption within the realistic range of I C s . We propose a delay function d from an input pin p of a gate g to an input pin q of its fanout as follows: (6) where u;Ye, b;Fe and are constants that depend on the gate's parameters and its load (see Fig. 2 ) and rise means this is the delay time for the rising input signal. We will discuss how U;:,, b;Fe and c x " are determined later. A similar formula is defined for a falling input. Figure 2 shows examples of this function. In reality, we have a finite range for I;,,, are omitted here due to the page limit. The error to SPICE simulation is about the half of the original P R method, which is accurate enough. The computation time of the improved P R method was about the same as the original in our experiment. Rise (fall) time was ignored since it didn't affect the accuracy so much despite the high computational effort.
Layout model
We assume the LSIs are composed of the rows of ECL standard cells and the channels between them.
The delay time (eq. (6)) and the power dissipation and Rc2 should also be scaled in proportion to R,, so that the logic swing does not vary.
We employ a programmable model for these resistors (see Fig. 3) . A programmable resistor is composed of a poly-silicon interconnect, a metal interconnect and two contact cuts. One of the contact cuts is fixed a t one end but the other is placed after the optimization. After the optimization described later, we simply put the contact cuts so that the currents through these resistors are adjusted to the optimized values from eqs. (1) and (2). It is easy to calculate each contact cut location from each resistance value. Since this contact programming can be done within the cell, the post-layout optimization is possible.
Power and timing optimization
Here we formulate a nonlinear programming (NLP) problem for power and timing optimization under timing constraints, which are composed of the max/min delay, setup/hold, and the clock skew constraints. In [4] only max delay constraint is considered. 
Max/min delay and setup/hold constraints
Let us consider the delay time from the input pin p of g i to the input pin q of gj by dpq in Fig. 4 . We consider two types of delay time for dpg: the maximum and the minimum delay time d r x and d z n . The difference between these two is due to process variation and they should be equal in the ideal case. Since we distinguish rise and fall delay, we consider four types of delay time, i.e.,a i , r i s e p a x , f a l l , dmin,+ise pq and P i n i f P9
Note they are functions of 1;;. Each gate pin p is assigned the maximum signal arrival time (SAT) trax and the minimum SAT trin. Note 
Clock skew constraints

DRa DRb
Figure 4: Timing constraints
In the case of exclusive OR, we have to consider four such constraints for a pair of pins. Here, we limit ourselves to rise for delay and SAT in this paper and omit this superscript for simplicity.
The following minimum delay constraint between two pins is needed to avoid the hold violation:
We have both rise and fall for equation (8) and T, is the clock cycle time (see Fig. 4 ). Note that we assume the setup and hold time are functions of the latch B's switching current I:, which are similar t o (6) . This is due to the safe design margin used by designers. A latch also has normal max/min delay constraints from its clock pin to its fanouts. One important problem in making timing constraints is eliminating the false paths [7] , which is a difficult problem of logic design. If we construct the delay constraints for false paths, T, could not be minimized since the false paths may become critical and prevent other nonfalse paths from being minimized. In [4] the false paths were not considered, which was fatal to real chip design. To reduce the number of the false paths, we allow users to indicate the source and sink gates of the nonfalse paths. Tracing the network from both the sources and sinks, the constraints are created only for the gates on the user-specified paths. This also reduces the problem size to some extent.
We have three modes for the clock skew optimization. They are called fixed skew, zero skew and free skew modes. One of them is specified by users.
In the fixed skew mode, each clock driver's switching current is set to a given constant to have a constant delay time. So the clock tree has the constant skew time. In this mode we need max / min delay constraints (7) and (8) with constant I,, for the drivers.
In the zero skew mode, the clock skew is minimized.
Here we need max / min delay constraints (7) and (8) with variable I,, for the clock drivers and the following clock skew constraints at the leaf of the tree:
where t$ is the clock SAT a t a latch B and the T,i and TA are the variables used to bound the clock SAT.
Here, we drop superscripts m m and min for simplicity. Let us add another constraint:
The clock skew time is minimized by minimizing T3kew. Alternatively by setting Tskew to a positive constant, the circuit is optimized within the given skew allowance Tskew.
In the free skew mode, all the clock drivers have no special skew constraints other than normal delay constraints (7) and (8) with variable I,,. It is not necessary to have zero skew to get the minimum clock cycle time. The clock cycle time can be shortened more by adjusting the delay of the clock signal optimally. Fig. 5 shows the clock cycle time T, can be shortened more by increasing the clock delay time for latch B in Fig. 4 . By allowing the clock driver cells to have variable delays, the clock signal phase is adjusted so that the clock cycle time T, is minimized.
Problem formulation and its solution
The problem is formulated as a following multipleobjective nonlinear programming problem. 
Clock skew constraints (zero skew mode onlul :
If the fixed or free skew modes are specified, minimize objective functions (14) and (15) in this order. If the zero skew mode is specified, minimize objective functions (13), (14) and ( 
Experimental results
We have implemented a program called Polaris for the above procedure. We ran Polaris for the various circuits in Table 1 in both the zero and free skew modes. All results are normalized by max-current results for which all I ! , are fixed to ILaZ and hence each gate is given its max speed. Circuits S444, S953 and S5378 are ISCAS benchmark data mapped with our resistor-programmable ECL cell library and FCHIP is fastest SAT free skew mode an LSI (excluding 10s) whose max-current version is for a real communication system in Japan. FCHIP's results are normalized by this real chip. The layout design was done with our in-house tools. Each cell has up t o 12 programmable resistors. All results are verified by DRC/ERC.
From Table 1 , we see that Polaris reduced the power dissipation more than 40% of the max-current version while keeping the clock cycle time comparable or even smaller due to the efficient clock skew optimization. Since the minimum power is 40% (since we set Ikin = O.4Ihaz), we can say the results are fairly good. S5378 has very few gates on critical paths, so the power reduction was large. The free skew mode results are slightly faster than the zero skew mode ones in some circuits due to the clock optimization while the skew is much smaller in the zero skew mode. In the free skew mode, the minimum delay time for each gate is set t o 90% of the maximum. The clock cycle times for FCHIP are not so reduced since the maxcurrent version (the real chip) was highly optimized regarding the clock signal.
The CPU times were measured on a Sun Sparc 10, where less than a quarter of them was spent in the delay function fitting. The delay times (10 points of I,, per cell) for all the cells of FCHIP were obtained in about 2 min by the improved P R method. Figure 6 shows the power-cycle time(T,) curve for FCHIP. It was obtained by running Polaris in the zero skew mode for some fixed T,. Since the actual system requirement for the T, of FCHIP was much larger than loo%, we can reduce the power dissipation more by exploiting this timing margin.
Conclusions
Although ECL LSIs can operate at very high frequency up to a few GHz, their applications have been limited to very high speed systems due to their high power dissipation. In this paper, we proposed a CAD method that enables the optimization of the power and timing of ECL LSIs after the layout design. Based on the accurate delay and power models extracted from the layout, an NLP problem was formulated and solved. Experimental results showed that the proposed method can reduce the power dissipation more than 40% for circuits including a real chip without degrading speed. Reducing the size of NLP by eliminating false paths more efficiently is future work.
