Abstract-In this paper we show how decomposition of a wide CMOS transistor into a multi-finger FET with gates of minimum size can be beneficial for the reduction of delay and power-delay products in logic gates. This design possibility, which we call a minimum-split transistor (MST), seems to be largely overlooked in the literature. In a 90 nm CMOS process we compare the design to wide transistors. By exploiting the narrow-width effect, reduced parasitic capacitances from a shorter active channel and increased gate-drain spacing, we achieve up to 75-85% higher operation speed at a similar or reduced power consumption. The worst-case timing delay is reduced by 35-40% along with the nominal values. The design technique is considered valuable, in particular for critical time paths. The paper takes the perspective of subthreshold logic design at 200 mV, but the technique is also useful above threshold. A statistical experiment also investigates how V th variation in MSTs changes with the number of parallell gates.
I. INTRODUCTION
This work grew out primarily of a wish to increase the reliability and effectiveness of subthreshold logic design [1] , [2] which focuses on the operation of CMOS transistors in weak and moderate inversion. As the voltage supply drops towards or below the threshold voltage V th , robustness and handling of the sometimes extreme timing delay variations that occur is a key design concern for subthreshold circuits. The power-delay product (PDP) [3] or the energy/operation for a digital circuit is often found to be at an optimum value when the supply voltage is close to V th [4] , particularily when the acitvity factor is low. Although this work focuses on circuits operating at a power supply of 200 mV, positive results are also achieved at higher supply voltages.
In this paper we investigate the benefits of what we call minimum-split transistors (MSTs), which merely mean multifinger transistors composed of a multiple of minimum-width gates. By doing this we exploit an increased current/width, narrower active channel and reduced capacitances in order to achieve large gains in timing delay at similar power consumption. To the authors knowledge the exploitation of this layout technique has not been presented in the literature earlier.
In the following section we describe the MST layout technique. We then outline and present the results of a ring oscillator experiment and a numerical experiment concerning the statistical properties of MSTs. The results are discussed in section V.
II. MINIMUM-SPLIT TRANSISTOR
In deep sub-micron CMOS processes some transistors exhibits a peculiar and perhaps counter-intuitive narrow-width effect [5] . When the width of the device is reduced below a certain point, the nominal device current goes up. Typically this is attributed to fringing electrical fields and it is modeled with a lowering of V th . Often measures are taken in the process steps to counter this effect. In this paper we focus instead on the possible exploitation of this effect when it is present.
A graph for the nominal current as a function of width in a general purpose 90nm NMOS transistor, used later in the ring oscillator experiment, can be seen in figure 2. As we can see the nominal device current per μm increases to almost 3.5 times the value of a wide channel. When designing a layout for a logic cell one can thus choose to utilize a multiple of the minimum-width transistors instead of a wider gate. We call this design a minimum-split transistor (MST) and a layout can be seen in figure 1. We can see that the source to drain pitch is larger in the MST design, and thus the wide transistor (WT) design may be allowed a larger width for an equal-area design. In the figure a bounding box for the two source and drain areas would be equal. When drawing layout for the MST design, minimum spacing rules for the source/drain implants and an increase in pitch result in a rather large layout area compared to wide gates. By evaluation of the design rules we find that MSTs with N = 2 minimum-size parallell gates is equal in area to a 0.75 μm wide gate (1.7 μm and 3.58 μm for N = 4, 8 respectively). In order to place the gates closer it seems beneficial in the figure to join the source/drain areas of the MST but design rules also has a lower limit for the isolation area that prohibits this.
By simulation we find that the I on current of a MST design with 2 parallell transistors is 11% larger than that of an equal area 0.75 μm wide gate, even though the combined width of the MST active channels is only 34% of the WT design. For 4 or more parallell transistors the current is progressively smaller than a wide gate design of equal area. The increased half-pitch is however particularily beneficial to the reduction of capacitances from the gate to the source and drain areas. In the layout figure the combined area of the source implants is roughly half of that of the wide transistor, whereas the total side-wall length is about 15% larger in the MST design. The supplied simulation models calculate the total drain junction capacitance of the MST design as roughly 40% of the WT design. Unfortunately the gate-drain capacitance is not directly attainable. 
III. RING OSCILLATOR EXPERIMENT
Using a design-kit from CMP for a 90 nm triple-well CMOS process where both the narrow-width wide-pitch minimumsize unit transistor, and the narrow-pitch wider transistor is modelled, we use Monte Carlo simulations at the schematic level to estimate the performance of seven different 3-stage ring oscillators with inverters to compare between designs based on either WTs or MSTs. To compare the two designs with a cost/function metric in mind we therefore select a wide gate of equal area to the MST design. In the process that we utilize only the NMOS transistors exhibits a significant narrow-width effect while the PMOS does not. Therefore a simple width scaling scheme is adopted for the PMOS in both design types, where the width is selected to achieve the same I on value as the NMOS. The selected PMOS also has a slightly lower V th so that the PMOS width is small, ranging from 0.17 Results and sizing parameters can be seen in table I. The columns correspond to the number of minimum-sized gates in the nmos transistor (N=1, 2, 4, 8). Median frequency and power consumption values are found from a MonteCarlo simulation with 500 runs. The minimum (worst-case) frequency f min along with mean powerP is also included. From f min a maximum value for the average stage delaȳ T d is computed fromT d = 1/2Nf min . Since the frequency distribution appeared roughly log-normal a standard deviation is given after a log 10 computation of the frequency values. The power-delay product (PDP) is computed from the nominal values for P and f . Since the ring oscillator has an activity factor near 1, the dynamic power consumption accounts for nearly all the power. The total load capacitance was thus computed from the equation
An average capacitance per stageC S is then a third of the C L value.
We also include a simulation to show the oscillator behaviour with a capacitive load. A capacitor was inserted between every stage, and the capacitance value was varied from 0.1fF to 100fF. It was assumed that any resistance in metal wires would be insignificant, due to the low frequencies and high R on values in subthreshold. The simulation results appear in figure 3 and are discussed in section V. The results are further discussed in section V.
IV. NUMERICAL MST EXPERIMENT
This experiment shows how the variation in device current in MSTs changes with the number of gates N and is compared to the variation in wide transistors of the same active channel width. An expression for the weak inversion drain to source current in a transistor is [6] :
where the specific current I s can be expressed
where n is the slope factor, μ the mobility, C ox the gateoxide capacitance, U t the thermal voltage, and W ef f , L ef f is the channels effective length and width. Although variations in mobility, C ox and the effective channel width and length W ef f , L ef f is of importance, it is the threshold voltage V th that is the most important factor when accounting for variability in device currents [7] , particularily for small geometries and weak inversion. The variation in V th between a matched transistor pair is often modeled as following a normal distribution, with mean V th and standard deviation σ V th . An expression for σ V th , is given by Stolk's equation [8] :
where c is a constant dependent on the process. Typically σ V th can be around 15mV -30mV for a minimum size device in modern processes (90 nm -45 nm) [9] . Variations in the threshold voltage can thus quickly account for up to two decades of subthreshold current mismatch in a minimum sized device. For these deep-submicron processes the global, or intra-die mean variation of V th is normally smaller than the variation seen in a matched pair of minimum-sized transistors.
Assuming that V th follows a normal distribution N(V th , σ V th ), exp (V th ) will follow a log-normal distribution and we have a sample mean of exp V th + σ 2 V th /2 . If we connect several transistors in parallell, we can take advantage of this. For n = 1.4 and U t = 0.026mV (room temperature) we can achieve a nominal current increase of 6.8% and 19.8% for σ V th = 15mV and 25mV respectively. In figure  4 we have used the same standard deviations to compare the current of N parallell gates with a width of W min to a single gate with a width of NW min . For the MST gates we used a sample size of 10 6 . The wide gates only required a single sample and the 99.9% worst-case confidence band is computed at −3.08σ V th which is the tabulated value for where the 99.9% lower confidence band lies in normal and lognormal distributions. σ V th is computed for the wide gate according to the regime in equation 4.
V. DISCUSSION
We start by discussing the results of the ring oscillator experiment (see table I ). As is evident the MST design is faster by 75-86% in the three comparable cases. The power consumption is also roughly proportional to the I on values. As can be seen in figure 3 the larger I on values of 4× and 8× wide-gate oscillators result in a slightly faster operation than their MST counterparts when they are significantly loaded (C L 10fF). We estimate the capacitance per wire length for this process to be less than 0.2 fF/μm, so 10fF corresponds to a wire length of 50 μm or more. From the standard deviations of the lognormal frequency distribution we see that there is an increase of around 5% for the σ value. This is not so significant since gains in the maximum average time delay is so large, 35-40% for the three designs. At the same time the power consumption is comparable at values ±14% from the wide gate values. We see the reason for the improved behaviour in the average stage delay capacitanceC s , where the values have been reduced by 40-50%. The Power-Delay products also reflect this as the dynamic power (eq. 1) is linearly dependent on the load capacitance, as long as the capacitor is not extremely large so the oscillator fails to maintain a full swing.
It seems easy to conclude that the MST design may constitute a better technique for designing time-critical and powerefficient logic standard cells. This is of course highly reliant on transistor characteristics, but preliminary results indicate that the technique may have good merits in other processes. One question to ask is why we have not simply increased the source/drain pitch of the WT design, and one answer to this is that the for the transistor model this paper is based on the current/area then would always be larger in the WT design. Although we have not been able to model it, we attribute the reduced active channel width of the MST design at least some significance in the reduction of the parasitic capacitance and the MST design would thus always have a higher nominal speed, smaller input capacitance and better load capability for this particular transistor type.
In the process we have used, an area increase of perhaps 40%, possibly more, can be estimated from the increased pitch between source and drain. The increased speed and reduced power-delay product could make up for this cost, but it will depend on the requirements of the design. As both design types could potentially be used intermittently, an optimum fitting would be a complex task that is probably better left to EDA tools. For critical-time paths we however think there are many cases where the MST technique can be a very valuable tool to achieve higher manufacturability and yield, at very small expenses in power and area.
The results from the numerical MST experiment show that MSTs are as good as or better than wide gates when it comes to the nominal worst-case current, when the width of the active channel is the same. The area overhead for dividing the gates would however quickly eat up the benefits of MSTs. It is however not certain that this will be true forever. In the future, with increasing variation, it is possible that the MST design can be beneficial in its own right, regardless of the narrowwidth effect and capacitance reduction attainable.
Process development and new design rules may allow us to achieve MSTs at reduced areas, and the continuous increase in parameter variation we see for each technology generation might also make the technique more useful. We are also not certain to what degree the processes are designed to suppress the narrow-width effect, but it is possible that significant gains may be achieved by avoiding to do so.
VI. CONCLUSION
A layout technique with multiple minimum-sized transistors has been presented. Significant reduction of parasitic capacitances can be achieved. With an equal-area design MSTs have been shown to be a possibly faster and more power efficient option than classic wide transistors.
