We propose a logic gate leakage model based on transistor stacks, which includes local transistor level process variation parameters along with global process variation parameters and supply and temperature. The stack models include both subthreshold as well as gate leakage and consider the input vector state. We examine cells from an industrial standard cell library and find that most cells can be modeled with simple stacks, which have a linear chain of transistors. However some gates like XOR, Majority or Muxes need complex stacks and we show how these can be modeled. Our experiments show that only 18 different stack models are needed to predict the leakage of all gates in this industrial library. Re-use of the same models for pass transistor logic circuits and multi-finger transistors is also demonstrated. We explicitly include voltage and temperature into the models to support joint estimation of power supply IR drops and leakage currents, as well as enable analysis for dynamic voltage scaling applications.
Voltage and Temperature Scalable Logic Cell Leakage Models Considering Local Variations Based On Transistor Stacks

I. INTRODUCTION
L
EAKAGE power is approximately 50% of the total active power in the 90nm technology node [1] and is expected to remain a significant fraction of the total active power in scaled technologies [2] . Hence it important to accurately estimate the total leakage current of a digital circuit, not only to estimate the chip's power, but also to properly design the power grid of the chip in order to ensure that the performance targets are met [3] . Hence it has become imperative to accurately model leakage currents for digital circuits.
Leakage currents depend exponentially on certain process and environment parameters, like effective gate length, L e , oxide thickness, T ox , threshold voltage V T H and supply and temperature. Hence even small variations in these show up as a large variation in the leakage current, with Borkar et.al. in [4] reporting up to 20× variations in the leakage of manufactured chips.
Process parameter variations consist of both die-to-die and within-die variations. Die-todie (or inter-die or global) variations affect all transistors within a die in the same way and are modeled as a single random variable which takes on the same value for each gate in the chip. Within-die (or intra-die or local) variations affect different gates within the same chip differently. These can have two components: a spatially correlated component and a random component. For spatially correlated variations, gates within a small region get affected identically. This effect is modeled by breaking the die into many regions, with one random variable per region [5] . The random variable takes on the same value for all the gates within the region.
Random local variations affect each gate independently. In fact their origin can be traced to random fluctuations within each transistor's gate length (LER), oxide thickness (OTV) and threshold voltage due to random dopant fluctuations (RDF) [6] , [7] . Atomistic simulations in [7] show that these fluctuations become quite significant in technologies below 35nm, with standard deviation of threshold voltage reaching 100mV for a nominally sized transistor in a 9nm process node. Hence an accurate analysis in 45nm and lower process nodes needs to incorporate the effect of these random local variations. Recent work [8] , [9] has shown that lithography inaccuracies result in random non rectangular gate(NRG) structures which introduce further variations in the threshold voltage of the transistors. They also suggest efficient techniques to model the effects of these atomistic variations in circuit simulators like SPICE.
Existing leakage analysis frameworks use a simple model for a logic gates leakage consisting of an exponential of a linear or quadratic polynomial in the parameters (length, oxide thickness, threshold voltage) [5] , [10] , [11] . However, these models use a single random variable per gate to model the impact of local variations, ignoring the fact that random local variations due to LER, OTV and RDF are a transistor level phenomenon and need to be considered per transistor within the gate. Lumping local variations at the gate level works well for spatially correlated local variations, but leads to significant errors in the presence large random local variations. Hence we examine the inclusion of these per-transistor local variation parameters in the gate level leakage models. The main challenge is to handle the large number of parameters to model the leakage of a gate.
Leakage estimation considering power supply and temperature variations requires leakage models to include supply and temperature [12] . Existing leakage models explicitly model the dependence of leakage on either the process (global and spatially correlated local parameters) or the supply and temperature for a given set of process parameters. We propose a leakage model based on artificial neural networks (ANN) to capture the dependence of leakage on global, spatially correlated local, and random per-transistor process parameters like L e , V T H and T OX as well as explicitly include the supply voltage and temperature. This will enable voltage and temperature aware statistical leakage analysis. ANNs ability to model large amounts of non-linearity enables one to use these models in process nodes less than 65nm where the random local variability is expected to increase significantly [7] . ANNs can also handle large ranges in supply and temperature which enables analysis of chips that implement dynamic voltage scaling(DVS).
Accurate treatment of input vector dependence requires a separate model per input vector of the gate. However as proposed in [5] , we only need to model transistor stacks which can then be reused across many gates. We extend the work of [5] by covering a larger variety of logic gate structures which exist in a standard cell library like pass-gate based cells, multifinger transistors etc. and show how stack based models suffice to model the entire library.
We further show how transistor gate tunneling leakage currents can also be included in these stack models.
The large number of parameters introduced due to the consideration of random local variations at a per-transistor level, can be mitigated somewhat by including the parameters of only the OFF transistors of a stack along the lines of [13] . However we improve upon the accuracy of their approach by incorporating the transistor variation parameters of even weakly inverted transistors, which were considered as ON and ignored in [13] . We validate our models with SPICE and demonstrate a few applications of voltage and temperature scalable statistical leakage analysis on some benchmark ISCAS'85 circuits.
The rest of the paper is organized as follows: Section II reviews the existing leakage models. Section III describes how modeling leakage through stacks can be used for leakage characterization of different kinds of gates. Following that we explain how leakage through different gates is obtained with these stacks. We then present, in section IV, a neural network based leakage current model for stacks followed by various gate level results obtained when we tested these stack models on different gates comparing them with accurate SPICE simulations across a range of voltage, temperature and technology nodes. We then demonstrate the accuracy of our model, in section VI at a circuit level by using it to predict the mean and standard deviation of leakage of the ISCAS'85 benchmark circuits and comparing with SPICE and conclude in section VII.
II. LEAKAGE MODELS -REVIEW
The leakage(I v LEAK ) of a logic gate for an input vector v can be formally expressed as: [5] , [10] , [11] , [15] , [16] propose exponential polynomial models for the subthreshold leakage which includes the effect of local variations at the logic gate level. For example Rao et. al. in [15] express leakage of a gate as
Here, (L g , V g ) are zero mean random variables representing the global offset in length and threshold voltage and (L l , V l ) represent the random local offsets at the gate level. While the Consider a NAND4 gate with its input set to 0. We did two different SPICE simulations:-One with gate level local variation model(GLV) and the other with transistor level local variation model(TLV). On the lines of [7] we lumped the impact of LER, OTV and RDF into the V T H parameter and varied ( 3σ µ ) of local variations from 5% to 20% and measured the mean leakage in each case. Shown in Fig 1 is the mean leakage obtained from the two methods. As can be seen from the graph the GLV model used in all existing leakage SLA frameworks is pretty accurate as long as the 3σ µ of local variations is less than 10%. However, mean leakage error increases as local variations increase which will be the case with future technologies. Extrapolating from Fig 1 we see that the mean leakage can be over estimated by as much as 30-40% by the GLV model as compared to the actual TLV model. An intuitive explanation for why the GLV model fails for increased local variations is as follows. On a transistor stack, the weakest transistor determines the leakage. Hence, leakage through the stack can increase only if the threshold voltage of all transistors on the stack decrease. Due to the correlation assumption, the GLV model allows this with high probability while in reality it occurs with very low probability(since the probabilities multiply). The error introduced by the GLV model decreases with the number of transistors on the stack and is identical to the TLV model for an inverter since there is only one transistor on the stack.
Recently Gu et. al. in [17] have derived an expression for the effective threshold voltage of parallel stack of transistors considering the effect of RDF. They consider the case of "wide"transistors implemented as multiple fingers and derive accurate expressions for the PDF of the leakage with variations introduced by RDF. While this model works for a parallel stack, extending this to a series stack is not straight forward due to complicated nature of the stacking effect [13] . In this paper we over come this problem by explicitly modeling the dependence of leakage of a stack on the process parameters of each transistor. The exact technique of modeling will be explained in section IV.
Su et. al. in [12] propose iterative techniques to solve for leakage currents in the presence of IR drops. A similar iterative procedure is followed for the temperature profile as well. Such a solution requires temperature and voltage to be explicitly included in the leakage model.
The authors propose the model in Equation 3 for small deviations in voltage and temperature from the nominal. Since the leakage depends exponentially on the voltage and temperature, this equation will be valid for only very small deviations in either of them. We explore the feasibility of handling much larger ranges, simultaneously including global and random local process parameters, to support DVS applications.
As shown in Table I , the leakage current of a logic gate can change by an order of magnitude depending upon the input vector. Hence separate models are required for different input vectors. However by modeling leakage currents through transistor stacks [13] , [18] , the number of required models can be reduced drastically, by reusing the same models across different gates by appropriate scaling [5] .
However, there are no details presented in [5] on the impact of DIBL and the V T H drop across the top most NMOS transistor on the stack, as well as how to handle pass-gate structures and multi-finger transistors.
Consideration of per transistor local parameter variation in the model will increase the model complexity, which can be mitigated by only considering OFF transistors in a stack as in [13] . Here they propose an analytical expression for subthreshold leakage through transistor stacks for the case without process variations. They reduce the model complexity by only considering the OFF transistors in the stack. They consider a transistor to be OFF, only if its gate input is 0 (for NMOS and V DD for PMOS). However even an NMOS trying to pass a logic high(V DD ) with its gate connected to V DD is weakly inverted and hence OFF if its source rises to V DD − V T H Thus even such transistor's local process parameter variations needs to be explicitly considered in the model. In this paper, we adopt the approach of [13] to model only OFF transistors in the series stack, but expand the model to consider global and local parameter variations of each OFF transistor in the stack, including even the weakly inverted ones. We will first explain the concepts for simple stacks using a 4-input NAND gate as an example. We will then address complex stacks for gates like Majority gates as well as pass transistor logic. We will defer the discussion on multi-finger transistors to the appendix.
A. Simple Stacks -Sub threshold leakage
Consider a four input NAND gate shown in Fig 2. We will examine how its leakage can be predicted with elementary stacks across all input vectors by examining the effect of each vector in detail.
1) Input vector(0000):
In this state all four NMOS transistors are turned OFF while all four PMOS transistors are turned ON and are as good as short circuits [13] . Thus to predict the probability density function(PDF) of the leakage of a NAND4 gate for the input vector 0000 we need to model a four transistor NMOS stack, which we will refer to as n4/0. The "0"in the model name refers to the input vector applied to the stack. In our notation, the transistor closest to the output is the LSB(least significant bit) and the one closest to V DD /gnd is the MSB. 
ef f the scaling factor is
. It is well known that the leakage of a transistor is directly proportional to its effective width. With nominal process settings if the leakage of a four transistor stack is I (4) nom and that of a three transistor stack in I
nom then the scaling factor(
Note that the same n3/0 model will be used to predict the leakage of a NAND3 gate with input 000 as well with the difference being in the scaling factor which is unity. NMOS stack and the NAND4 gate we can scale the currents predicted the n2/0 model to predict the NAND4 leakage for these input vector combinations. Here again note that the same n2/0 model will be used to predict the leakage for the NAND3 gate with input 100 and NAND2 gate with input 00.
5) Input vectors (1001 / 0101 / 0011):
Just as in the 0001 case we observe that the top most transistor is turned off even though its gate is connected to V DD . Hence we have to have a model n3/1 with appropriate scaling to predict the leakage for this combination.
6) Input vectors (0111 / 1011 / 1101):
The leakage current is dominated by a single transistor whose gate is connected to ground. We would expect the model n2/1 to predict the leakage for these input combinations. But unfortunately the DIBL effect on the transistor whose gate is grounded is different in all three cases and hence there is a significant deviation in the PDF predicted by the n2/1 model. As can be seen from dominates the leakage, let the drain source voltage across N2 be V DS2 . Similarly the leakage in the 1101 case is dominated by N3 and let the drain source voltage across N3 be V DS3 . In the 0111 case all four transistors are OFF and the V DD drop is across all four transistors. In the 1011 case the bottom transistor, N1, is ON and is essentially a short circuit and hence the V DD drop is across three transistors (N2, N3 and N4), similarly, in the 1101 case the V DD drop is across N3 and N4 alone. Hence V DS1 < V DS2 < V DS3 , which implies that the effect of DIBL increases as the OFF transistor moves up the stack i.e. I 0111 < I 1011 < I 1101 as can be seen from Table I . We, therefore, need a separate model for each of these three input vectors.
7) Input vectors (1110 / 1111):
With the input set to 1110 the bottom three transistors(N1, N2 and N3) can be considered as short circuits and hence the leakage will be predicted by an n1/0 model with appropriate scaling. Note that the voltage drop across N1 -N3 can be pretty significant and can introduce some error but this is not too much of a problem and will be dealt with in greater detail in the results section. With the input set to 1111, N1-N4 are all turned ON and are static transistors. Hence the leakage is determined by the parallel PMOS stack which can be predicted with a p1/1 model.
B. Simple Stacks -Gate Leakage
As explained above we have identified a set of stacks needed for accurate sub-threshold leakage modeling and we now examine how these stacks can be made use of for accurate gate leakage estimation. The crucial difference between sub-threshold leakage and gate leakage is that for the former the ON transistors don't leak any extra current as they are in series with the other OFF transistors in the stack while for the latter ON transistors are in parallel and hence their gate leakage also have to be considered. Thus we can express the gate leakage through any logic gate as the sum of the gate leakages through the stacks needed for subthreshold leakage estimation and the gate leakages through other ON transistors. However an ON transistor can have different source/drain node potentials depending on is location within the stack and hence may have significantly different gate leakages depending on its position in the stack. We will examine a few input vector combinations of the NAND4 gate to explain this.
1) Input vector(0000/0001):
The sub-threshold stacks are n4/0 and n4/1 respectively. While generating the leakage numbers for the sub-threshold leakage current, we can also generate the gate leakage numbers for these stacks. However we need to account for the additional gate leakage of the PMOS transistors which are completely turned ON (Fig 3(D) ). In the 0000
case we have all four PMOS transistors contributing to the gate leakage while in the 0001 case only three PMOS transistors contribute to the gate leakage.
2) Input vector(1000/ 0100/ 0010):
The sub-threshold stack is n3/0. Thus the total gate leakage is the sum of three components
• The leakage of the the n3/0 stack, scaled by the ratio of effective widths
• The gate leakage of the 3 PMOS transistors which are completely turned ON
• The gate leakage of the NMOS transistors which is completely turned ON.
In the 1000 case, N1 is completely turned ON and the voltage at its drain and source is very nearly 0. Also the width of N1 is four times that of the unit width transistor thus the gate leakage of N1 is obtained by scaling the gate leakage of the unit transistor in Fig 3(B) by the ratio of effective widths. In the 0100 case the first two components of gate leakage are the same as the 1000 case. However, in the 0100 case, the NMOS transistor that is turned ON is N2 and its drain/source potential is greater than 0. Apart from scaling the leakage of the unit transistor in Fig 3(B) by the ratio of effective widths we also have to scale it so as to account for the difference in the drain/source voltage also.
Yang et. al. in [18] make approximate estimates of these node voltages and use the sensitivities to the drain/source potential to obtain the leakage current accordingly. They make such estimates for each transistor on the stack. From the above discussion gate leakage estimation can be summarized as
• Identify the sub-threshold stack and estimate the gate leakage for that stack
• Other transistors in the in the logic gate that have significant gate leakage have to be one of the four configurations in Fig 3. • Scale the gate leakage of other transistors by the necessary scaling factor to account for 
C. Simple Stacks in a Standard Cell Library
From the above discussions it is clear that we need to identify the suitable sub-threshold leakage stack for a gate for a given input vector. The gate leakage model for such a stack can then be easily obtained as explained above and added to the sub-threshold leakage model.
We can formulate the following rules to identify the right stack
• An NMOS/PMOS transistor trying to pass a logic 0/1 is completely turned on and hence can be treated as a short circuit [13] • An NMOS/PMOS transistor trying to pass a logic 1/0 turns off as soon as its source potential increases to (V DD − V T H )/(V T H ) and hence is turned off even though its gate is connected to (V DD /GN D)
• In an N transistor NMOS/PMOS stack, when there is exactly ONE NMOS/PMOS transistor whose gate is connected to GN D/V DD i.e. the gates of other N − 1 transistors are connected to V DD /GN D, the leakage will increase significantly as this transistor moves towards the output due to DIBL [19] .
• When the leakage through a logic gate is predicted using a stack, the leakage obtained from the stack model has to be scaled by the ratio of effective widths of the transistors in the gate and those on the stack
• Leakage due to parallel stacks simply add up Using the above mentioned stack rules we built models for the 18 commonly found stacks listed in the first column of Table II . These stack models are the most basic stacks and are widely used. We assume that the standard cell library does not have gates with stack size greater than 4 as the increased logical effort will affect the delay of the circuit.
The naming convention for the commonly used stacks are as follows.
All model names in Table II are of the form, {Stack type}{Stack size}/{Input to the stack}
• Stack type indicates if it is an NMOS stack or a PMOS stack -n for NMOS and p for PMOS
• Stack size is the number of transistors on the stack
• Input is the decimal value of the input vector being applied to the stack with the LSB of the input vector applied to the transistor closest to the output
Here the widths of the transistors on the stack are the unit inverters transistor widths which is then scaled by the stack size
Simple stacks are not sufficient to model all the gates in the library. We will next consider modeling of complex stacks.
D. Complex stacks -Majority Gate
Consider an example of a three input majority gate shown in Fig 4. As will be explained below, apart from the stacks listed in Table II , we need to model four more stack configurations to handle such gates. The leakage of the majority gate in Fig 4 is determined by three different stacks, whose currents we denote as I 1 , I 2 , I 3 , and the total leakage is given by the sum of these three currents. Let us examine the input vectors that result in the use of these different stacks.
1) Input vectors (000/111):
When the input is 000/111 PMOS/NMOS transistors {P1-P5}/{N1-N5} are completely turned on. N6/P6 is also turned on as the output of the majority gate is 0/1. I 2 is determined by the leakage through the stack {N4,N5}/{P4,P5} and can be estimated using the {n2/0}/{p2/3} model. Similarly I 3 can be estimated using the {p1/1}/{n1/0}
model. I 1 is primarily determined by the stack {N1, N2, N3}/{P1,P2,P3}. The stack formed by {N1 N2 N3}/{P1, P2, P3} cannot be approximated by any of the stack models listed in Table II because of the parallel combination of N1/P1 and N3/P3 in series with N2/P2. Thus we need two more models.
2) Input vectors (001/110):
The only difference between the previous input vector case and this case is that the input to C is 1/0. Thus I 2 and I 3 are identical to the previous case. I 1 is now determined by the stack {N1 N2 and N3}/{P1, P2, P3} with the input to N2/P2 being 1/0.
As mentioned earlier, N2/P2 cannot be treated as short circuits as NMOS/PMOS transistors cannot pass 1/0 fully. This again requires two more stack models to model these two states.
3) Other Input vectors: Any of the other four input vectors will result in a set of stacks listed in Table II . Similar to the explanation given earlier for the NAND4 gate we can deduce which of the models in Table II are needed for each input vector.
The above example illustrates two facts
• The stacks that need to be modeled are library dependent
• When a new CMOS gate is added in the library, even if the present set of stacks is not sufficient to capture the leakage across all inputs, we can identify the new stack pattern and still model the leakage accurately while re-using the existing models for most of the other input vectors.
Similar to the majority gate, we had to model one more PMOS stack for the two input XOR gate(implemented as a CMOS circuit) to accurately predict the leakage when the input vector is 0. For the inverters, 2-4 input NAND/NOR and AND/OR gates we could use the stacks models listed in Table II to predict the leakage across all their input states.
E. Complex Stacks for Pass transistor logic -Multiplexer
All multiplexers and flip flops are implemented with pass transistors. We look at the example of a two to one multiplexer(two inputs and one select input) and investigate the subthreshold leakage paths from V DD to gnd for each of the eight input vectors(two inputs and one select line). The schematic of a 2:1 mux is shown in Fig 5. As we can see the output is the same as the input(A) when the select line(S) is 0 and the output is B when the select line is 1.
We observe from Also note that the leakage through the inverters can be estimated with the n1/0 model or the p1/1 model depending on whether the input to it is 0 or 1 respectively.
1) S=0, A=0, B=0:
We see that the pass gate (pass1) is turned on and is a short circuit.
Clearly the potential difference across the pass2 is zero and hence there is no additional leakage through pass2. Thus the total leakage for this input combination is just the sum of leakage currents through the four inverters which can be predicted with the n1/0 and the p1/0 models.
2) S=0, A=0, B=1:
Here again pass1 is turned on and is a short circuit and
However B b is at ground potential. The potential difference across pass2 is V DD which will result in additional sub-threshold leakage current. The question we need to answer is, which stack model will appropriately model the leakage through the pass transistor. Shown in Fig 6 is the transistor equivalent of the multiplexer with the inputs S=0, A=0 and B=1. The inverter for the select line is not shown for brevity but the leakage through this inverter is independent of the other circuitry. For this input combination, the transistors that are turned on completely, and hence are as good as short circuits, are circled in Fig 6. Note that the leakage current that flows through the pass transistor, pass2, has to originate at V DD , flow through P1 then through pass1 followed by pass2 then through N2 and finally to ground. Note that at the node Y b we assume that the current coming from pass1 does not flow into the gates of transistors P3 and N3 but instead flows through pass2 completely. This is a valid assumption as subthreshold leakage current is the sum of sub-threshold leakages of isolated NMOS and PMOS transistors which is much greater than the gate leakage. With this explanation we can easily see that there are four leakage paths from V DD to ground for the circuit shown in Fig 6. • Through N1 of the inverter for the input A
• Through P2 of the inverter for the input B
• Through P3 of the inverter at the output
• Through the pass transistor pass2
The leakage of the mux will also include the leakage through the select line inverter which is not shown in the figure. The leakage through the pass transistor can be further broken down into a parallel NMOS transistor leakage and a PMOS transistor leakage. Thus we see that even though we have to consider the leakage through the pass transistor we did not have to model a new stack. Thus the elementary stack models are re-used even in such pass transistor based logic gates. The above discussion can be summarized as follows
1) The voltage levels at both ends of a pass transistor in gate are either at V DD or ground.
This implies that both nodes of a pass transistor have low resistance paths to V DD or ground.
2) Sub-threshold leakage current can flow through a pass transistor only if one end is at V DD and the other is at ground Similar explanations can be offered for other input vector combinations as well.
Among the different types of gates, we have analyzed all but one namely the multi finger transistor gates. Analyzing multi finger gates requires some empirical results and hence will be discussed in the appendix. We now look at how leakage of a logic gate can be derived in terms of the leakage through the elementary stacks.
F. Deriving leakage of a logic gate from stack leakage
All static CMOS gates can be broken down into elementary stacks and hence characterization of all gates in the standard cell library will only involve mapping of their different leakage states to the corresponding stacks that cause the leakage. Thus, the leakage currents predicted by the elementary stack models are analogous to basis vectors in linear algebra. By scanning the entire library and modeling the different stacks that appear in it, leakage current through any gate, for a given input vector can then be written as linear weighted sum of the currents through these stacks. If M is the total number of unique stack models present in the library, S
SU B j
and S
GAT E j
are the sub-threshold and gate leakages through the j th stack, the leakage through the i th gate in a circuit for a given input vector v can be written as 
Substituting Eqn. 4 we get
Where, α In the next section we investigate the feasibility of using neural networks to model the leakage through these stacks. The details have been explained with sub-threshold leakage as example. The same explanations hold for gate leakage as well.
IV. LEAKAGE MODELING -NEURAL NETWORKS
Existing leakage models [10] , [11] , [15] , [22] capture the effect of process on leakage for a given supply voltage (V ) and temperature (T ). These fit the log of the leakage to a polynomial -usually a quadratic [10] . However such an approach doesn't work very well when all the cross terms between the process parameters are considered. Incorporating (V ) and (T ) into the model to support large ranges in these, further enhances the difficulty of getting simple polynomials to work. Instead we propose to model the log of the leakage
, V DD , T )) using an artificial neural network (ANN). These are well known to fit highly non-linear but continuous functions well [23] . For example, Beyene in [24] used an ANN to model various characteristics of the received waveform at the end of a lossy interconnect wire. They show that the ANN is a good choice to model the amplitude of the received waveform, which is a highly non-linear function, and simple polynomial models fail to accurately capture the dependence on the process parameters.
A. Structure of the ANN
The ANN generally has an input and output layer along one or more hidden layers [23] . The number of hidden units in each hidden layer is also a design choice and we shall investigate how we can choose these. It is well know from the theory of neural networks [23] that an ANN with a single hidden layer can approximate any functional continuous mapping from one finite dimensional space to another, provided the number of hidden units is sufficiently large. The structure of a feed forward ANN with a single hidden layer is shown in Fig 7. There are N I inputs and N H hidden units in the hidden layer. The output z j of each hidden layer is given by
Where w ij is the weight from the i th input to the j th hidden layer which has a bias w 0j . The function φ(.) is called the activation function. In this paper we have chosen the tanh() as the activation function. The output y of the ANN is given by The choice of the number of hidden units in the hidden layer is not as straight forward and is iteratively obtained in the training process.
B. Training and Testing the ANN
The weights of the ANN w ij and α j of the ANN are to be learnt by providing appropriate training samples to the ANN. It is well known that random sampling [23] of the input space is a good technique to obtain training samples. Once the network is trained we need to test it with samples that were not presented to the ANN while training. This is known as generalization. The exact algorithm to train and test an ANN to model log of the leakage through a stack is as follows(in brackets we have indicated values used in this work). (0.6V -1.2V) is divided in steps of 0.4mV. In step 3 we randomly pair these uniformly generated points in order to make sure that the training set contains as many different V,T pairs as possible. The testing set has a similar but smaller number of such points which in our case was 500. The reason we have to consider so many points is the extreme non linearity introduced by temperature.
Step 6 creates the training and testing output data set through SPICE simulations. In step 7 we normalize both the input and output data set in order to improve the training. In step 8 we train the ANN with the Levenberg Marquardt algorithm as given in [23] . We used the Neural Network tool box provided by MATLAB for this purpose. We only had to choose the initialization parameters and train the network. The standard practice in Neural Networks is to train the network until the mean square error of one epoch is below the chosen threshold [23] . We then validate the trained network with a disjoint testing data set in step 9 and compute the maximum percentage error between the actual SPICE value and the one predicted by our trained ANN. If the error is above the chosen threshold, which in our case was 15%, we have to increase the number of hidden nodes and re-train the network. However, we found that we did not have to do this retraining even once. With this theory of stacks and neural networks we will now look at the results in detail.
V. GATE LEVEL RESULTS
A. Gates with simple stacks 1) Sub-threshold leakage in 130nm:
We trained the ANN to model leakage through the simple stacks listed in column 1 of Table II . We then used them to predict the PDF of the different gates.
Results for the basic NAND-NOR gates are listed in Table III For each gate in col 1 of Table III we have listed the mean and standard deviation error, compared to SPICE, for the input vector with minimum error and the one with maximum Similarly the error is maximum for the NAND3 and NAND2 gates when the n1/0 model is used to predict their leakage. Since the number of transistors on the stack progressively decrease from NAND4 to NAND2 the error drops from 20% for a NAND4 gate to 10% for a NAND2 gate. Similar results are observed for NOR gates as well. However, the average mean error and average standard deviation error, for a gate, across all input vectors, are less than 2% and 5% respectively.
The results presented till now were compared with SPICE simulations with a 130nm industrial model file in which gate leakage is not very prominent. We now look at the performance with an industrial 130nm model. However we present a few sub-threshold leakage results on a 45nm PTM as well for the sake of completeness. We modeled the 2-4 transistor NMOS/PMOS stacks and used them to predict the sub-threshold leakage of NAND and NOR gates. The results are summarized in Table IV . As can be seen from the table the mean error and standard deviation error is quite negligible even in 45 nm technology.
3) Gate Leakage -45nm PTM:
We modeled the gate leakage of all simple stacks with the 45nm predictive technology model and used them to predict the gate leakage of a NAND4 gate across all input vectors and compared them with accurate SPICE simulations. The results are summarized in Table V . As can be seen from Table V 
B. Gates with complex stacks
Optimized transistor level implementations of complex gates lead to new stack patterns. as discussed in section III-D. There we considered two such gates, the two input XOR shown Table II . The stack models needed for the majority gate were explained in section III-D.
On similar lines we see that the XOR gate needs one new stack model. When the binary input of the XOR gate is set to 11, we see that the {P2, P3, P4, P5} stack in Fig 8 determines the leakage. Note that this configuration of PMOS transistors cannot be broken down into two separate 2 transistor PMOS stacks since the drains of P3 and P5 are shorted. Thus by modeling this complex PMOS stack separately we can estimate the leakage of the XOR gate for the input 11 accurately. 
C. Voltage and Temperature Scalability
Results in Table III 
E. Pass Transistor Logic
We used the elementary stack models in Table II to predict the PDF of the leakage of each of the eight leakage states of the 2:1 mux. The results are summarized in Table VII . We see from the table that the stack approximation works well across all inputs with the maximum error in mean being less than 0.6% and that in standard deviation being 0.61%. This explanation can be carried over to the mux based flops as well since flops are generally implemented as multiplexers with the output fed back to one of the inputs.
F. Results from a industrial standard cell library
We examined all the cells for an industrial standard cell library in the 130nm process. We needed to use just 18 Simple stacks listed in Table II 
VI. CIRCUIT LEVEL SIMULATION RESULTS
A. Benchmark Results
Having modeled gates accurately we now see how the model performs in terms of speed and accuracy with large circuits. We tested our models with the ISCAS'85 benchmark circuits shows that the maximum error is at a temperature of 90 0 C where the mean error is less than 5% and standard deviations error is even lesser. Note that the results in Table VIII are reported at a supply voltage of 0.9V while the graph in Fig 11 was generated at 1.2V . These two results together show how our model is both voltage and temperature scalable.
B. Impact Of Local Variations
In the previous section we validated our model thoroughly by comparing each result with SPICE. In this section we will use the model directly to investigate the impact of local variations. Modeling local variations introduces the problem of considering the effect of many independent random variables. Doing a worst case corner analysis based on using extreme values for these variables will yield very pessimistic leakage current numbers which will be statistically unlikely as these are independent variables. Table IX lists 
C. Neural Network Complexity
There are two kinds of complexity associated with an ANN. One is training complexity and the other is run time complexity. Training is a one time issue and hence has less impact.
We found that amongst all the models presented in this work, the maximum training time was less than 70 seconds. Run time complexity details can be inferred from Table II . Table II shows the details of the trained neural network for the common stacks. The table lists • The N I component vector is normalized by dividing them with the input normalization coefficients
• The input to each hidden unit is evaluated using simple matrix multiplication
• The output of each hidden unit is evaluated as tanh() of its input as per Equation ( 7) • The output of the ANN is evaluated as a weighted sum of the output of each hidden unit as per Equation ( 8) • The output obtained in the previous step is multiplied by the output normalization coeffi-cient
• Since we are modeling log of the leakage we have to take an exponential of the value obtained in the previous step to get the actual leakage current
From We will next consider stack approximations for other multi finger gates. Consider the two transistor NMOS stack with four fingers shown in Fig A-2 . We have three ways of approximating the leakage of this four finger stack.
• Neglect the fact that the drains of N1, N3, While using neural networks to model stacks we said the model has been successfully trained if the maximum sample error is less than some threshold(15% in our case). This means that across all testing samples used, the maximum error was less than 15%. By approximating the 4 fingers into two independent 2 finger stacks we obviously cannot approximate the However, we justify the approximation as the probability of the answer being erroneous by more than the specified threshold is very low.
However, evaluating this probability for a given threshold is not an easy task since the functional form of the actual leakage is extremely complicated. Instead we look at the trend of the probability of error, which we denote as ρ, for different stack approximations through simulations to pick the right stack combination. We trained the neural network to model 2 transistor NMOS stack with two fingers and three fingers respectively. We then considered a 7 finger NMOS stack with two transistors and measured the leakage for 5000 SPICE simulations to obtain the actual values. We then looked at different partitions of stack with 1-3 fingers to predict its leakage. As we move to the right we keep decreasing the number of n2/0 models and use more of the n2f2/0 models. As we can see clearly, all three errors(∆µ, ∆σ, ρ) decrease as the number of n2/0 models are reduced. This behavior is expected. As mentioned before, by neglecting the fact that the drains of the lower transistors are shorted we are treating the currents through each stack as independent when they are actually not. By considering more n2/0 models we are treating more currents as independent which introduces more error as shown in the graph. Thus it is clear that we need appropriate two finger stack models.
Now the question is whether we need a three finger model as well.
From the graph we see that the last set of bars has least error. This is the case when the seven finger stack is partioned as two two finger stacks(n2f2/0) and one three finger stack (n2f3/0). We must note that the saving in error compared to the previous partition (3 two finger stacks(n2f2/0) and one single finger stack(n2/0))
is not too much. The mean error drops from 2.55% to 1.80%. Standard dev error drops from 3.12% to 2.61% and the probability of an incorrect prediction drops from 0.9% to 0.24%.
Thus the choice we need to make is whether we want to model the three finger versions of the stacks in Table II 
