Abstract-Evolving nanometer CMOS technologies provide low power, high performance and higher levels of integration but suffer from increased subthreshold leakage and excessive process variation. The present work examines the 45nm bulk and high-k technologies. We evaluate the performance of a 32-bit ripplecarry adder circuit for the entire range of supply voltages over which it displays correct function. Lowering voltage increases delay, reducing the maximum clock cycle rate. We use the maximum permissible clock rate and the energy per cycle at that clock rate as two performance criteria. The minimum energy per cycle operation occurs at a subthreshold voltage. For minimum energy, the bulk technology has a very low performance (~7 MHz). However, high-k technology works at a much higher 250 MHz clock. Faster clock rate reduces the leakage energy making high-k almost twice as energy efficient compared to bulk. The energy per cycle versus supply voltage is a U-shaped curve whose bottom, the minimum energy point, provides a stable equilibrium against speed and energy deviations due to process related parametric variations for different technologies. These deviations can be expected to be lower for high k technology compared to those circuits designed in bulk technology that are commonly in use. These deviations are also lower compared to those at higher supply voltages that are commonly in use. Although we expect the clock rate to further improve and energy per cycle to reduce for 32 nm and finer technologies, some projections indicate that energy per cycle could increase with a move towards finer technologies. However, those studies were conducted on bulk technologies and further investigation should ascertain the performance of the high-k technology.
INTRODUCTION
There is a growing concern for increased power and energy dissipation with the scaling down of transistors. The total power (P total ) dissipated in a CMOS logic gate consists of static power (P static ) and dynamic power (P dynamic ).
While the scaling down of transistors causes a reduction in dynamic energy per cycle due to reduced capacitances in the circuit, there is an increase in leakage current of the circuit due to scaling down of the threshold voltage causing a significant increase in the static power dissipation. The speed of digital circuits is currently limited by the energy density. Shrinking feature sizes will continue to have the advantage of higher degree of integration, resulting in lower cost, provided energy density can be kept in control. Another characteristic that will assume increasing significance is tolerance to larger process variation of smaller features. Hence, there is high interest in developing design techniques for power and energy efficient circuits using high leakage nanometer technologies.
The supply voltage has the strongest influence on all components of power and energy of a digital CMOS circuit. In 1971, Meindl and Swanson concluded that to obtain the greatest power saving and the least power-speed product, the circuit must be operated at the lowest supply voltage practically possible by the design technology [1] . Their calculation showed that CMOS transistors did not abruptly turn off below the threshold voltage but acted as weak inversion devices. They determined that the smallest theoretical supply voltages at which circuits could function is approximately 8kT/q ≈ 0.2V at T = 300 Kelvin, where k is the Boltzmann constant, T is absolute temperature and q is the electron charge. One technique highlighted in their paper was ion implantation of boron for adjusting the turn-on voltages for both p and n transistors, achieving an operation close to their derived theoretical limit [2] . However, because of very low performance for technologies in use at that time such low voltage operation was not adopted in practical systems.
Another approach has been to examine the energy minimization for circuits operating in the sub-threshold region. Studies have shown subthreshold operations have a number of advantages, namely, improved gain, noise margin, and greater energy efficiency at lower frequencies than the standard CMOS [3] . The authors in [4] further examine solutions for optimum supply voltage (V dd ) and threshold voltage (V t ) to minimize energy in subthreshold operations of digital circuits. It is shown that there is a maximum achievable frequency for a given circuit operating in the subthreshold region. They conclude that the current standard cell libraries also show reduced energy per operation for a minimum sized device. Dual voltage design in the subthreshold voltage range has recently been studied and shown to have energy and speed advantages [5] [6] . Similarly, subthreshold voltage operation may have advantage in extending the battery lifetime in portable and mobile electronics [7] .
Operation at 330mV supply voltage was shown successfully for test chips fabricated in 90 nm technology while obtaining energy savings on the order of 9X compared to other reduced performance scenarios [8] . Similar work has shown that optimum V dd need not occur at the lowest voltage at which the circuit functions correctly [9] . This result was quite significant as it disproved the conclusion drawn by Meindl and Swanson [1] . The reason was the increased leakage of the submicron devices.
In this work we simulate a 32 bit ripple carry adder designed in 45 nm bulk and high-k metal gate technology. By aggressive voltage scaling described in previous research [3] [4] [8] [9] [10] , we obtain the optimum V dd at which the minimum energy per cycle occurs and compare the results for both processes. We conclude that there is a significant improvement in performance when the process is changed from bulk to highk technology. The circuit modeled in high-k showed an operating frequency of 250 MHz which is a significant jump from bulk CMOS technology while retaining the advantage of low energy consumption. Further, from the nature of the energy versus V dd graph, we hypothesize that the operation at subthreshold V dd is more resilient to process variation than that at the normal V dd for both high-k and bulk technologies.
II.
CIRCUIT MODELING
Simulations were performed on a 32 bit ripple carry adder. The circuit was first designed using VHDL. The VHDL file was then imported in Leonardo Spectrum tool [11] , and synthesized in TSMC 0.18 micron model. A verilog output was generated using the same tool, and this file was then imported into the Design Architect tool [12] , which gave the schematic of the 32 bit ripple carry adder using the standard TSMC cell library.
The Design Architect tool internally generated a SPICE netlist, which was further modified by changing the width of all transistors from 0.18 μm to 45 nm while preserving the width over length (W/L) ratio. Instead of using the TSMC libraries as used by the Design Architect, we used the Predictive Technology Model (PTM) for both 45 nm bulk and high-k technologies [13] .
Circuit level simulation was conducted using HSPICE [14] and the timing and power data were obtained. For various supply voltages, we assessed the functional correctness of the circuit and determined the energy and delay characteristics.
III. SIMULATION RESULTS

A. Minimum Energy Point Estimation
A schematic of the ripple carry adder is shown in Figure 1 . To calculate the delay at each voltage, we ensured that the critical path was activated. We, therefore, applied the following vectors. First, all the inputs (A, B, and C i ) were initialized to 0. This sets all sum outputs and the carryout to value 0. In the second vector, all A inputs (A[0:31]) were set to 1. All sum outputs thus became 1. A third vector then set a 1 at C i . This activated the critical path as a carry was propagated through all 32 full adders while the sum outputs were brought back to 0. The critical path determines the frequency of vector application. This frequency changes for each voltage point.
After finding the frequency, 100 random vectors were applied to the input of the 32 bit ripple carry adder at the maximum operating frequency at that voltage point. On conducting the SPICE simulations using HSPICE, the average current consumed by the circuit was measured, and multiplied by voltage. That gave us the average power consumed by the operating circuit. Energy per cycle was determined by multiplying the average power with the delay of the circuit. All results of simulation and calculation described above have been tabulated in Tables I and II From the tables and the graph, it is evident that the high-k technology has the advantage of greater energy efficiency. In high-k technology, the minimum energy is obtained at a lower voltage than that for the bulk technology. Comparing the minimum energy operations for the two technologies we find that for high-k energy/cycle is 40% lower compared to that for the bulk technology. The minimum energy point occurs at 0.3V for both high-k and bulk technologies.
Notably, the circuit works faster in high-k technology than in bulk technology. From Tables 1 and 2 , we find the frequency of operation at the optimum energy (minimum energy/cycle) point is 250 MHz (critical path delay is 4 ns) for high-k technology while for bulk technology the corresponding frequency for minimum energy/cycle operation is just above 7 MHz (critical path delay is 137 ns).
B. Process Variation
On analyzing graphs of Figure 2 , we infer that circuits designed in 45 nm high-k technology should be more resilient to process variations because the energy-delay curve is lower when compared to circuits designed in 45 nm bulk technology and that minor changes would not cause any drastic effect on efficiency or performance. To get some preliminary evidence 45 nm bulk 45 nm high k on this theory, we assigned a 5% relative variance to the threshold parameter (vth0) in the PTM files [13] . First, we investigated how a variance on the threshold parameter would affect the critical path delays for 45 nm bulk and high-k technologies. A Monte Carlo simulation of 30 samples of the circuit was performed. Critical path delay was measured for each sample through HSPICE [14] simulation using a vector pair that activated the critical path. The means and standard deviations (σ) for the critical path delay for circuits operating at 0.3V designed in 45 nm bulk and high-k technologies are tabulated in Table III .
The corresponding sum of mean and 3σ give us the worst case delay for a circuit operating at 0.3V for each technology. This worst case delay was used as clock period to feed 100 random vectors to 30 random Monte Carlo samples of the 32 bit adder circuit and the current drawn from V dd for each sample was measured. The average current of a circuit sample was multiplied by the operating voltage to obtain the power, which when multiplied by the clock period (Table III) gave us the energy/cycle for each random sample as tabulated in Tables  IV and V . Finally, the energy/cycle for each sample circuit was normalized with respect to the ideal (without process variation as in Tables I and II ) energy/cycle of that voltage and plotted on a graph as shown in Figure 3 . From the tables and graphs, it is evident that a circuit designed in high-k technology is more resilient to process variation, has smaller critical path delay and has lower energy/cycle. The average energy/cycle deviation from the ideal (no process variation) value for 45 nm bulk is 63.76% with a peak of more than 200% while high-k has a normalized energy/cycle deviation of 25.34% with a peak of 110%.
A deviation in the threshold parameter (vth0) causes a change in the drive current and critical path delay. This change usually causes the energy/cycle to increase as current and delay are not exactly inversely proportional to each other. However, there are rare instances (in high-k) where their relationship has caused the energy/cycle to decrease from the nominal value resulting in a circuit that runs faster. ✭ Nominal operation assuming no process variation (Table I) . ✭ Nominal operation assuming no process variation (Table II) . ✭ Nominal operation assuming no process variation (Table I) . Table VI gives the average energy/cycle and the normalized energy/cycle for 30 Monte Carlo samples of the 32 bit adder circuit designed in 45 nm high-k technology operating at 0.9 V. These energy/cycle values were compared with the absolute energy/cycle values of the same sample circuits operating at 0.3V from Table V and plotted on the graph in Figure 4 . It is clearly seen that even with process variations, circuits operating at 0.3V are considerably more energy efficient than circuits operating at 0.9V. Table VII compares the average values of energy/cycle and the clock period with and without process variations for various technologies and operating voltages. Although the clock period almost doubles due to process variations for subthreshold voltages, it is clearly seen that the circuit's energy is close to the nominal energy/cycle. Since we assumed all samples to have a clock period corresponding to the worst (3σ) delay, it is possible that some circuits may be able to run faster and, for those cases, their individual energy/cycle may come closer to the nominal values or even perform better than that.
We cannot compare the normalized energy/cycle for 0.9V and 0.3V operations because due to the small values of the energy/cycle at 0.3V, even a small deviation would translate into a large percentage and hence may give the false impression that the circuit is less reliable at lower voltages. IV.
CONCLUSION
We believe our results are accurate and portray a picture of how a device will behave when fabricated in these technologies as the PTM models have shown a trend of closely following the actual fabrication trends. They have also shown better physical scalability over a wide range of process and design conditions [15] .
Recent research has shown that process variation can greatly affect the functionality of logic gates [16] . It can also bring in uncertainties in the circuit logic. Shifts in the threshold voltage V t can drastically affect the I ON and I OFF in subthreshold regions causing an exponential shift in the minimum energy point [9] .
Our results indicate that high-k technology designs at the minimum energy point will be more resilient to process variations when compared to bulk technology because high-k technologies provide a higher drive current in the sub-threshold region along with a reduction in leakage for the same drive current when compared to the bulk technology [17, 18] . We have also shown that even with process variations, circuits operating at 0.3V (sub-threshold voltages) remain more energy efficient than at 0.9V (normal operating voltages). Furthermore, to study process variations, we plan to vary the important technological parameters like threshold voltage, effective channel length, channel width, oxide thickness, etc., by means of Gaussian distributions, and then conduct simulations to get an accurate feel for the effect of process variation on the minimum energy point. The results of these studies will be published in the future.
Studies have shown that the voltage at which the minimum energy point occurs reduces with change in technology, reached a minimum at 90nm and then starts increasing with every technology advance [19] . Hence, for lower technologies, the voltage at which the minimum energy point occurs should increase. However, as these studies have been done only for bulk technologies, it is hard to predict how high-k models will behave. Simulations need to be done to check how the minimum energy point moves from 45nm high-k technology to finer high-k technologies.
The ultimate minimum energy any circuit can achieve is bounded by the Landauer limit, which is given by kTln2, where k is the Bolzmann constant and T is the absolute temperature in Kelvin. Current studies have shown that the lower bound on the energy to process one bit is about 36,000 times higher than the absolute Landauer limit [20, 21] . A shift towards high-k technology is only a small step towards achieving energy values close to that limit. However, more research and supporting experiments need to be done on finding the limits of high-k technology so that it can lead to actual implementations of digital systems like microprocessors, graphics processors, and digital signal processors.
