Abstract-The performance of a processor generally means how fast it can execute a task. For a given architecture we can measure the size of a task as the number of clock cycles it will take to execute. Then clock frequency (f ) will determine the execution time. Normally, the frequency can be raised if the supply voltage Vdd is increased. This, however, increases the power and energy used. We introduce a new measure, cycle efficiency (η) as cycles per joule that gives the rate of computational work per unit energy. Similar to f , η is also a function of Vdd. We provide a method of characterizing a processor in terms of its f and η versus Vdd characteristics. Intel Pentium M processor with an assumed 90nm CMOS PTM (predictive technology model) is used as an example. For a demonstration of performance and energy management, we consider a program that executes in 1.8 billion clock cycles. At the nominal operating supply of 1.2V we have f = 1.8GHz and η = 15 megacycles/joule. The program executes in 1 second and uses 120 joules. For operation at 0.6V, f = 277MHz and η = 70 megacycles/joule, resulting in a run time of 6.5 seconds and consumption of 25 joules. We also find a subthreshold voltage extreme of 200mV, f = 54.5MHz and η = 660 megacycles/joule. Now the program will take 33 seconds but will consume only 2.27 joules. Thus, using cycle efficiency and clock frequency one can manage the time and energy performances according to the requirements of a computing task.
Abstract-The performance of a processor generally means how fast it can execute a task. For a given architecture we can measure the size of a task as the number of clock cycles it will take to execute. Then clock frequency (f ) will determine the execution time. Normally, the frequency can be raised if the supply voltage Vdd is increased. This, however, increases the power and energy used. We introduce a new measure, cycle efficiency (η) as cycles per joule that gives the rate of computational work per unit energy. Similar to f , η is also a function of Vdd. We provide a method of characterizing a processor in terms of its f and η versus Vdd characteristics. Intel Pentium M processor with an assumed 90nm CMOS PTM (predictive technology model) is used as an example. For a demonstration of performance and energy management, we consider a program that executes in 1.8 billion clock cycles. At the nominal operating supply of 1.2V we have f = 1.8GHz and η = 15 megacycles/joule. The program executes in 1 second and uses 120 joules. For operation at 0.6V, f = 277MHz and η = 70 megacycles/joule, resulting in a run time of 6.5 seconds and consumption of 25 joules. We also find a subthreshold voltage extreme of 200mV, f = 54.5MHz and η = 660 megacycles/joule. Now the program will take 33 seconds but will consume only 2.27 joules. Thus, using cycle efficiency and clock frequency one can manage the time and energy performances according to the requirements of a computing task.
I. INTRODUCTION
Voltage scaling has been a very popular low power design methodology in the industry. Having a quadratic relationship with the power consumed, lowering the supply voltage reduces the power considerably. But this mars the overall performance of the circuit. Optimizing performance and power simultaneously requires a thorough study of the available resources and trade-offs possible. This paper looks into one of the most important areas of contemporary research in electrical and computer engineering: energy efficiency [4] , [6] , [16] , [17] . Power and Performance are two conflicting goals a designer has to achieve [11] - [13] , [15] . With a number of performance oriented devices emerging with a huge demand of power from a fixed capacity battery, using the battery wisely becomes important [9] , [10] . This paper suggests a new metric called cycle efficiency that can be considered while deciding upon the operating conditions of a processor for energy efficiency.
II. CYCLE EFFICIENCY
Performance of a processor refers to its performance in time. It is defined for a task (or program) as the inverse of the execution time [14] . Similarly, efficiency of a processor is defined [14] as the inverse of the energy consumed by the program. Thus,
We observe a similarity between the two measures. The performance can be called time efficiency and efficiency can be referred to as energy performance. In this paper, we will call them time performance and energy performance, respectively. If we regard the clock cycle as a unit of work that a processor performs, then a clock cycle means a time period 1/f , where f is the frequency in units of cycles per second or hertz (Hz). A clock cycle also means certain amount of energy or energy per cycle (EP C). We define cycle efficiency, η = 1/EP C, its unit being cycles per joule. Thus, a clock cycle means 1/f second in time and 1/η joule in energy. Consider a program being run on a processor. Suppose the program execution takes C clock cycles. Then we have,
and
where η is cycle efficiency of the processor in cycles per joule. gives the time performance of the processor as,
Similarly, Equation 2 gives the energy performance as,
Clearly, cycle efficiency (η) characterizes the energy performance in a similar way as frequency (f ) characterizes the time performance. These two performance parameters are related to each other by the power being consumed, as follows:
III. ENERGY AND DELAY FOR A TECHNOLOGY
We assume that the processor being characterized is large and a full scale gate level or transistor level model may not be available. Even if such a model was available, a detailed simulation for various voltages would be impractical for high complexity reason. However, operational data about the processor, such as voltage, maximum clock frequency and power consumption, is available. Also, the technology of the device is specified. We, therefore, characterize the technology using known and easily analyzable circuits. Then, we scale the characterization to the processor.
For a given technology, we derive energy consumption and delay as functions of supply voltage. The procedure given here can be used for any technology for which simulation models are available. For illustration, we use 90nm CMOS PTM technology [20] and the Hspice simulator [1] .
Energy per Cycle (EPC). Figure 1 shows the energy per cycle (EPC) for an eight-bit ripple carry adder circuit. This result is available from recent references [5] , [8] . The circuit was synthesized using a 90nm PTM [20] CMOS technology library and simulated with random input vectors. The vector period was set close to the critical path delay, which was determined by simulation of a vector pairs generated to activate the critical path. Both dynamic (E dyn ) and leakage (E leak ) energy components were determined using Hspice [1] . Simulation was repeated for voltage range 80mV to 1.2V. EPC in Figure 1 is E tot = E dyn + E leak and is shown by solid line. Threshold voltages are 0.29V for nMOS and −0.21V for pMOS devices. We notice that EPC is minimum at 0.17V, which a sub-threshold voltage. At this voltage the leakage and dynamic energies are equal, the circuit operates extremely slow but is most energy efficient [2] , [5] , [18] . [5] , [7] , [8] . Fig. 2 . delay and leakage current normalized to an inverter at V DD = 1.2V through Hspice simulation in PTM 90nm CMOS [5] , [7] , [8] .
Delay. For delay we used a recent result for a chain of 90nm CMOS inverters [5] , [8] . This is shown in Figure 2 where the delay is normalized with respect to the delay at 1.2V. We note that as the supply voltage drops below 1.2V, the delay (t d ) increases gradually at first and then rather rapidly as we approach the subthreshold range.
IV. CHARACTERIZATION OF INTEL PENTIUM M PROCESSOR
We characterize the Intel Pentium M processor in the 90nm CMOS technology. The exact details of its technology are not known to us, so for illustrative purpose we assume the CMOS predictive technology model (PTM) [20] as analyzed in the previous section.
Processor data is obtained from a published paper [4] . For a supply voltage V dd = 1.2V, the maximum clock rate is 1.8GHz and the chip consumes 120W. This gives an energy per cycle, EP C = 120/(1.8 × 10 9 ) = 66.67nJ. Scaling the graph of Figure 1 to read an EPC of 66.67nJ at V dd = 1.2V we get the EPC versus V dd characteristic for Pentium M processor as shown in Figure 3 . The inverse of EP C is the cycle efficiency η, which is shown in Figure 4 .
Next, we scale the delay t d versus V dd characteristic of Figure 2 for Pentium M processor so that the frequency at V dd = 1.2V is 1.8GHz. The scaled frequency (f ) versus V dd characteristic is also shown in Figure 4 .
V. MANAGING PROCESSOR POWER
Consider a program that executes in 1.8 billion clock cycles. Three scenarios are given in Table I . If we operate the processor at 1.2V, we see From Figure 4 that clock frequency f = 1.8GHz and cycle efficiency η = 15 megacycles/joule. The program is executed in one second, the processor consumes 120W of power and the total energy used is 120 joules. If we reduce the supply voltage to half, i.e., V dd = 0.6V, and run the processor at f = 277MHz as given by Figure 4 , which also gives cycle efficiency η = 70 megacycles/joule. Now the power consumption is 3.96W, but the program takes 6.5 seconds and uses 25 joules of energy. The third case uses a subthreshold voltage V dd = 200mV. From Figure 4 , f = 54.5MHz and η = 660 megacycles/joule. The processor runs at a very low power level of 83mW, but the program takes 33 seconds and uses 2.72 joules of energy. This type of slow but highly energy efficient operation has been reported by several authors [5] , [18] , [19] .
VI. CONCLUSION
We have introduced a new parameter, cycle efficiency (η), expressed as cycles per joule that is similar to the clock frequency f expressed as cycles per second. For a computing task, f is the rate of execution in time and η is the rate of execution in energy. To consider the analogy of automobiles, f is analogous to speed in miles per hour (MPH) and η is analogous to miles per gallon (MPG). We demonstrate how the operation of a processor can be characterized for f and η as functions of the supply voltage V dd . Once we set V dd based on the time and energy requirements, we let the processor run at the fastest clock frequency allowed by the structure (critical path delay). For modern technologies that have significant leakage, this is an efficient mode of operation [3] . Given that the underlying objective is to weigh the total energy cost against the execution time, the time and energy performances can be managed using these two parameters.
