In the design of low power systems, it is important to analyze and optimize both the hardware and the software component of the system. To evaluate the software component of the system, a good instruction-level energy model is essential. In this paper we present a methodology for instruction level modelling of microcontrollers using gate level power estimation tools. We use the microcontroller, M68HC11, to illustrate this method. We study two different implementations of the microcontroller and show that the energy consumption of each instruction is quite different. Our study reveals that data correlation does not significantly affect the energy consumption of most instructions. Finally, we show the correctness of this model by running some sample programs and showing that the predicted energy estimates are quite close to the actual estimates.
INTRODUCTION
In order to design a system for low power andlor embedded computing applications, it is important to analyze and optimize power in all the components of the system. An ever increasing portion of the functionality of today's system is in the form of software. Thus along with the power cost of the hardware component, it is important to estimate the power cost of the software component. In order to systematically analyze the power cost of the software component, the power cost of the individual instructions have to be estimated. Clearly, a good instruction-level energy model is essential t o evaluate s o f t w a r e in terms o f the power metric and also help search the design space for low power software implementations.
There are many advantages in developing an instructionlevel power model [l] - [3] . First, it provides a way of assigning a power cost to the software component of a system and helps in verifying if the overall system meets the This work was supported by the Center for Low Power Electronics and by UDSL, Motorola Inc.
Dinesh Gaitonde
Monterey Design Systems 894 Ross Drive, Suite 200 Sunnyvale, CA 94089-1443 gaitonde 0 mondes.com specified power budget. Secondly, it can be directly used by compilers and code generators to generate code targetted towards low power. Thirdly, it can guide higher level design decisions such as hardware-software partitioning. Finally, it helps in providing a meaningful comparison of the power consumption of different processors (vs a single average power consumption figure).
The instruction level power analysis technique was first developed at Princeton University [I]- [3] . The technique is based on measuring the current drawn by the processor as it repeatedly executes certain instructions. Power models for the Intel 486DX2, the Fujitsu SPARClite 934 and the Fujitsu DSP processor have been developed using this method. The current measurement method was also used in [4] to develop a power predictor model for the TI TMS320C5x processor. The method is very accurate and uses linear regression with instruction and architectural level variables (such as the average bit switching activity in the instruction register, address busses, etc.) as predictors. A variation of the current measurement method that used a digitizing oscilloscope to measure instantaneous power was used in [5] to develop a power model for the JF and HD implementations of the i960 family. The study in [5] showed that the variation in power consumption across assembly instructions is of no statistical significance. This is in disagreement with the results in [ 11-[4] which show that while similar instructions consume similar power, the variation across all instructions is significant. Our conclusion is that the result in [5] is specific to the Intel i960 family and cannot be generalized. Complex processors, in general, tend to have less variation in instruction costs compared to smaller DSP processors because of the dominance of the overhead costs as pointed out in [3] .
The other related work in high level power analysis techniques assign power costs to architectural modules such as datapath units, control units and memory units [6] , [7] . The power cost is the estimated average capacitance that would switch when the given module is activated. Since the activity factors are obtained from functional simulation over typical input streams, such a technique takes a long time to evaluate the power consumption of the software component. In contrast, once the instruction level model is developed, evaluating the power consumption of even large programs is very time-efficient.
In this paper we present an instruction level power analysis technique based on gate level power estimation. While this method requires access to the gate level (or at least the RT level) description, it allows us to estimate the energy early on in the design process (vs after the processor has been shipped out), thereby making it possible to study the design space trade-offs for low power implementation. We use a popular Motorola microcontroller, M68HC11, to illustrate our method. We study two different implementations of the microcontroller. We find that the energy consumption of each instruction is quite different for the two implementations. A study of these differences can be used to resynthesize parts of the design for low power applications. We also study the effect of data correlation on the energy consumption of the instructions. Our study shows that data correlation does not significantly affect the energy consumption of most instructions. Finally we use the energy estimates o f the instructions to predict the energy consumption of a few sample programs. The predicted values are quite close to the actual values.
PROPOSED MODEL
In this paper we present a methodology for instruction level modeling of microcontrollers using gate level power estimation. We use a popular Motorola microcontroller -M68HC11 [8] to investigate the feasibility of using a gate level power estimation tool to characterize the instruction set of this micrcontroller for power. The HCl l microcontroller is used only as an example. The techniques used in this paper can be extended to any microcontroller.
Given the behavioral description of the microcontroller design, high level synthesis tools can be used to transform it into an RT level implementation. The RT level implementation can then be synthesized using a commercial synthesis engine such as Synopsys. A gate level power estimation tool can then be used to estimate the energy consumption of each instruction. In our setup, we used the high level synthesis tool Matisse ', the synthesis tool Synopsys and the gate level estimator ASPEN 2 .
In our model, the base cost of an instruction was modelled in the following way. The base cost of simple instructions, such as the load instruction, was modelled by simply executing such instructions 1000 times and computing the average. The data values used in each instance of the in-'Matisse is a high level synthesis tool developed at Motorola. 2ASPEN is a gate level simulator developed at Motorola. struction were either random or correlated (where the degree of correlation could be specified). The entire program resided in an on-chip memory (and thus the base cost did not model the cost of an external memory access). Most instructions could not be modelled by themselves and had to be modelled in conjunction with some other instruction such as the load instruction. Consider the case when instruction X could be modelled by itself and instruction Y had to be modelled with instruction X. In both cases, ASPEN was used to generate the average power for 1000 runs. Let PX be the average power for instruction X and PX+Y be the average power for instruction X followed by Y . Since we know the exact time that it takes to execute a single instruction X , t x , and the time that it takes to execute instruction X followed by Y , tx+y, the energy of instruction Y , E y , can be calculated in a straight-forward way.
The assumption while computing the base cost is that there is no overhead in executing instruction Y after instruction X. Thus E y = tX+y * Px+y -t x * Px. The interinstruction effects have not been modelled explicitly. Experimental results were used to derive an average cost for inter-instruction effects.
RESULTS
In this section we describe the results of implementing the experimental setup on two different implementations of the HC11 microcontroller. Since the synthesizable HC11 was generated from a behavioral description of the design, one could generate several candidate synthesizable implementations of the HC l l. Both these implementations were optimized for area -one more than the other.
We have categorized the instructions into the following classes: (i) Loads, Stores and Transfers, (ii) Arithmetic Operations, (iii) Multiply and Divide, (iv) Logical Operations (v) Shifts and Rotates, (vi) Stack and Index Register Operations, (vii) Condition Code Register Instructions and (viii) Branches. The details of function of each instruction can be obtained from [8] . We studied the effects of data correlation on the energy consumption values. Two sets of correlated data were generated. Mildly correlated data corresponds to cor(lO), while medium correlated data corresponds to cor(50). The results have been tabulated in Table I . We found that the average energy consumption values do not change much when the correlation of the data is increased. This implies that there is not much data dependency on the power consumed by each instruction. Consequently, it is sufficient to work with power consumption values of random data during the implementation evaluation phase. For the branch instructions, the instruction was modelled differently, depending on whether the branch was successful or not. Next we illustrate the difference in the power consumption values for the two architectures. Even though the software (core) of the two implementations is identical, the hardware realizations are significantly different. Consequently, the energy consumption values are also different. An analysis of the differences in the energy consumption values should aid in future implementations of HC11 that would be targetted for low power. Table I lists the the % difference in the energy values for random data of the two implementations. In this sample set of instructions, Implementation 2 has a lower power consumption for the following instructions: LDAA, INCA, DECA, ORAA, ASLA, ASLD, INX.
Sample Program
The accuracy of the instruction energy estimates was checked by running a few sample programs. In each case, we compared the actual energy consumption values (calculated by ASPEN) with those predicted using the estimates derived using the power model developed in Section 2. The predicted values were always within 12% of the actual values.
Program 1: Computing the MAXIMUM of 5 numbers.
The data is loaded in locations 0000-0004. For input data 73,36,24,82,49 (where 73 is loaded in location 0000,36 in 0001, etc), the average energy over 100 runs using ASPEN is 4.066 nJ. The estimated energy using the instruction level power model is 4.017 nJ. The estimated energy is 1.2 % lower than the actual energy.
Program 2:
Computing the running sum yi = E,"=, xi+j,
Random data is loaded in locations 0000-0006. The average energy over 100 runs using ASPEN is 20.0056 nJ. The estimated energy using the proposed instruction level power model is 20.0958 nJ. The estimated value is 0.45% higher than the actual value.
0 5 i 5 4 -unrolling.
In this example, we simply unrolled Program 2 to generate this program. The average energy over 100 runs using ASPEN is 9.022 nJ. The estimated energy using the proposed model is 8.988 nJ. Thus the estimated value is 3.77% lower than the actual value. An interesting point to note is that the unrolled program (unrolling factor 5) consumes only 45% of the energy consumed by the program with the loop (Program 2). This is because of the large overhead in manipulating loops. Instructions with index register X such as LDAA(ind,X), STAA(ind,X) etc do extra addition operations, thereby increasing the power consumption. Thus for low power applications, loop unrolling should be done as much as possible. Program 4: Sorting 5 numbers using bubble sort. The numbers are loaded in locations 0000 through 0004. For input data 19, 23, 35, 57 and 89, where 19 is stored in location 0000,23 in location 0001, etc., the average energy using ASPEN over 50 runs is 19.757 nJ. The estimated energy using the power model is 22.07 nJ. So, for this data set, the estimated energy is 1 1.71 % higher than the actual energy. For a different set of input data, the energy values were closer to the actual estimates. If the experiment had been run on a large number of input data sets, we anticipate that the average energy would have been closer to the estimated energy.
CONCLUSIONS
In this paper we have described a method based on gate level power estimation to develop an instruction level energy model for microcontrollers. We applied our technique to develop an instruction level energy model for the M68HC 1 1 microcontroller. This model was used to successfully predict the energy consumption of a few sample programs. Our study also showed that (i) data correlation does not affect the energy consumption of most instructions and (ii) the same instruction incurs a different cost for different gate-level implementations (even though the software core is the same).
The accuracy of our model can be significantly increased by taking into account the effect of average switching in the data address bus, the program address bus and the instruction register as in [4] . Our next step is to extend this method to develop instruction level models for more complex processors where the effect of pipeline stalls, size of register files, cache misses, U 0 accesses etc. would be significant. 
