Energy-Aware Opcode Design

Abstract

Abstract-Embedded processors are required to achieve high performance while running on batteries. Thus, they must exploit all the possible means available to reduce energy consumption while not sacrificing performance. In this work, one technique to reduce energy is explored to intelligently design the instructionopcodes of a processor based on a target-workload. The optimization is done using a heuristic that not-only minimizes switching between adjacent instructions, but also simplifies the decoding to reduce latches to save dynamic energy. On average, an optimized opcode is able to be decoded using 40-60% less latches in the decoder. In addition, it is shown that a decoder optimized for algorithms that had similar program structure, similar data-types or similar behavior exhibited consistent patterns of energy reduction. The techniques presented in this paper yield an average 10% reduction in the total dynamic energy. It is also shown that this heuristic can be used to achieve similar results on different issue-width processors. I. MOTIVATION Embedded devices are required to perform several complex tasks that were once attempted by high-performance systems One solution is to take a general-purpose processor and customize it for an embedded system [1]. These embedded processors are simpler than their high-performance counterparts and require significant assistance from the compiler for scheduling, branch-handling etc. However, unlike high-performance systems, wide-availability of compilers, assemblers, and other utilities are limited The first logical step for designing (or choosing) such processors is to define the target application. This ONE target application represents the main workload of this processor. It is generally a good assumption that this target application is one of the most frequently executed applications in this system. If this one target application is able to be run at high performance while consuming less energy, then the overall system energy is reduced. The main concentration of this work is to provide a heuristic for intelligent-design of the instruction opcodes for an embedded processor using one application as the target (or training application). The new-opcode configuration is created by analyzing the code-generator and reducing switching among the adjacent instructions occurring in the target application. The opcodes are designed such that frequently occurring instructions are decoded easily, which reduces the internal decoder power. Unlike previous work, which requires the superset of all benchmarks to be run on the processor to gain any power/energy reduction ( [9] [29]), we prove that one benchmark is enough to provide a significant amount of energy reduction. In addition, we show that an energyefficient opcode-design can reduce energy in the decoder and other stages of the pipelined processor. Finally, we show the effects of processor issue-width scaling on the overall power reduction using this methodology. For this work, the compiler is selected and designed before the processor. Using this design approach, the constraints imposed by the compiler (as shown in section 2) is known ahead of time, and the processor can be designed accordingly. The paper is organized as follows. The related works are explained in section 2. Section 3 gives a brief introduction of a Retargetable code-generator. The experimental framework and the benchmark-set are explained in section 4. Section 5 explains the project methodology. The discussion of results is given in section 6 and the paper is concluded in section 7. II. RELATED WORK Several works have been proposed for power and energy reduction using intelligent opcode-design. To our knowledge, the only work that closely resembles ours is by Benini et al. Cheng and Tyso

    Similar works