Abstract-This paper describes design of high energy efficiency 32 bit parallel processor core using instructtion-levels data gating and dynamic voltage scaling (DVS) techniques. We present instruction-levels data gating technique. We can control activation and switching activity of the function units in the proposed data technique. We present instruction-levels DVS technique without using DC-DC converter and voltage scheduler controlled by the operation system. We can control powers of the function units in the proposed DVS technique. The proposed instruction-levels DVS technique has the simple architecture than complicated DVS which is DC-DC converter and voltage scheduler controlled by the operation system and a hardware implementation is very easy. But, the energy efficiency of the proposed instruction-levels DVS technique having dual-power supply is similar to the complicated DVS which is DC-DC converter and voltage scheduler controlled by the operation system. We simulate the circuit simulation for running test program using Spectra. We selected reduced power supply to 0.667 times of the supplied power supply. The energy efficiency of the proposed 32 bit parallel processor core using instruction-levels data gating and DVS techniques can improve about 88.4% than that of the 32 bit parallel processor core without using those. The designed high energy efficiency 32 bit parallel processor core can utilize as the coprocessor processing massive data at high speed.
I. INTRODUCTION
The digital SOC has the higher density, the higher performance and the more multi-function due to the convergences of the digital technology. Hot issues which are principal in the current digital SOC are flexibility, scalability and high energy efficiency. The energy efficiency is obtained by dividing the capability of a processor to process data by consumption power and can be expressed in terms of MIPS/mW or MOPS/mW.
A parallel architecture is the one of the candidates in order to satisfy both high performance and low power consumption. The parallel architecture can be categorized into a single instruction multiple data (SIMD) architecture and a multiple instruction multiple data (MIMD) architecture. SIMD and MIMD architecture make use of the data level parallelism. Recently, the parallel architecture in digital SOC is required because of the low power and the high operation computing capability. The processor core performing data processing is the core circuit in parallel processor. Parallel processor is composed of the array of the processor core. Parallel processors using SIMD or MIMD architecture are mainly used for the multimedia data processing in which the calculation performance is much required in the portable devices.
The most effective method of low power consumption drops the power supply voltage V DD . But, the delay time increased and the performance decreased by reducing power supply voltage in general. Low power consumption techniques have been proposed. Clock gating, frequency gating, and multiple power technique has been reported. The clustered voltage scaling (CVS), extended CVS, and the variable supply-voltage (VS) scheme have the multiple power supply voltages structure [1] . It is that the scalable power supply structure prefers to the constant supply structure in order to high energy efficiency [1] . When the required performance of the target system is lower than the maximum performance, power supply voltage can be dynamically reduced to the lowest possible extent that ensures a proper operation of the system by using the DVS technique [2] [3] [4] [5] [6] [7] [8] . The DVS technique has dynamically varying power supply voltage according to the workload. The power in the DVS technique transits the power supply voltage to the reduced power supply voltage dependent on workload. Therefore, the workload prediction and detection are to the essential in DVS technique. Many workload prediction algorithms for the high energy efficiency have been published [4, 5] . The reduced power supply V DDL is determined to the workload, process, and delay time. The minimum reduced power supply voltage V DDL should be scaled to approximately 0.3 times of the power supply V DD . But, the reduced power supply V DDL in reported DVS-based processors is commonly 0.5 times of the power supply V DD . And energy efficiency in published DVS-based processors can be improved about 20% to 93% using DC-DC converter and voltage scheduler controlled by the operation system [2] [3] [4] [5] [6] [7] [8] . The power-delay and energy-delay product are the very important factors in DVS technique because the speed penalties due to lowering voltage can be occurred. The reduced voltage can be selected between 1.5 times of the threshold voltage and 3 times of the threshold voltage for the optimization of the delay and performance in DVS technique and the energydelay product is minimum in the energy-power supply V DD plot when the power supply V DD is equal to the 2 times of the threshold voltage [9, 10] . The Intel XScale [6] , the IBM PowerPC [7] , and the Transmeta Crusoe [8] are commercial processors using DVS technique. The DC-DC converter and the voltage scheduler controlled by the operation system are used in the commercial processors using DVS technique.
In this paper, we describe design and circuit simulation of the high energy efficiency 32 bit parallel processor core.
We present instruction-levels data gating and DVS techniques for the high energy efficiency 32 bit parallel processor core. We design and circuit simulation the 32 bit parallel processor core using data gating and the DVS technique at the instruction levels in order to the high performance and the low power consumption. We proposed the data gating technique at the instructionlevels for low power consumption. We can select only one function unit of the function units in the processing unit according to the enable signals using the proposed data gating. The selected function unit only executes the real data according to the instructions. All input data of the non-selected function units have zero regardless of the instruction so they do not activation and switching activity. We proposed instruction-levels dual voltage-levels DVS technique without voltage fluctuations in the power rails and level converter or level shifter circuit between power supply V DD blocks and the reduced power supply V DDL blocks. The power consumption of the proposed 32 bit parallel processor core decreases using instruction-levels data gating and dual voltage-levels DVS techniques. Therefore, the energy efficiency of the proposed 32 bit parallel processor core can be maximized.
II. DESIGN OF THE PROPOSED PROCESSOR CORE USING INSTRUCTION-LEVELS DATA GATING AND DVS TECHNIQUES
We applied the data gating and DVS technique to the proposed 32 bit parallel processor core at the instructionlevels for high energy efficiency. The power consumption of the proposed 32 bit parallel processor core d decreases using data gating and the DVS techniques.
III. SIMULATION RESULTS
We simulated the power and circuit simulation for running test program using Spectra with layout extraction data which does not include PAD. We simulated power simulation with varying to the reduced voltage using Spectra. The reduced voltage can be selected between 1.5 times of the threshold voltage and 3 times of the threshold voltage for the optimization of the delay and performance in DVS technique. We used 0.4 V threshold voltage process in this paper. Fig. 4 shows the normalized power consumption of the designed 32 bit parallel processor core in condition that the power supply is 1.2 V and the reduced voltage is varied. X-axis represents reduced voltage and Y-axis represents normalized power consumption. As shown Fig.  4 , the normalized power consumption has minimum value as the reduced voltage has 0.8 V. So, we selected 0.8 V to the optimum reduce voltage in this paper.
We simulated the circuit simulation for running test program using Spectra with layout extraction data which does not include PAD. Fig. 5 shows the circuit simulation results of the designed 32 bit parallel processor core. The power supply V DD is the 1.2 V and the reduced power supply V DDL is 0.8 V. The 32 bit adder power is the reduced power supply voltage 0.8 V V DDL during 32 bit adder is non-activation. But, the 32 bit adder power is the supplied power voltage 1.2 V V DD during adder is activetion. Therefore, the 32 bit adder power transits the reduced power supply voltage 0.8 V V DDL to the supplied power voltage 1.2 V V DD according to the ADDEREN control signal. Table 1 shows the normalized energy efficiency of the designed 32 bit parallel processor core with the data gating and DVS techniques. The Case 1 represents the designed 32 bit parallel processor core with the instruction-levels Table 1 . The normalized energy efficiency of the designed 32 bit Parallel processor core with the data gating and DVS techniques. The energy efficiency of the proposed 32 bit parallel processor core using instruction-levels data gating and DVS techniques can improve about 88.4% than that of the 32 bit processing unit without using instruction-levels data gating and DVS techniques.
IV. CONCLUSIONS
This paper describes design and simulation of 32 bit parallel processor core at the instruction levels for high energy efficiency. We presented an instruction-levels data gating technique and dynamic voltage scaling technique for the high energy efficiency 32 bit parallel processor core. We proposed data gating technique at the instruction levels. We can select only one function unit of the function units in the processing unit according to the enable signals using the proposed data gating. The selected function unit only executes the real data according to the instructions. All input data of the non-selected function units have zero regardless of the instruction so they do not activation and switching activity. We proposed instruction-level DVS technique without using DC-DC converter and voltage scheduler controlled by the operation system. The power of the selected function unit is the supplied power supply voltage V DD and that of the non-selected function units are the reduced power supply voltage V DDL in the proposed DVS technique. We selected reduced power supply to 0.667 times of the supplied power supply. The power supply voltages of function unit can transits the supplied power voltage 1.2 V to the reduced power supply voltage 0.8 V according to the instructions using the dynamic voltage scaling power supply. The energy efficiency of the proposed 32 bit parallel processor core using instructionlevels data gating and DVS techniques can improve about 88.4% than that of the 32 bit parallel processor cores without using instruction-levels data gating and DVS techniques. The proposed instruction-level DVS technique has the simple architecture than complicated DVS which is DC-DC converter and voltage scheduler controlled by the operation system and a hardware implementation is very easy. But, the energy efficiency of the proposed instructionlevel DVS technique having dual-power supply is similar to the complicated DVS which is DC-DC converter and voltage scheduler controlled by the operation system.
The designed high energy efficiency 32 bit parallel processor core can utilize as the coprocessor processing massive data at high speed. The high energy efficiency multimedia chips can apply the multimedia data processing in which the calculation performance is much required in the por- 
