Abstract. Very Long Instruction Word Reconfigurable Cipher Processor (VLIW-RCP) has been developed into a new bottleneck where the contradiction between performance and power consumption is becoming fiercer. In this paper, first the power consumption analysis on instruction level of RCP based on the architecture of VLIW is made. Based on the analysis an energy-efficiency evaluation model for the VLIW-RCP is built and also the energy-efficiency evaluation method is purposed. Afterwards, the relation between instruction and energy-efficiency is explored by formula derivation and simulation experiment. As a result, this paper can be used to make a choice of improvement scheme for a new generation Green Computing Reconfigurable Cipher Processor (GC-RCP).
Introduction
As the small electronic devices are used more and more widely, such as the sensing equipment used in the Internet of Things, the smart mobile phone used in our life, green computing has played an important role of in the cipher processor field. The traditional VLIW-RCP with multiple parallel Reconfigure Cipher Units (RCUs) could pack up multiple parallelizable processing instructions to improve the rate of the instruction level parallelism (ILP). It has become the de facto standard of cipher computing processor to improve the performance of RCP. However, performance and power consumption are two opposing metrics. It is the large power consumption of each RCU that the VLIW-RCP could not improve performance by integrating more parallel RCUs or pack more parallel instructions. The role of energy-efficiency is becoming more and more important in the design of a new generation GCRCP.
An instruction-level energy model of Xeon Phi, the first product using Intel's Man Integrated Core architecture, has been purposed in Dissertation 1. This model accurately predicts dynamic power consumption with an average error rate under 5% and provides software developers opportunities to improve energy efficiency. Dr Piguet purposed the definition of the energy efficiency (E.E.) of a reconfigurable processor, which is the number of operations it processes per second when consuming 1mW [2] . This parameter can be defined by Eq. 1:
Where, NOP is the number of operations computed at each cycle and Fclk the operating frequency. Achip is the total area of the chip, CN the normalized capacitance by area unit, α the average activity, and VDD the supply voltage. However, the energy model above and Equation (1) could not be used directly on the VLIW-RCP. And it also couldn't reflect the relation between instruction and energy-efficiency.
In this paper, we analyse the power consumption per instruction (PCPI) of VLIW-RCP for an AES encryption operation simulation and build an energy-efficiency model. Through analytic models and simulation, we explore the relation between the energy-efficiency and the number of the instruction lines. The results presented in this paper show us how to make a choice of improvement scheme for the next generation GCRCP.
PCPI analyse
It is well known that CISC Instruction Sets are more energy-efficient than RISC Instruction Sets for the same microarchitecture [3] . However, the instruction set is different in the VLIW-RCP. In the VLIW-RCP, different instruction drives different source operand and causes different power consumption. In order to explore the relation between instruction and power consumption, this section makes a 100MHz, CMOS 65nm, AES encryption operation simulation experiment on the VLIW-RCP [4] . The power consumption changing is stair-stepping at the positive edge and the negative edge of each clock cycle, as is shown in Fig.1 and is stable at the others operating time. Fig. 1 (a) shows the power consumption changing in the positive edge of a clock cycle and Fig. 1 (b) shows the power consumption changing in the negative edge of a clock cycle. However, the peak is different in different clock cycle and the average power consumption is also different in different clock cycle, because the instruction is executed different in every clock cycle. We can use the following piecewise functions to fit the power consumption changing within a clock cycle sketchy. The calculation of the average power consumption P of the clock cycle is equal to 0.3157W calculated by Equation (2). According to this method, we could get the average power consumption of each clock cycle, which is equal to the average power consumption of each instruction lines, as is shown in Fig. 3 Figure 3 . The average power consumption of each clock cycle.
From Fig 3, we could find the difference between the different instructions is no more than 0.01mW. So, we could ignore the power consumption difference between the different instructions in above cases. From the above PCPI analyse of the VLIW-RCP, we barely get only one conclusion, which is "Ignoring the PCPI difference".
Energy-Efficiency Model
Processor Architecture. In order to explore the relation between the energy-efficiency and instruction, the energy-efficiency model is built in this section. Firstly, we get a view of the architecture of VLIW-RCP [4, 5] , as is shown in Figure. VLIW-RCP is a kind of processor which could replace the mapping cipher algorithm by changing the processor instruction and configuration information of the RCUs. A VLIW-RCP cipher compute unit contains 4 RCUs, and each RCU contains varieties of cipher logic units. Generally, the VLIW-RCP is a 3-stage pipeline processor for the complex data hazard. The processor is different from a 3-stage pipeline general processor where there are more ALU, called RCUs in the VLIW-RCP, so that the processor could pack more parallelizable processing instructions which could be executed in a clock cycle. VLIW-RCP Energy-Efficiency Model. However, we can regard packed instructions as an instruction lines for "Ignoring the PCPI difference". Then the VLIW-RCP is equal to a 3-stages pipeline general processor. In order to analysis energy-efficiency on instruction level, we simplify and get an energy-efficiency model from the following aspects:  Simplify the processor architecture. From the analysis in Section 2, we could regard the VLIW-RCP as a general processor with a 128bit data path, and could regard multiple packed instructions as an instruction line, which is sent form IF in a clock cycle.  Simplify the 3-stages pipeline architecture. Because we can ignore the PCPI difference, we can also ignore the power consumption difference from fetching, decoding, executing different instruction at different stage. However, we could not ignore the effect for the cycles per instruction (CPI), which could be calculated as Equation (3):
Where, Ninst is the number of the instruction lines for mapping cipher algorithm in the VLIW-RCP, Npipeline is the number of the processor's pipeline. And here, it is equal to Ninst0+2.  Simplify the complex units in the processor. We can ignore the PCPI difference, so we do not care which type the working cipher operation logic is. Based on the simple energy-efficiency model, we give some definition. Definition 1: The instruction line is the instruction pack, including an instruction or multiple packed instructions, which is sent from the VLIW-RCP's instruction fetch.
Definition 2: The number of instruction lines for the VLIW-RCP mapping a cipher algorithm is the number of instruction packs. Let be the number of instruction lines for the VLIW-RCP mapping a cipher algorithm. Let Ninst-min be the minimal number. Table 1 offers an example of AES round function, which the number of instruction Lines is 8 (The last instruction line should not be counted).
Definition 3: The operation velocity of the VLIW-RCP for a cipher algorithm is the operation times in one second. Let VOP be the operation velocity.
Definition 4: The energy-efficiency of the VLIW-RCP is the operation velocity of a cipher algorithm when consuming 1mW. Let E.E.VLIW-RCP be the energy-efficiency of the VLIW-RCP. 
Energy-Efficiency analysis
Based Definition 4, we could get the formula of the energy-efficiency, as is shown as Equation (4):
Among Equation (4), Ncrypt is the number of operating times of the VLIW-RCP mapping a cipher algorithm over a little period of time, and tcrypt is the length of the working time of the processor, Pcrypt the power consumption of the VLIW-RCP.
Based Definition 3, we get Equation (5) as follow.
The working time of the VLIW-RCP could express as Equation (6).
And the energy-efficiency will be 0 . .
clk VLIW RCP inst crypt
From Equation (7), we could find out the based relation between the number of instruction lines and the energy-efficiency. But strictly, it is not an absolute direct ratio. As we know, the power consumption is equal to the sum of dynamic power consumption (Pd) and static power consumption (Ps). Dynamic power consumption consists mostly of the switching power, given in Equation (8):
Where, α is the activity factor, C and VDD are capacitance and voltage. Traditionally, dynamic power consumption is always far greater than static power consumption in a CMOS integrated circuit, but not in all cases, but not for deep sub-micron technology. Then the energy-efficiency will be 2 0
The activity factor can heavily depend on what cipher algorithm is mapped in the VLIW-RCP. The capacitance and voltage also do not depend on the VLIW-RCP design. From Equation (9), we could infer that the two influence factors of energy-efficiency under the VLIW-RCP designer control are the number of instruction lines and the operation frequency. Improving the frequency could only reduce the effect of static power consumption. Comparing with it, reducing the number of instruction lines may be more pivotal and necessary for improving the energy-efficiency of VLIW-RCP. 
Computational examples and analysis
In order to prove the validity of the inference above, this section offers an experiment, which is based on the VLIW-RCP architecture of Dissertation 3. The experiment scheme has been done as follow: Test parameter: Energy-efficiency, power consumption; From the experiment result, as is shown in Fig. 6 , we could prove the inference in Section 4, which is that improving operation frequency and reducing the number of instruction line could improve the VLIW-RCP energy-efficiency. And we could also prove that reducing the number of instruction is more pivotal than improving operation frequency to improve energy-efficiency.
Summary
In the proposed energy-efficiency evaluation method, we could explore the relation between instruction and energy-efficiency. We could get a conclusion for the VLIW-RCP's user that if we want to improve the energy-efficiency of the VLIW-RCP, we should reduce the number of instruction lines by cutting down some useless instruction lines, compressing some sparse instructions into single or several instruction lines if they could be executed together, and reduce the times of using the loop-based program. We should improve the processing capacity of each instruction under the operation frequency remaining the same and improve the number of parallelizable cipher logic and the utilization radio of a RCU in one clock cycle if we want to design a new generation GC-RCP.
