Abstract -Encryption algorithms are becoming more necessary to ensure data is securely transmitted over insecure communication channels. IDEA NXT is a recently developed symmetric block algorithm and its structure is based on the already proven IDEA (International Data Encryption Algorithm) cipher. Its top-level structure uses the Lai-Massey scheme and the round functions used in the scheme are substitution permutation networks (SPN). Its flexibility lies in the fact that it can be efficiently implemented in hardware and software. Implementation of cryptographic algorithms has recently attracted researchers in terms of power usage. For various applications, minimising the consumed power has become an important challenge for cryptographers. With the advent of side channel attacks, this task has become even more difficult because a programmer must take into account countermeasures against such attacks, which often increase computations, and hence power consumption. We report some of the first results of implementing the cipher on an FPGA and on a low power microcontroller.
I INTRODUCTION
Over the last number of decades there has been rapid development in communication technology and in computer processing power, which in turn has led to a huge increase in data passing between companies, individuals and organizations. Encryption algorithms are used to provide the secure transmission of this data across insecure channels, such as telephone lines and ISDN, while maintaining its integrity. Attacks on transmitted data by unwanted third parties exploit weakly designed algorithms and these attacks can be either passive (listening on transmissions) or active (modifying the transmitted data).
In today's world of cryptographic applications, one of the most precious commodities is power. Memory is an obvious important issue for applications involving Smart Cards but applications involving security applications and contact free smart cards have recently attracted researchers in terms of power consumption. Power becomes even more important when cryptographic solutions are integrated with communications. Device can only operate as long as its battery maintains power.
In communications the original message before encryption is referred to as the plaintext (P ) and is usually of a fixed length. Encrypting a message requires the algorithm to operate upon the plaintext using an encryption key (Ke) to produce the ciphertext (C ). After transmission the original plaintext (P ) is reconstructed at the receiving side from the ciphertext using the decryption algorithm and a decryption key (Kd ). Ke and Kd are pseudo-random data patterns that are generated by a separate key-scheduling algorithm. In the case of a private-key cipher Ke and Kd are the same and are known only to the sender and receiver. For public-key ciphers different keys are used for encryption and decryption processes [1] . The research presented in this paper focuses on a low power implementation in hardware and software of the new private-key block cipher IDEA NXT. Various implementation techniques are presented and the hardware and software implementations are compared in terms of power consumption. Finally we propose a hardware-software codesign approach for ultra low-power implementation of such cryptographic solutions.
The paper is organized as follows. Section 2 overviews IDEA NXT's construction and discusses its high-level structure and the primitive components used in its round functions. Section 3 explains the finite field arithmetic used in the round functions. In section 4 the implementation aspects are discussed in detail for both hardware and software and finally the results are discussed in section 5.
II IDEA NXT Algorithm
IDEA NXT is a symmetric block cipher that was developed to have a high security level and large implementation possibilities [2] . Its top level scheme is based on the Lai-Massey scheme which is the same used for the IDEA cipher. Previous ciphers such as DES, Triple DES and Blowfish were based on the Feistel scheme. This scheme proved that if the round functions are random, then a 3-round Feistel cipher will look random to any chosen plaintext attack. For the Lai-Massey scheme it was proved that a similar result could be obtained if an orthomorphism function was added [3] . The orthomorphism used is a Feistel scheme with an identity function as its round function as shown in figure 4 .
Most modern ciphers are based on Claude Shannon's principles of diffusion and confusion. Confusion is the obscuring of the relationship between the plaintext and the ciphertext and can be achieved through substitution. Substitution for IDEA NXT occurs in the SPN in the form of non-linear 8-bit to 8-bit mappings using constructs called s-boxes. However confusion is not enough to provide security for a cipher and must be coupled with diffusion. To achieve diffusion, patterns There are four members of the IDEA NXT family as shown in table 1. NXT64 and NXT128 are the generic members while NXT64/k /r and NXT128/k /r are variants of the cipher in which the key size k can be of length 0 to 256 bits and r, the number of encryption/decryption rounds, can be any where from 12 to 255 iterations. The standard number of rounds used in the generic versions is sixteen but twelve is the minimum for acceptable security levels.
NXT64 and its variant are based on the top-level Lai-Massey scheme shown in figure 2. This scheme 
III Finite Field Arithmetic
Multiplication of the matrix used for diffusion is performed using finite field arithmetic [4] . The field used is GF(2 m ). In this case m is 8 which gives a field containing 256 byte elements. Operations in a finite field are beneficial to a hardware design because the result of all calculations of the field will be represented using a constant number of bits (for our implementation it will be 8 bits). Elements can be represented as polynomials with the highest degree of (m-1) with coefficients in GF (2) .
Multiplication in GF(2 8 ) corresponds to the multiplication of polynomials modulo an irreducible polynomial of degree 8. In IDEA NXT the irreducible polynomial is
IV Architectures a) FPGA Implementations
Hardware designs of NXT64 and NXT128 were implemented for both the Spartan-IIE and Virtex-II Pro FPGAs. Differently designed architectures were used for both FPGAs with the emphasis on a low power consumption. One of the keys to achieving this is to reduce the amount of switching in a design. This was achieved by keeping the number of registers as low as possible and having a predominately combinational circuit. A one-shot purely combinational architecture was designed for the Virtex-II Pro as shown in figure 5 . This minimises power consumption as no clock is used apart from initially loading data, which reduces the switching. The Spartan-IIE design integrates block RAM to reduce the design space. One design entity was used for both the encryption and decryption process as both have the same structure except for differing orthomorphism functions. An encrypt/decrypt signal was used to select the required orthomorphism depending on the process. Again the same entity can be used for the final round of both versions by selecting no orthomorphism function.
The block data and sixteen round-keys were serially shifted into all design entities. The roundkeys were held in on-chip registers and provided for each round. Decryption round keys were shifted in reverse order to those for encryption. Serially shifting in data reduces the I/O pin count causing lower power usage [5] .
For the Virtex-II Pro design a Mastrovito multiplier was used, which is easily implemented using a network of AND gates and XOR gates [4] . A different approach for this was used in the Spartan-IIE architecture that removed all galois field combinational logic and some sbox logic by using RAM lookup tables as shown in figure 6. As arithmetic in GF (2 8 ) occurs at byte level these look-up tables contain all possible 256 results of an 8-bit value multiplied by the constant values of the diffusion matrix [2] . The Spartan-IIE has 64k-bits of RAM that is divided into sixteen 4k-bit dual-port blocks. These blocks can be configured to specific port widths. To suit this design a configuration using 8-bit wide address buses and 16-bit wide data buses was used as shown in figure 7 . This allowed the look-up table to be initialised with the results of multiplication in GF(2 8 ) by two matrix elements, where the variable multiplicand is the value of the address bus. The RAM control lines were set so that only read operations are allowed. The values for the tables were taken from the software design which uses fully precomputed tables. Output results from the RAM blocks are given as follows, result0 = X 0 (8).α : X 0 (8).c result1 = X 1 (8).α : X 1 (8).c result2 = X 2 (8).α : X 2 (8).c result3 = X 3 (8).α : X 3 (8).c (2) and the 4x4 matrix multiplication results are deduced by,
The NXT128 design uses the same method for its 8x8 matrix multiplication and uses a total of 12 RAM blocks. Because of this both IDEA NXT64 and IDEA NXT128 can be implemented on the same Spartan-IIE within the constraints of RAM availability.
b) Software Implementations
In this section, several issues about the implementation of IDEA NXT64 on low-end 8-bit architectures are discussed. These were done on an Atmel Atmega 128L microcontroller. Some issues regarding the trade-offs between memory usage and power consumed are also presented. Two implementations were done, one memory efficient and the other power efficient for minimizing memory and power respectively.
Various software implementation techniques were worked on for minimizing power consumption. Based on the observation that different instructions of a processor cost different amount of energy, three main power saving techniques were looked into: 1. Assigning live variables to registers 2. Avoiding repetitive address computations 3. Minimizing memory accesses Algorithm design and implementation techniques can also affect power consumption by a huge amount. Some of the aspects which were taken into account while deciding the implementation are: 1. Recursive versus iterative 2. Look-up table versus computational approach 3. Data Types Most of the software implementation architecture is the same as the hardware architecture except for a few techniques which are mentioned below:
The most intensive computations are related to the evaluation of the sbox mapping and of the MU function (figure 3) implementation. The developers of IDEA NXT presented three implementation strategies which were all implemented and tested (A,B and D in table 2). Finally a mixstrategy of strategy B and D was implemented in which sbox mapping and final equations for function mapping of multiplications and division by α are precomputed and implemented rather than saving the results of mapping the functions. This resulted in increased speed and decreased memory requirements while making no changes in power consumed. Four Strategies having different implementations in terms of precomputed data are presented in Table 2 .
Strategy PreComputation
A none B sbox mapping sbox and C equations for α-mapping D sbox and alpha Table 2 : Strategies for Implementations Strategy A would be suitable for implementations with lowest memory usage. It requires sbox implementation, which involves lot of computations and hence increases consumed power much more than power consumed during memory accesses for sbox calculation in Strategy B. Hence, in terms of power efficient implementation Strategy B is more efficient that A. Still, Strategy B requires MU computations which involve complicated computations for multiplication and division by α. Strategy D requires memory accesses at every stage of the MU computation and hence increases power consumption by a significant amount. Hence, a strategy is required in which memory accesses (Strategy D) and computations (Strategy B) can be simultaneously decreased which leads to Strategy C.
In Strategy C, for implementation of MU computations, we precomputed the final equations for multiplication and division by α (equation 4) so as to get rid of Galois Field arithmetic computations. Hence α-mapping could be done without memory accesses or complicated computations. This increases the computation speed and makes it possible to implement the function without actually saving any of the data, which in turn, decreases the power consumption. We propose that Strategy C would be best for ultra low-power and ultra-fast implementations of IDEA NXT family. If x i is 'i'th bit of the multiplicand, the equations for multiplication and division by α are given by:
Most of our results are based on Strategy C in which sbox mapping was precomputed and stored in RAM, which decreased our computation time, device usage and power consumption. It is worth mentioning that MU operation doesn't require saving the data and hence it was better to implement them without memory usage. This would in turn increase the computation time but we still go better with power consumption and memory requirements.
The implementation of the sigma/mu layer requires just these computations:
With these precomputed equations, the gate count decreased by half and instructions were decreased by 80%. This decreased power consumption since every instruction set in C consumes some power. The final gate count taking all the serial and parallel implementations into account was:
Step The hardware utilisation is much less in the Spartan design because it is only designed for one round and would require sixteen clock cycles to fully encrypt/decrypt the data once data and round keys have been loaded. The one-shot Virtex design has a large design space because of sixteen combinational rounds and needs only one clock cycle to complete the process. This design would not fit on the Spartan device. Implementing the lookup tables in block RAM reduces the amount of distributed RAM (LUT's) used in the Spartan designs. NXT128 designs are also much larger than the NXT64 counterparts which is due to double the data being computed.
b) Software Results
The results for both memory efficient and power efficient implementation are presented. Table 5 clearly shows that the memory usage can be decreased by 45% by using a memory efficient coding. Table 6 clearly shows that such memory efficient implementation increases the power consumed due to repetitive updating of the registers. It's also clear from Table 5 and Table 6 that power consumption can be decreased upto 80% using low power implementation techniques. Although software encryption is becoming more prevalent today, hardware is still the embodiment of choice for military and computationally intensive commercial applications [1] . The main advantage of using hardware is speed as these algorithms use a lot of processing power for computation. Hardware is also quite secure in relation to software. An algorithm would need to be embedded deep in an operating system to be secure in software whereas hardware can be manufactured in tamper-proof devices. Software has the disadvantage of being slower but its benefits include portability and flexibility. We propose to implement IDEA NXT as a hardware-software co-design for best results in terms of power consumption. Most of the computational part can done through software implementation and look-up tables and memory access can be done using hardware. This type of implementation is suitable for cryptographic applications where power is a more important issue than speed. The memory usage will be the same for hardware, software or a co-design. For speed based applications, partioning should have the computational part on hardware while for power based applications it should have it on software.
This means that in future cryptographic implementations partioning is going to play the most important role and will enable applications specific implementations. This will give less flexibility to the users but more application driven benefits can be achieved through such implementations.
VI Conclusion
Aspects which are crucial to a good cryptographic implementation are power consumption, memory requirements, performance, speed, cost and reusability. A fully parameterised hardware cryptographic design can save both time and cost when implemented correctly while a good software implementation can save power, time and memory. In this paper different architectures are examined and implemented for the generic versions of IDEA NXT. Some of the preliminary results of low-power hardware and software implementation are presented. The results show that IDEA NXT can be successfully implemented for low power cryptographic applications and the inherited flexibility is a great advantage for ultra low power hardwaresoftware co-design implementation.
Thanks are due to the National Access Program (NAP) provided by the Tyndall National Institute, and to Science Foundation Ireland (SFI) for the funding provided to help carry out this research.
