Abstract: This paper presented a simplified hardware architecture of the block cryptographic algorithm, HIGHT, for wireless applications like a RFID system. We have modified the original HIGHT algorithm that reduced the critical path in the key scheduler and dismissed redundant logics sharing encryption and decryption datapathes, and thereby yield a smaller silicon area. The proposed HIGHT supporting both encryption and decryption had 2,608 gates, 13% smaller than the original HIGHT design excluding decryption block. It consumes the average power 10.8 μW at 2.5 V for 100 kHz. It can be applicable to passive RFID tag without serious difficulty in size and power. Also, the maximum clock frequency of 125 MHz allows a data throughput rate of 235 Mbps that can support cryptography of high-speed multimedia data.
Introduction
The battery life and the privacy between users become critical issues as wireless devices become more popular. Recent information security addresses ubiquitous sensor network (USN) environment, which emphasizes communi-cation availability without limitation of time and location. The wireless system employing radio frequency identification (RFID) system is representative technique in USN [1] . There is a security risk in the data communication between a RFID tag and a reader. The block cipher, AES (advanced encryption standard) and HIGHT are employed to make a strong security [2, 3, 4] . Power consumption and die-size are major design concerns for cryptographic component integrated in the mobile system. M. Feldhofer et al. presented AES that has 3,400 gates and average power consumption is 4.5 μW with 100 kHz clock and 1.5 V [3] . There is another block cipher, HIGHT, which does not only consist of simple operations to be ultra-light but also has enough security as a good encryption algorithm [4] . It requires only 3,048 gates. This paper proposed a new hardware architecture of HIGHT, aiming at small size, high speed, and low power that will be useful for RFID systems. The proposed architecture shorten the length of the critical path and shares the common datapath for encryption and decryption processes. The implementation results of the proposed HIGHT show the fulfillment of mandatory requirements of a passive RFID tag application.
HIGHT algorithm and hardware architecture
A HIGHT has four major functional blocks of key schedule, initial transform, 32 iterative rounds, and the final transform as shown in Fig. 1a . The Key schedule is generating 128 sub-keys from SK 0 to SK 127 for one encryption process based on a master key as shown in Fig. 1b . The initial transform converts a plaintext to the input of the first round operation using four whitening-key bytes from W K 0 to W K 3 , The round transform performs 32 iterative round functions and then, the final transform is responsible for changing the 64-bit output of the last round function to the 64-bit ciphertext using the four whitening-key bytes from W K 4 to W K 7 . A whitening key is used to hide the input data of a round function. HIGHT has a simple structure to perform encryption and decryption. It uses simple word operations such as addition mod 2 8 , subtraction mod 2 8 , XOR, and left bitwise rotation.
There are some differences between an encryption and a decryption of the HIGHT although they have the same transformation processes as shown in Fig. 1c . The decryption process is an inverse operation of encryption. The final transform in encryption eliminates swaps among bytes in the 32nd round function. It transforms the output of the 32nd round into the ciphertext. The byte-swap for decryption is performed in the opposite direction to that in the encryption. The sequence of sub-keys for transform operations reverses to get a decrypted plaintext. Thus, the initial and final transform of the decryption replaces the final and initial transform of the encryption, respectively. The addition operation in the initial and final transforms of encryption replaces the subtraction in decryption. Also, the round function in decryption uses the subtraction instead of addition. These differences can be alleviated by unifying the same datapaths for encryption and decryption and controlling them alternately. It is possible to design more small HIGHT by sharing the common logics between encryption and decryption.
The proposed hardware architecture of HIGHT
There are two key design issues in our implementation. We shorten the critical path of the key scheduler. The output of the proposed key scheduler goes through 2-stages, MUX→adder, while the output of the key scheduler in the conventional HIGHT goes through 4-stages, MUX→g function→MUX→ register. Also, we achieve the area saving by replacing the bitwise permutation and g-function in a conventional HIGHT [4] as shown in Fig. 1b to bitwise rotation and addition operation as shown in Fig. 2a , respectively. In addition, we unify the round function both an encryption and a decryption by sharing the common datapahts in both of them. The proposed design can perform the encryption and decryption alternatively. The select signal, en de chooses the operation of encryption or decryption. Thus, these key design issues bring us gate size reducing and higher operation frequency comparing to the conventional HIGHT design [4] . The proposed HIGHT keeps operation the same with the conventional HIGHT. The initial transform uses four words of the master key. A 32-bit keys passes through the 4-to-1 multiplexer at a time. The proposed HIGHT performs the bitwise rotation for encryption and the reverse bitwise rotation for decryption alternately because it employs both rotation blocks. The key generator produces sub-keys as a master key without any change during the initial and the final transform. However, the sub-keys generated in the round transforms are the results of add operations with 32-bits constant, δ i obtained from the δ generator. The sub-key used in the initial and the final transform is generated by performing addition operations with '0', not 32-bits constant, δ i . It can save the addition of MUX logic by using the part of a master key as a sub-key without any changes. During the iterative round transforms, the sub-key can be generated by δ i obtained from LFSR h and 32-bit data among the 128 bits caused by the bitwise rotation of a master key. Each of them performs the addition in 8-bit at a time and it generates a 32-bit sub key by combining four 8 bits. Thus, the sub key used in the 32 round transform can be obtained from the 4 bitwise rotations.
We construct the δ generation circuit having two LFSR h circuits for both encryption and decryption separately as shown in Fig. 2a . δ 0 is fixed as 0x5a16, which becomes the initial state of 7-bit LFSR h. The output MUX can choose the output δ i among the output of encrypt LFSR h, δ en and the output of decrypt LFSR h, δ de according to the selection signal, en de. Each number in LFSR h represents the bit position of δ i . It is constructed with a simple wiring with four XOR operations. For performing addition operations within the key scheduler, we generate 32-bit constants by combining four 8-bit outputs. The LFSR h circuit for the decryption should operate reversely with that for the encryption.
The proposed architecture of the round transform is depicted in Fig. 2b . It reduces the circuit size of the frequently used round transform among the encryption and decryption algorithms. The round circuit can be applied to the initial and the final transform, too. The 64-bit input data can be divided into 8-bit blocks and represented as follows.
Here, let define D in 7 and D in 0 as MSB and LSB, respectively. After input data from D in 7 to D in 0 are shuffled alternately as shown in Fig. 2b , the initial transform will be started. And then, the output is transferred to the iterative round transform. The 3-to-1 MUXs in front of the round transform are responsible for untwisting the swap of the round function by selecting the input data to the round transform. There are two differences between encrypt and decrypt. The one is that the addition in encryption replaces subtraction in decryption. The other is that the byte-swap for decryption is performed the opposite direction to that in the encryption. The original master key bit pattern is recovered after the 32 rounds master key scheduling. The four words of the master key are sent to the output for the final transform. The total number of clock cycles for encryption and decryption is 34 clocks.
Performance analysis
The silicon implementation of the proposed HIGHT uses MagnaChips 0.35-μm CMOS ASIC process. We choose area optimization option at logic synthesis. We have used NanoSim tool from Synopsys and Spice parameters to obtain the power consumption. Note that the proposed HIGHT meets every mandatory requirement for a RFID tags. Due to the low-power restrictions, the clock frequency of the RFID tag must be divided from 13.56 MHz to 100 kHz. In addition, the circuit size should be under 5,000 gates and the current consumption is under 15 μA, and the encryption and decryption time is under 15 ms. Our implementation supporting both encryption and decryption process comprises 2,608 equivalent gates. It consumes only an average current of 4.3 μA when operated at 100 kHz and 2.5 V supply. It spends 680 μs for encrypting and decrypting 64-bit data. Thus, these results mean that our design is applicable to a passive RFID tag. Table I shows the comparison results with the other counterpart cipher chips. The AES [3] and the proposed HIGHT were fabricated using 0.35-μm technology. However, the conventional HIGHT [4] was fabricated using 0.25-μm technology. For fair comparison, we evaluate three types of block ciphers on 0.35-μm technology. Thus, we reestimate the performance of the conventional HIGHT on 0.35-μm technology using linear scaling method [5] .
In terms of circuit area, the AES [3] , the conventional HIGHT [4] , and the proposed HIGHT have 3,400, 3,048, and 2,608 gates, respectively. Our design achieves 13% area reduction comparing to the conventional HIGHT design even though it supports both encryption and decryption processes. It is obtained from the key scheduler optimization and unifying the round logics. In the conventional HIGHT, the key scheduler requires 1,648 gates and the round and others require 1,400 gates. However, the key scheduler in our implementation has 1,591 gates. In the key scheduler design, we added only one LF SR h for decryption and control logics, while we reduce the logic area by replacing four bitwise permutations and g-functions to two bitwise rotations and an addition operation. In addition, we prevent the area increase in the round logics by sharing the common datapath for encryption and decryption and by simplifying control logic.
The proposed HIGHT shows outstanding results in power and performance. The AES and the proposed HIGHT implementation have an average current consumption of 8.2 μA and 4.3 μA at the 100 kHz clock frequency, respectively. It notes that the current consumption of the proposed HIGHT design also outperforms twice than that of recently proposed low-resource hardware implementation of AES. In addition, the maximum throughput of the conventional HIGHT is 108 Mbps at the 57 MHz clock frequency. On the other hand, our implementation works flawlessly at 125 MHz clock, thereby the maximum performance of 235 Mbps is achieved. It is due to the shorten the critical path of the key scheduler. 
Conclusions
Portable RFID devices run under severe power limitation. This paper proposes an implementation of the ultra-light block cipher algorithm, HIGHT for a passive RFID tag. The proposed HIGHT design eliminates the redundant logics by sharing the datapath for encryption and decryption. Also, it gives more efficient key schedule design. The proposed design meets the mandatory requirements for RFID device such as power consumption, gate count, and throughput. Our design consumes the average power under 10.8 μW at 100 kHz clock frequency and 2.5 V supply. It has only 2,608 equivalent gates. In addition, it shows the maximum throughput of 235 Mbps. It can be applicable to other application such as multimedia data on a high speed network.
