This letter proposes highly efficient MISTY1 8-rounds pipelined architecture for wireless networks. A novel methodology is adopted for implementation of MISTY1 substitution functions by optimizing S9 and S7 LUTs (Look-Up Tables) to minimize the silicon area. Besides, a key module FI function is compliant to double edge-trigger the optimized S9 LUTs. This leads to substantial reduction in the pipeline requirements for the proposed hardware architecture. For path delay reduction, logic modifications are made in FI and FO functions realizing efficient and high-speed MISTY1 implementation. FPGA implementation on Xilinx FPGA, Virtex 7 xc7vx690t yielded a throughput value of 16.3 Gbps covering area of 1265 CLB slices.
Introduction
MISTY1 is a NESSIE approved 64-bit block cipher developed by Mitsubishi Electric [1] . Standardized by ISO/IEC, MISTY1 falls in a third security level called as "normal-legacy" designed for smaller data blocks of 64-bits or less e.g. payments with 8-byte passwords. It is proven to be secure against "Linear and Differential Cryptanalysis" having probability value of 2^-56. Therefore, MISTY1 block cipher is widely used for wireless sensor networks, mobile communications, online transactions and ATMs.
The design and optimization of cryptographic algorithms have been studied in detail keeping in view the application requirements for low area, high speed or achieving a trade-off between area and speed [2, 3, 4, 5, 6, 7, 8, 9, 10, 11] . For low area design, the commonly adopted methods include re-utilization of logic blocks and s-boxes optimization [2, 3, 4, 5, 6, 7] . The area-efficient implementation techniques have widely been adapted to a feistel-like MISTY1 structure using a single FI/FO function for embedded applications [2, 3, 4] . The compact MISTY1, however are highly unsuitable for high speed applications having low throughput values. Contrary to area-efficient design schemes, encryption algorithms including AES, KASUMI, CAMELLIA and MISTY1 employ RAMs/LUTs/combinational logic to substitute s-boxes using pipe-lined architecture for high speed implementations [7, 8, 9, 10, 11] . It is found that the non-optimized high-speed architectures implementing straight-forward pipelines require large area thus reducing the efficiency [7, 11] . In this regard, our study is mainly focused on the efficient implementation of MISTY1 having salient features as under:
• Area optimization of S9 and S7 s-boxes.
• Efficient implementation of FI function by employing a double edge-triggered technique in the data path of S9 substitution function. • Design and implementation of MISTY1 architecture with logic modifications in FI and FO functions for reduction in path delay.
Optimized S9 and S7 s-boxes
A comprehensive analysis on MISTY1 s-boxes revealed that the algebraic expressions of S9 and S7 can be decomposed into branched LUT structure such that each expression y i of S9 and S7 is formulated as 5-bit, 4-bit or 3-bit input LUTs as described in Table I . The output y i is obtained by 'XORING' the LUTs output as given by eq. (1).
The primary advantage of transforming S9/S7 mathematical expressions into 3Â LUTs is that it does not affect the path delay of FI function (described in detail in Section 3). Moreover, by reducing the depth using maximum 5-bit input LUTs, the hardware area is reduced considerably. Table II shows the area reduction of 48.39% with the proposed LUTs as compared to 9-bit and 7-bit LUTs for S9 and S7 respectively mentioned in MISTY1 specifications [1] . Fig. 1a and 1b respectively. It is evident that the proposed FI function is executed on a single clock cycle triggering adjacent (upper and lower) S9 LUTs on positive and negative clock-edges respectively. This methodology differs from old implementations consisting of only positive-edge triggered pipe-lines thereby requiring multiple clock cycles [7, 11] . Furthermore, KI IJR XOR is performed after zero extension (Z) for path delay reduction [3] . In order to maintain logic equivalency, KI IJR XOR is also performed on the right most 7-bits after S7 LUTs execution. The symbol 'T' in Figs. 1a and 1b denotes the truncate operation of 2Â MSB bits. Thus, the optimized S9/S7 LUTs in concatenation with KI IJR XOR modification results in an efficient FI function implementation. The path delay of FI function can be expressed as eq. (2): A throughput value of 16.3 Gbps was obtained with CLB slices of 1265 achieving efficiency of 12.9 Mbps/slices. The remarkable results are the outcome of decomposed s-boxes and FI/FO function optimizations with fine placement of pipe lines. We also evaluated our circuit design with AES, KASUMI, and CAMELLIA and found that our design has the 2 nd highest efficiency. Furthermore, we implemented ref. [7] (i.e. MISTY1 architecture claimed as the most efficient) with our FPGA device under the same environment for fair comparison. The parametric values of area, throughput and efficiency were obtained as 2920 CLB Slices, 21.9 Gbps and 7.5 Mbps/slices respectively thus proving our proposed design to be highly efficient and the 3 rd fastest MISTY1 architecture till date.
Conclusion
This letter presents MISTY1 pipe-lined architecture characterizing efficient implementation. A double edge-triggered methodology employing optimized LUTs for S9/S7 enabled a single clock cycle operation of FI function thereby reducing the area. The logic modifications for path delay reduction resulted in high throughput implementation of MISTY1. A highly efficient MISTY1 architecture is well-suited for wireless networks and mobile computing.
