I. INTRODUCTION
The low complexity security algorithm and its hardware implementation become essential to protect information over WPAN (Wireless Personal Area Network) and WBAN (Wireless Body Area Network) utilized in various areas. The IEEE 802.15.4 standard, one of the well-known standards for WPAN system, provides the security service based on the AES-CCM (Advanced Encryption Standard -Counter with CBC-MAC). Also, in the IEEE 802.15.6, the first international WBAN standard, AES-CCM is adopted in order to guarantee information security.
The AES-CCM consists of the CBC-MAC (Cipher Block Chaining-Message Authentication Code) mode for the MAC calculation and the CTR (Counter) mode for data encryption with the AES (Advanced Encryption Standard) algorithm. As the AES core is dominant hardware element of the AES-CCM implementation, the hardware optimization of the AES core plays a key role to reduce the complexity of AES-CCM security engine.
In this paper, the area-optimized AES-CCM security engine is proposed for IEEE 802.15.4 / 802.15.6 systems where highly limited hardware resources are available for low-cost implementation. To lower the complexity of AES core implementation, which is used as cryptographic primitive for AES-CCM, the folding technique is exploited for small gate count and the S-box without the use of memory is also presented. In addition, we reduce the complexity of the multi-standard AES-CCM security engine with the toggling method where only single AES core is required in two operating mode of AES-CCM.
II. LOW-COMPLEXITY AES CORE
The AES algorithm, which is used in AES-CCM security operation, is one of the symmetric key block cipher algorithm. The AES algorithm interpreted in Galois field consists of four sub-algorithms performed during the execution of the encryption. In this section, we introduce the AES algorithm and present a various methodology to reduce the hardware complexity.
AES Algorithm
The AES algorithm is a symmetric block cipher that processes 128-bit data blocks arranged as a 4x4 matrix of bytes called a State. All bytes in the AES algorithm are regarded as elements of the Galois Field. The Galois Field elements can be added and multiplied by using the mathematical concepts different from standard arithmetic [1] . The symbol " Å " and " Ä " are used to denote the addition and the multiplication in the Galois Field, respectively. Galois Field Arithmetic allows mathematical operation to encrypt data easily and effectively. As shown in Fig. 1 , the AES algorithm consists of four subalgorithms including ShiftRows, SubBytes, MixColumns, and AddRoundKey. In the encryption of the AES algorithm, each round except the final round requires all of the above sub-algorithms, while the final round does not have the MixColumns. The AES algorithm takes the initial key and performs a KeyExpansion to generate RoundKey (Key i ) used in i-th round. For the AES-128 used in the IEEE 802.15.4, the number of rounds is represented by Nr, where Nr = 10 [1] .
In the ShiftRows, the bytes in each row of the round data cyclically shifted over different numbers of bytes. The SubBytes is a non-linear substitution that operates independently on each byte. The MixColumns operates on the State column by column and the bytes of each column are mixed together. In the AddRoundKey, the State is added to the RoundKey, which generated from the KeyExpansion with one InitialKey in each round.
In the SubBytes step, each byte of the State is replaced by new byte using the substitution table (S-box) constructed by composing two transformations. The Sbox is the multiplicative inverse of a Galois field GF ( 2 8 ) with the irreducible polynomial, followed by an affine transformation. The SubBytes can be described by matrix form as Eq. (1), where M is an 8x8 binary matrix, and C is 8-bit vector {01100011}.
Most of the operations are implemented using a chain of XORs except for the SubBytes that performs multiplicative inverse operation. The SubBytes is the most complex operation which dominates the hardware cost of the AES core implementation and performance of the AES algorithm. Therefore, the optimization of the SubBytes is critical to the low complexity AES design.
Proposed AES Core
To reduce hardware complexity, the folded architecture is adopted with the S-box based on CFA (Composite Field Arithmetic). In resource constrained systems, it is not suitable using the traditional LUT-based S-box which increases the total memory requirement. We exploit the on-the-fly S-box based on CFA instead of the traditional S-box. The AES core is a strong candidate for folding, since the AES algorithm is a repetition of four subalgorithms. Folding techniques reusing computational units have been exploited in order to reduce hardware complexity.
The S-box that can be implemented without the use of memory has been proposed in [2] and further optimization for minimizing the hardware complexity of the S-box has been proposed in [3] . The S-box in [4] is implemented in composite field and employs circuit sharing between S-box and S-box -1 . These lowcomplexity S-box relies on CFA that offers efficient implementation for operations such as multiplication and inversion. In order to exploit CFA, the elements in original field GF ( 2 8 ) should be mapped to an isomorphic composite field. The field mapping is performed by isomorphic mapping function that is decided by the field polynomials of GF (2 8 ) and its composite fields. The transformation matrix δ and its inverse δ -1 that are used in isomorphic mapping can be obtained by the exhaustive search algorithm [3] . The matrix δ and δ -1 are shown Eq.
(2). 
After isomorphic mapping, the elements that have been arranged to composite field can be described as s h x+s l , where s h and s l are elements of GF (2 4 ) and x is a root of composite field polynomial. Using Extended Euclidean algorithm, the multiplicative inverse can be computed as Eq. (3). 
According to (3), the multiplicative inversion can consist of operations in sub field GF (2 4 ) such as multiply, addition, squaring and multiplicative inversion that can be implemented by fully combinational logic. The operations in sub field can significantly reduce the hardware complexity of S-box. Fig. 2(c) shows the low complexity S-box using composite field arithmetic. The squarer, multiplication and constant multiplication are illustrated by x 2 block, × block and ´l block. The x -1 block is the multiplicative inversion that is further decomposed to GF ( (2 2 ) 2 ).
The AES is a symmetric encryption algorithm that has a fixed block size of 128-bit. A typical block-wide AES structure illustrated in Fig. 2(a) takes only one clock cycle with using 16 S-boxes and 4 MixColumns in each round. Since the block-wide AES focuses on high performance, it simultaneously uses many duplicated units. Since the block-wide structures require numerous hardware resources, it is not proper for WPAN application. To utilize the resources efficiently, the folded AES structure that reuses the hardware resources has been proposed in [2] as shown in Fig. 2(b) . We exploit the folding techniques to reduce hardware complexity by reusing only one S-box and one MixColumns unit with the low overhead compared to the sub-algorithm unit reduction. Because the AES decryption operation is unnecessary in the AES-CCM mode, we use the AES encryption core that except the AES decryption operation. 
III. AES-CCM

AES-CCM Algorithm
The AES-CCM consists of the CBC mode for the MAC generation and the CTR mode for data encryption. The CBC-MAC generates the MAC which provides strong assurance of authenticity to the overall payload by applying cipher block chaining to the associated data. The CTR generates the ciphertext that is encrypted from the message and the MAC which is generated from CBC-MAC.
In the CBC-MAC, the previous encrypted data block is XORed with each successive plaintext block to create block chaining. As each block depends on the previous block by chain of blocks, the MAC that depends on the overall payload ensures the data integrity. In the CTR, each counter block that consists of the nonce data and the counter sequence is encrypted by the AES encryption algorithm and then the resulting block is XORed with the message block to produce the ciphertext. Since the AES encryption algorithm is also used in the decryption of the AES-CCM, the AES decryption process is not necessary in the AES-CCM. Therefore, we exploit the encryption only AES hardware in order to reduce the hardware resources.
The authentication and encryption process of the AES-CCM algorithm are shown in Fig. 3 . The IV (Initial Vector), MSG (Message) and the AUTH (Authentication data) are formatted according to the building blocks mechanisms in [5] . Both MSG blocks and AUTH blocks are used in the CBC-MAC to generate a MAC. On the other hand, the CTR uses only MSG blocks for encryption.
Security Operations for IEEE 802.15.4 / 802.15.6 Standards
The IEEE 802.15.6 standard provides the security service based on the AES-CCM for message encryption and authentication. The block IV defined in [5] contains control information such as the flag, the MAC size and the octet length, denoted as Q, of the message payload size. In the case of the IEEE 802.15.6 standard, the flag is whether or not to encrypt the message payload and the MAC size is four octets, denoted as MIC-32 (Message Integrity Codes-32), and the Q is two octets [5] .
The AES-CCM * that is adopted security operation of the IEEE 802.15.4 standard is based on the AES-CCM. In contrast to the AES-CCM, the AES-CCM * provides the eight levels of security option. These security options can be classified four modes: unsecured, encryption only, authentication only, and encryption and authentication [6] . Moreover, these security options offer varying levels of data authenticity such as MIC-32, MIC-64 and MIC-128, that offers stronger data authenticity than that of the IEEE 802.15.6 standard.
IV. AREA-OPTIMIZED MULTI-STANDARD AES-CCM SECURITY ENGINE
There are several approaches to implement the areaoptimized AES-CCM security engine that generates the MAC for the data integrity and the ciphertext for the confidentiality. The CBC-MAC and the CTR use the same AES core to encrypt the plaintext. So, the structure of the AES-CCM security engine hugely depends on the AES core management method, such as a sequential method, a parallel method, and a toggle method. The AES core management methods are shown in Fig. 4 .
In the sequential method, since the RST (response time) increases linearly according to the overall payload size which contains both the authentication data and the message, the sequential method has a longer response time than others [7] . In the parallel method, the security engine has high data throughput and short response time by processing the CBC-MAC and the CTR at the same time, but the engine using parallel method [8, 9] requires more hardware resources than sequential method.
The security engine that is designed by the toggle method operates the CBC-MAC and the CTR with only one AES core by turns [10] . During the authentication phase, the security engine performs only the MAC data calculation in the CBC-MAC. Then, during the message phase, the security engine needs to operate both the CBC-MAC and the CTR. The MSG data blocks are used for the MAC calculation in CBC-MAC and encrypted to the ciphertext in the CTR by turns. The toggle method reduces the hardware resource by using only one AES core with the same response time in the parallel structure as shown in Fig. 4 . The IEEE 802.15.4 and IEEE 802.15.6 standards which intend to design a type of WPAN and WBAN provide the low cost wireless communication with moderate data rate. Since the both WPAN and WBAN focus on the low-cost, the low complexity design is more optimum solution than high data rate design accompanied by parallel processing. So we employ the toggle method that provides a short response time and a low cost design using only one AES core for the area-optimized AES-CCM security engine. Fig. 5 shows the structure of the area-optimized AES-CCM security engine that is implemented by using the toggle method. The Block Generator and the Count Generator are formatting the input data blocks, such as B DATA , B IV , and B CTR , that used in the CBC-MAC and the CTR. The PT (plaintext) that is selected by current state is encrypted to encrypted data block D EN in the AES core. The D EN is XORed with the data block B DATA or the cipher block B CIPH , and then the result is stored in the Cipher Register or MAC Register according to the AES-CCM operation modes. The Authentication Check unit provides the Valid (validate signal) by comparing the TAG with the D MAC in the verification state.
To lower the complexity of the security engine, we propose four techniques to reduce requirement of the hardware resource. First, we exploit the composite field arithmetic to reduce the hardware complexity of S-box that dominates the AES hardware cost. Second, the AES data path bit-width is restricted to 8-bit by using folding techniques to reuse the S-box and MixColumns unit. Third, to reduce unnecessary hardware resources, we use the AES encryption core that except the AES decryption operation. Finally, as the toggle method is adopted in the Cipher/MAC Generator security engine, the security engine is implemented with only one AES encryption core. The proposed AES-CCM security engine can be operated with the various security modes. The security engine is processing the authentication and the encryption and constructing the MAC according to the security mode. The security mode is selected by the IV (Initial Vector) for the various combinations of the security options of IEEE 802.15.4 and IEEE 802.15.6 standards. Our security engine optimized with the four techniques can be compatible with multi-standard by offering the various security options.
V. IMPLEMENTATION RESULTS
The area-optimized AES-CCM security engine was described in Verilog-HDL and synthesized with a 65nm standard CMOS process. The proposed security engine runs at 6 MHz and 35 MHz for IEEE 802.15.4 and 802.15.6, respectively. The proposed security engine is designed to be working in conjunction with the IEEE 802.15.4-compatible ZigBee modem based on O-QPSK modulation that runs at the operating frequency of 6 MHz. Table 1 and 2 denote the implementation results of the proposed designs and the conventional designs. Table 1 shows the comparison results between the proposed AES core and the one in [8] . In the proposed AES core, the total gate count can be dramatically reduced up to 88.1% in comparison with the 128-bit architecture AES core in [8] by exploiting 8-bit folded architecture and the CFA which leads to the lowcomplexity S-box implementation. Table 2 compares the proposed security engine with other designs published in [8, 9] . To summarize, [9] is the parallel structure using the 128-bit AES core; [8] exploits the parallel structure using the 8-bit AES core with the LUT-based S-box; this work proposes the area-optimized AES-CCM security engine that adopts the toggle method with the lowcomplexity 8-bit AES encryption core.
As denoted in Table 2 , the gate count of the proposed security engine can be reduced by up to 42.5% in comparison with [9] , while the data rate of 250 kbps that is required by the IEEE 802.15.4 standard. Moreover the proposed security engine can be operated at over a clock frequency of 35 MHz. The data rate of 10 Mbps required by the IEEE 802.15.6 standard is satisfied in the proposed security engine at the operating frequency of 35 MHz without the additional hardware cost. Also as shown in Fig. 5 , the proposed security engine has the dedicated Authentication Check block which can check the correctness of the D MAC for the payload by comparing with TAG, while the previous works are based on software-based Authentication Check process which leads to high performance requirements in CPU / DSP core [8, 10] . Since the WPAN and WBAN devices mainly focus on area efficient implementation, this paper proposes the area-optimized AES-CCM security engine that is compatible with security services for IEEE 802.15.4 and IEEE 802.15.6 standard.
VI. CONCLUSIONS
In this paper, the design of the area-optimized AES-CCM security engine is described. The proposed security engine is compatible with IEEE 802.15.4 and IEEE 802.15.6 standards for WPAN and WBAN. The AES-CCM security engine is organized around the 8-bit AES encryption core that exploits the composite field arithmetic in order to reduce hardware complexity of the S-box. We use the encryption only AES core excepting the AES decryption operation that is unnecessary in the AES-CCM. By exploiting the toggle method that operates the CBC-MAC mode and the CTR mode by 
