Abstract. This paper proposes a compact design of SMS4 S-box using combinational logic which is suitable for the implementation in area constraint environments like smart cards. The inversion algorithm of the proposed S-box is based on composite field GF(((2 2 ) 2 ) 2 ) using normal basis at all levels. In our approach, we examined all possible normal basis combinations having trace equal to one at each subfield level. There are 16 such possible combinations with normal basis and we have compared the S-box designs based on each case in terms of logic gates it uses for implementation. The isomorphism mapping and inverse mapping bit matrices are fully optimized using greedy algorithm. We prove that our best case reduces the complexity upon the SMS4 S-box design with existing inversion algorithm based on polynomial basis by 15% XOR and 42% AND gates.
Introduction
SMS4 is the mandatory block cipher standard for securing Wireless Local Area Network (WLAN) devices in China. The Office of State Commercial Cipher Administration of China (OSCCA) released the cipher description in January, 2006 [8] and the English version of the document is published by Diffie and Ledin [9] . SMS4 is used in WLAN Authentication and Privacy Infrastructure (WAPI) standard in order to provide data confidentiality. The Chinese WLAN industry widely uses WAPI, and it is supported by many international corporations like SONY in the relevant products. The efficiency of SMS4 hardware implementation in terms of power consumption, area and throughput mainly depends upon the implementation of its S-box. It is the most computationally intensive operational structure of SMS4 as it comprises of nonlinear multiplicative inversion. The designers of the SMS4 had chosen its S-box design similar to Rijndael which employs inversion base mapping [14] . Implementing a circuit to find the multiplicative inverse in the GF (2 8 ) using Extended Euclidean algorithm or Fermat theorem is very complex and costly. Several architectures of GF(2 8 ) inverter have been proposed by researchers over the period of time for area efficient implementation of S-boxes that comprises of inversion in their algebraic expressions. An efficient way to implement S-box is to use combinational logic because it requires small area for implementation. V. Rijmen [3] proposed the first hardware implementation of AES S-box using composite field representation. The proposed design suggested the use of Optimal Normal Basis for efficient inversion in GF (2 8 ). J. Wolkerstorfer [1] and A.Rudra [5] implemented the AES S-box by representing GF (2 8 ) as a quadratic extension of the GF(2 4 ) using polynomial basis. In this approach a byte in GF (2 8 ) is first decomposed into linear polynomial with coefficients in GF (2 4 ) and different arithmetic operations in GF (2 4 ) are computed using combinational logic. The inversion in hardware is then implemented with the simple logic gates by further decomposing GF (2 4 ) into GF(2 2 ) operations. Satoh [6] and Mentens [7] further optimized the hardware implementation of AES S-box by applying a composite field with multiple extensions of smaller degrees. The tower field GF (2 8 )→GF (((2   2   ) 2 )
2 ) is constructed with repeated degree 2 extensions using polynomial basis. Canright in [2] analyzed all possible combinations of normal and polynomial basis at subfield levels of GF (((2   2   ) 2 )
2 ) and proved that use of normal bases at all levels of composite field decomposition further reduces the area of the AES S-box implementation. X. Bai [4] proposed a GF (2 8 ) inversion algorithm for SMS4 S-box based on slight modification of design in [1] . In this paper, a new combinational structure of SMS4 S-box with the inversion algorithm in tower field representation GF (2 8 )→GF (((2   2   ) 2 )
2 ) based on normal basis, has been proposed. We have analyzed all possible combinations of normal basis at each level with trace one from the field generated by irreducible primitive polynomial of SMS4 cipher. The comparison of our resulting best case architecture with the Sbox design based on proposed GF(2 8 ) inverter of [4] is also given. The organization of the rest of paper is as follows. In subsequent section, structure of SMS4 block cipher is briefly described with the focus on its S-box. In section 3, the design of S-box using the composite field representation with normal basis is explicated. Section 4 gives the comparison of combinatorial S-box designs of SMS4 with different normal basis combinations at subfield level. In section 5, a comparative analysis is given between our proposed design of S-box with the one based on the inversion algorithm presented in [4] . Conclusions and work in progress are stated in section 6.
The SMS4
SMS4 block cipher is based on the iterative fiestel structure with input, output, and key size of 128 bits each. The data input is divided into four 32 bit words. The algorithm comprises of 32 rounds, and in each round one word is modified by adding it to other three words with a keyed function. Encryption and decryption processes have the similar structure and only the key schedule is reversed. For the detailed description of cipher one may refer to [9] . The official depiction of SMS4 S-box is given as a lookup table (LUT) with 256 entries. The S-box is commonly implemented with the ROM lookup table where the pre-computed values are stored. However, significant hardware resources are required if lookup table is implemented with 16 × 16 entries. SMS4 S-box is bijective and it substitutes byte input for byte output using arithmetic computations over GF (2 8 ) . A method suitable for hardware implementation of S-box is to first perform affine transformation on GF (2) , then carry out inversion in GF(2 8 ), followed by second affine transformation over GF(2) [13, 14] . The S-box algebraic structure is given as the following expression [13] .
.
(
The row vectors are C 1 = 0xCB = (11001011) 2 and C 2 = 0xD3 = (11010011) 2 . The cyclic matrices A 1 and A 2 in the algebraic expression are as below:
The irreducible primitive polynomial in GF (2 8 ) is .
SMS4 S-box Design in Composite Field
In this section we describe the proposed SMS4 combinatorial structure based on composite field GF (((2   2   ) 2 ) 2 ) in normal basis with the logical equations for inversion, multiplications, squaring and addition. SMS4 S-box design in composite field arithmetic is more efficient than using ROM/RAM for lookup tables (LUT) in area constrained environments [4] . All finite fields of same cardinality are isomorphic but their arithmetic efficiency depends significantly on the choice of basis that is used for the field element representation. For the hardware implementation, normal basis has significant advantage over polynomial basis as mathematical operations in normal basis representation generally comprises of rotation, shifting and XORing [11, 12] .
GF(2 8 ) Inversion Algorithm using Normal Basis
For input byte x to SMS4 S-box, inverse is computed for the expression (A 1 .x + C 1 ). The complexity of basis conversion is dependent on the selected irreducible polynomial and if the polynomial is adequately chosen, the basis conversion is simple [7] . Following are the irreducible polynomials and their corresponding normal basis representation.
Normal basis (Z 4 ). To minimize the operations and simplify inversion circuit in composite field we consider only those basis combinations which have τ = T = 1. The nested structure of GF (2 8 ) inverter comprises of different subfield operations. In the following sections logical structures for inversion, multiplication and scaling in composite field are given. 
Inversion in GF(2
Where is multiplication and is addition in GF(2 2 8 ) is expressed by following relation.
The logical structure of GF( 2 8 (8) The GF (2 4 ) inverter is depicted in figure 2 . The inversion in GF( 2 2 ) is same as squaring and implemented without gates by swapping of bits. If e Є GF (2 2 ) is represented in normal basis (Z 2 , Z) as e = e h Z 2 + e l Z, e h , e l ∈ GF(2) and f is the inverse of e in GF( 2 2 ) then inversion in GF (2 2 ) is:
Multiplication in GF(2 4 ) and GF(2 2 ). The structures of multipliers in GF( 2 4 ) and GF (2 2 ) in normal basis are derived as below. 
represents the bit addition, is AND operation and = (e h e l ) (f h f l ). The above mentioned structures are illustrated in figure 3 and figure 4 respectively.
Scaling and Squaring in GF(2 4 ) and GF(2 2 ).
In GF (2 8 ) and GF(2 4 ) inverters there are constant multiplication operations (n × a 2 ) and (N × c 2 ) and in GF(2 4 ) multiplier there is constant multiplication term (N × c). The combination of squaring and scaling operation results in further optimization [2] . The computation of these terms depends on the values of n in GF (2 4 ) and N in GF( 2 2 ) for the chosen normal basis. N Є GF (2 2 ) and N is not equal to zero or one, therefore N and N+1 are the roots of z 2 + z + 1. So depending on the choice of basis, scalars for N and N 2 implies to scalars for z or z 2 . The two bit factor (N × c) is given in two ways.
Similarly the square scaling two bit factor (N × c 2 ) is represented in following two ways depending upon choice of conjugate basis pair. The scaling operation (n x a 2 ) is a four bit factor in GF(2 8 ) inverter and its computation in GF(2 2 ) depends on the normal basis types and the relation between norm n and N as in [2] . For computations in GF(2 4 ), tables in appendix 'B' are used.
Generating Isomorphic and Inverse Mapping Functions
The standard SMS4 form is defined by 8 bit vector as coefficients of powers of x which is root of irreducible primitive polynomial in (3). Multiplicative inversion in composite field is computed after a byte in GF (2 8 ) is mapped to its composite field representation using isomorphism function δ [6] . After the multiplicative inverse is computed in the composite field, the 8 bit result is mapped back to standard equivalent representation in GF (2 8 ) using inverse isomorphic function δ -1 . The isomorphic and it inverse mapping is one to one and onto mapping and is represented as 8×8 matrix [10] . If byte s is in standard polynomial basis then it can be represented as a quadratic extension as s = a h X 16 + a l X, a h , a l ∈ GF(2 4 ), where each 4 bit coefficient is represented as c = c h Y 4 +c l Y, c h , c l ∈ GF(2 2 ), each of which is then further represented as pair of bits e = e h Z 2 + e l Z in GF( 2 2 )/GF (2) . If the new byte is given as t 7 t 6 t 5 t 4 t 3 t 2 t 1 t 0 then we have the following expression [2].
= .
The values of X, Y and Z are substituted from the conjugate basis chosen and these 8 hexadecimal values with coefficient t i represents the columns of 8 × 8 reverse base transformation matrix δ -1 . The inverse matrix δ is used for changing standard basis to corresponding composite field representation [2] . The inverse mapping matrix δ -1 is combined with affine transformation matrix A 2 for further optimization as in [6] . The block diagram of SMS4 S-box is given in the figure below.
Fig. 5. SMS4 S-box Block Diagram

Results
For the possible choices of norms in GF (2 4 ) and GF(2 2 ) along with the normal basis at each subfield level satisfying τ = T = 1, we have 16 possible cases as shown in appendix 'A'. SMS4 S-box design based on each case is fully tested and simulated. The most compact case is the one which gives the least number of XOR gates for implementation. It can be observed from the results in table 1 that choosing different normal basis combination results in small difference in number of XOR gates. These small differences exist due to different mapping matrices and slight differences in the inverter architectures. The matrices operations for mapping, inverse mapping and affine transformation are fully optimized using greedy algorithm [10] . The greedy algorithm operates iteratively on the mentioned matrices determining the occurrences of all possible repeating pairs in the output. The repeating pairs are pre-computed to reduce the number of XOR gates. Our best case S-box design (case 5, table 1) saves 35 XOR gates by application of greedy algorithm. The GF (2 8 ) inverter in normal basis comprises of one GF (2 4 ) inverter, three GF(2 4 ) multipliers, one square scaling and two additions in GF (2 4 ) as shown in figure 1. One GF(2 4 ) inversion is computed using three multipliers, one inversion, one square scaling and two additions in GF(2 2 ) as depicted in figure 2 , where one GF (2 4 ) multiplier comprises of three multipliers, four additions and a scaling operation in GF(2 2 ) as in figure 3 . Thus total number of logic gates computed in hierarchical structure of inverter for our best case S-box is 91 XOR and 36 AND. The structures of multipliers in figure 3 and figure 4 depicts that it requires summation of high and low halves of each input factor. If the same factor is shared by two different multipliers then share factor can save one subfield addition [2] . Thus, a four bit common factor in one GF (2 4 ) multiplier can save five XOR gates and a two bit common factor in GF( 2 2 
Comparative Analysis
Our most compact SMS4 S-box comprises of 134 XOR and 36 AND gates with conjugate pair basis (0x94, 0x95), (0x51, 0x50) and (0x5C, 0x5D) respectively. We provide comparison of our most compact case 5 S-box design with the one based on GF(2 8 ) inversion algorithm proposed in [4] that uses polynomial basis. The operations in the subfield and the number of XOR and AND logic gates required to design SMS4 S-box based on [4] is given in table 3. The matrices computations are optimized using greedy algorithm as in [1]. 
Conclusion and Future Work
In this paper we have proposed an improved design for SMS4 S-box based on the combinational logic with a low gate count. The proposed algorithm for computing SMS4 S-box function is based on composite field GF (((2  2 ) 2 ) 2 ) and we have simulated all the possible cases of subfield combination depending upon the choice of normal basis, from which we have determined the best case. All the transformation matrices are optimized using greedy algorithm. We have proved that our best case S-box design results in much lower gate count and reduces the complexity by 15% XOR gates and 42% AND gates over the S-box based on the inversion algorithm of [4] . Our compact architecture of SMS4 S-box can save a significant amount of chip area in the hardware implementation of SMS4 in ASICs and it can be used for area constrained and demanding throughput SMS4 integrated circuits for applications ranging from smart cards to high speed processing units. The future work will concentrate on the ASIC implementation of the S-box, where our design can be further improved using the logic gate optimizations depending on specific CMOS standard library. 
References
