Abstract: Developing a reconfigurable transceiver to support multiple protocols seamlessly and efficiently is an extremely tough task. Wireless standards such as wireless local area network (IEEE 802.11a/g) and WiMAX (IEEE 802.16e) incorporate block interleaving technique to overcome the occurrence of burst errors during transmission. Field Programmable Gate Array (FPGA) implementation of floor and modulus (MOD) functions to perform the two step permutation for attaining the new index is quite complex. In this study, the authors propose a low complexity and area efficient reconfigurable architecture for multimode interleaver address generator to support multiple wireless standards. In addition, a novel MOD_row and MOD_column circuit are proposed to compute MOD function for row and column counter values, respectively. The proposed address generation circuitry supports BPSK, QPSK, 16-QAM and 64-QAM modulation schemes under all possible code rates. The reconfigurable address generator for various block size and modulation scheme are implemented on Xilinx Spartan XC3S400 FPGA and the functionalities are verified through simulation. The synthesis results of the proposed design shows a reduction of 60% in resource utilisation and an improvement of 46% in operating frequency over the existing approaches.
Introduction
Multistandard radios utilise distinct transceiver block for each wireless standard to process the corresponding communication signals. Hence, they result in increase of silicon cost and power consumption of the wireless devices which operates on battery mode. To overcome this problem, the concept of the software defined radio (SDR) has been introduced [1] . Main aim of SDR is to provide a single platform consisting of a hardware layer and number of software layers which incorporates set of radios from different communication standards. The reconfigurable nature of SDR enhances an improvement in the physical layer (PHY) of the existing and upcoming modern wireless system. Mitola [2] has introduced the concept of cognitive radio (CR) in his dissertation. SDR based CRs are fully programmable wireless devices that can sense their environment and dynamically adapt their transmission waveform, channels and spectrum reuse for dynamic spectrum access [1] . The wideband input signal of a typical CR exhibits the coexistence of radio channels based on different wireless communication standards simultaneously. Due to abundant growth in broadband wireless access wireless based electronic devices operating at high data rate becomes a more challenging task compared with wired last mile access technologies [3] . IEEE has developed standards (IEEE 802.11a/ g) which are popularly known as wireless local area network (WLAN) [4, 5] and (IEEE 802.16e) known as mobile WiMAX [6] . Among the PHY layer subsystems, the forward error correction (FEC) block is the one, which consumes more silicon due to addressing/permutation tables used in the conventional approaches for interleaver and deinterleaver. To support the rapid growth in wireless technology, the interleaver address generator hardware needs to adapt to the different interleaving standards. Hence, there is a scope for researchers to develop a reconfigurable architecture for address generator supporting multiple radio standards which reduces the silicon cost.
The channel interleaver is a mandatory block in the PHY of 802.11a/g and 802.16e based transmitter. A channel interleaver for multistandard SDR has to support multiple interleaving functions. Interleaving plays a vital role in improving the performance of FEC codes in terms of bit error rate. The basic function of interleaver block is to spread the encoded data into a random one. In the conventional approach, the received data is stored row-wise in the memory and read column-wise after applying certain permutations. To achieve more flexibility in the multimode interleaver, an address generator has to be developed using specialised blocks with reconfigurable capabilities.
Upadhyaya et al. [7] , have implemented the conventional interleaver on field programmable gate array (FPGA) with reduction in area. In [8] , an address generating circuit, for 802.11 interleaver based on the conventional look up table (LUT) method is presented. In [9] , a simple technique is developed to implement the one dimensional interleaver equation in matrix form (2D). Khater et al. [10] carried out the hardware implementation of the address generator for 1/2 code rate WiMAX channel interleaver. Chang [11] implemented an efficient dual mode deinterleaver for both IEEE 802.16e (block deinterleaver) and outer deinterleaver for digital video broadcasting standards. Yu et al. [12] presented a high speed block interleaver/deinterleaver to provide arbitrary column wise permutation by adopting a low-power first in first out bank structure and a simple finite state machine (FSM). In [13] , the authors have derived 2D translation of the functions for WiMAX channel deinterleaver which is more complex for 64-QAM. Upadhyaya et al. have tested an address generating circuit on FPGA for WiMAX multimode interleaver/deinterleaver in [14, 15] and WLAN multimode interleaver in [16] for all permissible code rates and modulation schemes based on FSM. Asghar, et al. [17] , proposed a twofold interleaver architecture for different spatial stream application in 802.11n. Zhang et al. [18] had presented, a low complexity architecture for interleaver/ deinterleaver suitable for MIMO application in 802.11a/g/n wireless LAN. Upadhyaya et al. [19] have designed and implemented separate hardware for QPSK, 16-QAM, and 64-QAM address generator on FPGA for WiMAX channel deinterleaver.
IET Computers & Digital Techniques

Research Article
In this paper, we propose a reconfigurable architecture of multimode interleaver address generator suitable for multistandard SDR by exploring the redundant hardware between different modulation schemes. In addition, we proposed a novel MOD_row and a MOD_column circuits to compute MOD function for row and column counter values. Our work differs from the approach [19] in the style of implementation, which eliminates the redundant hardware resources and the ROM required to store the modulus (MOD) values as well. Moreover we have derived the mathematical expression of address generator for different modulation schemes and the Boolean expression for the MOD_column and MOD_row circuit. A detailed analysis of the proposed architecture is performed and the results are compared with prevailing techniques which show an improvement in terms of logic utilisation and operating frequency.
The rest of the paper is organised as follows: Section 2 discusses the interleaving techniques in WLAN/WiMAX wireless standards. In Section 3, a reconfigurable algorithm is proposed for multimode interleaver address generations. Section 4 explains the implementation of reconfigurable architecture for the multimode interleaver. The performance analysis of the proposed architecture is presented in Section 5. Finally, the paper is concluded in Section 6.
Interleaving in WLAN/WiMAX standards
The wireless LAN standards 802.11a [4] , 802.11g [5] and WiMAX standard 802.16e [6] supports the conventional block interleaving schemes. The channel interleaving is a process of rearranging code symbols to reduce the effect of burst error. It processes one block of encoded bits at a time with block size equal to one OFDM symbol depending on the modulation scheme for a specific code rate.
The detailed view of the conventional channel interleaver structure is shown in Fig. 1 . It consists of an address generator and two RAMs of the same size. Both the memories are controlled by the sel signal, generated from the address generator in such a way that when one memory block is being written, the other one is read, and vice versa. When sel = 'HIGH', the input data stream written in M1 correspond to the write address as W E is active. Simultaneously, the memory M2 outputs the interleaved data stream for the read addresses. The read/write operation of memory blocks are carried out according to the interleaver depth. If it reaches the specified depth, then the status of sel signal is changed to swap the read/ write operation. Table 1 shows the different depths (N cbps ) of channel interleaver for IEEE 802.11a/g and 802.16e to incorporate various code rates and modulation schemes [4] [5] [6] .
Proposed reconfigurable algorithm for multimode interlever address generator
This section describes the mathematical background of the algorithm for the proposed multimode interleaver address generator. The channel interleaving in WiMAX/WLAN is expressed by a set of equations which perform two step permutation as discussed in [9] .
(1) ensures the randomisation of encoded bits and (2) validates that the adjacent coded bits are mapped alternately onto less or more significant bits of the signal constellation. where, m n and k n are the first and second level permutation outputs, respectively; n represents the index of the coded bit in the un-permutated source block which varies from 0 to (N cbps − 1); s is the parameter defined as max (N bpsc /2,1), where N bpsc represents the number of coded bits per sub-carrier, that is, 1, 2, 4, 6 for BPSK, QPSK, 16-QAM and 64-QAM, respectively [13] . d represents the number of columns and N cbps is the block size corresponding to the number of coded bits per allocated sub-channels. Table 2 shows the theoretical values of N bpsc ,s, N cbps and number of rows in interleaver memory for different signal constellations [16] . The modulo and floor functions are denoted as % and ⌊⌋, respectively.
To generate various permutations values for all modulation schemes and code rates, a MATLAB program is developed using (1) and (2) . As d is chosen as 16, the number of columns are fixed (=d ) and the number of rows are given by N cbps /d for all N cbps . The general expression for four modulation schemes chosen for discussion is derived by examining the correlation between the addresses as shown in Table 3 [16].
BPSK/QPSK
For both BPSK/QPSK the value of parameter 's' is 1. Therefore, (2) reduces to k n = m n + 0
Introducing the 2D array in which i increments when j expires, the range of i and j as [13] . Now, the interleaver can be realised as a 2D row-column matrix with size j × i. Therefore, (2) can be rewritten as
where, k n , j = 0, 1, …(N cbps /d) − 1 and i = 0, 1, …(d − 1) represents the interleaver address, row and column values, respectively.
16-QAM
The parameter 's' is 2 for 16-QAM as the number of coded bits per subcarrier is 4. Therefore, (1) and (2) can be rewritten as
Considering the 2D i and j, the 2D transformation of interleaver for 16-QAM can be described as:
64-QAM
For 64-QAM transmission, the number of coded bits per sub-carrier is 6. Thus, using s = 3 in (1) and (2), it can be rewritten as
The 2D transformation of the interleaver for 64-QAM can be described as
Using (6), (9) and (12) an reconfigurable algorithm is developed to eliminate the requirement of modulo and floor function as described in Algorithm 1 (see Fig. 2 ).
Proposed reconfigurable address generator architecture
In this section, the architectural details of the proposed reconfigurable multimode interleaver address generator is presented. The architecture of the proposed reconfigurable multimode interleaver address generator for all modulation schemes is shown in Fig. 3 . It comprises of a row counter, column counter, selection unit, multiplier and an adder. The output of column counter is compared with d to generate a fixed count value between 0 to d − 1. Whereas, the row counter is a variable one, which counts different levels between 0 to (N cbps /d) − 1 based on specific block size selected by multiplexers (MUXs) M6 and M7. From (6), the hardware required to implement BPSK and QPSK is quite similar and also the addresses are equally spaced as shown in Table 4 . Whereas, the addresses of 16-QAM and 64-QAM are not equally spaced. Therefore, the address generator has to produce two progressive patterns for 16-QAM and three for the 64-QAM modulation scheme. Thus, the design procedure adapted for QPSK can be extended to 16-QAM and 64-QAM with additional components such as incrementer, decrementer and a separate MOD circuit for row and column counter respectively. The structure shown inside the dashed line in Fig. 3 represents the selection unit for different modulation schemes such as BPSK, QPSK, 16-QAM and 64-QAM block. Each modulation scheme and their corresponding block size are encoded with specific binary values as tabulated in Table 4 . Table 4 , the modulation schemes can be categorised into two groups (BPSK, QPSK and 16-QAM, 64-QAM) based on the address spacing in the increment values. In hardware perspective, a simple EX-OR gate is enough to classify the two sets of modulation schemes. The MUX M1 selects the row counter output directly if the EX-OR of mod_type is '0' else it selects the incremented/decremented value of the row counter. Likewise, if the MOD_column value is '0', the row counter output is directly selected in M5 for all modulation schemes as given in (6), (9) and (12) . If MOD_column value is '1', the MUX M3 should select decrement by 1 logic else the output of MUX M2. Whereas, M2 generates two different outputs, that is, increments by 1 for 16-QAM and increments by 2 for 64-QAM based on mod_type [0]. While performing 64-QAM, the MOD_column value reaches a value of 2. In this case, the MUX M4 selects the row counter output incremented by 1 based on MOD_row[1] = '0' else decrement by 2. In our design, we have reduced the size of the MUXs compared with the logic in [19] by having special kind of select signal. Moreover, the control signals are not generated externally, but generated internally from encoded binary values.
Working of MOD_column
As, the number of column (d) is chosen as 16, the column counter values goes between 0 to (d − 1), that is, maximum value of 15. Hence, a 4-bit MOD_column circuit is proposed to compute the MOD value of column counter as shown in Fig. 4 . with 4-bit input as C 3 , C 2 , C 1 and C 0 from MSB to LSB. The final MOD2 or MOD3 value for the column counter is obtained by performing the successive steps as shown in Table 5 .
Logical expression for MOD_column:
In general, for a 2-bit ripple carry adder with two 2-bit input numbers (B 1 , B 0 , A 1 , B 0 ) , that is, totally 4-bit and the outputs are 2-bit sum (Sum 1 , Sum 0 ) and a carry bit (C out ). The general Boolean expression for sum and carry of a 2-bit ripple carry adder is represented as
The four bit inputs (C 3 , C 2 , C 1 , C 0 ) are grouped into two 2-bit numbers. To obtain the Boolean expression of the sum (s 1 , s 0 ) and 
Similarly, the Boolean expression of the sum (s 3 , s 2 ) of the second 2-bit adder in Fig. 4 , can be obtained by substituting the values of (11), (12) and (13) .
From the Table 6 , it is noted that the sum value (s 3 , s 2 ) coincides with the MOD3 (modulo 3) of the 4-bit inputs, except for the case of sum value is 3 ('11'). Where, the required MOD3 value is 0 ('00'), that is, complement. The key point is that, if the sum value goes to '3' we have to bring into '0'. Hence, we introduced a MUX for selecting either the direct value of sum (s 3 , s 2 ) or its complement value ( s 3 , s 2 ) with the control signal (sel) generated by performing AND operation of the Output Operation 1 1 2 = '1100' as 11 00
Group in to two bits each 2 
The output of the MUX is
Finally, the output of the MOD_column is obtained by selecting either MOD2 (C 0 ) or MOD3 (Out1) with the selection signal as mod type[0].
Working of MOD_row
From Table 4 , the maximum Block size (N cbps ) for IEEE 802.11a/g and IEEE 802.16 interleaver is 576. Therefore, the row counter counts up to maximum value of (N cbps /d) − 1 = 35. In binary form, it can be represented with 6-bit as '100011'. Hence, a MOD_row circuit is proposed to compute MOD function for 6-bit input named as R 5 , R 4 , R 3 , R 2 , R 1 , R 0 from MSB to LSB as shown in Fig. 5 . The final MOD2 or MOD3 for the row counter is obtained by performing the successive steps as shown in Table 7 . The Boolean expression for the MOD_row circuit can be derived in the same manner as MOD_column. The MUX selects either the direct value of sum (s 5 , s 4 ) or its complement value ( s 5 , s 4 ) with the control signal (sel) generated by performing AND operation of the sum bit s 5 and s 4 as given in Fig. 5 . Finally, the output of the MOD_row (Out) is obtained by selecting either MOD2 values (R 0 ) or the MOD3 value (Out2) with the selection signal as mod_type[0] (Table 7) .
Therefore, the proposed MOD circuit for row and column, eliminates the need for two ROMs of dimension 16 × 3-bit and 64 × 3-bit in [19] to store the MOD values for 16-QAM and 64-QAM respectively, which yields a substantial amount of reduction in resources and improvement in performance.
Performance analysis of the proposed architecture
In this section, the performance analysis of the proposed MOD function circuit and the reconfigurable address generator with existing architectures are discussed in detail.
Mod circuit analysis and synthesis results
Several implementation techniques for computing modulo function with varying input and modulo bit width on FPGA and ASIC platform are discussed in [20] [21] [22] . In [20] , Sivakumar et al. discusses the implementation of X mod m architecture with the MOD values of 3, 5, 6, 7, 9 and 10 in an ASIC chip using 3 μm CMOS technology. Since, our proposed MOD circuit for address generation is implemented on FPGA, it is difficult to make a direct comparison with [20] .
In [21] , Butler, et al. have developed two methodologies for computing x mod z with z fixed as 3, where x is the n-bit input and z is the MOD value. First approach is based on combinational circuits and the second design uses cascade of LUTs. To have better comparison with [21] , we have also performed the synthesis of the proposed MOD circuit on Altera Stratix IV EP4SE530F43C3NES FPGA using ALTERA Quartus II tool with the input bit width of 8 and 16.
From Table 8 , it is inferred that the proposed MOD circuit is much faster and consumes fewer resources than the first approach in [21] . It shows an average reduction of 65% in total registers and 28% improvement in operating speed with inputs of 8 and 16 bit width.
From the results, it is concluded that for higher input bit width, the proposed circuit provides further reduction in resource utilisation and improvement in operating speed. Compared with the second design using LUTs, our proposed design also consumes less logic resources. From Table 9 , it is observed that the proposed design achieves an average reduction of 44% in total registers. Since, the LUT approach adapts LUT in each stage rather than combinational logic, the design able to operate at high speed with increase in latency. The trade-off between the complexity and latency were discussed in [21] .
In [22] , Gorodecky has developed a circuit for the calculation of X mod P with the Boolean expressions are represented in Reed-Muller XOR polynomial form. It uses XOR and AND operators for the computation. For comparison of the proposed MOD circuit with [22] , we have extended the logic behind our proposed MOD 3 circuit to develop a MOD 7 function with input of 10-bit width and the synthesis are performed on both Virtex 7 XC7V285t and Spartan 3 XC3S1000 FPGA. The results are tabulated in Table 10 to show its better performance in terms of LUTs and propagation time.
Reconfigurable address generator analysis and results
The proposed reconfigurable address generator architecture is developed using Verilog HDL [23] and the functionalities are verified using Xilinx ISE. The whole design is implemented on Xilinx Spartan XC3S400 FPGA [24] device to have a better comparison of the proposed work with the existing approaches in [15] , [19] . 
FPGA synthesis results:
In the case of FSM based method [15] , for each block size, a unique MUX is used for all modulation schemes which results in increase of hardware and decrease in critical path delay. From Table 11 , it is evident that the proposed technique shows reduction of 48% in FPGA slices, 72% in flip flops, 73% in 4-input LUT's and with 6% lesser operating frequency compared with FSM method [15] . Compared with LUT based approach [19] , our work eliminates the conventional usage of block RAMs to house the address for each interleaver depths which results in significant reduction in slices (by 91%), in flip flops (by 81%), in 4-input LUT's (by 91%) and the design operates at 1.85 times faster rate. Our work shows an reduction of 55% in terms of logic slices, 71% in terms of flip flops and 56% in terms of 4-input LUT's and operates at 46% faster in terms of operating speed compared with MUX based approach in [19] . For a better comparison of the implementation results, a bar chart of resource utilisation for various methods against the proposed method is shown in Fig. 7 .
Conclusion
In this paper, a reconfigurable address generator architecture of multimode interleaver for IEEE 802.11a/g and 802.16e wireless standards supporting all possible code rates and modulation schemes has been presented. The complete hardware was developed using Verilog HDL and implemented on Xilinx Spartan 3 FPGA. In addition, a novel MOD_row and a MOD_column circuits were proposed to compute MOD function for row and column counter values. The synthesis results of the proposed MOD circuit shows significant reduction in resource utilisation and improvement in operating speed compared with existing modulo algorithms. MATLAB simulation results endorse the functionality of the address generator for various interleaver depths. A detailed analysis of the implementation results has been made to show the performance of the proposed method is improved compared with the existing methods. The proposed work shows an average of 60% reduction in resource utilisation and an improvement of 46% in operating frequency compared with the existing approaches. Thus, the proposed reconfigurable address generator architecture can be used to support various modulation schemes in multimode interleaver of multistandard SDR devices.
Acknowledgment
The authors thank UK-India Education and Research Initiative (UKIERI) Thematic Partnerships under grant no. UKUTP201100134, India for providing necessary support in this work. Table 11 Comparison of logic resources between the proposed and existing approach Logic resources FSM based method [15] LUT based method [19] Mux based method [19] proposed 
References
