Abstract: A Radix-4 Quasi-cyclic shift network (QSN) for reconfigurable QC-LDPC decoders is presented in this paper. A complexity reduction technique is described to reduce the total gate count at each stage in addition to the fact that Radix-4 logarithmic barrel shifter naturally offers less number of stages compared to Radix-2. The proposed Radix-4 QSN architecture supports various code rates and all sizes of sub matrices. Moreover, a novel Radix-4 signal generator is proposed which is particularly an essential element for reconfigurable LDPC decoders. The synthesis, placement and routing (P & R) of the proposed network is performed using TSMC 90-nm standard cell CMOS technology. The implementation results shows that the proposed network outperforms its predecessors by about 11% and 38% in terms of area and clock frequency respectively.
ally very complex. The switch network is one of the sources of complexity and critical path within LDPC decoders [1, 2] . QC-LDPC codes simplify the switch network. QC-LDPC code eliminates the random shifting (or permutation) and only cyclic shifting is required [1] . The conventional barrel shifter architectures are not sufficient for modern reconfigurable LDPC decoders because conventional barrel shifters don't support cyclic shifting when the number of inputs is less than the network size or they don't support LDPC decoders with multi-size sub matrices [3, 4, 5] .
Recently, many switch network designs are presented for reconfigurable QC-LDPC decoders. Some of the efficient designs based on Benes network, Banyan network and Barrel shifters are given in [3, 4] and [5] , respectively. To the best of our knowledge, the Radix-2 Quasi-cyclic shift network (QSN) design [5] has performed better than all other designs. The QSN architecture has utilized two conventional logarithmic barrel shifters and a merge network to perform the required cyclic shifts for arbitrary number of inputs. This works present a novel idea of designing QSN based on high radix number system.
Radix-4 QSN architecture
Generally, the Radix-2 logarithmic barrel shifters consist of a base unit of two to one multiplexers. So, Radix-4 approach consists of four to one multiplexers. Radix-2 network can offer maximum two numbers of shifts at each stage (here, s = stage), i.e. 0×2 s and 1×2 s . So, intuitively the Radix-4 network offers maximum four numbers of shifts at each stage, 0×4 s , 1×4 s , 2×4 s and 3×4 s . Hence it offers shift amount of 0/1/2/3, 0/16/32/48 for first and third stage, respectively. Total numbers of stages required for shift value of 'N' are log 4 N . It is clear that total numbers of stages are reduced compared to Radix-2 [5] , which are the core constituent of complexity and critical path. A 16×16 Radix-4 logarithmic barrel shifter using four to one multiplexers is shown in Fig. 1 (a) . Radix-4 QSN requires a base-4 representation of the shift amount. So, each stage requires a 2-bit control value. The most complex part of the barrel shifter architecture shown in Fig. 1 (a) is a fixed wired interconnecting network between the multiplexer stages. Each multiplexer needs four wired inputs. So, 16 multiplexers require total of 64 (=16 × 4) interconnecting wires and total of 128 interconnecting wires are required for two stage 16×16 barrel shifter shown in Fig. 1 (a) . All the interconnections are implemented using Eq. (1).
where, 'N' is a network size, 'iNum' is a stage input, 'miNum' is an input number of a 4×1 multiplexer and 'mNum' is a multiplexer number. of QSN [5] solved the problem by using two barrel shifters instead of one and a merge stage to combine the outputs of two networks. The proposed architecture for Radix-4 QSN is shown in Fig. 1 (c) Fig. 1 (c) because now it is the responsibility of the second barrel shifter (or reverse network) to provide all the upward shifting signals to the merge network. The control value of direct network is 'c' (cyclic shift); while the control value for reverse network is 'r' (difference of number of inputs and cyclic shift amount). A merge network is used to merge the signals from both the networks, as shown in Fig. 1 (c) . After the elimination of upward directed signals, first 4 s multiplexers in each stage are completely eliminated. Furthermore, second 4 s multiplexers turn into two to one multiplexers, while next 4 s multiplexers turn into three to one multiplexes. The rest are four to one multiplexers. These results provide a significant area reduction specifically, when the numbers of stages are large. The total number of multiplexers required for Radix-4 QSN are calculated using the Table I . The proposed complexity reduction method not only reduces number of multiplexers but also reduces the interconnecting wire required between stages. The interconnecting wires between stages are calculated using Eq. (2).
T otal W ires
For 16×16 Radix-4 QSN network shown in Fig. 1 (c Fig. 2 (a) . It is clear that the zeros are shifting in a cyclic shifting manner and each shifted zero is replaced by one. So, the controller for the merge stage can be implemented as a Radix-4 barrel shifter with all inputs equal to zero. Furthermore, all the downward directed interconnecting wires are fed with the value of one, as shown in Fig. 2 (b) . The control signal generation algorithm for Radix-4 QSN is shown in Fig. 2 (c (1) is used to make the connections between the multiplexer stages.
Implementation and comparison results
The proposed Radix-4 QSN design (with 8-bit word length) was modeled in Verilog HDL and synthesized with TSMC 90-nm CMOS technology (All the inputs and outputs were loaded with buffers). The layout was carried out using 9-layer metal technology. Table II shows implementation and comparison results for proposed Radix-4 QSN architecture. It is clear that Radix-4 QSN performs much better compared to [3, 5] . A 96 × 96 network is the key requirement for IEEE 802.11n and IEEE 802.16e standard LDPC decoders. Generally, an area scaling factor of (1.414 2 ) 2 ≈ 4 and a frequency scaling factor of 1.414 2 ≈ 2 is used to convert a 90-nm result to 180-nm result [6] . Thus, scaled area value for 96 × 96 network equals 0.1317 × 4 = 0.527 mm 2 , that translates to about 11% saving in terms of area compared to [5] and 27% compared to [3] . Scaled frequency value for 96 × 96 network equals 650 ÷ 2 = 325 MHz, which is about 38% and 70% higher than [5] and [3] , respectively.
Conclusion
The proposed work describes an efficient Radix-4 QSN architecture for reconfigurable QC-LDPC decoders. This work paves a way for the development and implementation of high radix QSN network. A novel complexity reduction technique is described to reduce a gate count at each stage. Furthermore, a novel signal generator suitable for Radix-4 QSN is also proposed. The proposed design shows a definite performance improvement over its predecessor.
