Abstract-The multi-mode message passing switch networks for multi-standard QC-LDPC decoder are presented. An enhanced self-routing switch network with only one barrel shifter permutation structure and a shifter-based two-way duplicated switch network are proposed to support 19 and 3 different submatrices defined in IEEE 802.16e and IEEE 802.11n. These proposed switch networks can route the decoding message in parallel by different sizes without signal congestion. The enhanced self-routing switch network can switch the messages at different expansion factors with the lowest hardware complexity. Under the condition of a smaller expansion factor, the decoder throughput can be enhanced from the two-way duplicated switch network by increasing the parallelism. In the 130nm CMOS synthesis result, the proposed enhanced self-routing and the two-way duplicated switch network gate counts are 21.9k and 37.4k at 384MHz operation frequency.
I. INTRODUCTION
Low-density parity-check (LDPC) code, defined by a very sparse parity check matrix, was firstly introduced by Gallager [1] . The QC codes are described by sparse parity-check matrices comprised of blocks of circulant matrices [2] . The performance of the QC-LDPC block codes compares favorably with that of randomly constructed LDPC codes being shortened to medium block lengths [3] . The parity-check matrix H of QC-LDPC code can be decomposed into several cyclic-shifted identities or zero matrices. Message passing network at sub-matrix size is applied for QC-LDPC decoder to switch the decoding messages between the check nodes and the bit nodes [4] . In the other way, the QC-LDPC decoders are well suited for hardware implementation because of the regularity in the parity check matrices [5] . Newly high-speed communication systems such as IEEE 802.16e [6] , IEEE 802.11n [7] have considered employing QC-LDPC codes to enhance their performance. But the multi-rate and multi-sizes QC-LDPC codes defined in IEEE 802.16e and IEEE 802.11n are irregular and difficult to support all code rates under variable matrix sizes [8] .
According to different system parameters, the sub-matrix size defined by the expansion factor z f and the operation mode are variable in IEEE 802.16e and IEEE 802.11n.Dedicated hardware of message passing network for each specific submatrix size enhances the signal congestion with higher hardware complexity. It is necessary to have a flexible switch network to meet all sub-matrix sizes. The Omega and Benes sorting networks were exploited for regular LDPC decoders [4] . However, the traditional sorting networks, like Omega and Benes networks, are only limited to a fixed size and therefore suitable for the fixed size switching schemes.
With only one barrel shifter , the self-routing (SR) switch network was proposed to support 19 different sub-matrices sizes defined in IEEE 802.16e [9] . But additional control logic circuit is required to decide the expected routed messages when the shift amount is larger than half of maximum expansion factor. An enhanced SR switch network [10] can switch message in parallel with a simpler routing decision rule and reduce the control logic complexity. A shifter-based twoway duplicated switch network is proposed to upgrade the message passing throughput under the condition of a smaller sub-matrix size [11] . The selected results of duplicated switch network will be equivalent to the barrel shifting by shifting the duplicated parts with the original messages. These proposed switch networks can support 19 and 3 different sub-matrices defined in IEEE 802.16e and IEEE 802.11n.
The remainder of this paper is depicted as follows. Section II introduces IEEE 802.16e and IEEE 802.11n LDPC code structure. Section III describes the proposed architecture of the enhanced SR and two-way duplicated switch network which can support all the sub-matrix sizes defined in IEEE 802.16e and IEEE 802.11n. Finally, the implementation results of the proposed switch networks applied for the QC-LDPC decoder are shown in section IV and the final conclusion is presented in section V.
II. QC-LDPC CODES IN IEEE 802.16E/IEEE 802.11N
In the IEEE 802.16e mobile WiMax and IEEE 802.11n wireless LAN systems, the M N parity-check matrix H can be decomposed into z f z f sub-matrices, where the variable M is the number of parity check equations, and N is the code length. Each sub-matrix is either the zero matrix or the cyclic-shifted identity matrix. Note that the expansion factor z f , has 19 in IEEE 802.16e [6] and 3 in IEEE 802.11n [7] . The paritycheck matrix H can also be expanded from the m b n b base matrix H b with m b = M/z f and n b = N/z f . The code rate is determined by the value of m b /n b , where the maximum value of m b and the constant value of n b defined in the IEEE 802.16e system are 12 and 24 respectively. The parity check matrix size is determined based on the code rate and the expansion factor z f of sub-matrices. [7] range from 27 to 81 with an increment of 21. The H is extended from the base matrix H b by replacing each 1 in H b with a z f z f circular right shifted identity matrix and each 0 in H b with a z f z f zero matrix. A structure for rate-1/2 parity check matrix H is shown in Fig. 1 . Note that H can be partitioned into two parts: H 1 and H 2 , where H 1 is the information bits and H 2 is the parity check bits. H 2 can also be partitioned into two parts: h and H' 2 , where H' 2 has a dual-diagonal structure. 
III. MUTIL-MODE MESSAGE PASSING SWITCH NETWORK
The variable sub-matrix size causes the difficulty in applying the fixed size crossbar switches, such as Omega and Benes sorting networks [4] . For multi-standard QC-LDPC decoder, two shifter-based structures with one single permutation network [10] [11] are proposed to complete the message routing for all code rates and lengths. The enhanced SR switch network is suitable for message passing at the larger sub-matrix size, and meets all sub-matrix sizes with a simpler routing decision rule. Under the condition of a smaller sub-matrix size, the two-way duplicated switch network can be configurable for different expansion factors and switch different messages concurrently to improve the decoding throughput.
A. Enhanced Self-routing Switch Network
With only one barrel shifter, the SR switch network was proposed to support 19 different sub-matrices sizes defined in IEEE 802.16e [9] . Fig.2 illustrates the SR switch network that routes all messages concurrently through the three-stage switch network. The first stage is the combination of source messages with the self-routing bits (SRB), the second stage is the barrel shifter permutation network, and the third stage is the selection scheme. During the final selection, the expected z (smaller than the maximum expansion factor z max ) output messages are chosen from the z max routed messages according to the SRBs determined at the first stage. Note that the variable z is the expected expansion factor.
The third stage routing lookup engine compares the corresponding routed self-routing bits in parallel according to the expected expansion factor and shift amount. The routed messages that the available corresponding self-routing (selfrouting bit = 1 implies the available condition) are selected as the expected output messages. But both the routed SRBs for the routing lookup engine are available when the shift amount is larger than z max /2. An additional logic circuit is needed to decide the expected output message in the SR switch network. With a simpler routing decision rule, the enhanced SR network [11] is proposed to route the message with the lower hardware complexity. Fig. 3(a) shows that the first routing decision data are constructed from the (z+1)-th to the 2z-th routed data and the second routing decision data are constructed from the first to the z-th routed data. Note that the expected expansion factor z z max /2 in Fig. 3(a) . Fig. 3(b) shows the routing lookup scheme for z > z max /2. The first part of first routing decision data are constructed from the (z+1)-th to the z max -th routed data and the second part of first routing decision data are constructed from the first to the (2z-96)-th routed data. The second routing decision data are constructed from the first to the z-th routed data. When the shift amount is larger than
The two-way duplicated switch network can improve the decoding efficiency when the expected expansion factor z is smaller than z max /2. By a smaller expansion factor, the switch network can be configured to process two-way duplicated messages associated with different sub-matrices. Fig. 4 shows the source message format of two-way duplicated parallelism. By transferring the original message to the source message format, the first and the second message can be switched simultaneously according to the different shift amounts. Table II is a summarized performance comparison among the proposed switch networks [10] [11] and the existing switch network. In the 130nm CMOS synthesis result, the proposed enhanced SR and the two-way duplicated switch network occupy 2 0.1095 mm and 2 0.1872 mm area at 384MHz operation frequency respectively. The flexible permuter proposed in [12] can support three expansion factors defined in IEEE 802.11n only. The flexible barrel shifter and the multi-stage shifting network with multi-stage multiplexers were applied to switch variable size messages for IEEE 802.16e LDPC decoders [13] [14] [15] . But these switch networks increase the signal congestion since the duplicated multiplexers for the variable sub-matrix sizes. With the single permutation network, the proposed enhanced SR switch network can meet all the expansion factor requirements of IEEE 802.16e and IEEE 802.11n with the lowest hardware complexity. To improve the throughput when z z max /2, the two-way duplicated switch network can switch two messages groups simultaneously.
The two-way duplicated switch network was applied to the QC-LDPC decoder, compliant to both IEEE 802.11n and IEEE 802.16e. The message passing efficiency is double as compared to the enhanced-SR switch network if z f < z max /2. In the 90nm 1P9M process, the decoder chip occupies 2 6.25 mm silicon area and can achieve 300 MHz in the post-layout simulation. For the IEEE 802.16e application, the decoder equipped with two-way duplicated switch network chip operating at 107 MHz frequency achieves the maximum 63.36 Mb/s data rate within 20 iterations and dissipates 203 mW at 1.0V supply. 
V. CONCLUSION
The enhanced SR switch network and the two-way duplicated switch network can support the permutation function that fulfills the requirement of different sub-matrix sizes. Signal routing congestion in the proposed switch networks can significantly be reduced with only one permutation shifter that provides 19+3 different switch network sizes defined in the IEEE 802.16e and the IEEE 802.11n. After synthesized by the 130nm process, the enhanced SR switch network can switch messages at different sub-matrix sizes with the lowest hardware complexity. Under a smaller expansion factor, the two-way duplicated switch network can result in 50% improvement in decoding throughput by increasing the parallelism of message passing.
