Abstract-This paper presents approaches to develop efficient network for non-binary quasi-cyclic LDPC (QC-LDPC) decoders. By exploiting the intrinsic shifting and symmetry properties of the check matrices, significant reduction of memory size and routing complexity can be achieved. Two different efficient network architectures for Class-I and Class-II nonbinary QC-LDPC decoders have been proposed, respectively. Comparison results have shown that for the code of the 64-ary (1260, 630) rate-0.5 Class-I code, the proposed scheme can save more than 70.6% hardware required by shuffle network than the state-of-the-art designs. The proposed decoder example for the 32-ary (992, 496) rate-0.5 Class-II code can achieve a 93.8% shuffle network reduction compared with the conventional ones. Meanwhile, based on the similarity of Class-I and Class-II codes, similar shuffle network is further developed to incorporate both classes of codes at a very low cost.
INTRODUCTION
Compared with binary ones, LDPC codes defined upon Galois field GF(q) with order higher than two have even more excellent error correction capabilities with proper encoding approach and code length [1] . However, with the improvement of decoding performance, higher computation complexity will follow. To this end, [2] proposed several sub-optimal selecting algorithms based on n-norm ∥ ∥ n construction. As n decreases to 1, the optimal algorithm reduces to the Min-Max algorithm, which proves very suitable for practical purposes by achieving a good compromise between hardware costs and decoding performance. Meanwhile, a special class of non-binary LDPC codes named non-binary QC-LDPC codes are constructed with the architecture-aware scheme [3] . Even though, the few implementations of non-binary QC-LDPC decoders employing Min-Max algorithm still suffer from a high hardware cost [4] - [5] . Without exploiting the inherent geometry properties of QC-LDPC codes, those designs employ either a conventional bi-directional network or two shuffle networks for re-/shuffling, leading to a q-time increase of the network complexity.
In this paper, to make full use of benefits introduced by architecture-aware scheme, special emphasis has been placed on investigating the geometry properties of the corresponding H matrices. Rather than reconfiguring the global shuffle networks for each layer, the proposed approach employs two kinds of local shuffle network to eliminate the unnecessary network costs for Class-I and Class-II QC-LDPC codes, respectively. The designs are reconfigurable, memory efficient, highly parallel, and of low routing complexity. In order to demonstrate the advantages, both the 64-ary (1260, 630) rate-0.5 Class-I code and 32-ary (992, 496) rate-0.5 Class-II code are employed as examples. It is shown that, if flexibility is taken into account, 70.6% shuffle network cost can be reduced compared with the state-of-the-art designs for Class-I code and 93.8% for Class-II code. On the other hand, if flexibility is not a necessity, more memory and control logic can be eliminated. Moreover, a new local shuffle network which is compatible with both classes has been further proposed along with a minimum cost.
The remainder of this paper is organized as follows. In Section II, construction methods for both Class-I and Class-II codes are briefly reviewed. In Section III, geometry properties of both codes are investigated and summarized, respectively. Different layer partition choices have been proposed in Section IV. Section V describes the shuffle networks. The hardware costs estimation and comparisons with other designs are given in Section VI. Finally, Section VII concludes the whole paper.
II. NON-BINARY QC-LDPC CODES CONSTRUCTION
Like their binary counterparts, non-binary QC-LDPC codes are initiated by the motivation of architecture-aware design [3] . Using two similar array dispersions of matrices, constructions for two classes of QC-LDPC codes, referred as Class-I and Class-II codes, are proposed as well. It is know that elements of GF(q) can be represented in the power of primitive elementα: T accordingly, where can be any element in GF(q). Therefore, the construction steps for Class-I codes can be stated as follows:
Construction for Class-I Non-Binary QC-LDPC Codes 
(1) (1) , 0 , 
By simply changing the multiplications with additions, we obtain the construction steps for Class-II codes as follows. The similar construction steps for both codes yields resemblances in corresponding geometry properties and network designs, which are stated in Section III and V, respectively. 
Construction for Class-II Non-Binary QC-LDPC Codes
[ ] ,
[(
Replace each entry of by its CPM
III. PROPERTIES OF NON-BINARY QC-LDPC CODES
Although some apparent properties of non-binary QC-LDPC codes have been addressed by previous literatures [4] - [5] , more geometry properties hidden behind the algebra architectures need to be revealed for efficient network designs.
A. Shifting Properties of Class-I Codes
According to the construction steps, one can verify the identity of (1) , W i j and its upper-left neighbor
Similar permutation property holds at the lower level:
Moreover, with the definition of CPM, the useful properties of Class-I codes can be summarized as follows: Proposition 1 The Class-I non-binary QC-LDPC codes satisfy shifting properties at three different levels:
1) The i-th row of the base matrix (1) W is exactly the 1-step right cyclic-shift of the [(i-1)modc]-th row. Therefore, (1) ( 
B. Symmetry Properties of Class-II Codes
Unlike Class-I codes, it is not that trivial to uncover the geometry properties of Class-II codes. Since the surjective function mapping from elements of subgroups t ′ and m t − ′′ to power forms of α is not specified, candidate(s) for value assignment scheme is not unique. Given this degree of freedom, a specific surjective function which yields symmetry properties is introduced purposely. Without loss of generality, details of the surjective function are described with the subgroup t ′ : 
Substituting Eq. (3) into the construction steps, we show that matrix (2) W has the symmetry property shown in Eq. (5). Denote the sub-matrix of (2) W by ( 
2) ,
W i j , then we have, (2) (2) , 1 ,
Combining this fact with the matrix structure, it follows that (2) , W i j and its mirror about the anti-diagonal are identical. Similarly, Eq. (4) corresponds to the self-symmetry of ( 
W i j about its own anti-diagonal, (2) (2) ( , )( , )
.
On the other hand, for Class-II codes, both the base matrix (2) W and its sub-matrix ( 
W i j are self-symmetric about their own diagonals. This is apparent from the 4 th step of the construction for Class-II codes as follows, (2) , (2) ,
The above properties help to derive the following proposition: Proposition 2 The Class-II non-binary QC-LDPC codes satisfy the geometry properties at three different levels:
1. The base matrix (2) W is symmetric about its diagonal and anti-diagonal, i.e., W i j is also symmetric about its diagonal and anti-diagonal, i.e., we have (2) (2) ( , )( , ) ( , )( 1, 1) i j k l i j n l n k − − − − = w w and (2) (2) ( , )( , ) ( , )( , ) i j k l i j l k = w w ; 3. Each row of one CPM ( 
2) ( , )( , )
i j k l w is the right cyclic-shift of the row above it multiplied by α and the first row is the right cyclic-shift of the last row multiplied by α. Choose m = 5, and t = 2, we construct a 32-ary (992, 496) rate-0.5 Class-II code. Another Class-II code constructed with random surjective function is used as a benchmark. Decoding performances with EMS algorithm and maximum 10 iterations are illustrated in Fig. 1 . It is shown that the performance of Class-II code with the proposed method is as good as that of the random one. Therefore, the proposed approach introduces symmetry properties without affecting the decoding advantage.
IV. LAYER PARTITION CHOICES OF QC-LDPC CODES
In order to reduce the number of decoding iterations and make best use of the geometry properties, the layered decoding schedule is incorporated with the Min-Max algorithm here. The k-th iteration for layer t can be formulated as follows: 3: .
Stated in Proposition 1 and 2, both classes of non-binary QC-LDPC codes have a nice algebraic construction which can be easily accommodated with the layered decoding scheme. Along with the constraint of at most 1 column weight in each layer, two layer partition options can be proposed as follows:
1. Choose each sub-block row of
W i j as one layer, which consists of (q-1) rows. This option is defined as the Layer-I choice; 2. Choose each row of CPM
as one layer, which consists of only one row. This option is defined as the Layer-II choice.
V. LOCAL SHUFFLE NETWORKS FOR BOTH CODES
The architecture of the (u, v) non-binary QC-LDPC decoder is shown in Fig. 2 . u = ρ(q-1) is the code length, u-v = γ(q-1) is the number of check bits, w is the layer height. It is composed of w CNUs, a de-/permutation block, a global shuffle network (GSN), and u VNUs with a local shuffle network (Π) . In what follows, with proposed geometry properties, different reducedcomplexity shuffle networks for both codes are presented. 
A. Local Shuffle Network for Class-I Codes 1) Generating Algorithm for Local Shuffle Network
Apparently, Proposition 1.2 and 1.3 only differ in the value of the multiplicand (β or α). Without loss of generality, the Layer-I decoding scheme is chosen as an example.
Scheduling Algorithm for Local Shuffle Network -I ( 1) [ ( 1) 
The index of each VNU can be rewritten in the form of i(q-1)+j (for short (i, j)). For the example depicted in Fig. 3 , the index of VNU 7 can be rewritten as (2, 1) . Based on the new scheduling algorithm, the destination index is (1, 0) . Therefore, the extrinsic message should be transferred from VNU 7 to VNU 3 (1×3+0 = 3), which matches the previous analysis. 
2) Achitecture of Local Shuffle Network
It is observed that the inter-layer shuffle scheduling is irrelevant of the current layer index. That is, no matter what number i is, the extrinsic message transfering between the i-th layer and the (i+1)-th layer is exactly the same, which can be implemented by using fixed interconnections shown in Fig. 4 . The complexity of the resulting network is 1/q of that for the conventional one. Only 2log 2 ρ-1 stages and ρ(log 2 ρ-1/2) 2×2 crossbar switches are required. The number of control bits, which can be pre-acquired with INDEX (n) is ρ(log 2 ρ-1/2). The local shuffle network for Eq. (12) is given in Fig. 6 . With a modified INDEX (n) , this approach is suitable for Class-I codes.

VI. IMPLEMENTATION COMPLEXITY COMPARISONS
Without loss of generality, it is assumed that the proposed (u, v) decoder employs Layer-I partition. A (b q , b f ) uniform quantization is adopted, in which b f out of b q bits are used for fraction parts. Table I lists the comparison of the proposed design with others. Compared with the state-of-the-art decoders, the proposed one can greatly reduce the hardware complexity. According to [4] and [5] , their network could not incorporate flexibility into the design due to the use of ROM. In Table I , there are two approaches to design the local shuffle network for Class-I codes. The first one (#1) is reconfigurable for any Class-I codes with code length ρ(q-1). The second one (#2) is only suitable for a specific Class-I code. The #3 and #4 approaches deal with Class-II codes and codes of both classes, respectively. Take the 64-ary (1260, 630) rate-0.5 Class-I code as an example. While having more flexibility, the proposed shuffle network #1 achieves hardware saving of 69.2% and 70.6% compared with [4] and [5] , respectively. Because the #2 approach can further eliminate all memory elements required by #1, more reduction can be expected. For configurable version of the 32-ary (992, 496) rate-0.5 Class-II code decoder with #3 network, the total saving is (k-1)/k = 15/16 ≈ 93.8%.
VII. CONCLUSIONS
In this paper, with the exploited geometry properties of non-binary QC-LDPC codes, novel approaches to design efficient network for decoders are proposed, which outperform the state-of-the-art designs with more than 69.2% savings.
