The equivalent binary parity check matrices for the binary images of the cycle-free non-binary LDPC codes have numerous bit-level cycles. In this paper, we show how to transform these binary parity check matrices into their cycle-free forms. It is shown that the proposed methodology can be adopted not only for the binary images of non-binary LDPC codes but also for a large class of binary LDPC codes. Specifically, we present an extended p-reducible (EPR) LDPC code structure to eliminate the bit-level cycles. For the non-binary LDPC codes with short length symbol-level cycles, the EPR-LDPC codes can largely avoid the corresponding short length bit-level cycles. As to the decoding of the EPR-LDPC codes, we propose a hybrid hard-decision decoder and a hybrid parallel decoder for binary symmetric channel and binary input Gaussian channel, respectively. A simple code optimization algorithm for these binary decoders is also provided. Simulations show the comparative results and justify the advantages, i.e., better performance and lower decoding complexity, of the proposed binary constructions.
I. INTRODUCTION
Low density parity check (LDPC) codes, as a class of forward error control codes, have gained considerable attentions during the last decade due to their amazing decoding performance under different channels [1] , [2] . The performance of an infinite long LDPC code is usually evaluated in terms of the threshold for the average performance of its code ensemble (codes with the same degree distributions) based on the cyclefree condition [1] , [3] - [7] .
The LDPC codes will suffer from performance degradation if there exist non-negligible number of short length cycles in their parity check matrices, especially for the short block length codes. Moreover, codes with large girths will have respectable minimum/stopping distance bound, which also implies enhanced decoding performance. In this paper, we refer to the cycles in the binary parity check matrices as bitlevel cycles and the cycles in the non-binary parity check matrices as symbol-level cycles. In [8] - [10] , the authors show how to construct the parity check matrices with less bit-level cycles and large girths for binary LDPC codes. For the nonbinary LDPC codes, investigations indicate that they could have sparser Tanner graphs as the field size increases. For short to moderate block lengths, the non-binary LDPC codes with sparser graphs are more likely to outperform the binary ones. In [11] , [12] , the authors investigate a particular type of nonbinary LDPC codes, i.e. non-binary cycle LDPC codes, whose column weights are two. In [11] , optimizations for this type of codes are performed over Cayley-graph. In [12] , the authors propose bit-level coefficients selection methods to optimize the symbol-level performance for the non-binary cycle LDPC codes.
On the other hand, soft-decision decoding for the nonbinary LDPC codes requires a potentially higher complexity. The complexity of the q-ary sum-product decoding algorithm (QSPA) is O(q 2 ) for each check-sum operation. The Fourier transform QSPA reduces the complexity to O(q log q) [5] . The extended min-sum (EMS) algorithm in [13] further reduces the complexity to O(n m log n m ) at the cost of a bit performance loss, where n m is smaller than q. However, the computational complexity of the EMS decoder is still very high compared to the binary decoder. Hence, in [14] , [15] , the authors propose an extended binary representation for the non-binary LDPC code which can be decoded by binary decoders. The binary decoding complexity is only O(q) for BEC. Theoretically, based on the decoding error probability, the authors in [16] , [17] prove that the minimal decoding complexities exist if the LDPC codes are constructed with properly chosen degree distributions.
A. Related Works
The codewords of a non-binary LDPC code are often transmitted over binary input channels in their bit-vector forms, i.e., binary images of the non-binary LDPC codes. At the receiver side, the non-binary decoder needs to transform the received bit sequences back to their non-binary forms to perform the symbol-level decoding [2] , [6] , [12] , [18] , [19] for retrieving the information bits. On the other hand, as an alternative of using the non-binary decoders for binary input channels, one can use a binary decoder to retrieve the information bits by utilizing the binary representations of the non-binary parity check matrices for the purpose of reducing the computational complexity [14] , [15] , [20] . Especially in certain cases, when the receiver receives a non-binary codeword (e.g., moderate to long block length where the non-binary decoders do not have a clear advantage compared to the binary decoders) from the binary input channels and only limited computational resources are available, the consideration of using binary decoders is natural and practical for a fast and correct information recovery. However, the binary representation of a non-binary parity check matrix has numerous bit-level cycles, even if there is no symbol-level cycle [14] , [20] in the non-binary parity check matrix. Thus, in [14] , [15] , the authors introduce the (punctured) extended binary representation for the non-binary LDPC code to solve this issue. When there is no symbol-level cycle, this representation will also be cycle-free. In [20] , the authors propose a hybrid hard decision decoder particularly for the BEC which eliminates the local decoding cycles by introducing matrix inverse operations. Decoding in [20] has lower computational complexity than the decoding of the extended binary representation. In addition, the authors in [21] show how to optimize the binary representation of a nonbinary parity check matrix with the perspective of stopping set.
B. Contributions
In this paper, we aim at further improving the bit-level decoding performance and reducing the bit-level decoding complexity. We propose bit-level decoders for different channel models to achieve enhanced decoding performance and develop a new methodology to construct the generalized binary representation to avoid the short length bit-level cycles. Specifically, we propose a hybrid hard-decision decoder and a hybrid parallel decoder for binary symmetric channel and binary input Gaussian channel, respectively. We also develop an extended p-reducible (EPR) LDPC code structure for a large class of LDPC codes with decoding complexity of O(m s ), m s < q. For the non-binary LDPC codes with short length symbol-level cycles, the EPR-LDPC codes can largely avoid the corresponding bit-level cycles. Experimental studies show that the proposed EPR-LDPC codes under the hybrid parallel decoder have a maximum 0.8dB performance gain compared to the optimized non-binary cycle LDPC codes [11] , [12] , [18] with a much lower decoding complexity.
Contributions of this paper are summarized as follows. 1) We propose an extended iterative hard decision decoder and a hybrid parallel decoder for different channel models. A simple code optimization algorithm for these binary decoders is also provided to guarantee enhanced decoding performance. 2) We propose an EPR-LDPC code structure to avoid the short length bit-level cycles. A general framework is given to model the constructions and optimizations of the EPR-LDPC codes. Significant results and conditions regarding the constructions and optimizations of the EPR-LDPC codes are also derived.
C. Organization of the paper
The contents of this paper are organized as follows. In section II, we introduce the binary representations of the non-binary LDPC code and give a unified framework for the extended binary representation. In section III, we give the details about the EPR-LDPC codes. In section ??, we give the proposed binary decoders and provide a simple code optimization algorithm. Section V presents the simulation results.
II. BINARY REPRESENTATIONS FOR NON-BINARY LDPC CODES

A. Binary Images for Non-binary LDPC Codes
Let F q be the finite field of size q = 2 p . Let F * q = F q \{0} and F N q be the q-ary column vector space of dimension-N .
be the parity check matrix. Then the non-binary LDPC code C is defined as the kernel of H. If we endow F q with a binary vector space structure, every u ∈ F q can be denoted by a binary vectorū = (ū 1 ,ū 2 , . . . ,ū p−1 ) T . As a result, each codeword x = (x 1 , x 2 , . . . , x N ) T ∈ F N q in C has its binary vector representation
, binary images of the non-binary LDPC codes.
To obtain the binary representation of H, we use the companion matrix A over F q [20] , [22] , [23] . Then we have
If we replace every h i,j in H by its binary matrix representation A i,j (also called matrix label), we get the equivalent binary parity check matrix
As a result, the equivalent binary LDPC codeC is defined as the kernel of the matrixH.
With a little abuse of the notation, in the following, we denote any binary parity check matrix byH and any non-binary parity check matrix by H. We define diag(B 1 , B 2 , . . . , B N ) as the matrix
where B j , j = 1, 2, . . . , N , are not necessarily to be square matrices.
We also define ⊗ as the kronecker product of two matrices. Let B = (b i,j ) m×n and B ′ = (b ′ i,j ) h×g be two arbitrary matrices. The kronecker product of B and B ′ is an mh × ng matrix
B. Extended Binary Representation for Non-Binary LDPC Codes
In this subsection, we give a unified framework for the extended binary representation. Let N be the set of natural integers including 0 and N * = N\{0}. Let N q = {0, 1, . . . , q − 1} and N * q = N q \{0}. For an arbitrary matrix B, we denote the entries of B by B(i, j), i, j ∈ N, where i and j are the row number and column number, respectively. In addition, B(i, 0) represents the i th row vector, B(0, j) represents the j th column vector. Let I p×p be the p × p identity matrix. The extended representation is based on the equivalent binary LDPC code, which begins with a linear transformation of a binary vectorx j ∈ F p 2 [14] . Let Φ be the p × (q − 1) binary matrix of the following form Φ = (Φ(0, 1), Φ(0, 2), . . . , Φ(0, q − 1)),
where each column vector Φ(0, j), j = 1, 2, . . . , q − 1, is the binary representation of j ∈ N * q . Letx j be the binary vector representation of the coded symbol x j and v j = Φ Tx j ∈ F q−1
2
. Note that Φ is the parity check matrix of the [q − 1, q − 1 − p] hamming code. So, each v j is also a codeword of the simplex code (dual code of the hamming code). The extended binary representation of x is then
In addition, for each non-zero A i,j , we can get a (q − 1) × (q−1) matrix Ω i,j by utilizing an endomorphism of N q and an isomorphism between N q and F p 2 [14] . If we replace the nonzero A i,j inH by Ω i,j and the zero A i,j by 0 (q−1)×(q−1) , we get the extended binary parity check matrix Ω = (Ω i,j ) M×N . Then Ωv = 0 and the simplex constraints on v together form the extended binary representation.
In section III, we will introduce a matrix map f ω for constructing and optimizing the parity check matrix of the EPR-LDPC codes. We also notice that the matrix map can be utilized to simplify the constructions of Ω. As a result,
. Details about f ω is given in section III. Here, we only present the unified framework for the extended binary representation.
III. EXTENDED p-REDUCIBLE CODE
In this section, we give the general framework to model the constructions and optimizations of the EPR-LDPC codes and show how to design EPR-LDPC codes by satisfying some girth constraints.
A. Definition of the EPR-LDPC Codes
We first give the definition of p-reducible codes. Definition 1 (p-reducible): Given a binary parity check ma-trixH and an integer p > 1. The binary code defined with H is called p-reducible iff, after rearranging the columns and rows ofH,H can be expressed as (A i,j ) M×N . Each A i,j is either p × p zero matrix or p × p full-rank matrix. This binary code defined withH is called strictly p-reducible iff there does not exist an integer p ′ = p such that the binary code defined withH is also p ′ -reducible.
From Definition 1, we know that the equivalent binary LDPC codes are log 2 q-reducible. The length-12 (3,4)-regular Gallager code in [24] is a strictly 3-reducible code. For the quasi-cyclic binary LDPC codes, we notice that a circulant is full-rank iff its associated polynomial [25] and 1 − x n has only one common zero. Then if the (N p, (N − M )p) binary quasi-cyclic LDPC codes are constructed with p×p full rank circulant matrices, these codes are p-reducible. In [26] , the authors show that the parity check matrices of irregular repeat accumulate (IRA) codes can be constructed by an array of some circulant permutation matrices. The resulting parity check matrices are composed of identity matrices and the circulant permutation matrices. Let the size of the circulant permutation matrices be p × p. Then these IRA codes are p-reducible. Moreover, if the protograph LDPC codes are obtained by filling the base matrices with p × p full rank matrices and zero matrices, the resulting codes are also preducible codes. The above examples are only a small list of the p-reducible codes.
Next, we give the definitions that will be used in the following sections. Definition 2: The mother matrix Λ p of a binary matrixH or of a non-binary matrix H is defined as a matrix with each entry being either 0 or 1. The binary matrixH can be obtained by replacing the 0s by 0 matrices of size p × p and the 1s by non-zero matrices of size p × p. These p × p matrices are also referred to as the matrix labels. The non-binary matrix H can be obtained by replacing the 0s in Λ p by the zero element in F 2 p and the 1s by the non-zero elements in F 2 p . Cycles in Λ p or H are referred to as the symbol-level cycles. Cycles inH are referred to as the bit-level cycles.
. . , N } be a matrices set, where Ψ e iq j , i q = 1, 2, . . . , M (q − 1) are matrices with p rows and p ′ j q − 1 columns. Let f ω be a matrix map and f ω (Ψ e iq j , Ψ j ) be a matrix with p ′ j columns. If each non-zero A i,j is a full rank matrix, we define f e as a function that takesH and Ψ e j , j = 1, . . . , N as inputs and outputs
. Below, before the detailed constructions, we give the definition of the EPR-LDPC codes to provide some general ideas.
Definition 4 (EPR-LDPC):
Let Ψ e = {Ψ e j , j = 1, 2, . . . , N } be the extended generator matrices set. Each Ψ e j is a full-rank binary matrix with p rows and p ′ j q − 1 columns and the non-zero columns in each Ψ e j are different from each other. Let
T andx is the binary codeword ofH. We associate eachH c j with a matrix setΨ e j . Then, the EPR-LDPC code is defined as the kernel of the parity check matrix Ω e = f e (H,Ψ e 1 ,Ψ e 2 , . . . ,Ψ e N ), such that Ω e · v e = 0 and the matrices inΨ e j has the same column number as Ψ e j . The above definition is very broad. It does not only defines more generalized binary representations for the non-binary LDPC codes, but also defines the generalized representations for a large class of binary LDPC codes. All the p-reducible codes can be connected to the EPR-LDPC codes.
B. Mapping definition and Examples
In this subsection, we give the details about the matrix map f ω . We begin with the basic notations and definitions. Let wt(·) be the function that calculates the number of non-zero columns in a matrix or of the non-zero elements in a vector.
Definition 5 ( ): We denote the relationship between two vectors a, b by a b if a is obtained by replacing some elements in b by zeros. For two matrices A, B, we denote A B if A is obtained by replacing some column vectors in B by zero vectors.
Definition 6 (≺): We denote the relationship between two vectors a, b by a ≺ b if a b and wt(a) < wt(b). For two matrices A, B, we denote the relationship between them by
Note that Ψ e j has different non-zero vectors as its columns and Φ has all the non-zero vectors in F p 2 as its columns. The non-zero column vectors in each Ψ e j form a subset of the column vectors in Φ. Without loss of generality, we assume that Ψ e j Φ for all j ∈ {1, 2, . . . , N }. Since the zero columns in Ψ e j will result in zero bits in v e j which can be ignored or readily removed, this assumption does not violate the Definition 3-4 and will facilitate the discussion of EPR-LDPC code too. With a little abuse of notation, we use f ω (Ψ e iq j , Ψ j ) to denote the resulting binary matrix and
represent different additions between the bits in v e j . To have a better understanding, we give simple examples for f ω . 
Example 1:
Since the extended generator matrix Ψ e j has different non-zero columns, the corresponding bits in v e j will represent different combinations of the bits inx j . Moreover, the additions between different binary parity check equations withinH T ix = 0, i ∈ {1, 2, . . . , M } can be formulated as Φ THT ix = 0 which will result in q − 1 different binary parity check equations [14] , [27] . We divide the q − 1 binary parity check equations into N partitions with the j th partition consisting of different combinations of the bits inx j , i.e., Φ T A i,jxj . If we set some of the q − 1 equations to be zero equations, then there exist only one Ψ e iq j for the j th partition such that the q − 1 rows of f ω (Ψ e iq j , A i,j ) respectively represents the q − 1 rows within the j th partition, e.g., if p = 3 and
If we set the first and third rows in Ω i,j to be zero vectors, then we have
More details are displayed in Fig 1 .
Note that each v e j is a codeword generated by Ψ e j and a combination of the bits inx j can be represented as a nonzero entry in f ω (Ψ e iq j , A i,j ). f ω can be also used to represent some parity check relationships for the bits in one v e j . The construction of such matrices is trivial, so we leave it for briefness. Moreover, different additions of the rows ofH and the combinations of the parity check relationships for each v e j can all be represented by f ω .
C. Main Properties of the Mapping
f ω (B, Ψ j ) are necessary and sufficient conditions for each other.
Proof: Since Ψ j is a p × p full rank matrix, all the Ψ T j Φ(0, i ′ ), i ′ = 1, 2, . . . , q − 1 are different column vectors. Then f ω (Φ, Ψ j ) will have only one non-zero entry in each row or column.
Φ, the zero columns in B will result in zero rows in f ω (B, Ψ j ). Then f ω (B, Ψ j ) can be obtained by setting some rows of f ω (Φ, Ψ j ) to be zero vectors. Since f ω (Φ, Ψ j ) have only one non-zero entry in each column, then some columns become zero vectors in
, it means that the columns in B generating the zero rows in f ω (B, Ψ j ) are set to be zero vectors. Since there is a one-to-one correspondence between
. This completes the proof.
In the following, if each A i,j inH is replaced by f ω (Φ, A i,j ), we denote the resulting matrix by
for all i and j, Ω coincides with the parity check matrix for the extended binary representation. According to Lemma 1, we also have the following properties of Ω.
Lemma 2: 1) For all the non-zero A i,j , i ∈ {1, 2, . . . , M }, j ∈ {1, 2, . . . , N }, the column and row weights of the corresponding
Then the row weights of Ω T i are the same and equal to the weight of Λ p (i, 0). Let Λ p (0, j), j = 1, 2, . . . , N be the j th column vector of Λ p . Then the column weights of Ω c j are equal to the weight of Λ p (0, j). Degree distributions of Ω are the same as those of Λ p .
Note that, IfH has the following form
where Ψ j , j = 1, 2, . . . , N are p × p full-rank matrices, then the parity check matrix Ω associated with Eq. (1) has the following form
(2) According to the first item in Lemma 2, Ω in Eq. (2) is composed of q − 1 disjoint Λ p s. As a result, codes defined with these Ωs including the extended binary representation will cause performance loss. Actually, if each Ψ j is the matrix representation of a non-binary symbol in F q , Eq. (1) defines the equivalent binary parity check matrix of the non-binary column-scaled LDPC (CS-LDPC) code [28] (including the non-binary QC-LDPC codes and the finite geometry nonbinary LDPC codes).
D. General Framework for Constructing and Optimizing the EPR-LDPC Codes
In this subsection, we present a general framework for the exhaustive search of Ω e . Since the non-zero bits in v e j represent different combinations of the bits inx j , the parity check relationships for v e can be obtained by finding the parity check relationships for the corresponding combinations and the desired Ω e can be constructed by searching among different combinations of the parity check relationships for v e . Definition 4 may imply that we should search for Ω e based on a given Ψ e . However, in order to guarantee enhanced decoding performance for Ω e , we first determine the desired Ω e then we update Ψ e . That is, 1) Based onH and Φ, we find and store some of the parity check relationships for v. 2) Using these parity check relationships, we construct
different Ω e s row by row such that the new row does not introduce cycles smaller than certain integer. 3) We find the Ω e with the desired performance threshold among these Ω e s. Then, we update Ψ e and v e . By utilizing f ω , we can model the searching processes (Step 2 and Step 3) as choosing properΨ e j for all j ∈ {1, 2, . . . , N }.
Then Ω e is obtained by replacing each
) and each f ω (Ψ e iq j , A i,j ) corresponds to a row in Ω e i,j (some of Ψ e iq j s could be zero matrices).
Note that A i,j s are not necessarily to be the matrix representations of the non-binary symbols of F q . Moreover, we refer to Ψ e j (0, 2 i−1 ) = 0, ∀i ∈ {1, 2, . . . , p} as the trivial case for the EPR-LDPC code. If each A i,j is replaced by
, the resulting Ω e coincides with the matrix Ω. If Ψ e j Φ, j = 1, 2, . . . , N and Ω e = Ω, the resulting codes is a class of punctured EPR-LDPC codes for parity check matrix Ω (the decoding messages for the punctured bits are set to be 0s in the soft decision decoders [29] ). It is easy to verify that codes in [15] form a trivial case of this punctured EPR-LDPC code for non-binary LDPC code.
E. Bit-level Cycles in Ω
In the previous subsection, the matrix map f ω is introduced to give the definition of the EPR-LDPC code and formulate the exhaustive searching of Ω e . In this subsection, we investigate the relations between the symbol-level cycles in Λ p and the bit-level cycles in Ω based on the properties of f ω . In general, we assume that Λ p is of girth g h . Λ p is cycle-free if g h = 0. Before the detailed demonstrations, we first give the definition for the matrix cycle.
Definition 7 (matrix cycle): Given a binary parity check matrixH. Let Λ p be its mother matrix. A matrix cycle of length-g inH exists iff its corresponding positions in Λ p form a symbol-level cycle of length-g.
Lemma 3: If the girth of the mother matrix Λ p is g h > 0, then the girth of the associated parity check matrix Ω for the p-reducible code is g s g h , which is caused by the length-g s symbol-level cycle in Λ p . If g h = 0, g s = 0.
Proof:
Since Ω i,j is a (q − 1) × (q − 1) permutation matrix and cycle-free (due to the first item in Lemma 2), if Λ p contains no cycle, Ω has no bit-level cycle. Moreover, a cycle in Λ p can only cause matrix cycle in Ω with the same length. The matrix cycle only contains bit-level cycles with the same length. And because Ω i,j is not equal to I (q−1)×(q−1) , a matrix cycle can not always cause the bit-level cycles with the same length. Thus, the girth of the binary parity check matrix Ω is not less than the girth of its mother matrix Λ p .
For the non-binary parity check H, if H satisfies the cycle-free condition, its associated Ω will also satisfy the cycle-free condition. A bit-level cycle in Ω is caused by the symbol-level cycle of the same length in H. Moreover, when H is constructed with cycles, investigations indicate that, among the cycles, the length-4 cycles contribute the most to the performance degradation. Next, we show that a length-4 symbol-level cycle in H will not always result in length-4 bit-level cycles in Ω.
Theorem 4: Let the non-zero matrix labels be uniformly taken from F * q . And the probability that a length-4 symbollevel cycle in the non-binary parity check matrix H will result in length-4 bit-level cycles in Ω based on f ω is denoted by p 4 . Then
Proof: Since the length-4 bit-level cycles are only caused by the length-4 symbol level cycle, we only consider the bitlevel cycles within a symbol-level cycle. Let (i 1 , j 1 ), (i 1 , j 2 ), (i 2 , j 1 ), (i 2 , j 2 ) be the four coordinates of four entries that represent a length-4 symbol level cycle in H. Let
be the matrix cycle corresponding to a length-4 symbol-level cycle. Let α 1 , β 1 , α 2 , β 2 ∈ {1, 2, . . . , q−1} respectively represent the column numbers of non-zero entries in Ω i1,j1 , Ω i1,j2 , Ω i2,j1 and Ω i2,j2 with α 1 , β 1 in the same row and α 2 , β 2 in the same row. Let S 1 = {(α 1 , β 1 ), α 1 , β 1 ∈ {1, 2, . . . , q −1}} and S 2 = {(α 2 , β 2 ), α 2 , β 2 ∈ {1, 2, . . . , q − 1}} be the twotuple sets containing all the different rows in (Ω i1,j1 , Ω i1,j2 ) and (Ω i2,j1 , Ω i2,j2 ), respectively. Then,
Let S be the set containing all the rows that could be involved in the length-4 matrix cycles. Let S = {(α, β), α, β = 1, 2, . . . , q − 1} and |S | = (q − 1) 2 with S 1 , S 2 ⊂ S . The length-4 bit-level cycle exist iff Pr(S 1 ∩ S 2 = ∅) = 1 − Pr(S 1 ∩ S 2 = ∅). We can calculate the probability of S 1 ∩S 2 = ∅ by counting the number of choices of S 1 and S 2 over S . Since there are q−1 different non-zero Ω i,j s, different Ω i,j s have different row numbers of the same row-vectors and no two different S i s have common elements, different S i s divide S into q − 1 disjoint subsets. And because each S i is uniformly chosen, then for a S 1 , there exist (q−2) S 2 s that do not form cycles. As a result, Pr(S 1 ∩ S 2 = ∅) = (q−1)(q−2) (q−1) 2 .
From Theorem 4, we know that, as the field size increases, the probability that a length-4 symbol-level cycle in H results in length-4 bit-level cycles in Ω will be significantly reduced.
Corollary 5: For any p-reducible code, let the matrix labels be chosen uniformly over a set {B g , g = 1, 2, . . . , Q}. If there exist an integer P Q such that rank(f ω (Φ, B gi ) + f ω (Φ, B gj )) = q − 1 for all i = j, i, j ∈ {1, 2, . . . , P }, then the probability that a length-4 symbol-level cycle in Λ p will result in length-4 bit-level cycles in Ω, i.e., p ′ 4 , satisfies
and P q − 1 for q = 2 p 4. When P = 1, p ′ 4 = 1. Proof: The P matrix labels result in at most q −1 disjoint subsets of S then P q − 1. The proof for the above inequality which results from the different values of Q − P is similar to the proof of Theorem 4.
According to Corollary 5, p ′ 4 can be minimized by enlarging q and minimizing Q − P . Consider a short length matrix cycle of length-g c , g c > 4. Based on the proof of Theorem 4, we suppose that the probability of the existence of the corresponding bit-level cycles is small too and relates to both q and g c . We also have the following observation for the short length cycles with lengths larger than 4.
Observation 1: 1) For a p-reducible code in Corollary 5, the probability that a symbol-level cycle of length-g c in H will cause corresponding bit-level cycles in Ω is bigger than 1 q−1 .
2) This probability increases as the length of the symbollevel cycle increases and decreases as q = 2 p increases.
F. Design of EPR-LDPC Codes according to Ω
In this subsection, we show how to efficiently find the parity check matrix Ω e with certain girth. The resulting Ω e can be also seen as a significant generalization of Ω for both binary and non-binary LDPC codes. Compared to the Ω, Ω e will have less bit-level cycles. In addition, superior to Ω in Eq. (2), Ω e will not be composed of disjoint sub-matrices. And for any girth-optimized p-reducible code, Ω can be constructed to further improve the decoding performance. First, according to Observation 1, if q is large enough, Ω can be constructed with the short length bit-level cycles being largely avoided in many cases. In these cases, we can always further avoid some short length bit-level cycles by simply changing the associated matrix labels A i,j s inH and searching for the proper Ω i,j s that do not form cycles. However, for some p-reducible codes, e.g., the codes in Corollary 5 with P = 1 and the non-binary CS-LDPC codes etc., the short length symbol-level cycles will always cause corresponding short length bit-level cycles in Ωs. In these cases, if Λ p is constructed with some length-4 symbol-level cycles, we can still avoid few length-4 bit-level cycles in Ω. That is, if the weight of one row is 3, we can add the smaller weight row to the larger weight row such that the resulting row is not of weight-1. However, this row addition operation can only handle limited number of length-4 bit-level cycles. As to the short-length cycles with lengths larger than 4, it is hard to avoid them without altering the structure of Ω. In addition, the Ω in Eq. (2) is composed of disjoint submatrices which will cause performance loss. So, we need to find another representation ofH which will generally result in good performance. Actually, since Ω has already avoided a large number of corresponding short length bit-level cycles in many cases and some cycles in Ω can be carefully handled, we could find the desired representation, i.e., Ω e , more efficiently from Ω instead of searching among numerous parity check combinations. We give the details below.
Step 1 : Let q = 2 p and p > 1. We construct a binary matrix set {B ′ 1 , B ′ 2 , B ′ 3 , . . .} with each B ′ i being a cyclefree 2 × (q − 1) or 2 × 2(q − 1) matrix. In addition, B ′ i ·v j = 0, ∀i, j or (B ′ i (0, 1), . . . , B ′ i (0, q−1))·v j = 0 and (B ′ i (0, q), . . . , B ′ i (0, 2q − 1)) · v j = 0, ∀i, j.
Step 2 : Given a parity check matrixH in Definition 1 with mother matrix Λ p . We construct Ω by using f ω . If Λ p is constructed with length-4 cycles, we use the row addition operation to eliminate some length-4 bit-level cycles.
Step 3 : Let g s be an even number. For the matrix cycles with length less than g s that results in bit-level cycles in Ω, we set the rows across the matrix cycles to be zero vectors and rearrange these zero rows to the lower part of the resulting matrix. Then, as illustrated in Fig 2, we place the B ′ i s that will not cause bit-level cycles with length less than g s one by one within these zero rows (at the non-overlapped column-positions). The resulting matrix is denoted by Ω e . Note that, given a practical p-reducible LDPC code, the row addition operation in Step 2 can be omitted as the length-4 cycles in Λ p are in general eliminated. Then, we can only use the row replacing operation in Step 3 to handle the bit-level cycles. Moreover, the codes defined with Ω e and Ω tend to have larger code lengths and code spaces than their associated binary LDPC codes. How to obtain the correctx and avoid the undetected errors for v e apart from enlarging the girth of Ω e will be addressed in the next section.
IV. BIT-LEVEL DECODERS FOR THE EPR-LDPC CODES
A. General Sum-Product Decoding
Let C be the non-binary LDPC code. LetC be the equivalent binary LDPC code. Let C e be the EPR-LDPC code. Let C e j be the binary LDPC code generated by Ψ e j . Obviously, there exists the following isomorphism
The above equation implies that to obtain an enhanced decoding performance ofC, we should decode v e by utilizing the parity check relationships for C e and C e 1 × C e 2 × · · · × C e N simultaneously. If Ψ e j = Φ, j = 1, 2, . . . , N , Eq. 4 represents the isomorphism between (punctured) extended binary representation and its non-binary counterpart [14] , [15] . The decoding applications of the extended binary representation over general channel models are given in [27] . For arbitrary p-reducible code, the above isomorphism also exists. To obtain the best decoding results ofC, each wt(Ψ e j ) should be large enough (more parity check bits will be involved). On the other hand, large wt(Ψ e j ) will results in higher decoding complexity in general. Thus, there exist a trade-off between the choices of Ψ e j , j = 1, 2, . . . , N and the decoding performance. Then to have the optimized Ψ e j s, we have to maximize each wt(Ψ e j ) while minimize the probability of the existence of the short length bit-level cycles. Next, we first show how to obtainC. Then we optimize the extended generator matrices set.
Assume thatx = (x T 1 ,x T 2 , . . . ,x T N ) T is transmitted over binary input channels. Letȳ = (ȳ T 1 ,ȳ T 2 , . . . ,ȳ T N ) T be the received sequence. The proposed decoder is a class of binary decoders which is implemented to make decisions both on x and v e . In the following, we first give the general sumproduct decoding procedure for the proposed decoders. Then we develop two variants for different channel models.
Ψ e
Ω ē
x v e c e Fig. 3 . General decoding procedure for the proposed binary decoders.
As shown in Fig 3, the bits inx are represented as bit nodes. The bits in v e are represented as extended bit nodes. Every rows of Ω e are represented as constraint nodes in c e . Then the general decoding procedure is described as follows.
Step 1 : Let µ (0)
x be the message from channels. The message µ is further tailored based on the generator matrices set Ψ e .
Step 5 : For iteration-h, if the hard decision of v e isv e which satisfies Ω eve = 0, then we obtainx from v e according to Ψ e . Since we use a binary decoding process and the zero columns in each Ψ e j and Ω e can be removed, the computational complexity for the check-vector-sum operation relies linearly on the number of the non-zero columns in Ω e i , i = 1, 2, . . . , M . The computational complexity of tailoring µ (l+1) v relies linearly on the non-zero columns in Ψ e j , j = 1, 2, . . . , N . Let the maximum number of the non-zero columns in each Ω e i be φ e q − 1 and the maximum number of the non-zero columns in each Ψ e j be ψ e q − 1. Then the computational complexity is dominated by O(m s = max{φ e , ψ e }).
In the general decoding procedure, when the decoding of v e over Ω e is accomplished, we have to get everyx j from v e j . To guaranteex j being successfully resolved from v e j , we also provide the following conditions for the extended generator matrices.
Theorem 6: Consider the p-reducible codes. For all j ∈ {1, 2, . . . , N } and q = 2 p 4, 1) if wt(Ψ e j ) > q 2 − 1, every bits inx j can be resolved from v e j . 2) If wt(Ψ e j ) = q 2 − 1,x j can be resolved with probability
.
Proof:
Recall that Φ is a p × (q − 1) matrix and
, Ψ e j (0, 1), Ψ e j (0, 2), . . . , Ψ e j (0, q − 1)} be the set that formed by the column vectors of Ψ e j . Then wt(Ψ e j ) = |V e j | − 1. Let the set V ′ = {Φ(0, 1), Φ(0, 2), . . . , Φ(0, 2 p−1 )} be the set of all unit vectors. Then the non-zero vectors in V and V e j can be formulated by the additions of the vectors in V ′ .
If |V e j | is larger than the size of the (p − 1)-dimensional subspace of V , then rank(Ψ e j ) = p. Every bits inx j can be resolved. The size of the (p − 1)-dimensional subspace can be calculated by
i , the rank of Ψ e j is either p or p − 1. Then the probability thatx j can be resolved equals the probability that the non-zero vectors in Ψ e j do not form a (p−1)-dimensional subspace, which depends on the number of the (p − 1)-dimensional subspaces. To calculate the number of the (p − 1)-dimensional subspaces of the V , we first introduce the Gaussian binomial coefficient over finite field F q
Then the number of the the (p−1)-dimensional subspaces over F 2 is calculated by p
The probability that x j can be resolved when wt(Ψ e j ) =
In addition, ifx j can be resolved, wt(Ψ e j ) is at least the size of a basis of a dimension-p vector space over F q , i.e., wt(Ψ e j ) log 2 q, j = 1, 2, . . . , N . Thus, the least number of non-zero columns required for each Ψ e j is p. However, if wt(Ψ e j ) = p, ∀j, the desired parity check matrix Ω e may not exist. Theorem 6 provides sufficient conditions for the successful decoding ofx. Moreover, large wt(Ψ e j ) generally results in better decoding performance. For the punctured EPR-LDPC code decoded over parity check matrix Ω, if all the punctured bits are recovered we can use Φ instead of Ψ e j to resolvex j .
B. Decoding over Different Channels
Next, we apply the general decoding procedure to the binary symmetric channel (BSC).
Example 2: In this example, we present an extended iterative hard decision decoder for BSC. Let ⊞ be the bit-wise addition of the vector space over F 2 . Then, for simplex code [30] 
. As a result, the iterative decoding procedure is described as follows.
Step 1 : Letv e be the message for the extended bit nodes which is initialized by the value of Ψ eT jȳ j , j = 1, 2, . . . , N and b be the thresholds to perform the bit-flipping.
Step 2 : If z = Ω eve = 0 then v e =v e . Else, s = z T Ω e = (s j ) 1×N (here is the decimal multiplication). For all
Step 3 : Stop the procedure when Ω eve = 0 or the maximum iteration number is reached. Then for the trivial case,x j = (v e j (1) , v e j (2), . . . , v e j (2 p−1 )) T . Below, we show how to apply the BP algorithm into the decoding of the EPR-LDPC code over binary input Gaussian channel.
Example 3: We give a hybrid parallel decoder for the EPR-LDPC codes by using the BP decoder and the hard decision decoder in example 2. The BP decoder and hard decision decoder exchange decoding messages iteratively. We consider one decoding round is finished iff these two decoders have exchanged information once. A (µ, ν) decoding round is a decoding round within which the BP decoder has performed µ times decoding iterations and the hard decision decoder has performed ν times decoding iterations. Different from example 2, we choose to transmit v e instead ofx as in the general decoding procedure. Assume BPSK is utilized. Let y e be the received sequence. Then the decoding process is described below.
Step 1 : Initialize the message for the v th extended bit node by µ
v,c = 2 σ 2 y e (v) and the message for the c th constraint node by ω
where N c is the extended bit nodes set connected to the c th constraint node.
where M v is the constraint nodes set connected to the v th extended bit node.
Step 4 : For iteration-µ in a (µ, ν) decoding round, let the hard decision bev e . We apply the hard decision decoder in example 2 for ν times. Ifv e (v) = 1, µ
v,c |. Then, go to step 2.
Step 5 : For iteration-hµν, if the hard decisionv e sat-
isfies Ω eve = 0, then v e =v e andx j = (v e j (1) , v e j (2), . . . , v e j (2 p−1 )) T for the trivial case. Ifx is transmitted in example 3, the initialization of the messages for the extended bit nodes can be performed as follows. First, let v e (v) = i ′ ∈Svx (i ′ ), where S v is the set containing all the bit nodes connected to the v th extended bit node. Let hard(·) be the hard decision function. And hard(v e (v)) = i ′ ∈Sv hard(x(i ′ )). Then the initialized messages for the extended bit node v e (v) are
The decoding procedure is the same as in example 3. Each bit in v e is then transmitted over a copied Gaussian channel.
To evaluate the performance of an EPR-LDPC code ensemble over copied Gaussian channels, we use the Monte-Carlo experiments for infinite LDPC codes introduced in [2] . That is, by simulating an infinite long EPR-LDPC code from the code ensemble, we evaluate the performance in terms of the minimum signal to noise ratio (MSNR), i.e., T b , for which the average syndrome bit entropy reaches certain value after a number of decoding iterations.
C. Code Optimization for The Binary Decoders
For the mother matrix Λ p , how to optimize the matrix labels for the non-binary decoders have been studied in [2] , [12] . The authors in [2] , [12] propose several optimization methods based on the equivalent binary LDPC codes. The degree distributions for the resultingH can be efficiently calculated according to [20] . As to the EPR-LDPC code, we can optimize the matrix labels according to Corollary 5. For different associated p-reducible codes, the optimized matrix labels for the same p will also be very different because different p-reducible codes use matrix labels set with different structures. Moreover, to obtain enhanced decoding performance of EPR-LDPC codes, just optimizing the matrix labels is not enough. We have to carefully construct the parity check matrix Ω e and the extended generator matrices set Ψ e too. In the following, we present a simple algorithm based on the proposed framework to achieve this goal. That is, after we have the matrix labels optimized, we guarantee that each wt(Ψ e j ) is large enough and optimize the girth and degree distributions of the Ω e . Λ p is assumed to be constructed by the modified progressive-edge-growth (PEG) algorithm. One also can construct Λ p by other random methods or with specific structures. We can either fix the code length and change Λ p or fix the Λ p and change the code length for different purposes. Details are given as follows.
Step 1 : The binary parity check matrixH is obtained by filling Λ p with the optimized matrix labels of size p × p according to Corollary 5. Let ψ > q 2 − 1 and φ > 0 be two non-zero integers. Let T b be the MSNR in dB. We search for Ω e with the generator matrix set Ψ e and the associatedΨ e j , j = 1, 2, . . . , N satisfying wt(Ψ e j ) ψ, j = 1, 2, . . . , N , wt(Ψ e i j ) φ, i = 1, 2, . . . , M . And the MSNR for the resulting degree distributions does not exceed T b (for short block length codes, we drop the MSNR examinations).
Step 2 : For an even number g s , we check if the girth of Ω e is not smaller than g s . If the girth of Ω e is smaller than g s , then p = p + 1 and go to step 1.
Step 3 : When p is large enough, we may change some Ψ e iq j s which generate the rows across the associated matrix cycles to further eliminate some bit-level cycles. Then we updateΨ e j , j = 1, 2, . . . , N and its associated Ψ e .
V. SIMULATION
A. Different binary forms of a non-binary LDPC code
In this subsection, we present the simulation results for different representations of a non-binary LDPC code under different decoders. No undetectable error is observed in our simulations. We consider the code over F 8 of rate R = 0.5311 with length-12000 bits. To have a fair comparison, the code we have found has similar MSNR for different representations.
Let v e = v and Ω e i = Ω i for some i, i.e. the block lengths for Ω e and Ω are the same. We search for Ω e by using the method in Section IV-C. Degree distributions and MSNRs for H,H, Ω e and Ω are displayed in table II and table III. The  code defined with Ω e is R = 0.5355. The MSNR for Ω e is E b /N 0 = 0.73dB. The MSNR for Ω is E b /N 0 = 0.68dB, while the capacity limit is E b /N 0 = 0.30dB. The comparison is shown in Fig 4, where HEPR (hard decision decoder for the EPR-LDPC code) is the extended hard decision decoder for Ω e , SEPR (soft decision decoder for the EPR-LDPC code) is the hybrid parallel decoder for Ω e , QSPA is the q-ary sumproduct decoder for H, SEB (soft decision decoder for the equivalent binary LDPC code ) is the binary BP decoder for H and SER (soft decision decoder for the extended binary representation) is the hybrid parallel decoder for Ω. Due to the short length bit-level cycles inH, SEB suffers from a performance loss of about 1dB. Decoding complexity per each check-sum for QSPA is O(q 2 ). In our simulation, SEPR achieves second place with much lower decoding complexity while the performance gap to QSPA is within 0.2dB. Moreover, SEPR outperforms SER for the same block length.
Consider the non-binary LDPC code of rate half over F 16 characterized by
The associated EPR-LDPC code with block length-2048 bits is optimized by the algorithm in section IV-C. Then we give the performance comparison under different decoders in Fig 5. In this example, SEPR also outperforms SER and is more close to the QSPA than SER.
B. Short length optimization
Short length cycles can cause severe performance degradation for short length block codes. We eliminate the short length cycles for Ω e in this subsection and give the comparative results for different outputs of the optimization in section IV-C which are displayed in table I. Letx be the bit sequence transmitted over the binary input Gaussian channels. H is constructed by modified progressive edge growth (PEG) method and the hybrid parallel decoder is adopted. Let the block length be 360 bits. Let M s = j wt(Ψ e j ) be the length of v e and g s be the girth. If the non-binary (3,6)-regular LDPC code is adopted, we give the performance comparison in Fig 6. The 32-ary LDPC code achieves the best performance in our simulation due to the optimization both on the girth and field size. 
C. Comparison of codes from literature
Consider the non-binary LDPC code of rate half over F 16 in Section V-A. We compare the performance of the EPR-LDPC code with the optimized non-binary cycle LDPC codes (optimized under similar assumptions) and the girth optimized binary LDPC codes in the literature. In Fig 7, SPB59 is the sphere packing bound for block length-2048 bits. The codes from [11] is the non-binary cycle code with length 5376 bits. The code from [12] is the non-binary cycle code with length 2048 bits. The code from [18] is the non-binary cycle code with length 3000 bits. These codes are optimized for nonbinary decoders. The code from [9] is the (3,6) QC-LDPC code with length 2294 bits. The code from [10] is the PEG-LDPC code with length 2694 bits. These codes are optimized for binary decoders. Our irregular EPR-LDPC code with block length-2048 bits (decoded by the decoder in example 3) outperforms others even with much shorter length than the codes in [11] and much smaller field size than the code in [12] . The EPR-LDPC code has achieved a maximum 0.8dB (at BER=10 −4 ) performance gain compared to the optimized non-binary cycle LDPC codes with a much lower decoding complexity.
D. Binary Erasure Channel
In this subsection, we investigate the performance of the EPR-LDPC code over BEC with different size of matrix labels. Consider the p-reducible code of rate half. Let Λ p2 be a 500× 1000 mother matrix. The degree distributions are displayed in table III. The hybrid parallel decoder in Example 3 is altered for BEC and the MSNR is 0.49. If p = 2, 3, 4, 5, 6 and Λ p2 is filled with randomly generated matrix labels, we compare the performance for different Ω in Fig 8. As the block length (N q = 1000(q − 1)) increases, the decoding curves approach the MSNR as we expect. 16-ary EPR-LDPC 256-ary code from [11] 256-ary code from [12] 16-ary code from [11] 64-ary code from [18] binary code from [10] binary code from [9] QSPA 0.038x 8 0.046x 6 0.248x 9 0.026x 7 0.004x 9 0.026x 7 0.136x 10 0.007x 8 0.007x 8 0.044x 11 0.001x 9 0.001x 9 0.008x 12 0.002x 16 0.001x 17 0.008x 17 0.004x 18 0.019x 18 0.012x 19 0.032x 19 0.021x 20 0.041x 20 0.029x 21 0.039x 21 0.031x 22 0.030x 22 0.026x 23 0.020x 23 0.019x 24 0.014x 24 0.015x 25 0.014x 25 0.015x 26 0.016x 26 0.019x 27 0.017x 27 0.021x 28 0.015x 28 0.020x 29 0.011x 29 0.016x 30 0.007x 30 0.011x 31 0.004x 31 0.006x 32 0.002x 32 0.003x 33 0.001x 33 0.001x 34 T b -0.18dB 0.59dB
VI. CONCLUSION
When there is no symbol-level cycle, the EPR-LDPC code will not have any bit-level cycle. Superior to the extended binary representation, the parity check matrix of EPR-LDPC code will not be composed of disjoint sub-matrices too. When there exists short length symbol-level cycles, the EPR-LDPC code can largely avoid the corresponding short length bit-level cycles. Decoding of the EPR-LDPC code by the proposed decoders (the hybrid hard-decision decoder and the hybrid parallel decoder) is capable of achieving computational complexities of O(m s ) where m s < q. Simulations show that the EPR-LDPC code outperforms the extended binary representation with the same block length. In addition, compared to the optimized non-binary cycle LDPC codes under non-binary decoders, the EPR-LDPC code under the proposed decoder achieves a maximum 0.8dB performance gain. 0.138x 0.002x 3 0.102x 0.02x 7 0.235x 2 0.004x 4 0.183x 2 0.095x 8 0.140x 3 0.007x 5 0.113x 3 0.205x 9 0.084x 4 0.051x 6 0.039x 4 0.260x 10 0.075x 5 0.176x 7 0.016x 5 0.218x 11 0.052x 6 0.291x 8 0.028x 6 0.128x 12 0.024x 7 0.268x 9 0.040x 7 0.054x 13 0.006x 8 0.147x 10 0.033x 8 0.016x 14 0.001x 9 0.048x 11 0.026x 9 0.003x 15 0.001x 13 0.006x 12 0.032x 10 0.003x 14 0.038x 11 0.005x 15 0.030x 12 0.008x 16 0.016x 13 0.010x 17 0.006x 14 0.013x 18 0.001x 15 0.016x 19 0.001x 59 0.020x 20 0.001x 60 0.021x 21 0.003x 61 0.021x 22 0.006x 62 0.019x 23 0.010x 63 0.017x 24 0.015x 64 0.016x 25 0.021x 65 0.016x 26 0.027x 66 0.016x 27 0.032x 67 0.014x 28 0.035x 68 0.011x 29 0.034x 69 0.008x 30 0.031x 70 0.005x 31 0.026x 71 0.002x 32 0.020x 72 0.001x 33 0.014x 73 0.009x 74 0.006x 75 0.003x 76 0.002x 77 0.001x 78 T b 0.73dB 0.49
