A new approach for encoding one class of quasi-cyclic low-density parity-check (QC-LDPC) codes is proposed. The proposed encoding method is applicable to parity-check matrices having dual-diagonal parity structure with single column of weight three in the parity generation region. Instead of finding the parity bits directly, the proposed method finds parity bits through vector correction. While the proposed LDPC encoding scheme is readily applicable to matrices defined in the IEEE physical layer standards, the computational complexity of the post processing operation for extraction of correction vector requires less effort than solving the linear equations involved with finding the parity bit as proposed by Myung et al.
Introduction
Ever since the rediscovery of low-density parity-check (LDPC) codes [1] , making them practical has been one of the focal points of research due to their remarkable performance [2] and relatively simple decoding structure [3] . With great efforts to reduce decoding complexity, increase in hardware speed and the sophistication of very large scale integrated circuits (VLSI) has made the implementation of LDPC codes possible. Moreover, LDPC codes has been adopted in the current IEEE physical layer standards [4] , [5] , as an optional error correction coding scheme.
Compared to the effort given to reduce the decoder complexity, less attention has been paid to encoding methods. Previously, encoding was a complicated process than decoding due to the random structure of LDPC codes. However, quasi-cyclic low-density parity-check codes (QC-LDPC) [6] , [7] as a class of structured LDPC codes were introduced recently to make LDPC codes feasible even for long codewords. The quasi-cyclic LDPC codes has memory efficiency due to their algebraic structure, compared to general LDPC codes. It can be encoded in linear time with shift registers with required memory for storing QC-LDPC codes reduced by a factor 1/Z, if Z ×Z circulant permutation matrices are used. The encoding complexity has become almost linear in time by the introduction of the approximate lower triangulation of the parity-check matrix [8] , and it has become even more efficient to support faster systematic encoding by having a simple dual diagonal structure without sacrificing its overall performance [9] . The systematic encoding schemes in [8] , [9] suggests finding the parity bits directly. Given the intuitive assumption that encoding operation is based on binary arithmetic, parity bits of LDPC codes can be ultimately found through back-substitutions and bitinversions. Thus, we propose to bypass finding the initial parity vector to reduce encoding complexity and latency, and find the rest of the parity bits through vector correction technique.
Notation: a boldface letter denotes a vector or a matrix; − → (·) denotes a vector; (·)
T denotes the transpose.
Approximate Lower Triangular QC-LDPC Codes with Dual-Diagonal Parity Structure with Single Weight-Three Column
In QC-LDPC codes, the parity-check matrix H can be partitioned into circulant permutation matrix (sub-blocks) of size Z × Z. For example in [4] , three sub-block sizes are suggested, as Z = 27, Z = 54, and Z = 81. Let Z a m,n be Z × Z zero sub-block or identity matrix I with permutation located at the m-th row and the n-th column with a times cyclic shift, 0 ≤ a < Z, to the right. With basic sub-block matrices Z, mZ × nZ parity-check matrix H can be composed of
As the parity-check matrix is designed for systematic code only, the matrix H again can be divided into two regions. H s is the sub-matrix for the region where systematic bits are multiplied and H p represents the parity bit region. The QC-LDPC codes suggested in the latest high throughput PHY standards such as [4] , [5] are systematic, i.e., it encodes an information block of size k, s = (s 0 , s 1 , ...,
T of size n, by adding n − k parity bits obtained so that it must satisfy
The vectors p a and p b are defined as
T , respectively. The QC-LDPC codes
Copyright c 2012 The Institute of Electronics, Information and Communication Engineers in [4] , [5] , [9] have dual diagonal parity structure, and the parity of portion of matrix H p can be further decomposed into four sub-matrices as
where I 0 denotes identity matrix I Z×Z with zero cyclic shift. in I denotes the value of cyclic shift. The 0 in H p means null (zero) sub-matrices which have no edges connecting check to variable nodes. Furthermore,
T is the weight three columns in the parity generation region.
Encoding Procedures for Approximate Lower Triangular QC-LDPC Codes

Conventional Fast Encoding Scheme by Simple Parity-Check Matrix
The efficient encoding method proposed by Richardson, et al. [8] assumes H as an approximate lower triangular form. The parity-check matrix H is in the form
where the size of
for QC-LDPC codes defined in [4] . Thus, sub-matrices A and C correspond to systematic part H s , and sub-matrices B, D, T, E belong to parity bit generation part H p . Note that T is lower triangular with identity matrices along the diagonal. By some manipulations of matrix operations, the sectioned H matrix in (4) related to codeword vector c in (
T , is summarized as [8] As
where ET −1 is a Z × (m − 1)Z sub-matrix composed of identity matrices (i.e., [I − I I − I · · · − I I]), which accumulates rows of sub-matrix A. Note that −ET −1 B + D = I, since addition of all sub-block matrices at weight three part of matrix H p suggested in the IEEE 802.11n standard [4] results simply Z × Z identity matrix I. Solving (5) and (6) leads to direct solution of parity vectors p a and p b . Thus, each parity bit vectors can be derived as [8] 
Since
, row accumulation of weight three columns), the 1st parity vector p a can be found by simplifying (7) to p a = ET −1 As + Cs. Then, the 2nd parity vector p b is obtained by block accumulation operation and back substitution starting with plugging p a into (8).
Proposed Novel Vector Correction Method for Lower
Computational Complexity
As illustrated in Fig. 1 , the sub-matrix H p can be repartitioned into three sub matrices, as
The purpose of re-partitioning is not on simpler derivation of parity bits generation but is meant to apply different parity vector correction in sectors during the post-processing stage. The sub-matrix V corresponds to the weight three columns previously described in Sect. 2. The column boundary between Q and U is set at ((n − m 2 )Z − 1)-th and ((n − m 2 )Z)-th column. After locating the boundary between Q and U, a three-step encoding process is proceeded in the followings.
Calculate Vector Cs
Once the information vector s is given, find vector Cs for the future use in finding p a . We note that prior knowledge of cyclic shifted values defined in a sparse matrix C allows matrix operation Cs to become an accumulation operation based on shift registers [10] . 
by Back Substitution
Let Z all-zero parity bits p 0 , p 1 , ..., p Z−1 for vector p a be generated as a temporal solution. Then, find the remaining parity vector p tmp b using back substitution procedure [9] , realized as
where
Note that − → p 0 is initially equivalent to an allzero vector in the proposed method. Given the above definitions, we define the correcting vector f as
Apply Post Processing to Correct Parity Vector p tmp b
The assumption of the temporal all-zero vector p a should be validated. If it is correct, it must be true that right-hand side of the following (12) should sum up to a zero vector, as
However, it is very unlikely that the p tmp b derived by (10) is correct (i.e. vector f might be a non-zero vector in most cases).
The proposed encoding method suggests the correct solution for p a is simply p a = f. On the other hand, correct solution for vector p b is found by adding either f or f κ to p tmp b in every sub-block unit Z. Here, f and f κ are defined as I f and (I + I 0 )f, respectively. In summary, we obtain the final solution for p a and p b as
Note that " " is the cyclic shift value identical to the one designated in (3).
Theorem 1:
For a QC-LDPC codes of parity-check matrix H having an almost dual diagonal lower triangular matrix with a weight three column composed of cyclically shifted sub-blocks in the parity generation part defined in (3), the parity bit vector p a can be obtained by finding f, defined in steps from (10) to (11).
Proof: Since Hc = 0 should be satisfied,
T = 0 must also be true, where x is defined as
The relation of correcting vector f with the direct solution (7) can be elaborated by solving the linear equations
Adding all the above binary linear equations would lead to direct solution that was found in [9] , and the result is summarized such that
In contrast to the direct solution for p a shown in (7), we can acquire it equivalently by finding the correcting vector f which requires operations (10) and (11) only. Note that binary operations T −1 As in (10) and Cs in (11) are common to both [9] and the proposed encoding method. In other words, we can avoid unnecessary numerical computations involving the left-hand side of (16) proposed by [9] . is obtained by adding either f or f κ in every sub-block unit Z, where the former is defined as f = I f and the latter is defined as f κ = (I + I 0 )f.
Proof: Given that the true solution for p a is f, vector − → p i+1 is easily found by adding f on both sides of the equation (15) belonging to region "Q" in Fig. 1 . Therefore, we have
For the correction of parity vectors belonging to region "U" in Fig. 1 , it is clear that they are also linearly dependent on p a = f as well as I p a = f . Thus, adding an additional vector f κ on the both sides to the m 2 + i-th row of the corresponding linear equation in (15), we obtain
Complexity Comparison
In this section, we compare the encoding complexity of our proposed scheme with the Richardson's scheme [8] , [9] . We analyze the number of binary multiplications (AND) and modulo 2 additions (XOR) required during the entire encoding processes. Since generalizing number of ANDs and XORs depends on average number of edges, we compare their complexities through direct numerical analysis respect to the parity-check matrix H having codeword length 1944, Z=81, R=1/2, details found in [4] . As shown in Table 1 , there are operations where both conventional and proposed encoding methods are applied in common. Therefore, we compare computations that are only mutually exclusive to each other (disregarding computations common for both). Table 2 Comparison of total required computations.
In addition, we exclude the computations for T −1 A and T −1 B which can be already found before information vector s is multiplied. This exclusion is reasonable because pre-computation also applies to the Richardson's scheme as well which can be assumed that they are stored in a readonly-memory (ROM) format.
Although both methods need to find T −1 As and Cs, the conventional method requires extra row vector accumulation of ET −1 As to find p a that is computationally proportional to the average density of rows in sub-matrix A. In contrast, such operation is not needed for the proposed method, but a calculation of f is required instead. The XOR operation involved with the f is comparatively less than finding that of p a . There are 55 non-zero Z × Z sub-blocks for matrix A and 6 non-zero Z × Z sub-blocks in C. Thus, the number of AND additions required for finding p a by the conventional method is roughly 9 times than that of finding the vector f by the proposed method.
The extra post-processing computations for the proposed method is trivial. f can be found by cyclically shifting it to the right by " ". Finding the f κ requires Z XOR additions. Moreover, all of the post processing vector correction procedure can be achieved by bit-flipping where elements of f, f and f κ are "1". Thus, the total XOR operations, as listed in Table 2 , needed for finding † p b can be further reduced to (318 + 1) × Z. As summarized in Tables 1  and 2 , it is clear that the proposed simple vector correction based LDPC encoding method requires slightly less complexity than that of the conventional method.
Meanwhile, the overall latency of the proposed method is much lower than that of the conventional method. The conventional method cannot proceed to compute p b without the acquisition of p a , whereas the proposed method can directly go on extracting p tmp b . In addition, it is clear that the time it takes to obtain p a by the conventional method requires a longer period than to find f, and add correcting vectors f and f κ to vector p tmp b .
Conclusions
In this letter, we proposed an alternative encoding method for QC-LDPC codes with dual diagonal parity structure with one weight three columns in the parity generation submatrix. We have demonstrated that direct approach to determining p a , lead by the fast encoding scheme [9] , is unnecessary. Moreover, the vector correction encoding method is directly applicable to current IEEE 802.11n and IEEE 802.16e standards, while the overall complexity and encoding latency are lower than those of the conventional method.
