Introduction
Dr.Galallager in 1962 first introduced Low-density parity check (LDPC) codes in his PhD.thesis [1] and after being rediscovered more than 30 years in 1996 by Dr. MacKay [2] .Nowadays these error correction codes have been constantly attracting researchers .There are big challenges in efficient hardware implementation for LDPC codes. It became popular because it demonstrates highly parallelizable decoding algorithms and good error correction performance [3] . As a result, it has been adopted in recent modern communication standards, such as WLAN, IEEE 802.11n [4] , WiMAX, IEEE 802.16e [5] ,10GBASE-T, IEEE 802.3an [6] and WPAN, IEEE 802.15.3c [7] .
Quasi-cyclic LDPC (QC-LDPC) structured codes have received significant importance due to their flexible hardware implementation features and good bit error ratio (BER) performance compared to the random codes. Recently, QC-LDPC codes were used in IEEE 802.11n and 802.16e standards which support multiple code rates and code lengths. To maintain the tradeoffs between hardware complexity, decoding throughout and error-correction performance there are many decoding algorithms for QC-LDPC codes. Iterative message passing algorithms offer excellent error correction performance and a large decoding complexity. A softdecision based Sum-Product Algorithm (SPA) achieves best decoding performance but has very high decoding complexity [8] . Several modifications have been recommended to simplify the check node operation in SPA. These check nodes are simplified by reducing the non-linear function [9, 10] and logarithmic functions which leads to the reduction in implementation complexity [11] .The Min-Sum (MS) algorithm [8] further simplifies check-node operations of SPA algorithm to reduce the decoding complexity but decreases decoding performance. Hence many modifications [12] are done in the MS algorithm, such as normalized MS and offset MS decoding to a balance between complexity and performance. In this paper, we present a low complexity fully-parallel QC-LDPC decoder based on the min-sum algorithm with much fewer memory bits and reduced complexity for wireless IEEE 802.11n standard. The proposed architecture is applied for ½ code rate and 648 bits code length. The paper is organized as follows: An overview of LDPC decoding is provided in section II. Section III consists of different LDPC decoding algorithms. Section IV discusses the proposed Fully Parallel architecture of LDPC decoder. Section V provides performance Synthesis results of the proposed algorithm.
II. Qc-Ldpc Codes
There are two types of LDPC codes, the first is LDPC block code and another LDPC convolutional code . Among both LDPC block codes are mostly used for its practically hardware implementation. LDPC codes are presented in terms of the matrix or graphically. In the graphical representation, LDPC code is characterized by a bipartite graph which is also known as Tanner graph as shown in Fig.1 . Tanner graph consists of two types of nodes namely variable node and check node. Variable node deals with codeword bits and checks nodes associated with parity check constraints. The generalized operation of LDPC is that variable node provides the input bit stream to check node unit whereas it performs parity check operations [3] and gives updated output bit stream again to the variable node unit.
Quasi-cyclic LDPC (QC-LDPC) structured codes have received significant importance due to their flexible hardware implementation features and good bit error ratio (BER) performance compared to the random codes. Recently, QC-LDPC codes were used in IEEE 802.11n and 802.16e standards which support multiple code rates and code lengths. Quasi-Cyclic LDPC (QC-LDPC) codes a subclass of LDPC codes is a structured code comes along with an even more efficient implementation with great performance. Theses are codes in which a cyclic shift of one codeword results in another new codeword. The cyclic structure of LDPC codes results into requiring less memory as compared with the conventional LDPC codes. In addition, QC-LDPC codes also show high-speed decoding because of the sparseness of its parity check matrix [5] . A Quasi-Cyclic Code of index t is a linear code in which a cyclic-shift of any code word by t position is also a code word. Quasi-Cyclic (QC)-LDPC has been proposed to reduce the complexity of the LDPC while obtaining the similar performance. The QC-LDPC codes consist of concatenated circulant sub-matrices. Every sub-matrix is nothing but a square matrix for which each row is the cyclic shift of the previous row, and the first row is obtained by the cyclic shift of the last row.
To maintain the tradeoffs between hardware complexity, decoding throughout and error-correction performance there are many decoding algorithms for QC-LDPC codes. Iterative message passing algorithms offer excellent error correction performance and a large decoding complexity. A soft-decision based SumProduct Algorithm (SPA) achieves best decoding performance but has very high decoding complexity [8] . Several modifications have been recommended to simplify the check node operation in SPA. These check nodes are simplified by reducing the non-linear function [9, 10] and logarithmic functions which leads to the reduction in implementation complexity [11] .The Min-Sum (MS) algorithm [8] further simplifies check-node operations of SPA algorithm to reduce the decoding complexity but decreases decoding performance. Hence many modifications [12] are done in the MS algorithm, such as normalized MS and offset MS decoding to a balance between complexity and performance.In this paper, we present a low complexity fully-parallel QC-LDPC decoder based on the min-sum algorithm with much fewer memory bits and reduced complexity for wireless IEEE 802.11n standard. The proposed architecture is applied for ½ code rate and 648 bits code length. The paper is organized as follows: An overview of LDPC decoding is provided in section II. Section III consists of different LDPC decoding algorithms. Section I discusses the proposed Fully Parallel architecture of LDPC decoder. Section V provides performance Synthesis results of the proposed algorithm. 01011001 11100100 00100111 10011010 
III. Quasi Cyclic Ldpc Codes For Ieee 802.11
QC-LDPC codes [13] are structured LDPC codes proposed for IEEE 802.11n, IEEE 802.16e [12] , and IEEE 802.22 standards. IEEE 802.11n LDPC codes supports the code lengths of N=648, 1296 and 1944 and the code rates r=1/2, 2/3, 3/4 and 5/6 for 12 different codes as shown in table I. [16] . These LDPC codes, are block-wise partitioned into smaller z x z i sub-matrices, with z1 =27, z2 =54 and z3 =81 for the short, middle and long codeword, respectively. The parity-check matrix H is arranged in ρ= Ni/zi block-columns and γ =(1− r)ρ . As shown in Fig. 2 , the entire H matrix of block LDPC is composed of either an identity matrix with different cyclic shifts (represented as a "1" ) or null matrix (represented as a "0"). The expansion factor, defined as size of the identity matrix z can be 27 . The base matrices have the number of block columns ρ = 24,γ=12 ,The code length N is ρ x z=648bits for ½ rate and k=324 information bits [17] . 
IV. LDPC Decoding Algorithm
LDPC codes are iteratively decoded in different ways and that decoding depends on the complexity and error performance requirements of the decoder. Sum-Product algorithm and Min-Sum algorithm are two well known soft decision optimum decoding algorithms. These algorithms are widely used in LDPC decoders and are known as standard decoders. These SP and Min-Sum algorithm perform row and column operations iteratively using check node message α and variable node message β. Flow chart of iterative decoding algorithm is shown in Fig. 3. [12]- [16] . In the Sum-Product algorithm (SP) during the row processing (check node update), each check node C i computes the α message for each variable node (V j' ), j' not equal to j and this V j is connected to C i . In check node update (row processing), α is computed as in (2) .
Quasi Cyclic Low Density Parity Check Decoder Using Min-sum Algorithm for IEEE 802.11n
DOI: 10.9790/4200-0604020107 www.iosrjournals.org 4 | Page
Here V(i)={ j:H ij =1} represent the set of variable nodes which participate in check equation i. C(j)={ i:H ij =1} denotes the set of check nodes taking part in variable node equation j update. Also here term V(i)\j denotes all variable nodes in V(i) except node j, and the term C(j)\i denotes all the check nodes in C(j) except node i.
Column processing stage (variable node update)
Here V(i)={ j:H ij =1} represent the set of variable nodes Which participate in check equation i. C(j)={ i:H ij =1} denotes the set of check nodes taking part in variable node equation j update..Variable node (V j ) computes β message for check node(C i ) by adding received information from the channel corresponding to column j called λ and α messages from all other check node (C i' ) which is connected to (V j ). Here condition is that i' not equals to i. In Variable node update (row processing), β is computed as in (4) .
Code estimation
In Min-Sum decoding algorithm ,code estimation is done by (5)
Syndrome check
At the last for error detection purpose syndrome check (6) is given below:
Min-Sum algorithm 4.2.1 Row processing stage (check node update):
In min sum algorithm, there is simplification of check node or row processing stage of SP decoding which can be done by approximating the magnitude computation in check node update equation with a minimum function. This algorithm is called as Min-Sum algorithm (MS algorithm).
Column processing stage (variable node update)
In Min-Sum decoding algorithm, column operation is same in Sum Product decoding. 
V. Fully Parallel Architecture
The fully parallel architecture does not depends on any structural properties of the parity check matrix. This architecture connects every check node and variable node of the parity check matrix H. The block structure of fully parallel decoder is presented in Fig.4 . The fully parallel architecture achieve the highest throughput and lowest latency. For the considered code rate and codeword length, the fully parallel decoder uses 648 hardware variable nodes and 324 hardware check nodes. 
Check Node U nit(RowProcessor)
The following schematic shows how six inputs are compared using 5 two-input comparators and the minimum of those six inputs is produced at the output. The check node processing does the comparison on the modulus/magnitude of the inputs. For the sign bit calculation XOR gate is used. The inputs and outputs are represented in sign-magnitude representation. According to the H-matrix used, the different numbers of 1s in 324 rows are 7 and 8. So the comp6 is called/instantiated according to the row weight as shown in Fig.5 . In this way each node is mapped to an individual row processor. 
VI. Results
To evaluate the proposed Fully parallel decoder architecture, a Verilog description was synthesized on Xilinx Virtex-5 device for a QC-LDPC decoder with block length of 648 bits and a code rate of ½ for standard IEEE 802.11n. By examining the hardware resource utilization and routing complexity , the implementation complexities of the proposed LDPC decoders are analyzed. A summary of FPGA device utilization generated by the Xilinx Synthesis Tool is shown in Table II. Table II . displays resources occupied by Fully parallel architecture of QC-LDPC decoder using Min sum algorithm shows that hardware resources required bye decoder are very less . 
VII. Conclusion
In this paper fully-parallel quasi-cyclic LDPC decoder is been implemented for IEEE 802.11n with codeword length 648 and 1/2code -rate . The performance as well as complexity of the decoder have been analyzed by software simulations. The result shows that QC-LDPC decoder architecture has reduced complexity in terms of area. The decoder provides a maximum throughput of 82.24Mbps. The proposed architecture is very suitable for high data rate communication systems.
