40 research outputs found
On Maximum Contention-Free Interleavers and Permutation Polynomials over Integer Rings
An interleaver is a critical component for the channel coding performance of
turbo codes. Algebraic constructions are of particular interest because they
admit analytical designs and simple, practical hardware implementation.
Contention-free interleavers have been recently shown to be suitable for
parallel decoding of turbo codes. In this correspondence, it is shown that
permutation polynomials generate maximum contention-free interleavers, i.e.,
every factor of the interleaver length becomes a possible degree of parallel
processing of the decoder. Further, it is shown by computer simulations that
turbo codes using these interleavers perform very well for the 3rd Generation
Partnership Project (3GPP) standard.Comment: 13 pages, 2 figures, submitted as a correspondence to the IEEE
Transactions on Information Theory, revised versio
Configurable and Scalable Turbo Decoder for 4G Wireless Receivers
The increasing requirements of high data rates and quality of service (QoS) in fourth-generation (4G) wireless communication require the implementation of practical capacity approaching codes. In this chapter, the application of Turbo coding schemes that have recently been adopted in the IEEE 802.16e WiMax standard and 3GPP Long Term Evolution (LTE) standard are reviewed. In order to process several 4G wireless standards with a common hardware module, a reconfigurable and scalable Turbo decoder architecture is presented. A parallel Turbo decoding scheme with scalable parallelism tailored to the target throughput is applied to support high data rates in 4G applications. High-level decoding parallelism is achieved by employing contention-free interleavers. A multi-banked memory structure and routing network among memories and MAP decoders are designed to operate at full speed with parallel interleavers. A new on-line address generation technique is introduced to support multiple Turbo
interleaving patterns, which avoids the interleaver address memory that is typically necessary in the traditional designs. Design trade-offs in terms of area and power efficiency are analyzed for different parallelism and clock frequency goals
Configurable and Scalable High Throughput Turbo Decoder Architecture for Multiple 4GWireless Standards
In this paper, we propose a novel multi-code turbo decoder architecture for 4G wireless systems. To support various 4G standards, a configurable multi-mode MAP (maximum a posteriori) decoder is designed for both binary and duo-binary turbo codes with small resource overhead (less than 10%) compared to the single-mode architecture. To achieve high data rates in 4G, we present a parallel turbo decoder architecture with scalable parallelism tailored to the given throughput requirements. High-level parallelism is achieved by employing contention-free interleavers. Multi-banked memory structure and routing network among memories and MAP decoders are designed to operate at full speed with parallel interleavers. We designed a very low-complexity recursive on-line address generator supporting multiple interleaving patterns, which avoids the interleaver address memory. Design trade-offs in terms of area and power efficiency are explored to find the optimal architectures. A 711 Mbps data rate is feasible with 32 Radix-4 MAP decoders running at 200 MHz clock rate.Texas Instruments Incorporate
Pruned Bit-Reversal Permutations: Mathematical Characterization, Fast Algorithms and Architectures
A mathematical characterization of serially-pruned permutations (SPPs)
employed in variable-length permuters and their associated fast pruning
algorithms and architectures are proposed. Permuters are used in many signal
processing systems for shuffling data and in communication systems as an
adjunct to coding for error correction. Typically only a small set of discrete
permuter lengths are supported. Serial pruning is a simple technique to alter
the length of a permutation to support a wider range of lengths, but results in
a serial processing bottleneck. In this paper, parallelizing SPPs is formulated
in terms of recursively computing sums involving integer floor and related
functions using integer operations, in a fashion analogous to evaluating
Dedekind sums. A mathematical treatment for bit-reversal permutations (BRPs) is
presented, and closed-form expressions for BRP statistics are derived. It is
shown that BRP sequences have weak correlation properties. A new statistic
called permutation inliers that characterizes the pruning gap of pruned
interleavers is proposed. Using this statistic, a recursive algorithm that
computes the minimum inliers count of a pruned BR interleaver (PBRI) in
logarithmic time complexity is presented. This algorithm enables parallelizing
a serial PBRI algorithm by any desired parallelism factor by computing the
pruning gap in lookahead rather than a serial fashion, resulting in significant
reduction in interleaving latency and memory overhead. Extensions to 2-D block
and stream interleavers, as well as applications to pruned fast Fourier
transforms and LTE turbo interleavers, are also presented. Moreover,
hardware-efficient architectures for the proposed algorithms are developed.
Simulation results demonstrate 3 to 4 orders of magnitude improvement in
interleaving time compared to existing approaches.Comment: 31 page
High-Throughput Contention-Free Concurrent Interleaver Architecture for Multi-Standard Turbo Decoder
To meet the higher data rate requirement of emerging wireless communication technology, numerous parallel turbo decoder architectures have been developed. However, the interleaver has become a major bottleneck that limits the achievable throughput in the parallel decoders due to the massive memory conflicts. In this paper, we propose a flexible Double-Buffer based Contention-Free (DBCF) interleaver architecture that can efficiently solve the memory conflict problem for parallel turbo decoders with very high parallelism.
The proposed DBCF architecture enables high throughput concurrent interleaving for multi-standard turbo decoders that support UMTS/HSPA+, LTE and WiMAX, with small datapath delays and low hardware cost. We implemented the DBCF
interleaver with a 65nm CMOS technology. The implementation of this highly efficient DBCF interleaver architecture shows significant improvement in terms of the maximum throughput and occupied chip area compared to the previous work.HuaweiNational Science Foundatio
Implementation of a 3GPP LTE Turbo Decoder Accelerator on GPU
This paper presents a 3GPP LTE compliant turbo decoder accelerator on GPU. The challenge of implementing a turbo decoder is finding an efficient mapping of the decoder algorithm on GPU, e.g. finding a good way to parallelize workload
across cores and allocate and use fast on-die memory to improve throughput. In our implementation, we increase throughput through 1) distributing the decoding workload for a codeword across multiple cores, 2) decoding multiple codewords simultaneously to increase concurrency and 3) employing memory optimization
techniques to reduce memory bandwidth requirements. In addition, we analyze how different MAP algorithm approximations affect both throughput and bit error rate (BER) performance of this decoder