41 research outputs found
High-Throughput Contention-Free Concurrent Interleaver Architecture for Multi-Standard Turbo Decoder
To meet the higher data rate requirement of emerging wireless communication technology, numerous parallel turbo decoder architectures have been developed. However, the interleaver has become a major bottleneck that limits the achievable throughput in the parallel decoders due to the massive memory conflicts. In this paper, we propose a flexible Double-Buffer based Contention-Free (DBCF) interleaver architecture that can efficiently solve the memory conflict problem for parallel turbo decoders with very high parallelism.
The proposed DBCF architecture enables high throughput concurrent interleaving for multi-standard turbo decoders that support UMTS/HSPA+, LTE and WiMAX, with small datapath delays and low hardware cost. We implemented the DBCF
interleaver with a 65nm CMOS technology. The implementation of this highly efficient DBCF interleaver architecture shows significant improvement in terms of the maximum throughput and occupied chip area compared to the previous work.HuaweiNational Science Foundatio
Turbo NOC: a framework for the design of Network On Chip based turbo decoder architectures
This work proposes a general framework for the design and simulation of
network on chip based turbo decoder architectures. Several parameters in the
design space are investigated, namely the network topology, the parallelism
degree, the rate at which messages are sent by processing nodes over the
network and the routing strategy. The main results of this analysis are: i) the
most suited topologies to achieve high throughput with a limited complexity
overhead are generalized de-Bruijn and generalized Kautz topologies; ii)
depending on the throughput requirements different parallelism degrees, message
injection rates and routing algorithms can be used to minimize the network area
overhead.Comment: submitted to IEEE Trans. on Circuits and Systems I (submission date
27 may 2009
Configurable and Scalable Turbo Decoder for 4G Wireless Receivers
The increasing requirements of high data rates and quality of service (QoS) in fourth-generation (4G) wireless communication require the implementation of practical capacity approaching codes. In this chapter, the application of Turbo coding schemes that have recently been adopted in the IEEE 802.16e WiMax standard and 3GPP Long Term Evolution (LTE) standard are reviewed. In order to process several 4G wireless standards with a common hardware module, a reconfigurable and scalable Turbo decoder architecture is presented. A parallel Turbo decoding scheme with scalable parallelism tailored to the target throughput is applied to support high data rates in 4G applications. High-level decoding parallelism is achieved by employing contention-free interleavers. A multi-banked memory structure and routing network among memories and MAP decoders are designed to operate at full speed with parallel interleavers. A new on-line address generation technique is introduced to support multiple Turbo
interleaving patterns, which avoids the interleaver address memory that is typically necessary in the traditional designs. Design trade-offs in terms of area and power efficiency are analyzed for different parallelism and clock frequency goals
Configurable and Scalable High Throughput Turbo Decoder Architecture for Multiple 4GWireless Standards
In this paper, we propose a novel multi-code turbo decoder architecture for 4G wireless systems. To support various 4G standards, a configurable multi-mode MAP (maximum a posteriori) decoder is designed for both binary and duo-binary turbo codes with small resource overhead (less than 10%) compared to the single-mode architecture. To achieve high data rates in 4G, we present a parallel turbo decoder architecture with scalable parallelism tailored to the given throughput requirements. High-level parallelism is achieved by employing contention-free interleavers. Multi-banked memory structure and routing network among memories and MAP decoders are designed to operate at full speed with parallel interleavers. We designed a very low-complexity recursive on-line address generator supporting multiple interleaving patterns, which avoids the interleaver address memory. Design trade-offs in terms of area and power efficiency are explored to find the optimal architectures. A 711 Mbps data rate is feasible with 32 Radix-4 MAP decoders running at 200 MHz clock rate.Texas Instruments Incorporate
On chip interconnects for multiprocessor turbo decoding architectures
International audienc
Turbo NOC: a framework for the design of Network-on-Chip-basedturbo decoder architectures
This paper proposes a general framework for the design and simulation of network-on-chip-based turbo decoder architectures. Several parameters in the design space are investigated, namely, network topology, parallelism degree, the rate at which messages are sent by processing nodes over the network, and routing strategy. The main results of this analysis are as follows: 1) the most suited topologies to achieve high throughput with a limited complexity overhead are generalized de Bruijn and generalized Kautz topologies and 2) depending on the throughput requirements, different parallelism degrees, message injection rates, and routing algorithms can be used to minimize the network area overhead
Implementation of a 3GPP LTE Turbo Decoder Accelerator on GPU
This paper presents a 3GPP LTE compliant turbo decoder accelerator on GPU. The challenge of implementing a turbo decoder is finding an efficient mapping of the decoder algorithm on GPU, e.g. finding a good way to parallelize workload
across cores and allocate and use fast on-die memory to improve throughput. In our implementation, we increase throughput through 1) distributing the decoding workload for a codeword across multiple cores, 2) decoding multiple codewords simultaneously to increase concurrency and 3) employing memory optimization
techniques to reduce memory bandwidth requirements. In addition, we analyze how different MAP algorithm approximations affect both throughput and bit error rate (BER) performance of this decoder
Improving Network-on-Chip-based Turbo Decoder Architectures
In this work novel results concerning Networkon- Chip-based turbo decoder architectures are presented. Stemming from previous publications, this work concentrates first on improving the throughput by exploiting adaptive-bandwidth-reduction techniques. This technique shows in the best case an improvement of more than 60 Mb/s. Moreover, it is known that double-binary turbo decoders require higher area than binary ones. This characteristic has the negative effect of increasing the data width of the network nodes. Thus, the second contribution of this work is to reduce the network complexity to support doublebinary codes, by exploiting bit-level and pseudo-floatingpoint representation of the extrinsic information. These two techniques allow for an area reduction of up to more than the 40 % with a performance degradation of about 0.2 d