46 research outputs found
On chip interconnects for multiprocessor turbo decoding architectures
International audienc
Configurable and Scalable Turbo Decoder for 4G Wireless Receivers
The increasing requirements of high data rates and quality of service (QoS) in fourth-generation (4G) wireless communication require the implementation of practical capacity approaching codes. In this chapter, the application of Turbo coding schemes that have recently been adopted in the IEEE 802.16e WiMax standard and 3GPP Long Term Evolution (LTE) standard are reviewed. In order to process several 4G wireless standards with a common hardware module, a reconfigurable and scalable Turbo decoder architecture is presented. A parallel Turbo decoding scheme with scalable parallelism tailored to the target throughput is applied to support high data rates in 4G applications. High-level decoding parallelism is achieved by employing contention-free interleavers. A multi-banked memory structure and routing network among memories and MAP decoders are designed to operate at full speed with parallel interleavers. A new on-line address generation technique is introduced to support multiple Turbo
interleaving patterns, which avoids the interleaver address memory that is typically necessary in the traditional designs. Design trade-offs in terms of area and power efficiency are analyzed for different parallelism and clock frequency goals
Turbo NOC: a framework for the design of Network On Chip based turbo decoder architectures
This work proposes a general framework for the design and simulation of
network on chip based turbo decoder architectures. Several parameters in the
design space are investigated, namely the network topology, the parallelism
degree, the rate at which messages are sent by processing nodes over the
network and the routing strategy. The main results of this analysis are: i) the
most suited topologies to achieve high throughput with a limited complexity
overhead are generalized de-Bruijn and generalized Kautz topologies; ii)
depending on the throughput requirements different parallelism degrees, message
injection rates and routing algorithms can be used to minimize the network area
overhead.Comment: submitted to IEEE Trans. on Circuits and Systems I (submission date
27 may 2009
Configurable and Scalable High Throughput Turbo Decoder Architecture for Multiple 4GWireless Standards
In this paper, we propose a novel multi-code turbo decoder architecture for 4G wireless systems. To support various 4G standards, a configurable multi-mode MAP (maximum a posteriori) decoder is designed for both binary and duo-binary turbo codes with small resource overhead (less than 10%) compared to the single-mode architecture. To achieve high data rates in 4G, we present a parallel turbo decoder architecture with scalable parallelism tailored to the given throughput requirements. High-level parallelism is achieved by employing contention-free interleavers. Multi-banked memory structure and routing network among memories and MAP decoders are designed to operate at full speed with parallel interleavers. We designed a very low-complexity recursive on-line address generator supporting multiple interleaving patterns, which avoids the interleaver address memory. Design trade-offs in terms of area and power efficiency are explored to find the optimal architectures. A 711 Mbps data rate is feasible with 32 Radix-4 MAP decoders running at 200 MHz clock rate.Texas Instruments Incorporate
VLSI Architectures for WIMAX Channel Decoders
This chapter describes the main architectures proposed in the literature to
implement the channel decoders required by the WiMax standard, namely
convolutional codes, turbo codes (both block and convolutional) and LDPC. Then
it shows a complete design of a convolutional turbo code encoder/decoder system
for WiMax.Comment: To appear in the book "WIMAX, New Developments", M. Upena, D. Dalal,
Y. Kosta (Ed.), ISBN978-953-7619-53-
Turbo NOC: a framework for the design of Network-on-Chip-basedturbo decoder architectures
This paper proposes a general framework for the design and simulation of network-on-chip-based turbo decoder architectures. Several parameters in the design space are investigated, namely, network topology, parallelism degree, the rate at which messages are sent by processing nodes over the network, and routing strategy. The main results of this analysis are as follows: 1) the most suited topologies to achieve high throughput with a limited complexity overhead are generalized de Bruijn and generalized Kautz topologies and 2) depending on the throughput requirements, different parallelism degrees, message injection rates, and routing algorithms can be used to minimize the network area overhead
A Reconfigurable Outer Modem Platform for Future Communications Systems
Future mobile and wireless communications networks
require flexible modem architectures with high performance.
Efficient utilization of
application specific flexibility is key to fulfill these
requirements.
For high throughput a single processor can not provide
the necessary computational power.
Hence multi-processor architectures become necessary.
This paper presents a multi-processor platform based on a new
dynamically reconfigurable application specific instruction set processor (dr-ASIP)
for the application domain of channel decoding.
Inherently parallel decoding tasks can be mapped onto individual processing nodes.
The implied challenging inter-processor communication is efficiently handled
by a Network-on-Chip (NoC) such that the throughput of each node is not degraded.
The dr-ASIP features Viterbi and Log-MAP decoding
for support of convolutional and turbo codes
of more than 10 currently specified mobile and wireless standards.
Furthermore, its flexibility allows for adaptation to future systems
A programmable, scalable-throughput interleaver
Abstract The interleaver stages of digital communication standards show a surprisingly large variation in throughput, state sizes, and permutation functions. Furthermore, data rates for 4G standards such as LTE-Advanced will exceed typical baseband clock frequencies of handheld devices. Multistream operation for Software Defined Radio and iterative decoding algorithms will call for ever higher interleave data rates. Our interleave machine is built around 8 single-port SRAM banks and can be programmed to generate up to 8 addresses every clock cycle. The scalable architecture combines SIMD and VLIW concepts with an efficient resolution of bank conflicts. A wide range of cellular, connectivity, and broadcast interleavers have been mapped on this machine, with throughputs up to more than 0.5 Gsymbol/second. Although it was designed for channel interleaving, the application domain of the interleaver extends also to Turbo interleaving. The presented configuration of the architecture is designed as a part of a programmable outer receiver on a prototype board. It offers (near) universal programmability to enable the implementation of new interleavers. The interleaver measures 2.09 mm2 in 65 nm CMOS (including memories) and proves functional on silicon