Search CORE

79 research outputs found

Adaptive OFDM System Design For Cognitive Radio

Author: Kokkeler André B.J.
Smit Gerard J.M.
Zhang Qiwei
Publication venue: IEEE Communications Society, Germany Chapter
Publication date: 01/01/2006
Field of study

Recently, Cognitive Radio has been proposed as a promising technology to improve spectrum utilization. A highly flexible OFDM system is considered to be a good candidate for the Cognitive Radio baseband processing where individual carriers can be switched off for frequencies occupied by a licensed user. In order to support such an adaptive OFDM system, we propose a Multiprocessor System-on-Chip (MPSoC) architecture which can be dynamically reconfigured. However, the complexity and flexibility of the baseband processing makes the MPSoC design a difficult task. This paper presents a design technology for mapping flexible OFDM baseband for Cognitive Radio on a multiprocessor System-on-Chip (MPSoC)

CiteSeerX

University of Twente Research Information

Flexible LDPC Decoder Architectures

Author: AWAIS MUHAMMAD
CONDO CARLO
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2012
Field of study

Flexible channel decoding is getting significance with the increase in number of wireless standards and modes within a standard. A flexible channel decoder is a solution providing interstandard and intrastandard support without change in hardware. However, the design of efficient implementation of flexible low-density parity-check (LDPC) code decoders satisfying area, speed, and power constraints is a challenging task and still requires considerable research effort. This paper provides an overview of state-of-the-art in the design of flexible LDPC decoders. The published solutions are evaluated at two levels of architectural design: the processing element (PE) and the interconnection structure. A qualitative and quantitative analysis of different design choices is carried out, and comparison is provided in terms of achieved flexibility, throughput, decoding efficiency, and area (power) consumption

Crossref

Directory of Open Access Journals

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Static Address Generation Easing: a Design Methodology for Parallel Interleaver Architectures

Author: Chavet Cyrille
Coussy Philippe
Martin Eric
Urard Pascal
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/02/2010
Field of study

4 pagesInternational audienceFor high throughput applications, turbo-like iterative decoders are implemented with parallel architectures. However, to be efficient parallel architectures require to avoid collision accesses i.e. concurrent read/write accesses should not target the same memory block. This consideration applies to the two main classes of turbo-like codes which are Low Density Parity Check (LDPC) and Turbo-Codes. In this paper we propose a methodology which finds a collision-free mapping of the variables in the memory banks and which optimizes the resulting interleaving architecture. Finally, we show through a pedagogical example the interest of our approach compared to state-of-the-art techniques

arXiv.org e-Print Archive

HAL-Université de Bretagne Occidentale

On chip interconnects for multiprocessor turbo decoding architectures

Author: A. Baghdadi
Angiolini
Bahl
Benedetto
Benedetto
Benes
Benini
Boutillon
Cooley
Dinoi
Dobkin
Dolinar
Douillard
G. Masera
H. Moussa
Imase
Imase
Kwak
M. Martina
Martina
Masera
Opferman
Papaharalabos
Tarable
Viterbi
Wang
Zhang
Publication venue: Elsevier
Publication date: 01/01/2011
Field of study

International audienc

Crossref

HAL-Université de Bretagne Occidentale

HAL Descartes

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Hal-Diderot

PORTO Publications Open Repository TOrino

Implementation of a High Throughput Soft MIMO Detector on GPU

Author: A Burg
B Hochwald
D Garrett
J Janhunen
Joseph R. Cavallaro
Michael Wu
S Chen
Siddharth Gupta
U Fincke
X Huang
Yang Sun
YLC Jong de
Z Guo
Publication venue: Springer
Publication date: 01/07/2011
Field of study

Multiple-input multiple-output (MIMO) significantly increases the throughput of a communication system by employing multiple antennas at the transmitter and the receiver. To extract maximum performance from a MIMO system, a computationally intensive search based detector is needed. To meet the challenge of MIMO detection, typical suboptimal MIMO detectors are ASIC or FPGA designs. We aim to show that a MIMO detector on Graphic processor unit (GPU), a low-cost parallel programmable co-processor, can achieve high throughput and can serve as an alternative to ASIC/FPGA designs. However, careful architecture aware software design is needed to leverage the performance offered by GPU. We propose a novel soft MIMO detection algorithm, multi-pass trellis traversal (MTT), and show that we can achieve ASIC/FPGA-like performance and handle different configurations in software on GPU. The proposed design can be used to accelerate wireless physical layer simulations and to offload MIMO detection processing in wireless testbed platforms.NokiaNokia Siemens Networks (NSN)Texas InstrumentsXilinxNational Science Foundatio

Crossref

DSpace at Rice University

On the application of graphics processor to wireless receiver design

Author: Wu Michael
Publication venue
Publication date: 01/01/2010
Field of study

In many wireless systems, a Turbo decoder is often combined with a soft-output multiple-input and multiple-output (MIMO) detector at the receiver to maximize performance in many 4G and beyond wireless standards. Although custom application specific designs are usually used to meet this challenge, programmable graphics processing units (GPU) has become an alternative to the traditional ASIC and FPGA solution for wireless applications. However, careful architecture-aware algorithm design and mapping are required to maximize performance of a communication block on GPU. For MIMO soft detection, we implemented a new MIMO soft detection algorithm, multi-pass trellis traversal (MTT). For Turbo decoding, we used a parallel window algorithm. We showed that our implementations can achieve high throughput while maintaining good performance. This work will allow us to implement a complete iterative MIMO receiver in software on GPU in the future

DSpace at Rice University

Implementation of a 3GPP LTE Turbo Decoder Accelerator on GPU

Author: Cavallaro Joseph R.
Wu Michael
Publication venue: IEEE
Publication date: 01/10/2010
Field of study

This paper presents a 3GPP LTE compliant turbo decoder accelerator on GPU. The challenge of implementing a turbo decoder is finding an efficient mapping of the decoder algorithm on GPU, e.g. finding a good way to parallelize workload across cores and allocate and use fast on-die memory to improve throughput. In our implementation, we increase throughput through 1) distributing the decoding workload for a codeword across multiple cores, 2) decoding multiple codewords simultaneously to increase concurrency and 3) employing memory optimization techniques to reduce memory bandwidth requirements. In addition, we analyze how different MAP algorithm approximations affect both throughput and bit error rate (BER) performance of this decoder

DSpace at Rice University

Energy-Efficient Computing for Mobile Signal Processing

Author: Seo Sangwon
Publication venue
Publication date: 01/01/2011
Field of study

Mobile devices have rapidly proliferated, and deployment of handheld devices continues to increase at a spectacular rate. As today's devices not only support advanced signal processing of wireless communication data but also provide rich sets of applications, contemporary mobile computing requires both demanding computation and efficiency. Most mobile processors combine general-purpose processors, digital signal processors, and hardwired application-specific integrated circuits to satisfy their high-performance and low-power requirements. However, such a heterogeneous platform is inefficient in area, power and programmability. Improving the efficiency of programmable mobile systems is a critical challenge and an active area of computer systems research. SIMD (single instruction multiple data) architectures are very effective for data-level-parallelism intense algorithms in mobile signal processing. However, new characteristics of advanced wireless/multimedia algorithms require architectural re-evaluation to achieve better energy efficiency. Therefore, fourth generation wireless protocol and high definition mobile video algorithms are analyzed to enhance a wide-SIMD architecture. The key enhancements include 1) programmable crossbar to support complex data alignment, 2) SIMD partitioning to support fine-grain SIMD computation, and 3) fused operation to support accelerating frequently used instruction pairs. Near-threshold computation has been attractive in low-power architecture research because it balances performance and power. To further improve energy efficiency in mobile computing, near-threshold computation is applied to a wide SIMD architecture. This proposed near-threshold wide SIMD architecture-Diet SODA-presents interesting architectural design decisions such as 1) very wide SIMD datapath to compensate for degraded performance induced by near-threshold computation and 2) scatter-gather data prefetcher to exploit large latency gap between memory and the SIMD datapath. Although near-threshold computation provides excellent energy efficiency, it suffers from increased delay variations. A systematic study of delay variations in near-threshold computing is performed and simple techniques-structural duplication and voltage/frequency margining-are explored to tolerate and mitigate the delay variations in near-threshold wide SIMD architectures. This dissertation analyzes representative wireless/multimedia mobile signal processing algorithms, proposes an energy-efficient programmable platform, and evaluates performance and power. A main theme of this dissertation is that the performance and efficiency of programmable embedded systems can be significantly improved with a combination of parallel SIMD and near-threshold computations.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/86356/1/swseo_1.pd

Deep Blue Documents at the University of Michigan

Application of a design space exploration tool to enhance interleaver generation

Author: Chavet Cyrille
Coussy Philippe
Martin Eric
Urard Pascal
Publication venue
Publication date: 20/06/2007
Field of study

This paper presents a methodology to efficiently explore the design space of communication adapters. In most digital signal processing (DSP) applications, the overall performance of the system is significantly affected by communication architectures, as a consequence the designers need specifically optimized adapters. By explicitly modeling these communications within an effective graph-theoretic model and analysis framework, we automatically generate an optimized architecture, named Space-Time AdapteR (STAR). Our design flow inputs a C description of Input/Output data scheduling, and user requirements (throughput, latency, parallelism...), and formalizes communication constraints through a Resource Constraints Graph (RCG). Design space exploration is then performed through associated tools, to synthesize a STAR component under time-to-market constraints. The proposed approach has been tested to design an industrial data mixing block example: an Ultra-Wideband interleaver

arXiv.org e-Print Archive

HAL-Université de Bretagne Occidentale

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

GPUs as Storage System Accelerators

Author: Al-Kiswany Samer
Gharaibeh Abdullah
Ripeanu Matei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/05/2012
Field of study

Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redesign systems and to explore new ways to engineer them to recalibrate the cost-to-performance relation. This project explores the feasibility of harnessing GPUs' computational power to improve the performance, reliability, or security of distributed storage systems. In this context, we present the design of a storage system prototype that uses GPU offloading to accelerate a number of computationally intensive primitives based on hashing, and introduce techniques to efficiently leverage the processing power of GPUs. We evaluate the performance of this prototype under two configurations: as a content addressable storage system that facilitates online similarity detection between successive versions of the same file and as a traditional system that uses hashing to preserve data integrity. Further, we evaluate the impact of offloading to the GPU on competing applications' performance. Our results show that this technique can bring tangible performance gains without negatively impacting the performance of concurrently running applications.Comment: IEEE Transactions on Parallel and Distributed Systems, 201

arXiv.org e-Print Archive

Crossref