79 research outputs found
Adaptive OFDM System Design For Cognitive Radio
Recently, Cognitive Radio has been proposed as a promising technology to improve spectrum utilization. A highly flexible OFDM system is considered to be a good candidate for the Cognitive Radio baseband processing where individual carriers can be switched off for frequencies occupied by a licensed user. In order to support such an adaptive OFDM system, we propose a Multiprocessor System-on-Chip (MPSoC) architecture which can be dynamically reconfigured. However, the complexity and flexibility of the baseband processing makes the MPSoC design a difficult task. This paper presents a design technology for mapping flexible OFDM baseband for Cognitive Radio on a multiprocessor System-on-Chip (MPSoC)
Flexible LDPC Decoder Architectures
Flexible channel decoding is getting significance with the increase in number of wireless standards and modes within a standard. A flexible channel decoder is a solution providing interstandard and intrastandard support without change in hardware. However, the design of efficient implementation of flexible low-density parity-check (LDPC) code decoders satisfying area, speed, and power constraints is a challenging task and still requires considerable research effort. This paper provides an overview of state-of-the-art in the design of flexible LDPC decoders. The published solutions are evaluated at two levels of architectural design: the processing element (PE) and the interconnection structure. A qualitative and quantitative analysis
of different design choices is carried out, and comparison is provided in terms of achieved flexibility, throughput, decoding efficiency, and area (power) consumption
Static Address Generation Easing: a Design Methodology for Parallel Interleaver Architectures
4 pagesInternational audienceFor high throughput applications, turbo-like iterative decoders are implemented with parallel architectures. However, to be efficient parallel architectures require to avoid collision accesses i.e. concurrent read/write accesses should not target the same memory block. This consideration applies to the two main classes of turbo-like codes which are Low Density Parity Check (LDPC) and Turbo-Codes. In this paper we propose a methodology which finds a collision-free mapping of the variables in the memory banks and which optimizes the resulting interleaving architecture. Finally, we show through a pedagogical example the interest of our approach compared to state-of-the-art techniques
On chip interconnects for multiprocessor turbo decoding architectures
International audienc
Implementation of a High Throughput Soft MIMO Detector on GPU
Multiple-input multiple-output (MIMO) significantly increases the throughput of a communication system by employing multiple antennas at the transmitter
and the receiver. To extract maximum performance from a MIMO system, a computationally intensive search based detector is needed. To meet the challenge
of MIMO detection, typical suboptimal MIMO detectors are ASIC or FPGA designs. We aim to show that a MIMO detector on Graphic processor unit (GPU),
a low-cost parallel programmable co-processor, can achieve high throughput and can serve as an alternative to ASIC/FPGA designs. However, careful architecture
aware software design is needed to leverage the performance offered by GPU. We propose a novel soft MIMO detection algorithm, multi-pass trellis traversal
(MTT), and show that we can achieve ASIC/FPGA-like performance and handle different configurations in software on GPU. The proposed design can be used
to accelerate wireless physical layer simulations and to offload MIMO detection processing in wireless testbed platforms.NokiaNokia Siemens Networks (NSN)Texas InstrumentsXilinxNational Science Foundatio
On the application of graphics processor to wireless receiver design
In many wireless systems, a Turbo decoder is often combined with a soft-output multiple-input and multiple-output (MIMO) detector at the receiver to maximize performance in many 4G and beyond wireless standards. Although custom application specific designs are usually used to meet this challenge, programmable graphics processing units (GPU) has become an alternative to the traditional ASIC and FPGA solution for wireless applications. However, careful architecture-aware algorithm design and mapping are required to maximize performance of a communication block on GPU. For MIMO soft detection, we implemented a new MIMO soft detection algorithm, multi-pass trellis traversal (MTT). For Turbo decoding, we used a parallel window algorithm. We showed that our implementations can achieve high throughput while maintaining good performance. This work will allow us to implement a complete iterative MIMO receiver in software on GPU in the future
Implementation of a 3GPP LTE Turbo Decoder Accelerator on GPU
This paper presents a 3GPP LTE compliant turbo decoder accelerator on GPU. The challenge of implementing a turbo decoder is finding an efficient mapping of the decoder algorithm on GPU, e.g. finding a good way to parallelize workload
across cores and allocate and use fast on-die memory to improve throughput. In our implementation, we increase throughput through 1) distributing the decoding workload for a codeword across multiple cores, 2) decoding multiple codewords simultaneously to increase concurrency and 3) employing memory optimization
techniques to reduce memory bandwidth requirements. In addition, we analyze how different MAP algorithm approximations affect both throughput and bit error rate (BER) performance of this decoder
Energy-Efficient Computing for Mobile Signal Processing
Mobile devices have rapidly proliferated, and deployment of handheld devices continues to increase at a spectacular rate. As today's devices not only support advanced signal processing of wireless communication data but also provide rich sets of applications, contemporary mobile computing requires both demanding computation and efficiency. Most mobile processors combine general-purpose processors, digital signal processors, and hardwired application-specific integrated circuits to satisfy their high-performance and low-power requirements. However, such a heterogeneous platform is inefficient in area, power and programmability. Improving the efficiency of programmable mobile systems is a critical challenge and an active area of computer systems research.
SIMD (single instruction multiple data) architectures are very effective for data-level-parallelism intense algorithms in mobile signal processing. However, new characteristics of advanced wireless/multimedia algorithms require architectural re-evaluation to achieve better energy efficiency. Therefore, fourth generation wireless protocol and high definition mobile video algorithms are analyzed to enhance a wide-SIMD architecture. The key enhancements include 1) programmable crossbar to support complex data alignment, 2) SIMD partitioning to support fine-grain SIMD computation, and 3) fused operation to support accelerating frequently used instruction pairs.
Near-threshold computation has been attractive in low-power architecture research because it balances performance and power. To further improve energy efficiency in mobile computing, near-threshold computation is applied to a wide SIMD architecture. This proposed near-threshold wide SIMD architecture-Diet SODA-presents interesting architectural design decisions such as 1) very wide SIMD datapath to compensate for degraded performance induced by near-threshold computation and 2) scatter-gather data prefetcher to exploit large latency gap between memory and the SIMD datapath. Although near-threshold computation provides excellent energy efficiency, it suffers from increased delay variations. A systematic study of delay variations in near-threshold computing is performed and simple techniques-structural duplication and voltage/frequency margining-are explored to tolerate and mitigate the delay variations in near-threshold wide SIMD architectures.
This dissertation analyzes representative wireless/multimedia mobile signal processing algorithms, proposes an energy-efficient programmable platform, and evaluates performance and power. A main theme of this dissertation is that the performance and efficiency of programmable embedded systems can be significantly improved with a combination of parallel SIMD and near-threshold computations.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/86356/1/swseo_1.pd
Application of a design space exploration tool to enhance interleaver generation
This paper presents a methodology to efficiently explore the design space of
communication adapters. In most digital signal processing (DSP) applications,
the overall performance of the system is significantly affected by
communication architectures, as a consequence the designers need specifically
optimized adapters. By explicitly modeling these communications within an
effective graph-theoretic model and analysis framework, we automatically
generate an optimized architecture, named Space-Time AdapteR (STAR). Our design
flow inputs a C description of Input/Output data scheduling, and user
requirements (throughput, latency, parallelism...), and formalizes
communication constraints through a Resource Constraints Graph (RCG). Design
space exploration is then performed through associated tools, to synthesize a
STAR component under time-to-market constraints. The proposed approach has been
tested to design an industrial data mixing block example: an Ultra-Wideband
interleaver
GPUs as Storage System Accelerators
Massively multicore processors, such as Graphics Processing Units (GPUs),
provide, at a comparable price, a one order of magnitude higher peak
performance than traditional CPUs. This drop in the cost of computation, as any
order-of-magnitude drop in the cost per unit of performance for a class of
system components, triggers the opportunity to redesign systems and to explore
new ways to engineer them to recalibrate the cost-to-performance relation. This
project explores the feasibility of harnessing GPUs' computational power to
improve the performance, reliability, or security of distributed storage
systems. In this context, we present the design of a storage system prototype
that uses GPU offloading to accelerate a number of computationally intensive
primitives based on hashing, and introduce techniques to efficiently leverage
the processing power of GPUs. We evaluate the performance of this prototype
under two configurations: as a content addressable storage system that
facilitates online similarity detection between successive versions of the same
file and as a traditional system that uses hashing to preserve data integrity.
Further, we evaluate the impact of offloading to the GPU on competing
applications' performance. Our results show that this technique can bring
tangible performance gains without negatively impacting the performance of
concurrently running applications.Comment: IEEE Transactions on Parallel and Distributed Systems, 201
- …