Search CORE

98 research outputs found

FlexCore: Massively Parallel and Flexible Processing for Large MIMO Access Points

Author: Georgis G
Husmann C
Jamieson K
Nikitopoulos K
Publication venue: 'The Academy of Transdisciplinary Learning and Advanced Studies'
Publication date: 27/03/2017
Field of study

Large MIMO base stations remain among wireless network designers’ best tools for increasing wireless throughput while serving many clients, but current system designs, sacrifice throughput with simple linear MIMO detection algorithms. Higher-performance detection techniques are known, but remain off the table because these systems parallelize their computation at the level of a whole OFDM subcarrier, sufficing only for the less demanding linear detection approaches they opt for. This paper presents FlexCore, the first computational architecture capable of parallelizing the detection of large numbers of mutually-interfering information streams at a granularity below individual OFDM subcarriers, in a nearly-embarrassingly parallel manner while utilizing any number of available processing elements. For 12 clients sending 64-QAM symbols to a 12-antenna base station, our WARP testbed evaluation shows similar network throughput to the state-of-the-art while using an order of magnitude fewer processing elements. For the same scenario, our combined WARP-GPU testbed evaluation demonstrates a 19x computational speedup, with 97% increased energy efficiency when compared with the state of the art. Finally, for the same scenario, an FPGA-based comparison between FlexCore and the state of the art shows that FlexCore can achieve up to 96% better energy efficiency, and can offer up to 32x the processing throughput

UCL Discovery

Large-Scale MIMO Detection for 3GPP LTE: Algorithms and FPGA Implementations

Author: Cavallaro Joseph R.
Dick Chris
Studer Christoph
Wang Guohui
Wu Michael
Yin Bei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Large-scale (or massive) multiple-input multiple-output (MIMO) is expected to be one of the key technologies in next-generation multi-user cellular systems, based on the upcoming 3GPP LTE Release 12 standard, for example. In this work, we propose - to the best of our knowledge - the first VLSI design enabling high-throughput data detection in single-carrier frequency-division multiple access (SC-FDMA)-based large-scale MIMO systems. We propose a new approximate matrix inversion algorithm relying on a Neumann series expansion, which substantially reduces the complexity of linear data detection. We analyze the associated error, and we compare its performance and complexity to those of an exact linear detector. We present corresponding VLSI architectures, which perform exact and approximate soft-output detection for large-scale MIMO systems with various antenna/user configurations. Reference implementation results for a Xilinx Virtex-7 XC7VX980T FPGA show that our designs are able to achieve more than 600 Mb/s for a 128 antenna, 8 user 3GPP LTE-based large-scale MIMO system. We finally provide a performance/complexity trade-off comparison using the presented FPGA designs, which reveals that the detector circuit of choice is determined by the ratio between BS antennas and users, as well as the desired error-rate performance.Comment: To appear in the IEEE Journal of Selected Topics in Signal Processin

arXiv.org e-Print Archive

CiteSeerX

Repository for Publications and Research Data

Design and Architecture of Spatial Multiplexing MIMO Decoders for FPGAs

Author: Amiri Kiarash
Cavallaro Joseph R.
Dick Chris
Rao Raghu
Publication venue: IEEE
Publication date: 01/10/2008
Field of study

Spatial multiplexing multiple-input-multiple-output (MIMO) communication systems have recently drawn significant attention as a means to achieve tremendous gains in wireless system capacity and link reliability. The optimal hard decision detection for MIMO wireless systems is the maximum likelihood (ML) detector. ML detection is attractive due to its superior performance (in terms of BER). However, direct implementation grows exponentially with the number of antennas and the modulation scheme, making its ASIC or FPGA implementation infeasible for all but low-density modulation schemes using a small number of antennas. Sphere decoding (SD) solves the ML detection problem in a computationally efficient manner. However, even with this complexity reduction, real-time implementation on a DSP processor is generally not feasible and high-performance parallel computing platforms such as FPGAs are increasingly being employed for this class of applications. The sphere detection problem affords many opportunities for algorithm and micro-architecture optimizations and tradeoffs. This paper provides an overview of techniques to simplify and minimize FPGA resource utilization of sphere detectors for high performance low-latency systems

Crossref

DSpace at Rice University

Implementation of 2x2 soft-input soft-output sphere decoder

Author: Yan J.
Publication venue
Publication date: 01/01/2013
Field of study

Repository TU/e

Pure OAI Repository

Efficient Algorithmic and Architectural Optimization of QR-based Detector for V-BLAST

Author: Fariborz Sobhanmanesh
Saeid Nooshabadi
Publication venue: 'Croatian Communications and Information Society'
Publication date: 01/01/2005
Field of study

The use of multiple antennas at both transmitting and receiving sides of a rich scattering communication channel improves the spectral efficiency and capacity of digital transmission systems compared with the single antenna communication systems. However algorithmic complexity in the realization of the receiver is a major problem for its implementation in hardware. This paper investigates a near optimal algorithm for V-BLAST detection in MIMO wireless communication systems based on the QR factorization technique, offering remarkable reduction in the hardware complexity. Specifically, we analyze some hardware implementation aspects of the selected algorithm through MATLAB simulations and demonstrate its robustness. This technique can be used in an efficient fixed point VLSI implementation of the algorithm. We also provide the VLSI architecture that implements the algorithm

Directory of Open Access Journals

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Doctor of Philosophy

Author: Hedstrom Jonathan Carl
Publication venue: University of Utah
Publication date: 01/01/2017
Field of study

dissertationThe continuous growth of wireless communication use has largely exhausted the limited spectrum available. Methods to improve spectral efficiency are in high demand and will continue to be for the foreseeable future. Several technologies have the potential to make large improvements to spectral efficiency and the total capacity of networks including massive multiple-input multiple-output (MIMO), cognitive radio, and spatial-multiplexing MIMO. Of these, spatial-multiplexing MIMO has the largest near-term potential as it has already been adopted in the WiFi, WiMAX, and LTE standards. Although transmitting independent MIMO streams is cheap and easy, with a mere linear increase in cost with streams, receiving MIMO is difficult since the optimal methods have exponentially increasing cost and power consumption. Suboptimal MIMO detectors such as K-Best have a drastically reduced complexity compared to optimal methods but still have an undesirable exponentially increasing cost with data-rate. The Markov Chain Monte Carlo (MCMC) detector has been proposed as a near-optimal method with polynomial cost, but it has a history of unusual performance issues which have hindered its adoption. In this dissertation, we introduce a revised derivation of the bitwise MCMC MIMO detector. The new approach resolves the previously reported high SNR stalling problem of MCMC without the need for hybridization with another detector method or adding heuristic temperature scaling terms. Another common problem with MCMC algorithms is an unknown convergence time making predictable fixed-length implementations problematic. When an insufficient number of iterations is used on a slowly converging example, the output LLRs can be unstable and overconfident, therefore, we develop a method to identify rare, slowly converging runs and mitigate their degrading effects on the soft-output information. This improves forward-error-correcting code performance and removes a symptomatic error floor in bit-error-rates. Next, pseudo-convergence is identified with a novel way to visualize the internal behavior of the Gibbs sampler. An effective and efficient pseudo-convergence detection and escape strategy is suggested. Finally, the new excited MCMC (X-MCMC) detector is shown to have near maximum-a-posteriori (MAP) performance even with challenging, realistic, highly-correlated channels at the maximum MIMO sizes and modulation rates supported by the 802.11ac WiFi specification, 8x8 256 QAM. Further, the new excited MCMC (X-MCMC) detector is demonstrated on an 8-antenna MIMO testbed with the 802.11ac WiFi protocol, confirming its high performance. Finally, a VLSI implementation of the X-MCMC detector is presented which retains the near-optimal performance of the floating-point algorithm while having one of the lowest complexities found in the near-optimal MIMO detector literature

The University of Utah: J. Willard Marriott Digital Library

Algorithms and Architectures of Energy-Efficient Error-Resilient MIMO Detectors for Memory-Dominated Wireless Communication Systems

Author: Ahmed M. Eltawil
Chung-An Shen
Fadi J. Kurdahi
Muhammad S. Khairy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

In a broadband MIMO-OFDM wireless communication system, embedded buffering memories occupy a large portion of the chip area and a significant amount of power consumption. Due to process variations of advanced CMOS technologies, it becomes both challenging and costly to maintain perfectly functioning memories under all anticipated operating conditions. Thus, Voltage over Scaling (VoS) has emerged as a means to achieve energy efficient systems resulting in a tradeoff between energy efficiency and reliability. In this paper we present the algorithm and VLSI architecture of a novel error-resilient K-Best MIMO detector based on the combined distribution of channel noise and induced errors due to VoS. The simulation results show that, compared with a conventional MIMO detector design, the proposed algorithm provides up-to 4.5 dB gain to achieve the near-optimal Packet Error Rate (PER) performance in the 4 × 4 64-QAM system. Furthermore, based on experimental results, when jointly considering the detector and memory power consumption, the proposed resilient scheme with VoS memory can achieve up to 32.64% savings compared to the conventional K-Best detector with perfect memory. © 2014 IEEE

Crossref

eScholarship - University of California

Low-Power Embedded Design Solutions and Low-Latency On-Chip Interconnect Architecture for System-On-Chip Design

Author: Yang Yoon Seok
Publication venue
Publication date: 11/01/2021
Field of study

This dissertation presents three design solutions to support several key system-on-chip (SoC) issues to achieve low-power and high performance. These are: 1) joint source and channel decoding (JSCD) schemes for low-power SoCs used in portable multimedia systems, 2) efficient on-chip interconnect architecture for massive multimedia data streaming on multiprocessor SoCs (MPSoCs), and 3) data processing architecture for low-power SoCs in distributed sensor network (DSS) systems and its implementation. The first part includes a low-power embedded low density parity check code (LDPC) - H.264 joint decoding architecture to lower the baseband energy consumption of a channel decoder using joint source decoding and dynamic voltage and frequency scaling (DVFS). A low-power multiple-input multiple-output (MIMO) and H.264 video joint detector/decoder design that minimizes energy for portable, wireless embedded systems is also designed. In the second part, a link-level quality of service (QoS) scheme using unequal error protection (UEP) for low-power network-on-chip (NoC) and low latency on-chip network designs for MPSoCs is proposed. This part contains WaveSync, a low-latency focused network-on-chip architecture for globally-asynchronous locally-synchronous (GALS) designs and a simultaneous dual-path routing (SDPR) scheme utilizing path diversity present in typical mesh topology network-on-chips. SDPR is akin to having a higher link width but without the significant hardware overhead associated with simple bus width scaling. The last part shows data processing unit designs for embedded SoCs. We propose a data processing and control logic design for a new radiation detection sensor system generating data at or above Peta-bits-per-second level. Implementation results show that the intended clock rate is achieved within the power target of less than 200mW. We also present a digital signal processing (DSP) accelerator supporting configurable MAC, FFT, FIR, and 3-D cross product operations for embedded SoCs. It consumes 12.35mW along with 0.167mm2 area at 333MHz

Texas A&M Repository

Recommended from our members

Efficient VLSI architectures for MIMO and cryptography systems

Author: Li Qingwei
Publication venue: 'Oregon State University'
Publication date
Field of study

Multiple-input multiple-output (MIMO) communication systems have recently been considered as one of the most significant technology breakthroughs for modern wireless communications, due to the higher spectral efficiency and improved link reliability. The sphere decoding algorithm (SDA) has been widely used for maximum likelihood (ML) detection in MIMO systems. It is of great interest to develop low-complexity and high-speed VLSI architectures for the MIMO sphere decoders. The first part of this dissertation is focused on the low-complexity and high-speed sphere decoder design for the MIMO systems. It includes the algorithms simplification, and transformations, hardware optimization and architecture development. Specifically, we propose the layered reordered K-Best sphere decoding algorithm and dynamic K-best sphere decoding algorithm, which can significantly improve the detection performance or reduce the hardware complexity. We also present the efficient K-Best sorting architecture, which greatly simplifies the sorting operation of the K-Best SDA. In addition, we introduce the early-pruning K-Best SD scheme, which eliminates the unlikely candidate at early decoding stages, thus saves computational complexity and power consumptions. For the conventional sphere decoder design, we develop the parallel and pipeline interleaved sphere decoder architecture, which considerably increases the decoding throughput with negligible extra complexity. Finally, we design the efficient radius and list updating units for the list sphere decoder, which increases the speed of obtaining the new radius and reduces the complexity for generating the new candidate list. The wireless communication technologies are widely used for the benefits of portability and flexibility. However, the wireless security is extremely important to protect the private and sensitive information since the communication medium, the airwave, is shared and open to the public. Cryptography is the most standard and efficient way for information protection. The second part of this thesis is thus dedicated to the high-speed and efficient architecture design for the cryptography systems including ECC and Tate pairing. We propose an efficient fast architecture for the ECC in Lopez-Dahab projective coordinates. Compared with the conventional point operation implementations, the point addition and doubling operations can be significantly accelerated with reasonable hardware overhead by applying parallel processing and hardware reusing. Moreover, we develop a complexity reduction scheme and an overlapped processing architecture for the Tate pairing in characteristic three. The proposed architecture can achieve over 2 times speedup compared with conventional sequential implementations for the Duursma-Lee and Kwon-BGOS algorithms

ScholarsArchive@OSU