619 research outputs found
Large-Scale MIMO Detection for 3GPP LTE: Algorithms and FPGA Implementations
Large-scale (or massive) multiple-input multiple-output (MIMO) is expected to
be one of the key technologies in next-generation multi-user cellular systems,
based on the upcoming 3GPP LTE Release 12 standard, for example. In this work,
we propose - to the best of our knowledge - the first VLSI design enabling
high-throughput data detection in single-carrier frequency-division multiple
access (SC-FDMA)-based large-scale MIMO systems. We propose a new approximate
matrix inversion algorithm relying on a Neumann series expansion, which
substantially reduces the complexity of linear data detection. We analyze the
associated error, and we compare its performance and complexity to those of an
exact linear detector. We present corresponding VLSI architectures, which
perform exact and approximate soft-output detection for large-scale MIMO
systems with various antenna/user configurations. Reference implementation
results for a Xilinx Virtex-7 XC7VX980T FPGA show that our designs are able to
achieve more than 600 Mb/s for a 128 antenna, 8 user 3GPP LTE-based large-scale
MIMO system. We finally provide a performance/complexity trade-off comparison
using the presented FPGA designs, which reveals that the detector circuit of
choice is determined by the ratio between BS antennas and users, as well as the
desired error-rate performance.Comment: To appear in the IEEE Journal of Selected Topics in Signal Processin
Efficient FPGA implementation of high-throughput mixed radix multipath delay commutator FFT processor for MIMO-OFDM
This article presents and evaluates pipelined architecture designs for an improved high-frequency Fast Fourier
Transform (FFT) processor implemented on Field Programmable Gate Arrays (FPGA) for Multiple Input Multiple Output
Orthogonal Frequency Division Multiplexing (MIMO-OFDM). The architecture presented is a Mixed-Radix Multipath Delay
Commutator. The presented parallel architecture utilizes fewer hardware resources compared to Radix-2 architecture,
while maintaining simple control and butterfly structures inherent to Radix-2 implementations. The high-frequency
design presented allows enhancing system throughput without requiring additional parallel data paths common in
other current approaches, the presented design can process two and four independent data streams in parallel
and is suitable for scaling to any power of two FFT size N. FPGA implementation of the architecture demonstrated
significant resource efficiency and high-throughput in comparison to relevant current approaches within
literature. The proposed architecture designs were realized with Xilinx System Generator (XSG) and evaluated
on both Virtex-5 and Virtex-7 FPGA devices. Post place and route results demonstrated maximum frequency
values over 400 MHz and 470 MHz for Virtex-5 and Virtex-7 FPGA devices respectively
Efficient DSP and Circuit Architectures for Massive MIMO: State-of-the-Art and Future Directions
Massive MIMO is a compelling wireless access concept that relies on the use
of an excess number of base-station antennas, relative to the number of active
terminals. This technology is a main component of 5G New Radio (NR) and
addresses all important requirements of future wireless standards: a great
capacity increase, the support of many simultaneous users, and improvement in
energy efficiency. Massive MIMO requires the simultaneous processing of signals
from many antenna chains, and computational operations on large matrices. The
complexity of the digital processing has been viewed as a fundamental obstacle
to the feasibility of Massive MIMO in the past. Recent advances on
system-algorithm-hardware co-design have led to extremely energy-efficient
implementations. These exploit opportunities in deeply-scaled silicon
technologies and perform partly distributed processing to cope with the
bottlenecks encountered in the interconnection of many signals. For example,
prototype ASIC implementations have demonstrated zero-forcing precoding in real
time at a 55 mW power consumption (20 MHz bandwidth, 128 antennas, multiplexing
of 8 terminals). Coarse and even error-prone digital processing in the antenna
paths permits a reduction of consumption with a factor of 2 to 5. This article
summarizes the fundamental technical contributions to efficient digital signal
processing for Massive MIMO. The opportunities and constraints on operating on
low-complexity RF and analog hardware chains are clarified. It illustrates how
terminals can benefit from improved energy efficiency. The status of technology
and real-life prototypes discussed. Open challenges and directions for future
research are suggested.Comment: submitted to IEEE transactions on signal processin
Baseband Processing for 5G and Beyond: Algorithms, VLSI Architectures, and Co-design
In recent years the number of connected devices and the demand for high data-rates have been significantly increased. This enormous growth is more pronounced by the introduction of the Internet of things (IoT) in which several devices are interconnected to exchange data for various applications like smart homes and smart cities. Moreover, new applications such as eHealth, autonomous vehicles, and connected ambulances set new demands on the reliability, latency, and data-rate of wireless communication systems, pushing forward technology developments. Massive multiple-input multiple-output (MIMO) is a technology, which is employed in the 5G standard, offering the benefits to fulfill these requirements. In massive MIMO systems, base station (BS) is equipped with a very large number of antennas, serving several users equipments (UEs) simultaneously in the same time and frequency resource. The high spatial multiplexing in massive MIMO systems, improves the data rate, energy and spectral efficiencies as well as the link reliability of wireless communication systems. The link reliability can be further improved by employing channel coding technique. Spatially coupled serially concatenated codes (SC-SCCs) are promising channel coding schemes, which can meet the high-reliability demands of wireless communication systems beyond 5G (B5G). Given the close-to-capacity error correction performance and the potential to implement a high-throughput decoder, this class of code can be a good candidate for wireless systems B5G. In order to achieve the above-mentioned advantages, sophisticated algorithms are required, which impose challenges on the baseband signal processing. In case of massive MIMO systems, the processing is much more computationally intensive and the size of required memory to store channel data is increased significantly compared to conventional MIMO systems, which are due to the large size of the channel state information (CSI) matrix. In addition to the high computational complexity, meeting latency requirements is also crucial. Similarly, the decoding-performance gain of SC-SCCs also do come at the expense of increased implementation complexity. Moreover, selecting the proper choice of design parameters, decoding algorithm, and architecture will be challenging, since spatial coupling provides new degrees of freedom in code design, and therefore the design space becomes huge. The focus of this thesis is to perform co-optimization in different design levels to address the aforementioned challenges/requirements. To this end, we employ system-level characteristics to develop efficient algorithms and architectures for the following functional blocks of digital baseband processing. First, we present a fast Fourier transform (FFT), an inverse FFT (IFFT), and corresponding reordering scheme, which can significantly reduce the latency of orthogonal frequency-division multiplexing (OFDM) demodulation and modulation as well as the size of reordering memory. The corresponding VLSI architectures along with the application specific integrated circuit (ASIC) implementation results in a 28 nm CMOS technology are introduced. In case of a 2048-point FFT/IFFT, the proposed design leads to 42% reduction in the latency and size of reordering memory. Second, we propose a low-complexity massive MIMO detection scheme. The key idea is to exploit channel sparsity to reduce the size of CSI matrix and eventually perform linear detection followed by a non-linear post-processing in angular domain using the compressed CSI matrix. The VLSI architecture for a massive MIMO with 128 BS antennas and 16 UEs along with the synthesis results in a 28 nm technology are presented. As a result, the proposed scheme reduces the complexity and required memory by 35%–73% compared to traditional detectors while it has better detection performance. Finally, we perform a comprehensive design space exploration for the SC-SCCs to investigate the effect of different design parameters on decoding performance, latency, complexity, and hardware cost. Then, we develop different decoding algorithms for the SC-SCCs and discuss the associated decoding performance and complexity. Also, several high-level VLSI architectures along with the corresponding synthesis results in a 12 nm process are presented, and various design tradeoffs are provided for these decoding schemes
FlexCore: Massively Parallel and Flexible Processing for Large MIMO Access Points
Large MIMO base stations remain among wireless network designers’ best tools for increasing wireless throughput while serving many clients, but current system designs, sacrifice throughput with simple linear MIMO detection algorithms. Higher-performance detection techniques are known, but remain off the table because these systems parallelize their computation at the level of a whole OFDM subcarrier, sufficing only for the less demanding linear detection approaches they opt for. This paper presents FlexCore, the first computational architecture capable of parallelizing the detection of large numbers of mutually-interfering information streams at a granularity below individual OFDM subcarriers, in a nearly-embarrassingly parallel manner while utilizing any number of available processing elements. For 12 clients sending 64-QAM symbols to a 12-antenna base station, our WARP testbed evaluation shows similar network throughput to the state-of-the-art while using an order of magnitude fewer processing elements. For the same scenario, our combined WARP-GPU testbed evaluation demonstrates a 19x computational speedup, with 97% increased energy efficiency when compared with the state of the art. Finally, for the same scenario, an FPGA-based comparison between FlexCore and the state of the art shows that FlexCore can achieve up to 96% better energy efficiency, and can offer up to 32x the processing throughput
- …