370 research outputs found
High Throughput VLSI Architecture for Soft-Output MIMO Detection Based on A Greedy Graph Algorithm
Maximum-likelihood (ML) decoding is a very computational-
intensive task for multiple-input multiple-output (MIMO)
wireless channel detection. This paper presents a new graph
based algorithm to achieve near ML performance for soft
MIMO detection. Instead of using the traditional tree search
based structure, we represent the search space of the MIMO
signals with a directed graph and a greedy algorithm is ap-
plied to compute the a posteriori probability (APP) for each
transmitted bit. The proposed detector has two advantages:
1) it keeps a fixed throughput and has a regular and parallel
datapath structure which makes it amenable to high speed
VLSI implementation, and 2) it attempts to maximize the a
posteriori probability by making the locally optimum choice
at each stage with the hope of finding the global minimum
Euclidean distance for every transmitted bit x_k element of {-1, +1}.
Compared to the soft K-best detector, the proposed solution
significantly reduces the complexity because sorting is not
required, while still maintaining good bit error rate (BER)
performance. The proposed greedy detection algorithm has
been designed and synthesized for a 4 x 4 16-QAM MIMO
system in a TSMC 65 nm CMOS technology. The detector
achieves a maximum throughput of 600 Mbps with a 0.79
mm2 core area.Nokia CorporationNational Science Foundatio
Parallel SUMIS Soft Detector for Large MIMO Systems on Multicore and GPU
[EN] The number of transmit and receiver antennas is an important factor that affects the performance and complexity of a MIMO system. A MIMO system with very large number of antennas is a promising candidate technology for next generations of wireless systems. However, the vast majority of the methods proposed for conventional MIMO system are not suitable for large dimensions. In this context, the use of high-performance computing systems, such us multicore CPUs and graphics processing units has become attractive for efficient implementation of parallel signal processing algorithms with high computational requirements. In the present work, two practical parallel approaches of the Subspace Marginalization with Interference Suppression detector for large MIMO systems have been proposed. Both approaches have been evaluated and compared in terms of performance and complexity with other detectors for different system parameters.This work has been partially supported by the Spanish MINECO Grant RACHEL TEC2013-47141-C4-4-R, the PROMETEO FASE II 2014/003 Project and FPU AP-2012/71274Ramiro Sánchez, C.; Simarro, MA.; Gonzalez, A.; Vidal Maciá, AM. (2019). Parallel SUMIS Soft Detector for Large MIMO Systems on Multicore and GPU. The Journal of Supercomputing. 75(3):1256-1267. https://doi.org/10.1007/s11227-018-2403-9S12561267753Rusek F, Persson D, Lau BK, Larsson EG, Marzetta TL, Edfors O, Tufvesson F (2013) Scaling up MIMO: opportunities and challenges with very large arrays. IEEE Signal Proc Mag 30(1):40–60Studer C, Burg A, Bölcskei H (2008) Soft-output sphere decoding: algorithms and VLSI implementation. IEEE J Sel Areas Commun 26(2):290–300Wang R, Giannakis GB (2004) Approaching MIMO channel capacity with reduced-complexity soft sphere decoding. In: Wireless Communications and Networking Conference, 2004. WCNC. 2004 IEEE vol 3, pp 1620–1625Persson D, Larsson EG (2011) Partial marginalization soft MIMO detection with higher order constellations. IEEE Trans Signal Procces 59(1):453–458Cîrkić M, Larsson EG (2014) SUMIS: near-optimal soft-in soft-out MIMO detection with low and fixed complexity. IEEE Trans Signal Process 62(12):3084–3097Alberto Gonzalez C, Ramiro, M, Ángeles Simarro, Antonio M Vidal (2017) Parallel SUMIS soft detector for MIMO systems on multicore. In: Proceedings of the 17th International Conference on Computational and Mathematical Methods in Science and Engineering, pp 1729–1736Hochwald BM, ten Brink S (2003) Achieving near-capacity on a multiple-antenna channel. IEEE Trans Commun 51:389–399Kaipeng L, Bei Y, Michael W, Joseph RC, Christoph S (2015) Accelerating massive MIMO uplink detection on GPU for SDR systems. In: 2015 IEEE dallas circuits and systems conference (DCAS), pp 1–4Di W, Eilert J, Liu D (2011) Implementation of a high-speed MIMO soft-output symbol detector for software defined radio. J Signal Process Syst 63(1):27–37Anderson E, Bai Z, Bischof C, Blackford LS, Demmel J, Dongarra J, Du Croz J, Greenbaum A, Hammarling S, McKenney A, Sorensen D (1999) LAPACK users’ guide. SIAM, LondonIntel MKL Reference Manual (2015) https://software.intel.com/en-us/articles/mkl-reference-manualcuBLAS Documentation (2015) http://docs.nvidia.com/cuda/cublasDagum L, Enon R (1998) OpenMP: an industry standard API for shared-memory programming. IEEE Comput Sci Eng 5(1):46–55CUDA Toolkit Documentation, Version 7.5 (2015) https://developer.nvidia.com/cuda-toolkitRoger S, Ramiro C, Gonzalez A, Almenar V, Vidal AM (2012) Fully parallel GPU implementation of a fixed-complexity soft-output MIMO detector. IEEE Trans Veh Technol 61(8):3796–3800Senst M, Ascheid G, Lüders H (2010) Performance evaluation of the markov chain monte carlo MIMO detector based on mutual information. 2010 IEEE International Conference on Communications (ICC), pp 1–
Large-Scale MIMO Detection for 3GPP LTE: Algorithms and FPGA Implementations
Large-scale (or massive) multiple-input multiple-output (MIMO) is expected to
be one of the key technologies in next-generation multi-user cellular systems,
based on the upcoming 3GPP LTE Release 12 standard, for example. In this work,
we propose - to the best of our knowledge - the first VLSI design enabling
high-throughput data detection in single-carrier frequency-division multiple
access (SC-FDMA)-based large-scale MIMO systems. We propose a new approximate
matrix inversion algorithm relying on a Neumann series expansion, which
substantially reduces the complexity of linear data detection. We analyze the
associated error, and we compare its performance and complexity to those of an
exact linear detector. We present corresponding VLSI architectures, which
perform exact and approximate soft-output detection for large-scale MIMO
systems with various antenna/user configurations. Reference implementation
results for a Xilinx Virtex-7 XC7VX980T FPGA show that our designs are able to
achieve more than 600 Mb/s for a 128 antenna, 8 user 3GPP LTE-based large-scale
MIMO system. We finally provide a performance/complexity trade-off comparison
using the presented FPGA designs, which reveals that the detector circuit of
choice is determined by the ratio between BS antennas and users, as well as the
desired error-rate performance.Comment: To appear in the IEEE Journal of Selected Topics in Signal Processin
A Scalable VLSI Architecture for Soft-Input Soft-Output Depth-First Sphere Decoding
Multiple-input multiple-output (MIMO) wireless transmission imposes huge
challenges on the design of efficient hardware architectures for iterative
receivers. A major challenge is soft-input soft-output (SISO) MIMO demapping,
often approached by sphere decoding (SD). In this paper, we introduce the - to
our best knowledge - first VLSI architecture for SISO SD applying a single
tree-search approach. Compared with a soft-output-only base architecture
similar to the one proposed by Studer et al. in IEEE J-SAC 2008, the
architectural modifications for soft input still allow a one-node-per-cycle
execution. For a 4x4 16-QAM system, the area increases by 57% and the operating
frequency degrades by 34% only.Comment: Accepted for IEEE Transactions on Circuits and Systems II Express
Briefs, May 2010. This draft from April 2010 will not be updated any more.
Please refer to IEEE Xplore for the final version. *) The final publication
will appear with the modified title "A Scalable VLSI Architecture for
Soft-Input Soft-Output Single Tree-Search Sphere Decoding
High-Throughput Soft-Output MIMO Detector Based on Path-Preserving Trellis-Search Algorithm
In this paper, we propose a novel path-preserving trellis-search (PPTS) algorithm and its high-speed VLSI architecture for soft-output multiple-input-multiple-output (MIMO) detection. We represent the search space of the MIMO signal with an unconstrained trellis, where each node in stage of the trellis maps to a possible complex-valued symbol transmitted by antenna. Based on the trellis model, we convert the soft-output MIMO detection problem into a multiple shortest paths problem subject to the constraint that every trellis node must be covered in this set of paths. The PPTS detector is guaranteed to have soft information for every possible symbol transmitted on every antenna so that the log-likelihood ratio (LLR) for each transmitted data bit can be more accurately formed. Simulation results show that the PPTS algorithm can achieve near-optimal error performance with a low search complexity. The PPTS algorithm
is a hardware-friendly data-parallel algorithm because the search operations are evenly distributed among multiple trellis nodes for parallel processing. As a case study, we have designed and synthesized a fully-parallel systolic-array detector and two folded detectors for a 4x4 16-QAM system using a 1.08 V TSMC 65-nm CMOS technology.With a 1.18 mm2 core area, the folded detector can achieve a throughput of 2.1 Gbps.With a 3.19 mm2 core area, the fully-parallel systolic-array detector can achieve a throughput of 6.4 Gbps
Efficient DSP and Circuit Architectures for Massive MIMO: State-of-the-Art and Future Directions
Massive MIMO is a compelling wireless access concept that relies on the use
of an excess number of base-station antennas, relative to the number of active
terminals. This technology is a main component of 5G New Radio (NR) and
addresses all important requirements of future wireless standards: a great
capacity increase, the support of many simultaneous users, and improvement in
energy efficiency. Massive MIMO requires the simultaneous processing of signals
from many antenna chains, and computational operations on large matrices. The
complexity of the digital processing has been viewed as a fundamental obstacle
to the feasibility of Massive MIMO in the past. Recent advances on
system-algorithm-hardware co-design have led to extremely energy-efficient
implementations. These exploit opportunities in deeply-scaled silicon
technologies and perform partly distributed processing to cope with the
bottlenecks encountered in the interconnection of many signals. For example,
prototype ASIC implementations have demonstrated zero-forcing precoding in real
time at a 55 mW power consumption (20 MHz bandwidth, 128 antennas, multiplexing
of 8 terminals). Coarse and even error-prone digital processing in the antenna
paths permits a reduction of consumption with a factor of 2 to 5. This article
summarizes the fundamental technical contributions to efficient digital signal
processing for Massive MIMO. The opportunities and constraints on operating on
low-complexity RF and analog hardware chains are clarified. It illustrates how
terminals can benefit from improved energy efficiency. The status of technology
and real-life prototypes discussed. Open challenges and directions for future
research are suggested.Comment: submitted to IEEE transactions on signal processin
A Novel VLSI Architecture of Fixed-complexity Sphere Decoder
Fixed-complexity Sphere Decoder (FSD) is a recently proposed technique for
Multiple-Input Multiple-Output (MIMO) detection. It has several outstanding
features such as constant throughput and large potential parallelism, which
makes it suitable for efficient VLSI implementation. However, to our best
knowledge, no VLSI implementation of FSD has been reported in the literature,
although some FPGA prototypes of FSD with pipeline architecture have been
developed. These solutions achieve very high throughput but at very high cost
of hardware resources, making them impractical in real applications. In this
paper, we present a novel four-nodes-per-cycle parallel architecture of FSD,
with a breadth-first processing that allows for short critical path. The
implementation achieves a throughput of 213.3 Mbps at 400 MHz clock frequency,
at a cost of 0.18 mm2 Silicon area on 0.13{\mu}m CMOS technology. The proposed
solution is much more economical compared with the existing FPGA
implementations, and very suitable for practicl applications because of its
balanced performance and hardware-complexity; moreover it has the flexibility
to be expanded into an eight-nodes-per-cycle version in order to double the
throughput.Comment: 8 pages, this paper has been accepted by the conference DSD 201
- …