370 research outputs found

    High Throughput VLSI Architecture for Soft-Output MIMO Detection Based on A Greedy Graph Algorithm

    Get PDF
    Maximum-likelihood (ML) decoding is a very computational- intensive task for multiple-input multiple-output (MIMO) wireless channel detection. This paper presents a new graph based algorithm to achieve near ML performance for soft MIMO detection. Instead of using the traditional tree search based structure, we represent the search space of the MIMO signals with a directed graph and a greedy algorithm is ap- plied to compute the a posteriori probability (APP) for each transmitted bit. The proposed detector has two advantages: 1) it keeps a fixed throughput and has a regular and parallel datapath structure which makes it amenable to high speed VLSI implementation, and 2) it attempts to maximize the a posteriori probability by making the locally optimum choice at each stage with the hope of finding the global minimum Euclidean distance for every transmitted bit x_k element of {-1, +1}. Compared to the soft K-best detector, the proposed solution significantly reduces the complexity because sorting is not required, while still maintaining good bit error rate (BER) performance. The proposed greedy detection algorithm has been designed and synthesized for a 4 x 4 16-QAM MIMO system in a TSMC 65 nm CMOS technology. The detector achieves a maximum throughput of 600 Mbps with a 0.79 mm2 core area.Nokia CorporationNational Science Foundatio

    Parallel SUMIS Soft Detector for Large MIMO Systems on Multicore and GPU

    Get PDF
    [EN] The number of transmit and receiver antennas is an important factor that affects the performance and complexity of a MIMO system. A MIMO system with very large number of antennas is a promising candidate technology for next generations of wireless systems. However, the vast majority of the methods proposed for conventional MIMO system are not suitable for large dimensions. In this context, the use of high-performance computing systems, such us multicore CPUs and graphics processing units has become attractive for efficient implementation of parallel signal processing algorithms with high computational requirements. In the present work, two practical parallel approaches of the Subspace Marginalization with Interference Suppression detector for large MIMO systems have been proposed. Both approaches have been evaluated and compared in terms of performance and complexity with other detectors for different system parameters.This work has been partially supported by the Spanish MINECO Grant RACHEL TEC2013-47141-C4-4-R, the PROMETEO FASE II 2014/003 Project and FPU AP-2012/71274Ramiro Sánchez, C.; Simarro, MA.; Gonzalez, A.; Vidal Maciá, AM. (2019). Parallel SUMIS Soft Detector for Large MIMO Systems on Multicore and GPU. The Journal of Supercomputing. 75(3):1256-1267. https://doi.org/10.1007/s11227-018-2403-9S12561267753Rusek F, Persson D, Lau BK, Larsson EG, Marzetta TL, Edfors O, Tufvesson F (2013) Scaling up MIMO: opportunities and challenges with very large arrays. IEEE Signal Proc Mag 30(1):40–60Studer C, Burg A, Bölcskei H (2008) Soft-output sphere decoding: algorithms and VLSI implementation. IEEE J Sel Areas Commun 26(2):290–300Wang R, Giannakis GB (2004) Approaching MIMO channel capacity with reduced-complexity soft sphere decoding. In: Wireless Communications and Networking Conference, 2004. WCNC. 2004 IEEE vol 3, pp 1620–1625Persson D, Larsson EG (2011) Partial marginalization soft MIMO detection with higher order constellations. IEEE Trans Signal Procces 59(1):453–458Cîrkić M, Larsson EG (2014) SUMIS: near-optimal soft-in soft-out MIMO detection with low and fixed complexity. IEEE Trans Signal Process 62(12):3084–3097Alberto Gonzalez C, Ramiro, M, Ángeles Simarro, Antonio M Vidal (2017) Parallel SUMIS soft detector for MIMO systems on multicore. In: Proceedings of the 17th International Conference on Computational and Mathematical Methods in Science and Engineering, pp 1729–1736Hochwald BM, ten Brink S (2003) Achieving near-capacity on a multiple-antenna channel. IEEE Trans Commun 51:389–399Kaipeng L, Bei Y, Michael W, Joseph RC, Christoph S (2015) Accelerating massive MIMO uplink detection on GPU for SDR systems. In: 2015 IEEE dallas circuits and systems conference (DCAS), pp 1–4Di W, Eilert J, Liu D (2011) Implementation of a high-speed MIMO soft-output symbol detector for software defined radio. J Signal Process Syst 63(1):27–37Anderson E, Bai Z, Bischof C, Blackford LS, Demmel J, Dongarra J, Du Croz J, Greenbaum A, Hammarling S, McKenney A, Sorensen D (1999) LAPACK users’ guide. SIAM, LondonIntel MKL Reference Manual (2015) https://software.intel.com/en-us/articles/mkl-reference-manualcuBLAS Documentation (2015) http://docs.nvidia.com/cuda/cublasDagum L, Enon R (1998) OpenMP: an industry standard API for shared-memory programming. IEEE Comput Sci Eng 5(1):46–55CUDA Toolkit Documentation, Version 7.5 (2015) https://developer.nvidia.com/cuda-toolkitRoger S, Ramiro C, Gonzalez A, Almenar V, Vidal AM (2012) Fully parallel GPU implementation of a fixed-complexity soft-output MIMO detector. IEEE Trans Veh Technol 61(8):3796–3800Senst M, Ascheid G, Lüders H (2010) Performance evaluation of the markov chain monte carlo MIMO detector based on mutual information. 2010 IEEE International Conference on Communications (ICC), pp 1–

    Large-Scale MIMO Detection for 3GPP LTE: Algorithms and FPGA Implementations

    Full text link
    Large-scale (or massive) multiple-input multiple-output (MIMO) is expected to be one of the key technologies in next-generation multi-user cellular systems, based on the upcoming 3GPP LTE Release 12 standard, for example. In this work, we propose - to the best of our knowledge - the first VLSI design enabling high-throughput data detection in single-carrier frequency-division multiple access (SC-FDMA)-based large-scale MIMO systems. We propose a new approximate matrix inversion algorithm relying on a Neumann series expansion, which substantially reduces the complexity of linear data detection. We analyze the associated error, and we compare its performance and complexity to those of an exact linear detector. We present corresponding VLSI architectures, which perform exact and approximate soft-output detection for large-scale MIMO systems with various antenna/user configurations. Reference implementation results for a Xilinx Virtex-7 XC7VX980T FPGA show that our designs are able to achieve more than 600 Mb/s for a 128 antenna, 8 user 3GPP LTE-based large-scale MIMO system. We finally provide a performance/complexity trade-off comparison using the presented FPGA designs, which reveals that the detector circuit of choice is determined by the ratio between BS antennas and users, as well as the desired error-rate performance.Comment: To appear in the IEEE Journal of Selected Topics in Signal Processin

    A Scalable VLSI Architecture for Soft-Input Soft-Output Depth-First Sphere Decoding

    Full text link
    Multiple-input multiple-output (MIMO) wireless transmission imposes huge challenges on the design of efficient hardware architectures for iterative receivers. A major challenge is soft-input soft-output (SISO) MIMO demapping, often approached by sphere decoding (SD). In this paper, we introduce the - to our best knowledge - first VLSI architecture for SISO SD applying a single tree-search approach. Compared with a soft-output-only base architecture similar to the one proposed by Studer et al. in IEEE J-SAC 2008, the architectural modifications for soft input still allow a one-node-per-cycle execution. For a 4x4 16-QAM system, the area increases by 57% and the operating frequency degrades by 34% only.Comment: Accepted for IEEE Transactions on Circuits and Systems II Express Briefs, May 2010. This draft from April 2010 will not be updated any more. Please refer to IEEE Xplore for the final version. *) The final publication will appear with the modified title "A Scalable VLSI Architecture for Soft-Input Soft-Output Single Tree-Search Sphere Decoding

    High-Throughput Soft-Output MIMO Detector Based on Path-Preserving Trellis-Search Algorithm

    Get PDF
    In this paper, we propose a novel path-preserving trellis-search (PPTS) algorithm and its high-speed VLSI architecture for soft-output multiple-input-multiple-output (MIMO) detection. We represent the search space of the MIMO signal with an unconstrained trellis, where each node in stage of the trellis maps to a possible complex-valued symbol transmitted by antenna. Based on the trellis model, we convert the soft-output MIMO detection problem into a multiple shortest paths problem subject to the constraint that every trellis node must be covered in this set of paths. The PPTS detector is guaranteed to have soft information for every possible symbol transmitted on every antenna so that the log-likelihood ratio (LLR) for each transmitted data bit can be more accurately formed. Simulation results show that the PPTS algorithm can achieve near-optimal error performance with a low search complexity. The PPTS algorithm is a hardware-friendly data-parallel algorithm because the search operations are evenly distributed among multiple trellis nodes for parallel processing. As a case study, we have designed and synthesized a fully-parallel systolic-array detector and two folded detectors for a 4x4 16-QAM system using a 1.08 V TSMC 65-nm CMOS technology.With a 1.18 mm2 core area, the folded detector can achieve a throughput of 2.1 Gbps.With a 3.19 mm2 core area, the fully-parallel systolic-array detector can achieve a throughput of 6.4 Gbps

    Efficient DSP and Circuit Architectures for Massive MIMO: State-of-the-Art and Future Directions

    Full text link
    Massive MIMO is a compelling wireless access concept that relies on the use of an excess number of base-station antennas, relative to the number of active terminals. This technology is a main component of 5G New Radio (NR) and addresses all important requirements of future wireless standards: a great capacity increase, the support of many simultaneous users, and improvement in energy efficiency. Massive MIMO requires the simultaneous processing of signals from many antenna chains, and computational operations on large matrices. The complexity of the digital processing has been viewed as a fundamental obstacle to the feasibility of Massive MIMO in the past. Recent advances on system-algorithm-hardware co-design have led to extremely energy-efficient implementations. These exploit opportunities in deeply-scaled silicon technologies and perform partly distributed processing to cope with the bottlenecks encountered in the interconnection of many signals. For example, prototype ASIC implementations have demonstrated zero-forcing precoding in real time at a 55 mW power consumption (20 MHz bandwidth, 128 antennas, multiplexing of 8 terminals). Coarse and even error-prone digital processing in the antenna paths permits a reduction of consumption with a factor of 2 to 5. This article summarizes the fundamental technical contributions to efficient digital signal processing for Massive MIMO. The opportunities and constraints on operating on low-complexity RF and analog hardware chains are clarified. It illustrates how terminals can benefit from improved energy efficiency. The status of technology and real-life prototypes discussed. Open challenges and directions for future research are suggested.Comment: submitted to IEEE transactions on signal processin

    A Novel VLSI Architecture of Fixed-complexity Sphere Decoder

    Full text link
    Fixed-complexity Sphere Decoder (FSD) is a recently proposed technique for Multiple-Input Multiple-Output (MIMO) detection. It has several outstanding features such as constant throughput and large potential parallelism, which makes it suitable for efficient VLSI implementation. However, to our best knowledge, no VLSI implementation of FSD has been reported in the literature, although some FPGA prototypes of FSD with pipeline architecture have been developed. These solutions achieve very high throughput but at very high cost of hardware resources, making them impractical in real applications. In this paper, we present a novel four-nodes-per-cycle parallel architecture of FSD, with a breadth-first processing that allows for short critical path. The implementation achieves a throughput of 213.3 Mbps at 400 MHz clock frequency, at a cost of 0.18 mm2 Silicon area on 0.13{\mu}m CMOS technology. The proposed solution is much more economical compared with the existing FPGA implementations, and very suitable for practicl applications because of its balanced performance and hardware-complexity; moreover it has the flexibility to be expanded into an eight-nodes-per-cycle version in order to double the throughput.Comment: 8 pages, this paper has been accepted by the conference DSD 201
    corecore