162 research outputs found

    Implementation of a High Throughput Soft MIMO Detector on GPU

    Get PDF
    Multiple-input multiple-output (MIMO) significantly increases the throughput of a communication system by employing multiple antennas at the transmitter and the receiver. To extract maximum performance from a MIMO system, a computationally intensive search based detector is needed. To meet the challenge of MIMO detection, typical suboptimal MIMO detectors are ASIC or FPGA designs. We aim to show that a MIMO detector on Graphic processor unit (GPU), a low-cost parallel programmable co-processor, can achieve high throughput and can serve as an alternative to ASIC/FPGA designs. However, careful architecture aware software design is needed to leverage the performance offered by GPU. We propose a novel soft MIMO detection algorithm, multi-pass trellis traversal (MTT), and show that we can achieve ASIC/FPGA-like performance and handle different configurations in software on GPU. The proposed design can be used to accelerate wireless physical layer simulations and to offload MIMO detection processing in wireless testbed platforms.NokiaNokia Siemens Networks (NSN)Texas InstrumentsXilinxNational Science Foundatio

    An efficient GPU implementation of fixed-complexity sphere decoders for MIMO wireless systems

    Full text link
    The use of many-core processors such as general purpose Graphic Processing Units (GPUs) has recently become attractive for the efficient implementation of signal processing algorithms for communication systems. This is due to the cost-effectiveness of GPUs together with their potential capability of parallel processing. This paper presents an implementation of the widely employed fixed-complexity sphere decoder on GPUs, which allows to considerably decrease the computational time required for the data detection stage in multiple-input multiple-output systems. Both, the hard-and soft-output versions of the method have been implemented. Speedup results show the proposed GPU implementation boosts the runtime of the parallel execution of the methods in a high performance multi-core CPU. In addition, the throughput of the algorithm is evaluated and is shown to outperform other recent implementations and to fulfill the real-time requirements of several LTE configurations. ©2012-IOS Press and the authors. All rights reserved.This work was partially funded by the TEC2009-13741 project of the Spanish Ministry of Science and by the PROMETEO/2009/013 project of the Generalitat Valenciana.Roger Varea, S.; Ramiro Sánchez, C.; González Salvador, A.; Almenar Terré, V.; Vidal Maciá, AM. (2012). An efficient GPU implementation of fixed-complexity sphere decoders for MIMO wireless systems. Integrated Computer-Aided Engineering. 19(4):341-350. https://doi.org/10.3233/ICA-2012-0410S34135019

    Probabilistically Bounded Soft Sphere Detection for MIMO-OFDM Receivers: Algorithm and System Architecture

    Get PDF
    Iterative soft detection and channel decoding for MIMO OFDM downlink receivers is studied in this work. Proposed inner soft sphere detection employs a variable upper bound for number of candidates per transmit antenna and utilizes the breath-first candidate-search algorithm. Upper bounds are based on probability distribution of the number of candidates found inside the spherical region formed around the received symbol-vector. Detection accuracy of unbounded breadth-first candidate search is preserved while significant reduction of the search latency and area cost is achieved. This probabilistically bounded candidate-search algorithm improves error-rate performance of non-probabilistically bounded soft sphere detection algorithms, while providing smaller detection latency with same hardware resources. Prototype architecture of soft sphere detector is synthesized on Xilinx FPGA and for an ASIC design. Using area-cost of a single soft sphere detector, a level of processing parallelism required to achieve targeted high data rates for future wireless systems (for example, 1 Gbps data rate) is determined.NokiaNational Science Foundatio

    Efficient Implementation of MIMO Decoders

    Get PDF

    Baseband Processing for 5G and Beyond: Algorithms, VLSI Architectures, and Co-design

    Get PDF
    In recent years the number of connected devices and the demand for high data-rates have been significantly increased. This enormous growth is more pronounced by the introduction of the Internet of things (IoT) in which several devices are interconnected to exchange data for various applications like smart homes and smart cities. Moreover, new applications such as eHealth, autonomous vehicles, and connected ambulances set new demands on the reliability, latency, and data-rate of wireless communication systems, pushing forward technology developments. Massive multiple-input multiple-output (MIMO) is a technology, which is employed in the 5G standard, offering the benefits to fulfill these requirements. In massive MIMO systems, base station (BS) is equipped with a very large number of antennas, serving several users equipments (UEs) simultaneously in the same time and frequency resource. The high spatial multiplexing in massive MIMO systems, improves the data rate, energy and spectral efficiencies as well as the link reliability of wireless communication systems. The link reliability can be further improved by employing channel coding technique. Spatially coupled serially concatenated codes (SC-SCCs) are promising channel coding schemes, which can meet the high-reliability demands of wireless communication systems beyond 5G (B5G). Given the close-to-capacity error correction performance and the potential to implement a high-throughput decoder, this class of code can be a good candidate for wireless systems B5G. In order to achieve the above-mentioned advantages, sophisticated algorithms are required, which impose challenges on the baseband signal processing. In case of massive MIMO systems, the processing is much more computationally intensive and the size of required memory to store channel data is increased significantly compared to conventional MIMO systems, which are due to the large size of the channel state information (CSI) matrix. In addition to the high computational complexity, meeting latency requirements is also crucial. Similarly, the decoding-performance gain of SC-SCCs also do come at the expense of increased implementation complexity. Moreover, selecting the proper choice of design parameters, decoding algorithm, and architecture will be challenging, since spatial coupling provides new degrees of freedom in code design, and therefore the design space becomes huge. The focus of this thesis is to perform co-optimization in different design levels to address the aforementioned challenges/requirements. To this end, we employ system-level characteristics to develop efficient algorithms and architectures for the following functional blocks of digital baseband processing. First, we present a fast Fourier transform (FFT), an inverse FFT (IFFT), and corresponding reordering scheme, which can significantly reduce the latency of orthogonal frequency-division multiplexing (OFDM) demodulation and modulation as well as the size of reordering memory. The corresponding VLSI architectures along with the application specific integrated circuit (ASIC) implementation results in a 28 nm CMOS technology are introduced. In case of a 2048-point FFT/IFFT, the proposed design leads to 42% reduction in the latency and size of reordering memory. Second, we propose a low-complexity massive MIMO detection scheme. The key idea is to exploit channel sparsity to reduce the size of CSI matrix and eventually perform linear detection followed by a non-linear post-processing in angular domain using the compressed CSI matrix. The VLSI architecture for a massive MIMO with 128 BS antennas and 16 UEs along with the synthesis results in a 28 nm technology are presented. As a result, the proposed scheme reduces the complexity and required memory by 35%–73% compared to traditional detectors while it has better detection performance. Finally, we perform a comprehensive design space exploration for the SC-SCCs to investigate the effect of different design parameters on decoding performance, latency, complexity, and hardware cost. Then, we develop different decoding algorithms for the SC-SCCs and discuss the associated decoding performance and complexity. Also, several high-level VLSI architectures along with the corresponding synthesis results in a 12 nm process are presented, and various design tradeoffs are provided for these decoding schemes

    Multicore implementation of a fixed-complexity tree-search detector for MIMO communications

    Full text link
    [EN] Multicore systems allow the efficient implementation of signal processing algorithms for communication systems due to their high parallel processing capabilities. In this paper, we present a high-throughput multicore implementation of a fixed-complexity tree-search-based detector interesting for MIMO wireless communication systems. Experimental results confirm that this implementation allows to accelerate the data detection stage for different constellation sizes and number of subcarriers.This work was supported by the TEC2009-13741 project of the Spanish Ministry of Science, by the PROMETEO/2009/013 project and ACOMP/2012/076 of the Generalitat Valenciana, and the Vicerrectorado de Investigacion de la UPV through Programa de Apoyo a la Investigacion y desarrollo (PAID-05-11-2898).Ramiro Sánchez, C.; Roger Varea, S.; Gonzalez, A.; Almenar Terré, V.; Vidal Maciá, AM. (2013). Multicore implementation of a fixed-complexity tree-search detector for MIMO communications. The Journal of Supercomputing (Online). 65(3):1010-1019. https://doi.org/10.1007/s11227-012-0839-xS10101019653Paulraj AJ, Gore DA, Nabar RU, Bölcskei H (2004) An overview of MIMO communications—a key to gigabit wireless. Proc IEEE 92(2):198–218Jiang M, Hanzo L (2007) Multiuser MIMO-OFDM for next-generation wireless systems. Proc IEEE 95(7):1430–14693GPP TS 36.201, V10.0.0, Evolved Universal Terrestrial Radio Access (E-UTRA); Physical layer—general description, December 2010Lin Y, Lee H, Woh M, Harel Y, Mahlke S, Mudge T, Chakrabarti C, Flautner K (2007) SODA: a high-performance DSP architecture for software-defined radio. IEEE MICRO 27(1):114–123Yang C-H, Markovic D (2008) A multi-core sphere decoder VLSI architecture for MIMO communications. In: Global telecommunications conference, November, pp 1–6Wu D, Eilert J, Liu D (2011) Implementation of a high-speed MIMO soft-output symbol detector for software defined radio. J Signal Process Syst 63(1):27–37Tan K, Liu H, Zhang J, Zhang Y, Fang J, Voelker GM (2011) Sora: high-performance software radio using general-purpose multi-core processors. Communun ACM 54(1):99–107Roger S, Ramiro C, Gonzalez A, Almenar V, Vidal AM (2012) An efficient GPU implementation of fixed-complexity sphere decoders for MIMO wireless systems. Integr Comput-Aided Eng 19(4):341–350Chen Y-K et al (2009) Signal processing on platforms with multiple cores: Part 1-Overview and methodologies. IEEE Signal Proc Mag 6:24–25Karam LJ, AlKamal I, Gatherer A, Frantz GA, Anderson DV, Evans BL (2009) Trends in multicore DSP platforms. IEEE Signal Process Mag 26(6):38–49Barbero LG, Thompson JS (2008) Fixing the complexity of the sphere decoder for MIMO detection. IEEE Trans Wirel Commun 7(6):2131–2142Hassibi B, Vikalo H (2005) On sphere decoding algorithm. Part I, The expected complexity. IEEE Trans Signal Process 54(5):2806–2818Agrell E, Eriksson T, Vardy A, Zeger K (2002) Closest point search in lattices. IEEE Trans Inf Theory 48(8):2201–2214OpenMP v3.0, http://www.openmp.org/mp-documents/spec30.pdf , May 200

    FPGA based technical solutions for high throughput data processing and encryption for 5G communication: A review

    Get PDF
    The field programmable gate array (FPGA) devices are ideal solutions for high-speed processing applications, given their flexibility, parallel processing capability, and power efficiency. In this review paper, at first, an overview of the key applications of FPGA-based platforms in 5G networks/systems is presented, exploiting the improved performances offered by such devices. FPGA-based implementations of cloud radio access network (C-RAN) accelerators, network function virtualization (NFV)-based network slicers, cognitive radio systems, and multiple input multiple output (MIMO) channel characterizers are the main considered applications that can benefit from the high processing rate, power efficiency and flexibility of FPGAs. Furthermore, the implementations of encryption/decryption algorithms by employing the Xilinx Zynq Ultrascale+MPSoC ZCU102 FPGA platform are discussed, and then we introduce our high-speed and lightweight implementation of the well-known AES-128 algorithm, developed on the same FPGA platform, and comparing it with similar solutions already published in the literature. The comparison results indicate that our AES-128 implementation enables efficient hardware usage for a given data-rate (up to 28.16 Gbit/s), resulting in higher efficiency (8.64 Mbps/slice) than other considered solutions. Finally, the applications of the ZCU102 platform for high-speed processing are explored, such as image and signal processing, visual recognition, and hardware resource management
    corecore