71 research outputs found

    High-Throughput Soft-Output MIMO Detector Based on Path-Preserving Trellis-Search Algorithm

    Get PDF
    In this paper, we propose a novel path-preserving trellis-search (PPTS) algorithm and its high-speed VLSI architecture for soft-output multiple-input-multiple-output (MIMO) detection. We represent the search space of the MIMO signal with an unconstrained trellis, where each node in stage of the trellis maps to a possible complex-valued symbol transmitted by antenna. Based on the trellis model, we convert the soft-output MIMO detection problem into a multiple shortest paths problem subject to the constraint that every trellis node must be covered in this set of paths. The PPTS detector is guaranteed to have soft information for every possible symbol transmitted on every antenna so that the log-likelihood ratio (LLR) for each transmitted data bit can be more accurately formed. Simulation results show that the PPTS algorithm can achieve near-optimal error performance with a low search complexity. The PPTS algorithm is a hardware-friendly data-parallel algorithm because the search operations are evenly distributed among multiple trellis nodes for parallel processing. As a case study, we have designed and synthesized a fully-parallel systolic-array detector and two folded detectors for a 4x4 16-QAM system using a 1.08 V TSMC 65-nm CMOS technology.With a 1.18 mm2 core area, the folded detector can achieve a throughput of 2.1 Gbps.With a 3.19 mm2 core area, the fully-parallel systolic-array detector can achieve a throughput of 6.4 Gbps

    Low complexity MIMO detection algorithms and implementations

    Get PDF
    University of Minnesota Ph.D. dissertation. December 2014. Major: Electrical Engineering. Advisor: Gerald E. Sobelman. 1 computer file (PDF); ix, 111 pages.MIMO techniques use multiple antennas at both the transmitter and receiver sides to achieve diversity gain, multiplexing gain, or both. One of the key challenges in exploiting the potential of MIMO systems is to design high-throughput, low-complexity detection algorithms while achieving near-optimal performance. In this thesis, we design and optimize algorithms for MIMO detection and investigate the associated performance and FPGA implementation aspects.First, we study and optimize a detection algorithm developed by Shabany and Gulak for a K-Best based high throughput and low energy hard output MIMO detection and expand it to the complex domain. The new method uses simple lookup tables, and it is fully scalable for a wide range of K-values and constellation sizes. This technique reduces the computational complexity, without sacrificing performance and the complexity scales only sub-linearly with the constellation size. Second, we apply the bidirectional technique to trellis search and propose a high performance soft output bidirectional path preserving trellis search (PPTS) detector for MIMO systems. The comparative error analysis between single direction and bidirectional PPTS detectors is given. We demonstrate that the bidirectional PPTS detector can minimize the detection error. Next, we design a novel bidirectional processing algorithm for soft-output MIMO systems. It combines features from several types of fixed complexity tree search procedures. The proposed approach achieves a higher performance than previously proposed algorithms and has a comparable computational cost. Moreover, its parallel nature and fixed throughput characteristics make it attractive for very large scale integration (VLSI) implementation.Following that, we present a novel low-complexity hard output MIMO detection algorithm for LTE and WiFi applications. We provide a well-defined tradeoff between computational complexity and performance. The proposed algorithm uses a much smaller number of Euclidean distance (ED) calculations while attaining only a 0.5dB loss compared to maximum likelihood detection (MLD). A 3x3 MIMO system with a 16QAM detector architecture is designed, and the latency and hardware costs are estimated.Finally, we present a stochastic computing implementation of trigonometric and hyperbolic functions which can be used for QR decomposition and other wireless communications and signal processing applications

    Advanced Wireless Digital Baseband Signal Processing Beyond 100 Gbit/s

    Get PDF
    International audienceThe continuing trend towards higher data rates in wireless communication systems will, in addition to a higher spectral efficiency and lowest signal processing latencies, lead to throughput requirements for the digital baseband signal processing beyond 100 Gbit/s, which is at least one order of magnitude higher than the tens of Gbit/s targeted in the 5G standardization. At the same time, advances in silicon technology due to shrinking feature sizes and increased performance parameters alone won't provide the necessary gain, especially in energy efficiency for wireless transceivers, which have tightly constrained power and energy budgets. In this paper, we highlight the challenges for wireless digital baseband signal processing beyond 100 Gbit/s and the limitations of today's architectures. Our focus lies on the channel decoding and MIMO detection, which are major sources of complexity in digital baseband signal processing. We discuss techniques on algorithmic and architectural level, which aim to close this gap. For the first time we show Turbo-Code decoding techniques towards 100 Gbit/s and a complete MIMO receiver beyond 100 Gbit/s in 28 nm technology

    A Parallel Radix-Sort-Based VLSI Architecture for Finding the First W Maximum/Minimum Values

    Get PDF
    Very-large-scale integration (VLSI) architectures for finding the first W (W>2) maximum (or minimum) values are required in the implementation of several applications such as nonbinary low-density-parity-check decoders, K-best multiple-input–multiple-output (MIMO) detectors, and turbo product codes. In this brief, a parallel radix-sort-based VLSI architecture for finding the first W maximum (or minimum) values is proposed. The described architecture, called Bit-Wise-And (BWA) architecture, relies on analyzing input data from the most significant bit to the least significant one, with very simple logic circuits. One key feature in the BWA architecture is its high level of scalability, which enables the adoption of this solution in a large spectrum of applications, corresponding to large ranges for both W and the size of the input data set. Experimental results, achieved by implementing the proposed architecture on a high-speed 90-nm CMOS standard-cell technology, show that BWA architecture requires significantly less area than other solutions available in the literature, i.e., less than or about 50% in all the considered cases and about 50% in the worst case. Moreover, the BWA architecture exhibits the lowest area–delay product among almost all considered cases

    Baseband Processing for 5G and Beyond: Algorithms, VLSI Architectures, and Co-design

    Get PDF
    In recent years the number of connected devices and the demand for high data-rates have been significantly increased. This enormous growth is more pronounced by the introduction of the Internet of things (IoT) in which several devices are interconnected to exchange data for various applications like smart homes and smart cities. Moreover, new applications such as eHealth, autonomous vehicles, and connected ambulances set new demands on the reliability, latency, and data-rate of wireless communication systems, pushing forward technology developments. Massive multiple-input multiple-output (MIMO) is a technology, which is employed in the 5G standard, offering the benefits to fulfill these requirements. In massive MIMO systems, base station (BS) is equipped with a very large number of antennas, serving several users equipments (UEs) simultaneously in the same time and frequency resource. The high spatial multiplexing in massive MIMO systems, improves the data rate, energy and spectral efficiencies as well as the link reliability of wireless communication systems. The link reliability can be further improved by employing channel coding technique. Spatially coupled serially concatenated codes (SC-SCCs) are promising channel coding schemes, which can meet the high-reliability demands of wireless communication systems beyond 5G (B5G). Given the close-to-capacity error correction performance and the potential to implement a high-throughput decoder, this class of code can be a good candidate for wireless systems B5G. In order to achieve the above-mentioned advantages, sophisticated algorithms are required, which impose challenges on the baseband signal processing. In case of massive MIMO systems, the processing is much more computationally intensive and the size of required memory to store channel data is increased significantly compared to conventional MIMO systems, which are due to the large size of the channel state information (CSI) matrix. In addition to the high computational complexity, meeting latency requirements is also crucial. Similarly, the decoding-performance gain of SC-SCCs also do come at the expense of increased implementation complexity. Moreover, selecting the proper choice of design parameters, decoding algorithm, and architecture will be challenging, since spatial coupling provides new degrees of freedom in code design, and therefore the design space becomes huge. The focus of this thesis is to perform co-optimization in different design levels to address the aforementioned challenges/requirements. To this end, we employ system-level characteristics to develop efficient algorithms and architectures for the following functional blocks of digital baseband processing. First, we present a fast Fourier transform (FFT), an inverse FFT (IFFT), and corresponding reordering scheme, which can significantly reduce the latency of orthogonal frequency-division multiplexing (OFDM) demodulation and modulation as well as the size of reordering memory. The corresponding VLSI architectures along with the application specific integrated circuit (ASIC) implementation results in a 28 nm CMOS technology are introduced. In case of a 2048-point FFT/IFFT, the proposed design leads to 42% reduction in the latency and size of reordering memory. Second, we propose a low-complexity massive MIMO detection scheme. The key idea is to exploit channel sparsity to reduce the size of CSI matrix and eventually perform linear detection followed by a non-linear post-processing in angular domain using the compressed CSI matrix. The VLSI architecture for a massive MIMO with 128 BS antennas and 16 UEs along with the synthesis results in a 28 nm technology are presented. As a result, the proposed scheme reduces the complexity and required memory by 35%–73% compared to traditional detectors while it has better detection performance. Finally, we perform a comprehensive design space exploration for the SC-SCCs to investigate the effect of different design parameters on decoding performance, latency, complexity, and hardware cost. Then, we develop different decoding algorithms for the SC-SCCs and discuss the associated decoding performance and complexity. Also, several high-level VLSI architectures along with the corresponding synthesis results in a 12 nm process are presented, and various design tradeoffs are provided for these decoding schemes

    Transmission strategies for broadband wireless systems with MMSE turbo equalization

    Get PDF
    This monograph details efficient transmission strategies for single-carrier wireless broadband communication systems employing iterative (turbo) equalization. In particular, the first part focuses on the design and analysis of low complexity and robust MMSE-based turbo equalizers operating in the frequency domain. Accordingly, several novel receiver schemes are presented which improve the convergence properties and error performance over the existing turbo equalizers. The second part discusses concepts and algorithms that aim to increase the power and spectral efficiency of the communication system by efficiently exploiting the available resources at the transmitter side based upon the channel conditions. The challenging issue encountered in this context is how the transmission rate and power can be optimized, while a specific convergence constraint of the turbo equalizer is guaranteed.Die vorliegende Arbeit beschäftigt sich mit dem Entwurf und der Analyse von effizienten Übertragungs-konzepten für drahtlose, breitbandige Einträger-Kommunikationssysteme mit iterativer (Turbo-) Entzerrung und Kanaldekodierung. Dies beinhaltet einerseits die Entwicklung von empfängerseitigen Frequenzbereichs-entzerrern mit geringer Komplexität basierend auf dem Prinzip der Soft Interference Cancellation Minimum-Mean Squared-Error (SC-MMSE) Filterung und andererseits den Entwurf von senderseitigen Algorithmen, die durch Ausnutzung von Kanalzustandsinformationen die Bandbreiten- und Leistungseffizienz in Ein- und Mehrnutzersystemen mit Mehrfachantennen (sog. Multiple-Input Multiple-Output (MIMO)) verbessern. Im ersten Teil dieser Arbeit wird ein allgemeiner Ansatz für Verfahren zur Turbo-Entzerrung nach dem Prinzip der linearen MMSE-Schätzung, der nichtlinearen MMSE-Schätzung sowie der kombinierten MMSE- und Maximum-a-Posteriori (MAP)-Schätzung vorgestellt. In diesem Zusammenhang werden zwei neue Empfängerkonzepte, die eine Steigerung der Leistungsfähigkeit und Verbesserung der Konvergenz in Bezug auf existierende SC-MMSE Turbo-Entzerrer in verschiedenen Kanalumgebungen erzielen, eingeführt. Der erste Empfänger - PDA SC-MMSE - stellt eine Kombination aus dem Probabilistic-Data-Association (PDA) Ansatz und dem bekannten SC-MMSE Entzerrer dar. Im Gegensatz zum SC-MMSE nutzt der PDA SC-MMSE eine interne Entscheidungsrückführung, so dass zur Unterdrückung von Interferenzen neben den a priori Informationen der Kanaldekodierung auch weiche Entscheidungen der vorherigen Detektions-schritte berücksichtigt werden. Durch die zusätzlich interne Entscheidungsrückführung erzielt der PDA SC-MMSE einen wesentlichen Gewinn an Performance in räumlich unkorrelierten MIMO-Kanälen gegenüber dem SC-MMSE, ohne dabei die Komplexität des Entzerrers wesentlich zu erhöhen. Der zweite Empfänger - hybrid SC-MMSE - bildet eine Verknüpfung von gruppenbasierter SC-MMSE Frequenzbereichsfilterung und MAP-Detektion. Dieser Empfänger besitzt eine skalierbare Berechnungskomplexität und weist eine hohe Robustheit gegenüber räumlichen Korrelationen in MIMO-Kanälen auf. Die numerischen Ergebnisse von Simulationen basierend auf Messungen mit einem Channel-Sounder in Mehrnutzerkanälen mit starken räumlichen Korrelationen zeigen eindrucksvoll die Überlegenheit des hybriden SC-MMSE-Ansatzes gegenüber dem konventionellen SC-MMSE-basiertem Empfänger. Im zweiten Teil wird der Einfluss von System- und Kanalmodellparametern auf die Konvergenzeigenschaften der vorgestellten iterativen Empfänger mit Hilfe sogenannter Korrelationsdiagramme untersucht. Durch semi-analytische Berechnungen der Entzerrer- und Kanaldecoder-Korrelationsfunktionen wird eine einfache Berechnungsvorschrift zur Vorhersage der Bitfehlerwahrscheinlichkeit von SC-MMSE und PDA SC-MMSE Turbo Entzerrern für MIMO-Fadingkanäle entwickelt. Des Weiteren werden zwei Fehlerschranken für die Ausfallwahrscheinlichkeit der Empfänger vorgestellt. Die semi-analytische Methode und die abgeleiteten Fehlerschranken ermöglichen eine aufwandsgeringe Abschätzung sowie Optimierung der Leistungsfähigkeit des iterativen Systems. Im dritten und abschließenden Teil werden Strategien zur Raten- und Leistungszuweisung in Kommunikationssystemen mit konventionellen iterativen SC-MMSE Empfängern untersucht. Zunächst wird das Problem der Maximierung der instantanen Summendatenrate unter der Berücksichtigung der Konvergenz des iterativen Empfängers für einen Zweinutzerkanal mit fester Leistungsallokation betrachtet. Mit Hilfe des Flächentheorems von Extrinsic-Information-Transfer (EXIT)-Funktionen wird eine obere Schranke für die erreichbare Ratenregion hergeleitet. Auf Grundlage dieser Schranke wird ein einfacher Algorithmus entwickelt, der für jeden Nutzer aus einer Menge von vorgegebenen Kanalcodes mit verschiedenen Codierraten denjenigen auswählt, der den instantanen Datendurchsatz des Mehrnutzersystems verbessert. Neben der instantanen Ratenzuweisung wird auch ein ausfallbasierter Ansatz zur Ratenzuweisung entwickelt. Hierbei erfolgt die Auswahl der Kanalcodes für die Nutzer unter Berücksichtigung der Einhaltung einer bestimmten Ausfallwahrscheinlichkeit (outage probability) des iterativen Empfängers. Des Weiteren wird ein neues Entwurfskriterium für irreguläre Faltungscodes hergeleitet, das die Ausfallwahrscheinlichkeit von Turbo SC-MMSE Systemen verringert und somit die Zuverlässigkeit der Datenübertragung erhöht. Eine Reihe von Simulationsergebnissen von Kapazitäts- und Durchsatzberechnungen werden vorgestellt, die die Wirksamkeit der vorgeschlagenen Algorithmen und Optimierungsverfahren in Mehrnutzerkanälen belegen. Abschließend werden außerdem verschiedene Maßnahmen zur Minimierung der Sendeleistung in Einnutzersystemen mit senderseitiger Singular-Value-Decomposition (SVD)-basierter Vorcodierung untersucht. Es wird gezeigt, dass eine Methode, welche die Leistungspegel des Senders hinsichtlich der Bitfehlerrate des iterativen Empfängers optimiert, den konventionellen Verfahren zur Leistungszuweisung überlegen ist

    Soft-output detection for transit antenna index modulation-based schemes.

    Get PDF
    Master of Sciences in Electronic Engineering. University of KwaZulu-Natal, Durban 2016.Abstract available in PDF file

    VLSI architectures design for encoders of High Efficiency Video Coding (HEVC) standard

    Get PDF
    The growing popularity of high resolution video and the continuously increasing demands for high quality video on mobile devices are producing stronger needs for more efficient video encoder. Concerning these desires, HEVC, a newest video coding standard, has been developed by a joint team formed by ISO/IEO MPEG and ITU/T VCEG. Its design goal is to achieve a 50% compression gain over its predecessor H.264 with an equal or even higher perceptual video quality. Motion Estimation (ME) being as one of the most critical module in video coding contributes almost 50%-70% of computational complexity in the video encoder. This high consumption of the computational resources puts a limit on the performance of encoders, especially for full HD or ultra HD videos, in terms of coding speed, bit-rate and video quality. Thus the major part of this work concentrates on the computational complexity reduction and improvement of timing performance of motion estimation algorithms for HEVC standard. First, a new strategy to calculate the SAD (Sum of Absolute Difference) for motion estimation is designed based on the statistics on property of pixel data of video sequences. This statistics demonstrates the size relationship between the sum of two sets of pixels has a determined connection with the distribution of the size relationship between individual pixels from the two sets. Taking the advantage of this observation, only a small proportion of pixels is necessary to be involved in the SAD calculation. Simulations show that the amount of computations required in the full search algorithm is reduced by about 58% on average and up to 70% in the best case. Secondly, from the scope of parallelization an enhanced TZ search for HEVC is proposed using novel schemes of multiple MVPs (motion vector predictor) and shared MVP. Specifically, resorting to multiple MVPs the initial search process is performed in parallel at multiple search centers, and the ME processing engine for PUs within one CU are parallelized based on the MVP sharing scheme on CU (coding unit) level. Moreover, the SAD module for ME engine is also parallelly implemented for PU size of 32×32. Experiments indicate it achieves an appreciable improvement on the throughput and coding efficiency of the HEVC video encoder. In addition, the other part of this thesis is contributed to the VLSI architecture design for finding the first W maximum/minimum values targeting towards high speed and low hardware cost. The architecture based on the novel bit-wise AND scheme has only half of the area of the best reference solution and its critical path delay is comparable with other implementations. While the FPCG (full parallel comparison grid) architecture, which utilizes the optimized comparator-based structure, achieves 3.6 times faster on average on the speed and even 5.2 times faster at best comparing with the reference architectures. Finally the architecture using the partial sorting strategy reaches a good balance on the timing performance and area, which has a slightly lower or comparable speed with FPCG architecture and a acceptable hardware cost

    Overcoming CubeSat downlink limits with VITAMIN: a new variable coded modulation protocol

    Get PDF
    Thesis (M.S.) University of Alaska Fairbanks, 2013Many space missions, including low earth orbit CubeSats, communicate in a highly dynamic environment because of variations in geometry, weather, and interference. At the same time, most missions communicate using fixed channel codes, modulations, and symbol rates, resulting in a constant data rate that does not adapt to the dynamic conditions. When conditions are good, the fixed date rate can be far below the theoretical maximum, called the Shannon limit; when conditions are bad, the fixed data rate may not work at all. To move beyond these fixed communications and achieve higher total data volume from emerging high-tech instruments, this thesis investigates the use of error correcting codes and different modulations. Variable coded modulation (VCM) takes advantage of the dynamic link by transmitting more information when the signal-to-noise ratio (SNR) is high. Likewise, VCM can throttle down the information rate when SNR is low without having to stop all communications. VCM outperforms fixed communications which can only operate at a fixed information rate as long as a certain signal threshold is met. This thesis presents a new VCM protocol and tests its performance in both software and hardware simulations. The protocol is geared towards CubeSat downlinks as complexity is focused in the receiver, while the transmission operations are kept simple. This thesis explores bin-packing as a way to optimize the selection of VCM modes based on expected SNR levels over time. Working end-to-end simulations were created using MATLAB and LabVIEW, while the hardware simulations were done with software defined radios. Results show that a CubeSat using VCM communications will deliver twice the data throughput of a fixed communications system
    corecore