48 research outputs found

    Baseband Processing for 5G and Beyond: Algorithms, VLSI Architectures, and Co-design

    Get PDF
    In recent years the number of connected devices and the demand for high data-rates have been significantly increased. This enormous growth is more pronounced by the introduction of the Internet of things (IoT) in which several devices are interconnected to exchange data for various applications like smart homes and smart cities. Moreover, new applications such as eHealth, autonomous vehicles, and connected ambulances set new demands on the reliability, latency, and data-rate of wireless communication systems, pushing forward technology developments. Massive multiple-input multiple-output (MIMO) is a technology, which is employed in the 5G standard, offering the benefits to fulfill these requirements. In massive MIMO systems, base station (BS) is equipped with a very large number of antennas, serving several users equipments (UEs) simultaneously in the same time and frequency resource. The high spatial multiplexing in massive MIMO systems, improves the data rate, energy and spectral efficiencies as well as the link reliability of wireless communication systems. The link reliability can be further improved by employing channel coding technique. Spatially coupled serially concatenated codes (SC-SCCs) are promising channel coding schemes, which can meet the high-reliability demands of wireless communication systems beyond 5G (B5G). Given the close-to-capacity error correction performance and the potential to implement a high-throughput decoder, this class of code can be a good candidate for wireless systems B5G. In order to achieve the above-mentioned advantages, sophisticated algorithms are required, which impose challenges on the baseband signal processing. In case of massive MIMO systems, the processing is much more computationally intensive and the size of required memory to store channel data is increased significantly compared to conventional MIMO systems, which are due to the large size of the channel state information (CSI) matrix. In addition to the high computational complexity, meeting latency requirements is also crucial. Similarly, the decoding-performance gain of SC-SCCs also do come at the expense of increased implementation complexity. Moreover, selecting the proper choice of design parameters, decoding algorithm, and architecture will be challenging, since spatial coupling provides new degrees of freedom in code design, and therefore the design space becomes huge. The focus of this thesis is to perform co-optimization in different design levels to address the aforementioned challenges/requirements. To this end, we employ system-level characteristics to develop efficient algorithms and architectures for the following functional blocks of digital baseband processing. First, we present a fast Fourier transform (FFT), an inverse FFT (IFFT), and corresponding reordering scheme, which can significantly reduce the latency of orthogonal frequency-division multiplexing (OFDM) demodulation and modulation as well as the size of reordering memory. The corresponding VLSI architectures along with the application specific integrated circuit (ASIC) implementation results in a 28 nm CMOS technology are introduced. In case of a 2048-point FFT/IFFT, the proposed design leads to 42% reduction in the latency and size of reordering memory. Second, we propose a low-complexity massive MIMO detection scheme. The key idea is to exploit channel sparsity to reduce the size of CSI matrix and eventually perform linear detection followed by a non-linear post-processing in angular domain using the compressed CSI matrix. The VLSI architecture for a massive MIMO with 128 BS antennas and 16 UEs along with the synthesis results in a 28 nm technology are presented. As a result, the proposed scheme reduces the complexity and required memory by 35%–73% compared to traditional detectors while it has better detection performance. Finally, we perform a comprehensive design space exploration for the SC-SCCs to investigate the effect of different design parameters on decoding performance, latency, complexity, and hardware cost. Then, we develop different decoding algorithms for the SC-SCCs and discuss the associated decoding performance and complexity. Also, several high-level VLSI architectures along with the corresponding synthesis results in a 12 nm process are presented, and various design tradeoffs are provided for these decoding schemes

    Low-Power Embedded Design Solutions and Low-Latency On-Chip Interconnect Architecture for System-On-Chip Design

    Get PDF
    This dissertation presents three design solutions to support several key system-on-chip (SoC) issues to achieve low-power and high performance. These are: 1) joint source and channel decoding (JSCD) schemes for low-power SoCs used in portable multimedia systems, 2) efficient on-chip interconnect architecture for massive multimedia data streaming on multiprocessor SoCs (MPSoCs), and 3) data processing architecture for low-power SoCs in distributed sensor network (DSS) systems and its implementation. The first part includes a low-power embedded low density parity check code (LDPC) - H.264 joint decoding architecture to lower the baseband energy consumption of a channel decoder using joint source decoding and dynamic voltage and frequency scaling (DVFS). A low-power multiple-input multiple-output (MIMO) and H.264 video joint detector/decoder design that minimizes energy for portable, wireless embedded systems is also designed. In the second part, a link-level quality of service (QoS) scheme using unequal error protection (UEP) for low-power network-on-chip (NoC) and low latency on-chip network designs for MPSoCs is proposed. This part contains WaveSync, a low-latency focused network-on-chip architecture for globally-asynchronous locally-synchronous (GALS) designs and a simultaneous dual-path routing (SDPR) scheme utilizing path diversity present in typical mesh topology network-on-chips. SDPR is akin to having a higher link width but without the significant hardware overhead associated with simple bus width scaling. The last part shows data processing unit designs for embedded SoCs. We propose a data processing and control logic design for a new radiation detection sensor system generating data at or above Peta-bits-per-second level. Implementation results show that the intended clock rate is achieved within the power target of less than 200mW. We also present a digital signal processing (DSP) accelerator supporting configurable MAC, FFT, FIR, and 3-D cross product operations for embedded SoCs. It consumes 12.35mW along with 0.167mm2 area at 333MHz

    LOW-COMPLEXITY AND HIGH-PERFORMANCE SOFT MIMO DETECTION BASED ON DISTRIBUTED M-ALGORITHM THROUGH TRELLIS-DIAGRAM

    Get PDF
    This paper presents a novel low-complexity multiple-input multipleoutput (MIMO) detection scheme using a distributed M-algorithm (DM) to achieve high performance soft MIMO detection. To reduce the searching complexity, we build a MIMO trellis graph and split the searching operations among different nodes, where each node will apply the M-algorithm. Instead of keeping a global candidate list as the traditional detector does, this algorithm keeps multiple small candidate lists to generate soft information. Since the DM algorithm can achieve good BER performance with a small M, the sorting cost of the DM algorithm is lower than that of the conventional K-best MIMO algorithm. The proposed algorithm is very suitable for high speed parallel processing.NokiaNokia Siemens Networks (NSN)XilinxNational Science Foundatio

    Soft MIMO Detection on Graphics Processing Units and Performance Study of Iterative MIMO Decoding

    Get PDF
    In this thesis we have presented an implementation of soft Multi Input Multi Output (MIMO) detection, single tree search algorithm on Graphics Processing Units (GPUs). We have compared its performance on different GPUs and a Central Processing Unit (CPU). We have also done a performance study of iterative decoding algorithms. We have shown that by increasing the number of outer iterations error rate performance can be further improved. GPUs are specialized devices specially designed to accelerate graphics processing. They are massively parallel devices which can run thousands of threads simultaneously. Because of their tremendous processing power there is an increasing interest in using them for scientific and general purpose computations. Hence companies like Nvidia, Advanced Micro Devices (AMD) etc. have started their support for General Purpose GPU (GPGPU) applications. Nvidia came up with Compute Unified Device Architecture (CUDA) to program its GPUs. Efforts are made to come up with a standard language for parallel computing that can be used across platforms. OpenCL is the first such language which is supported by all major GPU and CPU vendors. MIMO detector has a high computational complexity. We have implemented a soft MIMO detector on GPUs and studied its throughput and latency performance. We have shown that a GPU can give throughput of up to 4Mbps for a soft detection algorithm which is more than sufficient for most general purpose tasks like voice communication etc. Compare to CPU a throughput increase of ~7x is achieved. We also compared the performances of two GPUs one with low computational power and one with high computational power. These comparisons show effect of thread serialization on algorithms with the lower end GPU's execution time curve shows a slope of 1/2. To further improve error rate performance iterative decoding techniques are employed where a feedback path is employed between detector and decoder. With an eye towards GPU implementation we have explored these algorithms. Better error rate performance however, comes at a price of higher power dissipation and more latency. By simulations we have shown that one can predict based on the Signal to Noise Ratio (SNR) values how many iterations need to be done before getting an acceptable Bit Error Rate (BER) and Frame Error Rate (FER) performance. Iterative decoding technique shows that a SNR gain of ~1:5dB is achieved when number of outer iterations is increased from zero. To reduce the complexity one can adjust number of possible candidates the algorithm can generate. We showed that where a candidate list of 128 is not sufficient for acceptable error rate performance for a 4x4 MIMO system using 16-QAM modulation scheme, performances are comparable with the list size of 512 and 1024 respectively

    VLSI Implementation of Low Power Reconfigurable MIMO Detector

    Get PDF
    Multiple Input Multiple Output (MIMO) systems are a key technology for next generation high speed wireless communication standards like 802.11n, WiMax etc. MIMO enables spatial multiplexing to increase channel bandwidth which requires the use of multiple antennas in the receiver and transmitter side. The increase in bandwidth comes at the cost of high silicon complexity of MIMO detectors which result, due to the intricate algorithms required for the separation of these spatially multiplexed streams. Previous implementations of MIMO detector have mainly dealt with the issue of complexity reduction, latency minimization and throughput enhancement. Although, these detectors have successfully mapped algorithms to relatively simpler circuits but still, latency and throughput of these systems need further improvements to meet standard requirements. Additionally, most of these implementations don’t deal with the requirements of reconfigurability of the detector to multiple modulation schemes and different antennae configurations. This necessary requirement provides another dimension to the implementation of MIMO detector and adds to the implementation complexity. This thesis focuses on the efficient VLSI implementation of the MIMO detector with an emphasis on performance and re-configurability to different modulation schemes. MIMO decoding in our detector is based on the fixed sphere decoding algorithm which has been simplified for an effective VLSI implementation without considerably degrading the near optimal bit error rate performance. The regularity of the architecture makes it suitable for a highly parallel and pipelined implementation. The decoder has intrinsic traits for dynamic re-configurability to different modulation and encoding schemes. This detector architecture can be easily tuned for high/low performance requirements with slight degradation/improvement in Bit Error Rate (BER) depending on needs of the overlying application. Additionally, various architectural optimizations like pipelining, parallel processing, hardware scheduling, dynamic voltage and frequency scaling have been explored to improve the performance, energy requirements and re-configurability of the design

    Low-Power and Error-Resilient VLSI Circuits and Systems.

    Full text link
    Efficient low-power operation is critically important for the success of the next-generation signal processing applications. Device and supply voltage have been continuously scaled to meet a more constrained power envelope, but scaling has created resiliency challenges, including increasing timing faults and soft errors. Our research aims at designing low-power and robust circuits and systems for signal processing by drawing circuit, architecture, and algorithm approaches. To gain an insight into the system faults due to supply voltage reduction, we researched the two primary effects that determine the minimum supply voltage (VMIN) in Intel’s tri-gate CMOS technology, namely process variations and gate-dielectric soft breakdown. We determined that voltage scaling increases the timing window that sequential circuits are vulnerable. Thus, we proposed a new hold-time violation metric to define hold-time VMIN, which has been adopted as a new design standard. Device scaling increases soft errors which affect circuit reliability. Through extensive soft error characterization using two 65nm CMOS test chips, we studied the soft error mechanisms and its dependence on supply voltage and clock frequency. This study laid the foundation of the first 65nm DSP chip design for a NASA spaceflight project. To mitigate such random errors, we proposed a new confidence-driven architecture that effectively enhances the error resiliency of deeply scaled CMOS and post-CMOS circuits. Designing low-power resilient systems can effectively leverage application-specific algorithmic approaches. To explore design opportunities in the algorithmic domain, we demonstrate an application-specific detection and decoding processor for multiple-input multiple-output (MIMO) wireless communication. To enhance the receive error rate for a robust wireless communication, we designed a joint detection and decoding technique by enclosing detection and decoding in an iterative loop to enhance both interference cancellation and error reduction. A proof-of-concept chip design was fabricated for the next-generation 4x4 256QAM MIMO systems. Through algorithm-architecture optimizations and low-power circuit techniques, our design achieves significant improvements in throughput, energy efficiency and error rate, paving the way for future developments in this area.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/110323/1/uchchen_1.pd

    Récepteur itératif pour les systèmes MIMO-OFDM basé sur le décodage sphérique : convergence, performance et complexité

    Get PDF
    Recently, iterative processing has been widely considered to achieve near-capacity performance and reliable high data rate transmission, for future wireless communication systems. However, such an iterative processing poses significant challenges for efficient receiver design. In this thesis, iterative receiver combining multiple-input multiple-output (MIMO) detection with channel decoding is investigated for high data rate transmission. The convergence, the performance and the computational complexity of the iterative receiver for MIMO-OFDM system are considered. First, we review the most relevant hard-output and soft-output MIMO detection algorithms based on sphere decoding, K-Best decoding, and interference cancellation. Consequently, a low-complexity K-best (LCK- Best) based decoder is proposed in order to substantially reduce the computational complexity without significant performance degradation. We then analyze the convergence behaviors of combining these detection algorithms with various forward error correction codes, namely LTE turbo decoder and LDPC decoder with the help of Extrinsic Information Transfer (EXIT) charts. Based on this analysis, a new scheduling order of the required inner and outer iterations is suggested. The performance of the proposed receiver is evaluated in various LTE channel environments, and compared with other MIMO detection schemes. Secondly, the computational complexity of the iterative receiver with different channel coding techniques is evaluated and compared for different modulation orders and coding rates. Simulation results show that our proposed approaches achieve near optimal performance but more importantly it can substantially reduce the computational complexity of the system. From a practical point of view, fixed-point representation is usually used in order to reduce the hardware costs in terms of area, power consumption and execution time. Therefore, we present efficient fixed point arithmetic of the proposed iterative receiver based on LC-KBest decoder. Additionally, the impact of the channel estimation on the system performance is studied. The proposed iterative receiver is tested in a real-time environment using the MIMO WARP platform.Pour permettre l’accroissement de débit et de robustesse dans les futurs systèmes de communication sans fil, les processus itératifs sont de plus considérés dans les récepteurs. Cependant, l’adoption d’un traitement itératif pose des défis importants dans la conception du récepteur. Dans cette thèse, un récepteur itératif combinant les techniques de détection multi-antennes avec le décodage de canal est étudié. Trois aspects sont considérés dans un contexte MIMOOFDM: la convergence, la performance et la complexité du récepteur. Dans un premier temps, nous étudions les différents algorithmes de détection MIMO à décision dure et souple basés sur l’égalisation, le décodage sphérique, le décodage K-Best et l’annulation d’interférence. Un décodeur K-best de faible complexité (LC-K-Best) est proposé pour réduire la complexité sans dégradation significative des performances. Nous analysons ensuite la convergence de la combinaison de ces algorithmes de détection avec différentes techniques de codage de canal, notamment le décodeur turbo et le décodeur LDPC en utilisant le diagramme EXIT. En se basant sur cette analyse, un nouvel ordonnancement des itérations internes et externes nécessaires est proposé. Les performances du récepteur ainsi proposé sont évaluées dans différents modèles de canal LTE, et comparées avec différentes techniques de détection MIMO. Ensuite, la complexité des récepteurs itératifs avec différentes techniques de codage de canal est étudiée et comparée pour différents modulations et rendement de code. Les résultats de simulation montrent que les approches proposées offrent un bon compromis entre performance et complexité. D’un point de vue implémentation, la représentation en virgule fixe est généralement utilisée afin de réduire les coûts en termes de surface, de consommation d’énergie et de temps d’exécution. Nous présentons ainsi une représentation en virgule fixe du récepteur itératif proposé basé sur le décodeur LC K-Best. En outre, nous étudions l’impact de l’estimation de canal sur la performance du système. Finalement, le récepteur MIMOOFDM itératif est testé sur la plateforme matérielle WARP, validant le schéma proposé

    Cooperative Partial Detection for MIMO Relay Networks

    Get PDF
    This paper was submitted by the author prior to final official version. For official version please see http://hdl.handle.net/1911/64372Cooperative communication has recently re-emerged as a possible paradigm shift to realize the promises of the ever increasing wireless communication market; how- ever, there have been few, if any, studies to translate theoretical results into feasi- ble schemes with their particular practical challenges. The multiple-input multiple- output (MIMO) technique is another method that has been recently employed in different standards and protocols, often as an optional scenario, to further improve the reliability and data rate of different wireless communication applications. In this work, we look into possible methods and algorithms for combining these two tech- niques to take advantage of the benefits of both. In this thesis, we will consider methods that consider the limitations of practical solutions, which, to the best of our knowledge, are the first time to be considered in this context. We will present complexity reduction techniques for MIMO systems in cooperative systems. Furthermore, we will present architectures for flexible and configurable MIMO detectors. These architectures could support a range of data rates, modulation orders and numbers of antennas, and therefore, are crucial in the different nodes of cooperative systems. The breadth-first search employed in our realization presents a large opportunity to exploit the parallelism of the FPGA in order to achieve high data rates. Algorithmic modifications to address potential sequential bottlenecks in the traditional bread-first search-based SD are highlighted in the thesis. We will present a novel Cooperative Partial Detection (CPD) approach in MIMO relay channels, where instead of applying the conventional full detection in the relay, the relay performs a partial detection and forwards the detected parts of the message to the destination. We will demonstrate how this approach leads to controlling the complexity in the relay and helping it choose how much it is willing to cooperate based on its available resources. We will discuss the complexity implications of this method, and more importantly, present hardware verification and over-the-air experimentation of CPD using the Wireless Open-access Research Platform (WARP).NSF grants EIA-0321266, CCF-0541363, CNS-0551692, CNS-0619767, EECS-0925942, and CNS-0923479, Nokia, Xilinx, Nokia Siemens Networks, Texas Instruments, and Azimuth Systems
    corecore