316 research outputs found

    A CORDIC based QR Decomposition Technique for MIMO Detection

    Get PDF
    CORDIC based improved real and complex QR Decomposition (QRD) for channel pre-processing operations in (Multiple-Input Multiple-Output) MIMO detectors are presented in this paper. The proposed design utilizes pipelining and parallel processing techniques and reduces the latency and hardware complexity of the module respectively. Computational complexity analysis report shows the superiority of our module by 16% compared to literature. The implementation results reveal that the proposed QRD takes shorter latency compared to literature. The power consumption of 2x2 real channel matrix and 2x2 complex channel matrix was found to be 12mW and 44mW respectively on the state-of-the-art Xilinx Virtex 5 FPGA

    APPLICATION SPECIFIC PRECISION ANALYSIS OF CHOLESKY DECOMPOSITION IN MMSE MIMO RECEIVER SYSTEMS

    Get PDF
    We conduct an exploration study of various bit precisions for Cholesky decomposition. This research focuses on obtaining the minimum required signal to noise ratio (SNR) in Cholesky decomposition by reducing the internal precision of the computation. Primary goal of this research is to minimize resources and reduce power by performing calculations at a lower internal precision than the full 32-bit fixed or floating point. Cholesky decomposition is a key component in minimum mean square error (MMSE) multiple-input multiple-output (MIMO) receiver systems. It is used to calculate inverse of a matrix in many modern wireless systems. Cholesky decomposition is a very computation heavy process. We have investigated the effects of internal bit precisions in Cholesky decomposition. This is an exploration study to provide a benchmark for system designers to help decide on the internal precision of their system given SNRline, signal and noise variances, required output SNR and symbol error rate. Using pseudo floating point to control internal bit precision we have simulated Cholesky decomposition at various internal bit precisions with variable signal and noise variances, and SNRline values. These simulations have provided SNR for lower triangular matrix L, its inverse L-1, and the solution vector x (from the matrix equation Ax = b). In order to observe the effects of various bit precisions on SNR and symbol error probability, SNR in L and L-1 are plotted against condition number for 2x2, 4x4, 8x8, and 16x16 input matrices, and loss in symbol error probability (Psym) is plotted against condition number for 4x4 matrices for QPSK, 16QAM and 64QAM constellations. We find that as the internal precision is lowered there is a loss in SNR for L and L-1 matrices. It is further observed that loss in symbol error rate is negligible for internal bit precisions of 28 bits and 24 bits in all constellations. The loss in symbol error rate begins to show at 20 bits of precision and then increases drastically, especially for higher SNRline. These results provide an excellent resource for system designers. With these benchmarks, designers can decide on the internal precision of their systems according to their specifications

    Doctor of Philosophy

    Get PDF
    dissertationThe continuous growth of wireless communication use has largely exhausted the limited spectrum available. Methods to improve spectral efficiency are in high demand and will continue to be for the foreseeable future. Several technologies have the potential to make large improvements to spectral efficiency and the total capacity of networks including massive multiple-input multiple-output (MIMO), cognitive radio, and spatial-multiplexing MIMO. Of these, spatial-multiplexing MIMO has the largest near-term potential as it has already been adopted in the WiFi, WiMAX, and LTE standards. Although transmitting independent MIMO streams is cheap and easy, with a mere linear increase in cost with streams, receiving MIMO is difficult since the optimal methods have exponentially increasing cost and power consumption. Suboptimal MIMO detectors such as K-Best have a drastically reduced complexity compared to optimal methods but still have an undesirable exponentially increasing cost with data-rate. The Markov Chain Monte Carlo (MCMC) detector has been proposed as a near-optimal method with polynomial cost, but it has a history of unusual performance issues which have hindered its adoption. In this dissertation, we introduce a revised derivation of the bitwise MCMC MIMO detector. The new approach resolves the previously reported high SNR stalling problem of MCMC without the need for hybridization with another detector method or adding heuristic temperature scaling terms. Another common problem with MCMC algorithms is an unknown convergence time making predictable fixed-length implementations problematic. When an insufficient number of iterations is used on a slowly converging example, the output LLRs can be unstable and overconfident, therefore, we develop a method to identify rare, slowly converging runs and mitigate their degrading effects on the soft-output information. This improves forward-error-correcting code performance and removes a symptomatic error floor in bit-error-rates. Next, pseudo-convergence is identified with a novel way to visualize the internal behavior of the Gibbs sampler. An effective and efficient pseudo-convergence detection and escape strategy is suggested. Finally, the new excited MCMC (X-MCMC) detector is shown to have near maximum-a-posteriori (MAP) performance even with challenging, realistic, highly-correlated channels at the maximum MIMO sizes and modulation rates supported by the 802.11ac WiFi specification, 8x8 256 QAM. Further, the new excited MCMC (X-MCMC) detector is demonstrated on an 8-antenna MIMO testbed with the 802.11ac WiFi protocol, confirming its high performance. Finally, a VLSI implementation of the X-MCMC detector is presented which retains the near-optimal performance of the floating-point algorithm while having one of the lowest complexities found in the near-optimal MIMO detector literature

    Energy Efficient VLSI Circuits for MIMO-WLAN

    Get PDF
    Mobile communication - anytime, anywhere access to data and communication services - has been continuously increasing since the operation of the first wireless communication link by Guglielmo Marconi. The demand for higher data rates, despite the limited bandwidth, led to the development of multiple-input multiple-output (MIMO) communication which is often combined with orthogonal frequency division multiplexing (OFDM). Together, these two techniques achieve a high bandwidth efficiency. Unfortunately, techniques such as MIMO-OFDM significantly increase the signal processing complexity of transceivers. While fast improvements in the integrated circuit (IC) technology enabled to implement more signal processing complexity per chip, large efforts had and have to be done for novel algorithms as well as for efficient very large scaled integration (VLSI) architectures in order to meet today's and tomorrow's requirements for mobile wireless communication systems. In this thesis, we will present architectures and VLSI implementations of complete physical (PHY) layer application specific integrated circuits (ASICs) under the constraints imposed by an industrial wireless communication standard. Contrary to many other publications, we do not elaborate individual components of a MIMO-OFDM communication system stand-alone, but in the context of the complete PHY layer ASIC. We will investigate the performance of several MIMO detectors and the corresponding preprocessing circuits, being integrated into the entire PHY layer ASIC, in terms of achievable error-rate, power consumption, and area requirement. Finally, we will assemble the results from the proposed PHY layer implementations in order to enhance the energy efficiency of a transceiver. To this end, we propose a cross-layer optimization of PHY layer and medium access control (MAC) layer

    A low-complexity MIMO subspace detection algorithm

    Get PDF

    Algorithm Development and VLSI Implementation of Energy Efficient Decoders of Polar Codes

    Get PDF
    With its low error-floor performance, polar codes attract significant attention as the potential standard error correction code (ECC) for future communication and data storage. However, the VLSI implementation complexity of polar codes decoders is largely influenced by its nature of in-series decoding. This dissertation is dedicated to presenting optimal decoder architectures for polar codes. This dissertation addresses several structural properties of polar codes and key properties of decoding algorithms that are not dealt with in the prior researches. The underlying concept of the proposed architectures is a paradigm that simplifies and schedules the computations such that hardware is simplified, latency is minimized and bandwidth is maximized. In pursuit of the above, throughput centric successive cancellation (TCSC) and overlapping path list successive cancellation (OPLSC) VLSI architectures and express journey BP (XJBP) decoders for the polar codes are presented. An arbitrary polar code can be decomposed by a set of shorter polar codes with special characteristics, those shorter polar codes are referred to as constituent polar codes. By exploiting the homogeneousness between decoding processes of different constituent polar codes, TCSC reduces the decoding latency of the SC decoder by 60% for codes with length n = 1024. The error correction performance of SC decoding is inferior to that of list successive cancellation decoding. The LSC decoding algorithm delivers the most reliable decoding results; however, it consumes most hardware resources and decoding cycles. Instead of using multiple instances of decoding cores in the LSC decoders, a single SC decoder is used in the OPLSC architecture. The computations of each path in the LSC are arranged to occupy the decoder hardware stages serially in a streamlined fashion. This yields a significant reduction of hardware complexity. The OPLSC decoder has achieved about 1.4 times hardware efficiency improvement compared with traditional LSC decoders. The hardware efficient VLSI architectures for TCSC and OPLSC polar codes decoders are also introduced. Decoders based on SC or LSC algorithms suffer from high latency and limited throughput due to their serial decoding natures. An alternative approach to decode the polar codes is belief propagation (BP) based algorithm. In BP algorithm, a graph is set up to guide the beliefs propagated and refined, which is usually referred to as factor graph. BP decoding algorithm allows decoding in parallel to achieve much higher throughput. XJBP decoder facilitates belief propagation by utilizing the specific constituent codes that exist in the conventional factor graph, which results in an express journey (XJ) decoder. Compared with the conventional BP decoding algorithm for polar codes, the proposed decoder reduces the computational complexity by about 40.6%. This enables an energy-efficient hardware implementation. To further explore the hardware consumption of the proposed XJBP decoder, the computations scheduling is modeled and analyzed in this dissertation. With discussions on different hardware scenarios, the optimal scheduling plans are developed. A novel memory-distributed micro-architecture of the XJBP decoder is proposed and analyzed to solve the potential memory access problems of the proposed scheduling strategy. The register-transfer level (RTL) models of the XJBP decoder are set up for comparisons with other state-of-the-art BP decoders. The results show that the power efficiency of BP decoders is improved by about 3 times

    Algorithm Development and VLSI Implementation of Energy Efficient Decoders of Polar Codes

    Get PDF
    With its low error-floor performance, polar codes attract significant attention as the potential standard error correction code (ECC) for future communication and data storage. However, the VLSI implementation complexity of polar codes decoders is largely influenced by its nature of in-series decoding. This dissertation is dedicated to presenting optimal decoder architectures for polar codes. This dissertation addresses several structural properties of polar codes and key properties of decoding algorithms that are not dealt with in the prior researches. The underlying concept of the proposed architectures is a paradigm that simplifies and schedules the computations such that hardware is simplified, latency is minimized and bandwidth is maximized. In pursuit of the above, throughput centric successive cancellation (TCSC) and overlapping path list successive cancellation (OPLSC) VLSI architectures and express journey BP (XJBP) decoders for the polar codes are presented. An arbitrary polar code can be decomposed by a set of shorter polar codes with special characteristics, those shorter polar codes are referred to as constituent polar codes. By exploiting the homogeneousness between decoding processes of different constituent polar codes, TCSC reduces the decoding latency of the SC decoder by 60% for codes with length n = 1024. The error correction performance of SC decoding is inferior to that of list successive cancellation decoding. The LSC decoding algorithm delivers the most reliable decoding results; however, it consumes most hardware resources and decoding cycles. Instead of using multiple instances of decoding cores in the LSC decoders, a single SC decoder is used in the OPLSC architecture. The computations of each path in the LSC are arranged to occupy the decoder hardware stages serially in a streamlined fashion. This yields a significant reduction of hardware complexity. The OPLSC decoder has achieved about 1.4 times hardware efficiency improvement compared with traditional LSC decoders. The hardware efficient VLSI architectures for TCSC and OPLSC polar codes decoders are also introduced. Decoders based on SC or LSC algorithms suffer from high latency and limited throughput due to their serial decoding natures. An alternative approach to decode the polar codes is belief propagation (BP) based algorithm. In BP algorithm, a graph is set up to guide the beliefs propagated and refined, which is usually referred to as factor graph. BP decoding algorithm allows decoding in parallel to achieve much higher throughput. XJBP decoder facilitates belief propagation by utilizing the specific constituent codes that exist in the conventional factor graph, which results in an express journey (XJ) decoder. Compared with the conventional BP decoding algorithm for polar codes, the proposed decoder reduces the computational complexity by about 40.6%. This enables an energy-efficient hardware implementation. To further explore the hardware consumption of the proposed XJBP decoder, the computations scheduling is modeled and analyzed in this dissertation. With discussions on different hardware scenarios, the optimal scheduling plans are developed. A novel memory-distributed micro-architecture of the XJBP decoder is proposed and analyzed to solve the potential memory access problems of the proposed scheduling strategy. The register-transfer level (RTL) models of the XJBP decoder are set up for comparisons with other state-of-the-art BP decoders. The results show that the power efficiency of BP decoders is improved by about 3 times

    Baseband Processing for 5G and Beyond: Algorithms, VLSI Architectures, and Co-design

    Get PDF
    In recent years the number of connected devices and the demand for high data-rates have been significantly increased. This enormous growth is more pronounced by the introduction of the Internet of things (IoT) in which several devices are interconnected to exchange data for various applications like smart homes and smart cities. Moreover, new applications such as eHealth, autonomous vehicles, and connected ambulances set new demands on the reliability, latency, and data-rate of wireless communication systems, pushing forward technology developments. Massive multiple-input multiple-output (MIMO) is a technology, which is employed in the 5G standard, offering the benefits to fulfill these requirements. In massive MIMO systems, base station (BS) is equipped with a very large number of antennas, serving several users equipments (UEs) simultaneously in the same time and frequency resource. The high spatial multiplexing in massive MIMO systems, improves the data rate, energy and spectral efficiencies as well as the link reliability of wireless communication systems. The link reliability can be further improved by employing channel coding technique. Spatially coupled serially concatenated codes (SC-SCCs) are promising channel coding schemes, which can meet the high-reliability demands of wireless communication systems beyond 5G (B5G). Given the close-to-capacity error correction performance and the potential to implement a high-throughput decoder, this class of code can be a good candidate for wireless systems B5G. In order to achieve the above-mentioned advantages, sophisticated algorithms are required, which impose challenges on the baseband signal processing. In case of massive MIMO systems, the processing is much more computationally intensive and the size of required memory to store channel data is increased significantly compared to conventional MIMO systems, which are due to the large size of the channel state information (CSI) matrix. In addition to the high computational complexity, meeting latency requirements is also crucial. Similarly, the decoding-performance gain of SC-SCCs also do come at the expense of increased implementation complexity. Moreover, selecting the proper choice of design parameters, decoding algorithm, and architecture will be challenging, since spatial coupling provides new degrees of freedom in code design, and therefore the design space becomes huge. The focus of this thesis is to perform co-optimization in different design levels to address the aforementioned challenges/requirements. To this end, we employ system-level characteristics to develop efficient algorithms and architectures for the following functional blocks of digital baseband processing. First, we present a fast Fourier transform (FFT), an inverse FFT (IFFT), and corresponding reordering scheme, which can significantly reduce the latency of orthogonal frequency-division multiplexing (OFDM) demodulation and modulation as well as the size of reordering memory. The corresponding VLSI architectures along with the application specific integrated circuit (ASIC) implementation results in a 28 nm CMOS technology are introduced. In case of a 2048-point FFT/IFFT, the proposed design leads to 42% reduction in the latency and size of reordering memory. Second, we propose a low-complexity massive MIMO detection scheme. The key idea is to exploit channel sparsity to reduce the size of CSI matrix and eventually perform linear detection followed by a non-linear post-processing in angular domain using the compressed CSI matrix. The VLSI architecture for a massive MIMO with 128 BS antennas and 16 UEs along with the synthesis results in a 28 nm technology are presented. As a result, the proposed scheme reduces the complexity and required memory by 35%–73% compared to traditional detectors while it has better detection performance. Finally, we perform a comprehensive design space exploration for the SC-SCCs to investigate the effect of different design parameters on decoding performance, latency, complexity, and hardware cost. Then, we develop different decoding algorithms for the SC-SCCs and discuss the associated decoding performance and complexity. Also, several high-level VLSI architectures along with the corresponding synthesis results in a 12 nm process are presented, and various design tradeoffs are provided for these decoding schemes

    Scalable System Design for Covert MIMO Communications

    Get PDF
    In modern communication systems, bandwidth is a limited commodity. Bandwidth efficient systems are needed to meet the demands of the ever-increasing amount of data that users share. Of particular interest is the U.S. Military, where high-resolution pictures and video are used and shared. In these environments, covert communications are necessary while still providing high data rates. The promise of multi-antenna systems providing higher data rates has been shown on a small scale, but limitations in hardware prevent large systems from being implemented
    corecore