63 research outputs found

    VLSI Implementation of a Soft-Output Signal Detector for Multi-Mode Adaptive MIMO Systems

    Get PDF
    This paper presents a multimode soft-output multiple-input multiple-output (MIMO) signal detector that is efficient in hardware cost and energy consumption. The detector is capable of dealing with spatial-multiplexing (SM),break space-division-multiple-access (SDMA), and spatial-diversity (SD) signals of 4 ✕ 4 antenna and 64-QAM modulation. Implementation-friendly algorithms, which reuse most of the mathematical operations in these three MIMO modes, are proposed to provide accurate soft detection information, i.e., log-likelihood ratio, with much reduced complexity. A unified reconfigurable VLSI architecture has been developed to eliminate the implementation of multiple detector modules. In addition, several block level technologies, such as parallel metric update and fast bit-flipping, are adopted to enable a more efficient design. To evaluate the proposed techniques, we implemented the triple-mode MIMO detector in a 65-nm CMOS technology. The core area is 0.25 mm2 with 83.7 K gates. The maximum detecting throughput is 1 Gb/s at 167-MHz clock frequency and 1.2-V supply, which archives the data rate envisioned by the emerging long-term evolution advanced standard. Under frequency-selective channels, the detector consumes 59.3-, 10.5-, and 169.6-pJ energy per bit detection in SM, SD, and SDMA modes, respectively

    High-Throughput Soft-Output MIMO Detector Based on Path-Preserving Trellis-Search Algorithm

    Get PDF
    In this paper, we propose a novel path-preserving trellis-search (PPTS) algorithm and its high-speed VLSI architecture for soft-output multiple-input-multiple-output (MIMO) detection. We represent the search space of the MIMO signal with an unconstrained trellis, where each node in stage of the trellis maps to a possible complex-valued symbol transmitted by antenna. Based on the trellis model, we convert the soft-output MIMO detection problem into a multiple shortest paths problem subject to the constraint that every trellis node must be covered in this set of paths. The PPTS detector is guaranteed to have soft information for every possible symbol transmitted on every antenna so that the log-likelihood ratio (LLR) for each transmitted data bit can be more accurately formed. Simulation results show that the PPTS algorithm can achieve near-optimal error performance with a low search complexity. The PPTS algorithm is a hardware-friendly data-parallel algorithm because the search operations are evenly distributed among multiple trellis nodes for parallel processing. As a case study, we have designed and synthesized a fully-parallel systolic-array detector and two folded detectors for a 4x4 16-QAM system using a 1.08 V TSMC 65-nm CMOS technology.With a 1.18 mm2 core area, the folded detector can achieve a throughput of 2.1 Gbps.With a 3.19 mm2 core area, the fully-parallel systolic-array detector can achieve a throughput of 6.4 Gbps

    Performance - Complexity Comparison of Receivers for a LTE MIMO–OFDM System

    Get PDF
    Implementation of receivers for spatial multiplexing multiple-input multiple-output (MIMO) orthogonal-frequency-division-multiplexing (OFDM) systems is considered. The linear minimum mean-square error (LMMSE) and the K-best list sphere detector (LSD) are compared to the iterative successive interference cancellation (SIC) detector and the iterative K-best LSD. The performance of the algorithms is evaluated in 3G long-term evolution (LTE) system. The SIC algorithm is found to perform worse than the K-best LSD when the MIMO channels are highly correlated, while the performance difference diminishes when the correlation decreases. The receivers are designed for 2X2 and 4X4 antenna systems and three different modulation schemes. Complexity results for FPGA and ASIC implementations are found. A modification to the K-best LSD which increases its detection rate is introduced. The ASIC receivers are designed to meet the decoding throughput requirements in LTE and the K-best LSD is found to be the most complex receiver although it gives the best reliable data transmission throughput. The SIC receiver has the best performance–complexity tradeoff in the 2X2 system but in the 4X4 case, the K-best LSD is the most efficient. A receiver architecture which could be reconfigured to using a simple or a more complex detector as the channel conditions change would achieve the best performance while consuming the least amount of power in the receiver

    Implementation of a 3GPP LTE Turbo Decoder Accelerator on GPU

    Get PDF
    This paper presents a 3GPP LTE compliant turbo decoder accelerator on GPU. The challenge of implementing a turbo decoder is finding an efficient mapping of the decoder algorithm on GPU, e.g. finding a good way to parallelize workload across cores and allocate and use fast on-die memory to improve throughput. In our implementation, we increase throughput through 1) distributing the decoding workload for a codeword across multiple cores, 2) decoding multiple codewords simultaneously to increase concurrency and 3) employing memory optimization techniques to reduce memory bandwidth requirements. In addition, we analyze how different MAP algorithm approximations affect both throughput and bit error rate (BER) performance of this decoder

    Implementation of a High Throughput 3GPP Turbo Decoder on GPU

    Get PDF
    Turbo code is a computationally intensive channel code that is widely used in current and upcoming wireless standards. General-purpose graphics processor unit (GPGPU) is a programmable commodity processor that achieves high performance computation power by using many simple cores. In this paper, we present a 3GPP LTE compliant Turbo decoder accelerator that takes advantage of the processing power of GPU to offer fast Turbo decoding throughput. Several techniques are used to improve the performance of the decoder. To fully utilize the computational resources on GPU, our decoder can decode multiple codewords simultaneously, divide the workload for a single codeword across multiple cores, and pack multiple codewords to fit the single instruction multiple data (SIMD) instruction width. In addition, we use shared memory judiciously to enable hundreds of concurrent multiple threads while keeping frequently used data local to keep memory access fast. To improve efficiency of the decoder in the high SNR regime, we also present a low complexity early termination scheme based on average extrinsic LLR statistics. Finally, we examine how different workload partitioning choices affect the error correction performance and the decoder throughput.Renesas MobileTexas InstrumentsXilinxNational Science Foundatio

    Efficient DSP and Circuit Architectures for Massive MIMO: State-of-the-Art and Future Directions

    Full text link
    Massive MIMO is a compelling wireless access concept that relies on the use of an excess number of base-station antennas, relative to the number of active terminals. This technology is a main component of 5G New Radio (NR) and addresses all important requirements of future wireless standards: a great capacity increase, the support of many simultaneous users, and improvement in energy efficiency. Massive MIMO requires the simultaneous processing of signals from many antenna chains, and computational operations on large matrices. The complexity of the digital processing has been viewed as a fundamental obstacle to the feasibility of Massive MIMO in the past. Recent advances on system-algorithm-hardware co-design have led to extremely energy-efficient implementations. These exploit opportunities in deeply-scaled silicon technologies and perform partly distributed processing to cope with the bottlenecks encountered in the interconnection of many signals. For example, prototype ASIC implementations have demonstrated zero-forcing precoding in real time at a 55 mW power consumption (20 MHz bandwidth, 128 antennas, multiplexing of 8 terminals). Coarse and even error-prone digital processing in the antenna paths permits a reduction of consumption with a factor of 2 to 5. This article summarizes the fundamental technical contributions to efficient digital signal processing for Massive MIMO. The opportunities and constraints on operating on low-complexity RF and analog hardware chains are clarified. It illustrates how terminals can benefit from improved energy efficiency. The status of technology and real-life prototypes discussed. Open challenges and directions for future research are suggested.Comment: submitted to IEEE transactions on signal processin

    Low complexity MIMO detection algorithms and implementations

    Get PDF
    University of Minnesota Ph.D. dissertation. December 2014. Major: Electrical Engineering. Advisor: Gerald E. Sobelman. 1 computer file (PDF); ix, 111 pages.MIMO techniques use multiple antennas at both the transmitter and receiver sides to achieve diversity gain, multiplexing gain, or both. One of the key challenges in exploiting the potential of MIMO systems is to design high-throughput, low-complexity detection algorithms while achieving near-optimal performance. In this thesis, we design and optimize algorithms for MIMO detection and investigate the associated performance and FPGA implementation aspects.First, we study and optimize a detection algorithm developed by Shabany and Gulak for a K-Best based high throughput and low energy hard output MIMO detection and expand it to the complex domain. The new method uses simple lookup tables, and it is fully scalable for a wide range of K-values and constellation sizes. This technique reduces the computational complexity, without sacrificing performance and the complexity scales only sub-linearly with the constellation size. Second, we apply the bidirectional technique to trellis search and propose a high performance soft output bidirectional path preserving trellis search (PPTS) detector for MIMO systems. The comparative error analysis between single direction and bidirectional PPTS detectors is given. We demonstrate that the bidirectional PPTS detector can minimize the detection error. Next, we design a novel bidirectional processing algorithm for soft-output MIMO systems. It combines features from several types of fixed complexity tree search procedures. The proposed approach achieves a higher performance than previously proposed algorithms and has a comparable computational cost. Moreover, its parallel nature and fixed throughput characteristics make it attractive for very large scale integration (VLSI) implementation.Following that, we present a novel low-complexity hard output MIMO detection algorithm for LTE and WiFi applications. We provide a well-defined tradeoff between computational complexity and performance. The proposed algorithm uses a much smaller number of Euclidean distance (ED) calculations while attaining only a 0.5dB loss compared to maximum likelihood detection (MLD). A 3x3 MIMO system with a 16QAM detector architecture is designed, and the latency and hardware costs are estimated.Finally, we present a stochastic computing implementation of trigonometric and hyperbolic functions which can be used for QR decomposition and other wireless communications and signal processing applications
    • …
    corecore