93 research outputs found

    High-Throughput Soft-Output MIMO Detector Based on Path-Preserving Trellis-Search Algorithm

    Get PDF
    In this paper, we propose a novel path-preserving trellis-search (PPTS) algorithm and its high-speed VLSI architecture for soft-output multiple-input-multiple-output (MIMO) detection. We represent the search space of the MIMO signal with an unconstrained trellis, where each node in stage of the trellis maps to a possible complex-valued symbol transmitted by antenna. Based on the trellis model, we convert the soft-output MIMO detection problem into a multiple shortest paths problem subject to the constraint that every trellis node must be covered in this set of paths. The PPTS detector is guaranteed to have soft information for every possible symbol transmitted on every antenna so that the log-likelihood ratio (LLR) for each transmitted data bit can be more accurately formed. Simulation results show that the PPTS algorithm can achieve near-optimal error performance with a low search complexity. The PPTS algorithm is a hardware-friendly data-parallel algorithm because the search operations are evenly distributed among multiple trellis nodes for parallel processing. As a case study, we have designed and synthesized a fully-parallel systolic-array detector and two folded detectors for a 4x4 16-QAM system using a 1.08 V TSMC 65-nm CMOS technology.With a 1.18 mm2 core area, the folded detector can achieve a throughput of 2.1 Gbps.With a 3.19 mm2 core area, the fully-parallel systolic-array detector can achieve a throughput of 6.4 Gbps

    VLSI Implementation of Low Power Reconfigurable MIMO Detector

    Get PDF
    Multiple Input Multiple Output (MIMO) systems are a key technology for next generation high speed wireless communication standards like 802.11n, WiMax etc. MIMO enables spatial multiplexing to increase channel bandwidth which requires the use of multiple antennas in the receiver and transmitter side. The increase in bandwidth comes at the cost of high silicon complexity of MIMO detectors which result, due to the intricate algorithms required for the separation of these spatially multiplexed streams. Previous implementations of MIMO detector have mainly dealt with the issue of complexity reduction, latency minimization and throughput enhancement. Although, these detectors have successfully mapped algorithms to relatively simpler circuits but still, latency and throughput of these systems need further improvements to meet standard requirements. Additionally, most of these implementations don’t deal with the requirements of reconfigurability of the detector to multiple modulation schemes and different antennae configurations. This necessary requirement provides another dimension to the implementation of MIMO detector and adds to the implementation complexity. This thesis focuses on the efficient VLSI implementation of the MIMO detector with an emphasis on performance and re-configurability to different modulation schemes. MIMO decoding in our detector is based on the fixed sphere decoding algorithm which has been simplified for an effective VLSI implementation without considerably degrading the near optimal bit error rate performance. The regularity of the architecture makes it suitable for a highly parallel and pipelined implementation. The decoder has intrinsic traits for dynamic re-configurability to different modulation and encoding schemes. This detector architecture can be easily tuned for high/low performance requirements with slight degradation/improvement in Bit Error Rate (BER) depending on needs of the overlying application. Additionally, various architectural optimizations like pipelining, parallel processing, hardware scheduling, dynamic voltage and frequency scaling have been explored to improve the performance, energy requirements and re-configurability of the design

    FlexCore: Massively Parallel and Flexible Processing for Large MIMO Access Points

    Get PDF
    Large MIMO base stations remain among wireless network designers’ best tools for increasing wireless throughput while serving many clients, but current system designs, sacrifice throughput with simple linear MIMO detection algorithms. Higher-performance detection techniques are known, but remain off the table because these systems parallelize their computation at the level of a whole OFDM subcarrier, sufficing only for the less demanding linear detection approaches they opt for. This paper presents FlexCore, the first computational architecture capable of parallelizing the detection of large numbers of mutually-interfering information streams at a granularity below individual OFDM subcarriers, in a nearly-embarrassingly parallel manner while utilizing any number of available processing elements. For 12 clients sending 64-QAM symbols to a 12-antenna base station, our WARP testbed evaluation shows similar network throughput to the state-of-the-art while using an order of magnitude fewer processing elements. For the same scenario, our combined WARP-GPU testbed evaluation demonstrates a 19x computational speedup, with 97% increased energy efficiency when compared with the state of the art. Finally, for the same scenario, an FPGA-based comparison between FlexCore and the state of the art shows that FlexCore can achieve up to 96% better energy efficiency, and can offer up to 32x the processing throughput

    ASIC Implementation Comparison of SIC and LSD Receivers for MIMO-OFDM

    Get PDF
    MIMO-OFDM receivers with horizontal encoding are considered in this paper. The successive interference cancellation (SIC) algorithm is compared to the K-best list sphere detector (LSD). A modification to the K-best LSD algorithm is introduced. The SIC and K-best LSD receivers are designed for a 2 x 2 antenna system with 64-quadrature amplitude modulation (QAM). The ASIC implementation results for both architectures are presented. The K-best LSD outperforms the SIC receiver in bad channel conditions but the SIC receiver performs better in channels with less correlated MIMO streams. The latency of the K-best LSD is large due to the high modulation order and list size. The throughput of the SIC receiver is more than 6 times higher than that of the K-best LSD.TekesFinnish Funding Agency for Technology and InnovationNokiaTexas InstrumentsNokia Siemens Networks (NSN)Elekrobi

    Fully Pipelined Implementation of Tree-Search Algorithms for Vector Precoding

    Get PDF
    The nonlinear vector precoding (VP) technique has been proven to achieve close-to-capacity performance in multiuser multiple-input multiple-output (MIMO) downlink channels. The performance benefit with respect to its linear counterparts stems from the incorporation of a perturbation signal that reduces the power of the precoded signal. The computation of this perturbation element, which is known to belong in the class of NP-hard problems, is the main aspect that hinders the hardware implementation of VP systems. To this respect, several tree-search algorithms have been proposed for the closest-point lattice search problem in VP systems hitherto. Nevertheless, the optimality of these algorithms has been assessed mainly in terms of error-rate performance and computational complexity, leaving the hardware cost of their implementation an open issue. The parallel data-processing capabilities of field-programmable gate arrays (FPGA) and the loopless nature of the proposed tree-search algorithms have enabled an efficient hardware implementation of a VP system that provides a very high data-processing throughput

    A High-Speed QR Decomposition Processor for Carrier-Aggregated LTE-A Downlink Systems

    Get PDF
    This paper presents a high-speed QR decomposition (QRD) processor targeting the carrier-aggregated 4 × 4 Long Term Evolution-Advanced (LTE-A) receiver. The processor provides robustness in spatially correlated channels with reduced complexity by using modifications to the Householder transform, such as decomposing-target redefinition and matrix real-valued decomposition. In terms of hardware design, we extensively explore flexibilities in systolic architectures using a high-level synthesis tool to achieve area-power efficiency. In a 65 nm CMOS technology, the processor occupies a core area of 0.77mm2 and produces 72MQRD per second, the highest reported throughput. The power consumed in the proposed processor is 219mW

    Partial Detection for Multiple Antenna Cooperation

    Get PDF
    Multi-antenna relays can significantly increase the speed and reliability of wireless systems. However, because of the complexity of MIMO detection, there is considerable overhead in implementing a MIMO relay if the conventional detect-and-forward strategy is used. To address this challenge, we propose a novel cooperative partial detection (CPO) strategy that partitions the detection task between the relay and the destination. CPO leverages the structure of the tree-based c1ose-toML MIMO detectors, and modifies the tree traversal so that instead of visiting all the levels of the tree, only a subset of the levels, thus a subset of the transmitted streams, are visited. This novel approach reduces the tree levels, i.e. dimensions, in both the relay and the destination. Moreover, CPO provides a flexible method to control the level of partitioning between the relay and the destination, and thus, adjust the detection computational complexity in the relay and the destination. Monte-Carlo simulation results show that, under equal transmit power and complexity constraint in the destination, CPO achieves a better BER performance compared to the non-relay scenario, with limited computational overhead in the relay.NokiaNational Science Foundatio

    A Study on High Performance Gbps MIMO Wireless System

    Get PDF
    九州工業大学博士学位論文 学位記番号:情工博甲第294号 学位授与年月日:平成26年12月25日1 Introduction||2 Wireless System Overview||3 RC4 Encryption Architectures||4 MIMO Detection Algorithm and Architecture||5 LDPC Decoder Architecture||6 Conclusion and Future Wor