13 research outputs found

    LOW-COMPLEXITY AND HIGH-PERFORMANCE SOFT MIMO DETECTION BASED ON DISTRIBUTED M-ALGORITHM THROUGH TRELLIS-DIAGRAM

    Get PDF
    This paper presents a novel low-complexity multiple-input multipleoutput (MIMO) detection scheme using a distributed M-algorithm (DM) to achieve high performance soft MIMO detection. To reduce the searching complexity, we build a MIMO trellis graph and split the searching operations among different nodes, where each node will apply the M-algorithm. Instead of keeping a global candidate list as the traditional detector does, this algorithm keeps multiple small candidate lists to generate soft information. Since the DM algorithm can achieve good BER performance with a small M, the sorting cost of the DM algorithm is lower than that of the conventional K-best MIMO algorithm. The proposed algorithm is very suitable for high speed parallel processing.NokiaNokia Siemens Networks (NSN)XilinxNational Science Foundatio

    High-Throughput Soft-Output MIMO Detector Based on Path-Preserving Trellis-Search Algorithm

    Get PDF
    In this paper, we propose a novel path-preserving trellis-search (PPTS) algorithm and its high-speed VLSI architecture for soft-output multiple-input-multiple-output (MIMO) detection. We represent the search space of the MIMO signal with an unconstrained trellis, where each node in stage of the trellis maps to a possible complex-valued symbol transmitted by antenna. Based on the trellis model, we convert the soft-output MIMO detection problem into a multiple shortest paths problem subject to the constraint that every trellis node must be covered in this set of paths. The PPTS detector is guaranteed to have soft information for every possible symbol transmitted on every antenna so that the log-likelihood ratio (LLR) for each transmitted data bit can be more accurately formed. Simulation results show that the PPTS algorithm can achieve near-optimal error performance with a low search complexity. The PPTS algorithm is a hardware-friendly data-parallel algorithm because the search operations are evenly distributed among multiple trellis nodes for parallel processing. As a case study, we have designed and synthesized a fully-parallel systolic-array detector and two folded detectors for a 4x4 16-QAM system using a 1.08 V TSMC 65-nm CMOS technology.With a 1.18 mm2 core area, the folded detector can achieve a throughput of 2.1 Gbps.With a 3.19 mm2 core area, the fully-parallel systolic-array detector can achieve a throughput of 6.4 Gbps

    Implementation of a High Throughput Soft MIMO Detector on GPU

    Get PDF
    Multiple-input multiple-output (MIMO) significantly increases the throughput of a communication system by employing multiple antennas at the transmitter and the receiver. To extract maximum performance from a MIMO system, a computationally intensive search based detector is needed. To meet the challenge of MIMO detection, typical suboptimal MIMO detectors are ASIC or FPGA designs. We aim to show that a MIMO detector on Graphic processor unit (GPU), a low-cost parallel programmable co-processor, can achieve high throughput and can serve as an alternative to ASIC/FPGA designs. However, careful architecture aware software design is needed to leverage the performance offered by GPU. We propose a novel soft MIMO detection algorithm, multi-pass trellis traversal (MTT), and show that we can achieve ASIC/FPGA-like performance and handle different configurations in software on GPU. The proposed design can be used to accelerate wireless physical layer simulations and to offload MIMO detection processing in wireless testbed platforms.NokiaNokia Siemens Networks (NSN)Texas InstrumentsXilinxNational Science Foundatio

    FlexCore: Massively Parallel and Flexible Processing for Large MIMO Access Points

    Get PDF
    Large MIMO base stations remain among wireless network designers’ best tools for increasing wireless throughput while serving many clients, but current system designs, sacrifice throughput with simple linear MIMO detection algorithms. Higher-performance detection techniques are known, but remain off the table because these systems parallelize their computation at the level of a whole OFDM subcarrier, sufficing only for the less demanding linear detection approaches they opt for. This paper presents FlexCore, the first computational architecture capable of parallelizing the detection of large numbers of mutually-interfering information streams at a granularity below individual OFDM subcarriers, in a nearly-embarrassingly parallel manner while utilizing any number of available processing elements. For 12 clients sending 64-QAM symbols to a 12-antenna base station, our WARP testbed evaluation shows similar network throughput to the state-of-the-art while using an order of magnitude fewer processing elements. For the same scenario, our combined WARP-GPU testbed evaluation demonstrates a 19x computational speedup, with 97% increased energy efficiency when compared with the state of the art. Finally, for the same scenario, an FPGA-based comparison between FlexCore and the state of the art shows that FlexCore can achieve up to 96% better energy efficiency, and can offer up to 32x the processing throughput

    Fully Pipelined Implementation of Tree-Search Algorithms for Vector Precoding

    Get PDF
    The nonlinear vector precoding (VP) technique has been proven to achieve close-to-capacity performance in multiuser multiple-input multiple-output (MIMO) downlink channels. The performance benefit with respect to its linear counterparts stems from the incorporation of a perturbation signal that reduces the power of the precoded signal. The computation of this perturbation element, which is known to belong in the class of NP-hard problems, is the main aspect that hinders the hardware implementation of VP systems. To this respect, several tree-search algorithms have been proposed for the closest-point lattice search problem in VP systems hitherto. Nevertheless, the optimality of these algorithms has been assessed mainly in terms of error-rate performance and computational complexity, leaving the hardware cost of their implementation an open issue. The parallel data-processing capabilities of field-programmable gate arrays (FPGA) and the loopless nature of the proposed tree-search algorithms have enabled an efficient hardware implementation of a VP system that provides a very high data-processing throughput

    On the application of graphics processor to wireless receiver design

    Get PDF
    In many wireless systems, a Turbo decoder is often combined with a soft-output multiple-input and multiple-output (MIMO) detector at the receiver to maximize performance in many 4G and beyond wireless standards. Although custom application specific designs are usually used to meet this challenge, programmable graphics processing units (GPU) has become an alternative to the traditional ASIC and FPGA solution for wireless applications. However, careful architecture-aware algorithm design and mapping are required to maximize performance of a communication block on GPU. For MIMO soft detection, we implemented a new MIMO soft detection algorithm, multi-pass trellis traversal (MTT). For Turbo decoding, we used a parallel window algorithm. We showed that our implementations can achieve high throughput while maintaining good performance. This work will allow us to implement a complete iterative MIMO receiver in software on GPU in the future

    An Iterative Soft Decision Based LR-Aided MIMO Detector

    Get PDF
    The demand for wireless and high-rate communication system is increasing gradually and multiple-input-multiple-output (MIMO) is one of the feasible solutions to accommodate the growing demand for its spatial multiplexing and diversity gain. However, with high number of antennas, the computational and hardware complexity of MIMO increases exponentially. This accumulating complexity is a paramount problem in MIMO detection system directly leading to large power consumption. Hence, the major focus of this dissertation is algorithmic and hardware development of MIMO decoder with reduced complexity for both real and complex domain, which can be a beneficial solution with power efficiency and high throughput. Both hard and soft domain MIMO detectors are considered. The use of lattice reduction (LR) algorithm and on-demand-child-expansion for the reduction of noise propagation and node calculation respectively are the two of the key features of our developed architecture, presented in this literature. The real domain iterative soft MIMO decoding algorithm, simulated for 4 × 4 MIMO with different modulation scheme, achieves 1.1 to 2.7 dB improvement over Lease Sphere Decoder (LSD) and more than 8x reduction in list size, K as well as complexity of the detector. Next, the iterative real domain K-Best decoder is expanded to the complex domain with new detection scheme. It attains 6.9 to 8.0 dB improvement over real domain K-Best decoder and 1.4 to 2.5 dB better performance over conventional complex decoder for 8 × 8 MIMO with 64 QAM modulation scheme. Besides K, a new adjustable parameter, Rlimit has been introduced in order to append re-configurability trading-off between complexity and performance. After that, a novel low-power hardware architecture of complex decoder is developed for 8 × 8 MIMO and 64 QAM modulation scheme. The total word length of only 16 bits has been adopted limiting the bit error rate (BER) degradation to 0.3 dB with K and Rlimit equal to 4. The proposed VLSI architecture is modeled in Verilog HDL using Xilinx and synthesized using Synopsys Design Vision in 45 nm CMOS technology. According to the synthesize result, it achieves 1090.8 Mbps throughput with power consumption of 580 mW and latency of 0.33 us. The maximum frequency the design proposed is 181.8 MHz. All of the proposed decoders mentioned above are bounded by the fixed K. Hence, an adaptive real domain K-Best decoder is further developed to achieve the similar performance with less K, thereby reducing the computational complexity of the decoder. It does not require accurate SNR measurement to perform the initial estimation of list size, K. Instead, the difference between the first two minimal distances is considered, which inherently eliminates complexity. In summary, a novel iterative K-Best detector for both real and complex domain with efficient VLSI design is proposed in this dissertation. The results from extensive simulation and VHDL with analysis using Synopsys tool are also presented for justification and validation of the proposed works

    An Iterative Soft Decision Based LR-Aided MIMO Detector

    Get PDF
    The demand for wireless and high-rate communication system is increasing gradually and multiple-input-multiple-output (MIMO) is one of the feasible solutions to accommodate the growing demand for its spatial multiplexing and diversity gain. However, with high number of antennas, the computational and hardware complexity of MIMO increases exponentially. This accumulating complexity is a paramount problem in MIMO detection system directly leading to large power consumption. Hence, the major focus of this dissertation is algorithmic and hardware development of MIMO decoder with reduced complexity for both real and complex domain, which can be a beneficial solution with power efficiency and high throughput. Both hard and soft domain MIMO detectors are considered. The use of lattice reduction (LR) algorithm and on-demand-child-expansion for the reduction of noise propagation and node calculation respectively are the two of the key features of our developed architecture, presented in this literature. The real domain iterative soft MIMO decoding algorithm, simulated for 4 × 4 MIMO with different modulation scheme, achieves 1.1 to 2.7 dB improvement over Lease Sphere Decoder (LSD) and more than 8x reduction in list size, K as well as complexity of the detector. Next, the iterative real domain K-Best decoder is expanded to the complex domain with new detection scheme. It attains 6.9 to 8.0 dB improvement over real domain K-Best decoder and 1.4 to 2.5 dB better performance over conventional complex decoder for 8 × 8 MIMO with 64 QAM modulation scheme. Besides K, a new adjustable parameter, Rlimit has been introduced in order to append re-configurability trading-off between complexity and performance. After that, a novel low-power hardware architecture of complex decoder is developed for 8 × 8 MIMO and 64 QAM modulation scheme. The total word length of only 16 bits has been adopted limiting the bit error rate (BER) degradation to 0.3 dB with K and Rlimit equal to 4. The proposed VLSI architecture is modeled in Verilog HDL using Xilinx and synthesized using Synopsys Design Vision in 45 nm CMOS technology. According to the synthesize result, it achieves 1090.8 Mbps throughput with power consumption of 580 mW and latency of 0.33 us. The maximum frequency the design proposed is 181.8 MHz. All of the proposed decoders mentioned above are bounded by the fixed K. Hence, an adaptive real domain K-Best decoder is further developed to achieve the similar performance with less K, thereby reducing the computational complexity of the decoder. It does not require accurate SNR measurement to perform the initial estimation of list size, K. Instead, the difference between the first two minimal distances is considered, which inherently eliminates complexity. In summary, a novel iterative K-Best detector for both real and complex domain with efficient VLSI design is proposed in this dissertation. The results from extensive simulation and VHDL with analysis using Synopsys tool are also presented for justification and validation of the proposed works
    corecore