    A Scalable VLSI Architecture for Soft-Input Soft-Output Depth-First Sphere Decoding

    Multiple-input multiple-output (MIMO) wireless transmission imposes huge challenges on the design of efficient hardware architectures for iterative receivers. A major challenge is soft-input soft-output (SISO) MIMO demapping, often approached by sphere decoding (SD). In this paper, we introduce the - to our best knowledge - first VLSI architecture for SISO SD applying a single tree-search approach. Compared with a soft-output-only base architecture similar to the one proposed by Studer et al. in IEEE J-SAC 2008, the architectural modifications for soft input still allow a one-node-per-cycle execution. For a 4x4 16-QAM system, the area increases by 57% and the operating frequency degrades by 34% only.Comment: Accepted for IEEE Transactions on Circuits and Systems II Express Briefs, May 2010. This draft from April 2010 will not be updated any more. Please refer to IEEE Xplore for the final version. *) The final publication will appear with the modified title "A Scalable VLSI Architecture for Soft-Input Soft-Output Single Tree-Search Sphere Decoding

    Probabilistically Bounded Soft Sphere Detection for MIMO-OFDM Receivers: Algorithm and System Architecture

    Iterative soft detection and channel decoding for MIMO OFDM downlink receivers is studied in this work. Proposed inner soft sphere detection employs a variable upper bound for number of candidates per transmit antenna and utilizes the breath-first candidate-search algorithm. Upper bounds are based on probability distribution of the number of candidates found inside the spherical region formed around the received symbol-vector. Detection accuracy of unbounded breadth-first candidate search is preserved while significant reduction of the search latency and area cost is achieved. This probabilistically bounded candidate-search algorithm improves error-rate performance of non-probabilistically bounded soft sphere detection algorithms, while providing smaller detection latency with same hardware resources. Prototype architecture of soft sphere detector is synthesized on Xilinx FPGA and for an ASIC design. Using area-cost of a single soft sphere detector, a level of processing parallelism required to achieve targeted high data rates for future wireless systems (for example, 1 Gbps data rate) is determined.NokiaNational Science Foundatio

    Efficient VLSI Implementation of Soft-input Soft-output Fixed-complexity Sphere Decoder

    Fixed-complexity sphere decoder (FSD) is one of the most promising techniques for the implementation of multipleinput multiple-output (MIMO) detection, with relevant advantages in terms of constant throughput and high flexibility of parallel architecture. The reported works on FSD are mainly based on software level simulations and a few details have been provided on hardware implementation. The authors present the study based on a four-nodes-per-cycle parallel FSD architecture with several examples of VLSI implementation in 4 × 4 systems with both 16-quadrature amplitude modulation (QAM) and 64-QAM modulation and both real and complex signal models. The implementation aspects and details of the architecture are analysed in order to provide a variety of performance-complexity trade-offs. The authors also provide a parallel implementation of loglikelihood- ratio (LLR) generator with optimised algorithm to enhance the proposed FSD architecture to be a soft-input softoutput (SISO) MIMO detector. To the authors best knowledge, this is the first complete VLSI implementation of an FSD based SISO MIMO detector. The implementation results show that the proposed SISO FSD architecture is highly efficient and flexible, making it very suitable for real application

    Parallel Searching-Based Sphere Detector for MIMO Downlink OFDM Systems

    In this paper, implementation of a detector with parallel partial candidate-search algorithm is described. Two fully independent partial candidate search processes are simultaneously employed for two groups of transmit antennas based on QR decomposition (QRD) and QL decomposition (QLD) of a multiple-input multiple-output (MIMO) channel matrix. By using separate simultaneous candidate searching processes, the proposed implementation of QRD-QLD searching-based sphere detector provides a smaller latency and a lower computational complexity than the original QRD-M detector for similar error-rate performance in wireless communications systems employing four transmit and four receive antennas with 16-QAM or 64-QAM constellation size. It is shown that in coded MIMO orthogonal frequency division multiplexing (MIMO OFDM) systems, the detection latency and computational complexity of a receiver can be substantially reduced by using the proposed QRD-QLD detector implementation. The QRD-QLD-based sphere detector is also implemented using Field Programmable Gate Array (FPGA) and application specific integrated circuit (ASIC), and its hardware design complexity is compared with that of other sphere detectors reported in the literature.Nokia Renesas MobileTexas InstrumentsXilinxNational Science Foundatio

    A High Throughput Configurable SDR Detector for Multi-user MIMO Wireless Systems

    Spatial division multiplexing (SDM) in MIMO technology significantly increases the spectral efficiency, and hence capacity, of a wireless communication system: it is a core component of the next generation wireless systems, e.g. WiMAX, 3GPP LTE and other OFDM-based communication schemes. Moreover, spatial division multiple access (SDMA) is one of the widely used techniques for sharing the wireless medium between different mobile devices. Sphere detection is a prominent method of simplifying the detection complexity in both SDM and SDMA systems while maintaining BER performance comparable with the optimum maximum-likelihood (ML) detection. On the other hand, with different standards supporting different system parameters, it is crucial for both base station and handset devices to be configurable and seamlessly switch between different modes without the need for separate dedicated hardware units. This challenge emphasizes the need for SDR designs that target the handset devices. In this paper, we propose the architecture and FPGA realization of a configurable sort-free sphere detector, Flex-Sphere, that supports 4, 16, 64-QAM modulations as well as a combination of 2, 3 and 4 antenna/user configuration for handsets. The detector provides a data rate of up to 857.1 Mbps that fits well within the requirements of any of the next generation wireless standards. The algorithmic optimizations employed to produce an FPGA friendly realization are discussed.Xilinx Inc.National Science Foundatio

    High-Throughput Soft-Output MIMO Detector Based on Path-Preserving Trellis-Search Algorithm

    In this paper, we propose a novel path-preserving trellis-search (PPTS) algorithm and its high-speed VLSI architecture for soft-output multiple-input-multiple-output (MIMO) detection. We represent the search space of the MIMO signal with an unconstrained trellis, where each node in stage of the trellis maps to a possible complex-valued symbol transmitted by antenna. Based on the trellis model, we convert the soft-output MIMO detection problem into a multiple shortest paths problem subject to the constraint that every trellis node must be covered in this set of paths. The PPTS detector is guaranteed to have soft information for every possible symbol transmitted on every antenna so that the log-likelihood ratio (LLR) for each transmitted data bit can be more accurately formed. Simulation results show that the PPTS algorithm can achieve near-optimal error performance with a low search complexity. The PPTS algorithm is a hardware-friendly data-parallel algorithm because the search operations are evenly distributed among multiple trellis nodes for parallel processing. As a case study, we have designed and synthesized a fully-parallel systolic-array detector and two folded detectors for a 4x4 16-QAM system using a 1.08 V TSMC 65-nm CMOS technology.With a 1.18 mm2 core area, the folded detector can achieve a throughput of 2.1 Gbps.With a 3.19 mm2 core area, the fully-parallel systolic-array detector can achieve a throughput of 6.4 Gbps

    A Parallel Radix-Sort-Based VLSI Architecture for Finding the First W Maximum/Minimum Values

    Very-large-scale integration (VLSI) architectures for finding the first W (W>2) maximum (or minimum) values are required in the implementation of several applications such as nonbinary low-density-parity-check decoders, K-best multiple-input–multiple-output (MIMO) detectors, and turbo product codes. In this brief, a parallel radix-sort-based VLSI architecture for finding the first W maximum (or minimum) values is proposed. The described architecture, called Bit-Wise-And (BWA) architecture, relies on analyzing input data from the most significant bit to the least significant one, with very simple logic circuits. One key feature in the BWA architecture is its high level of scalability, which enables the adoption of this solution in a large spectrum of applications, corresponding to large ranges for both W and the size of the input data set. Experimental results, achieved by implementing the proposed architecture on a high-speed 90-nm CMOS standard-cell technology, show that BWA architecture requires significantly less area than other solutions available in the literature, i.e., less than or about 50% in all the considered cases and about 50% in the worst case. Moreover, the BWA architecture exhibits the lowest area–delay product among almost all considered cases

    Performance - Complexity Comparison of Receivers for a LTE MIMO–OFDM System

    Implementation of receivers for spatial multiplexing multiple-input multiple-output (MIMO) orthogonal-frequency-division-multiplexing (OFDM) systems is considered. The linear minimum mean-square error (LMMSE) and the K-best list sphere detector (LSD) are compared to the iterative successive interference cancellation (SIC) detector and the iterative K-best LSD. The performance of the algorithms is evaluated in 3G long-term evolution (LTE) system. The SIC algorithm is found to perform worse than the K-best LSD when the MIMO channels are highly correlated, while the performance difference diminishes when the correlation decreases. The receivers are designed for 2X2 and 4X4 antenna systems and three different modulation schemes. Complexity results for FPGA and ASIC implementations are found. A modification to the K-best LSD which increases its detection rate is introduced. The ASIC receivers are designed to meet the decoding throughput requirements in LTE and the K-best LSD is found to be the most complex receiver although it gives the best reliable data transmission throughput. The SIC receiver has the best performance–complexity tradeoff in the 2X2 system but in the 4X4 case, the K-best LSD is the most efficient. A receiver architecture which could be reconfigured to using a simple or a more complex detector as the channel conditions change would achieve the best performance while consuming the least amount of power in the receiver
