1,869 research outputs found

    Large-Scale MIMO Detection for 3GPP LTE: Algorithms and FPGA Implementations

    Full text link
    Large-scale (or massive) multiple-input multiple-output (MIMO) is expected to be one of the key technologies in next-generation multi-user cellular systems, based on the upcoming 3GPP LTE Release 12 standard, for example. In this work, we propose - to the best of our knowledge - the first VLSI design enabling high-throughput data detection in single-carrier frequency-division multiple access (SC-FDMA)-based large-scale MIMO systems. We propose a new approximate matrix inversion algorithm relying on a Neumann series expansion, which substantially reduces the complexity of linear data detection. We analyze the associated error, and we compare its performance and complexity to those of an exact linear detector. We present corresponding VLSI architectures, which perform exact and approximate soft-output detection for large-scale MIMO systems with various antenna/user configurations. Reference implementation results for a Xilinx Virtex-7 XC7VX980T FPGA show that our designs are able to achieve more than 600 Mb/s for a 128 antenna, 8 user 3GPP LTE-based large-scale MIMO system. We finally provide a performance/complexity trade-off comparison using the presented FPGA designs, which reveals that the detector circuit of choice is determined by the ratio between BS antennas and users, as well as the desired error-rate performance.Comment: To appear in the IEEE Journal of Selected Topics in Signal Processin

    A survey of the state-of-the-art and focused research in range systems

    Get PDF
    In this one-year renewal of NASA Contract No. 2-304, basic research, development, and implementation in the areas of modern estimation algorithms and digital communication systems have been performed. In the first area, basic study on the conversion of general classes of practical signal processing algorithms into systolic array algorithms is considered, producing four publications. Also studied were the finite word length effects and convergence rates of lattice algorithms, producing two publications. In the second area of study, the use of efficient importance sampling simulation technique for the evaluation of digital communication system performances were studied, producing two publications

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    Group implicit concurrent algorithms in nonlinear structural dynamics

    Get PDF
    During the 70's and 80's, considerable effort was devoted to developing efficient and reliable time stepping procedures for transient structural analysis. Mathematically, the equations governing this type of problems are generally stiff, i.e., they exhibit a wide spectrum in the linear range. The algorithms best suited to this type of applications are those which accurately integrate the low frequency content of the response without necessitating the resolution of the high frequency modes. This means that the algorithms must be unconditionally stable, which in turn rules out explicit integration. The most exciting possibility in the algorithms development area in recent years has been the advent of parallel computers with multiprocessing capabilities. So, this work is mainly concerned with the development of parallel algorithms in the area of structural dynamics. A primary objective is to devise unconditionally stable and accurate time stepping procedures which lend themselves to an efficient implementation in concurrent machines. Some features of the new computer architecture are summarized. A brief survey of current efforts in the area is presented. A new class of concurrent procedures, or Group Implicit algorithms is introduced and analyzed. The numerical simulation shows that GI algorithms hold considerable promise for application in coarse grain as well as medium grain parallel computers

    Solution of partial differential equations on vector and parallel computers

    Get PDF
    The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

    A Many-Core Overlay for High-Performance Embedded Computing on FPGAs

    Get PDF
    In this work, we propose a configurable many-core overlay for high-performance embedded computing. The size of internal memory, supported operations and number of ports can be configured independently for each core of the overlay. The overlay was evaluated with matrix multiplication, LU decomposition and Fast-Fourier Transform (FFT) on a ZYNQ-7020 FPGA platform. The results show that using a system-level many-core overlay avoids complex hardware design and still provides good performance results.Comment: Presented at First International Workshop on FPGAs for Software Programmers (FSP 2014) (arXiv:1408.4423

    Parallelization techniques for scientific and engineering applications and implementation of the boundary element method (BEM)

    Get PDF
    This dissertation reports the implementation of a boundary element method (BEM) application on the massively parallel MasPar MP-1 and MP-2 computers. That implementation provides a case study to demonstrate several techniques for parallelization of sequential algorithms and for optimization of parallel programs;An existing formal technique for transforming a sequential algorithm into a systolic architecture is presented. This dissertation then discusses how a parallel systolic algorithm on a mesh-connected computer can be derived from such a systolic Architecture; The matrix multiplication algorithm used in the BEM implementation is derived in this way;As part of the BEM implementation, this dissertation covers a novel method of solving a system of linear equations, using matrix inversion and LU decomposition. This method is shown to be less expensive than LU decomposition alone. Several parallelizations of matrix inversion are considered;Finally, this dissertation presents techniques for transforming parallel program source code to increase performance. The transformation improves performance by decreasing processor local memory access cost and by increasing processor utilization
    • …
    corecore