1,869 research outputs found
Large-Scale MIMO Detection for 3GPP LTE: Algorithms and FPGA Implementations
Large-scale (or massive) multiple-input multiple-output (MIMO) is expected to
be one of the key technologies in next-generation multi-user cellular systems,
based on the upcoming 3GPP LTE Release 12 standard, for example. In this work,
we propose - to the best of our knowledge - the first VLSI design enabling
high-throughput data detection in single-carrier frequency-division multiple
access (SC-FDMA)-based large-scale MIMO systems. We propose a new approximate
matrix inversion algorithm relying on a Neumann series expansion, which
substantially reduces the complexity of linear data detection. We analyze the
associated error, and we compare its performance and complexity to those of an
exact linear detector. We present corresponding VLSI architectures, which
perform exact and approximate soft-output detection for large-scale MIMO
systems with various antenna/user configurations. Reference implementation
results for a Xilinx Virtex-7 XC7VX980T FPGA show that our designs are able to
achieve more than 600 Mb/s for a 128 antenna, 8 user 3GPP LTE-based large-scale
MIMO system. We finally provide a performance/complexity trade-off comparison
using the presented FPGA designs, which reveals that the detector circuit of
choice is determined by the ratio between BS antennas and users, as well as the
desired error-rate performance.Comment: To appear in the IEEE Journal of Selected Topics in Signal Processin
A survey of the state-of-the-art and focused research in range systems
In this one-year renewal of NASA Contract No. 2-304, basic research, development, and implementation in the areas of modern estimation algorithms and digital communication systems have been performed. In the first area, basic study on the conversion of general classes of practical signal processing algorithms into systolic array algorithms is considered, producing four publications. Also studied were the finite word length effects and convergence rates of lattice algorithms, producing two publications. In the second area of study, the use of efficient importance sampling simulation technique for the evaluation of digital communication system performances were studied, producing two publications
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Group implicit concurrent algorithms in nonlinear structural dynamics
During the 70's and 80's, considerable effort was devoted to developing efficient and reliable time stepping procedures for transient structural analysis. Mathematically, the equations governing this type of problems are generally stiff, i.e., they exhibit a wide spectrum in the linear range. The algorithms best suited to this type of applications are those which accurately integrate the low frequency content of the response without necessitating the resolution of the high frequency modes. This means that the algorithms must be unconditionally stable, which in turn rules out explicit integration. The most exciting possibility in the algorithms development area in recent years has been the advent of parallel computers with multiprocessing capabilities. So, this work is mainly concerned with the development of parallel algorithms in the area of structural dynamics. A primary objective is to devise unconditionally stable and accurate time stepping procedures which lend themselves to an efficient implementation in concurrent machines. Some features of the new computer architecture are summarized. A brief survey of current efforts in the area is presented. A new class of concurrent procedures, or Group Implicit algorithms is introduced and analyzed. The numerical simulation shows that GI algorithms hold considerable promise for application in coarse grain as well as medium grain parallel computers
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
A Many-Core Overlay for High-Performance Embedded Computing on FPGAs
In this work, we propose a configurable many-core overlay for
high-performance embedded computing. The size of internal memory, supported
operations and number of ports can be configured independently for each core of
the overlay. The overlay was evaluated with matrix multiplication, LU
decomposition and Fast-Fourier Transform (FFT) on a ZYNQ-7020 FPGA platform.
The results show that using a system-level many-core overlay avoids complex
hardware design and still provides good performance results.Comment: Presented at First International Workshop on FPGAs for Software
Programmers (FSP 2014) (arXiv:1408.4423
Parallelization techniques for scientific and engineering applications and implementation of the boundary element method (BEM)
This dissertation reports the implementation of a boundary element method (BEM) application on the massively parallel MasPar MP-1 and MP-2 computers. That implementation provides a case study to demonstrate several techniques for parallelization of sequential algorithms and for optimization of parallel programs;An existing formal technique for transforming a sequential algorithm into a systolic architecture is presented. This dissertation then discusses how a parallel systolic algorithm on a mesh-connected computer can be derived from such a systolic Architecture; The matrix multiplication algorithm used in the BEM implementation is derived in this way;As part of the BEM implementation, this dissertation covers a novel method of solving a system of linear equations, using matrix inversion and LU decomposition. This method is shown to be less expensive than LU decomposition alone. Several parallelizations of matrix inversion are considered;Finally, this dissertation presents techniques for transforming parallel program source code to increase performance. The transformation improves performance by decreasing processor local memory access cost and by increasing processor utilization
- …