1,991 research outputs found

    A VLSI pipeline design of a fast prime factor DFT on a finite field

    Get PDF
    A conventional prime factor discrete Fourier transform (DFT) algorithm is used to realize a discrete Fourier-like transform on the finite field, GF(q sub n). A pipeline structure is used to implement this prime factor DFT over GF(q sub n). This algorithm is developed to compute cyclic convolutions of complex numbers and to decode Reed-Solomon codes. Such a pipeline fast prime factor DFT algorithm over GF(q sub n) is regular, simple, expandable, and naturally suitable for VLSI implementation. An example illustrating the pipeline aspect of a 30-point transform over GF(q sub n) is presented

    A survey of the state of the art and focused research in range systems, task 2

    Get PDF
    Contract generated publications are compiled which describe the research activities for the reporting period. Study topics include: equivalent configurations of systolic arrays; least squares estimation algorithms with systolic array architectures; modeling and equilization of nonlinear bandlimited satellite channels; and least squares estimation and Kalman filtering by systolic arrays

    Large-Scale MIMO Detection for 3GPP LTE: Algorithms and FPGA Implementations

    Full text link
    Large-scale (or massive) multiple-input multiple-output (MIMO) is expected to be one of the key technologies in next-generation multi-user cellular systems, based on the upcoming 3GPP LTE Release 12 standard, for example. In this work, we propose - to the best of our knowledge - the first VLSI design enabling high-throughput data detection in single-carrier frequency-division multiple access (SC-FDMA)-based large-scale MIMO systems. We propose a new approximate matrix inversion algorithm relying on a Neumann series expansion, which substantially reduces the complexity of linear data detection. We analyze the associated error, and we compare its performance and complexity to those of an exact linear detector. We present corresponding VLSI architectures, which perform exact and approximate soft-output detection for large-scale MIMO systems with various antenna/user configurations. Reference implementation results for a Xilinx Virtex-7 XC7VX980T FPGA show that our designs are able to achieve more than 600 Mb/s for a 128 antenna, 8 user 3GPP LTE-based large-scale MIMO system. We finally provide a performance/complexity trade-off comparison using the presented FPGA designs, which reveals that the detector circuit of choice is determined by the ratio between BS antennas and users, as well as the desired error-rate performance.Comment: To appear in the IEEE Journal of Selected Topics in Signal Processin

    Real-time refocusing using an FPGA-based standard plenoptic camera

    Get PDF
    Plenoptic cameras are receiving increased attention in scientific and commercial applications because they capture the entire structure of light in a scene, enabling optical transforms (such as focusing) to be applied computationally after the fact, rather than once and for all at the time a picture is taken. In many settings, real-time inter active performance is also desired, which in turn requires significant computational power due to the large amount of data required to represent a plenoptic image. Although GPUs have been shown to provide acceptable performance for real-time plenoptic rendering, their cost and power requirements make them prohibitive for embedded uses (such as in-camera). On the other hand, the computation to accomplish plenoptic rendering is well structured, suggesting the use of specialized hardware. Accordingly, this paper presents an array of switch-driven finite impulse response filters, implemented with FPGA to accomplish high-throughput spatial-domain rendering. The proposed architecture provides a power-efficient rendering hardware design suitable for full-video applications as required in broadcasting or cinematography. A benchmark assessment of the proposed hardware implementation shows that real-time performance can readily be achieved, with a one order of magnitude performance improvement over a GPU implementation and three orders ofmagnitude performance improvement over a general-purpose CPU implementation

    Number theoretic techniques applied to algorithms and architectures for digital signal processing

    Get PDF
    Many of the techniques for the computation of a two-dimensional convolution of a small fixed window with a picture are reviewed. It is demonstrated that Winograd's cyclic convolution and Fourier Transform Algorithms, together with Nussbaumer's two-dimensional cyclic convolution algorithms, have a common general form. Many of these algorithms use the theoretical minimum number of general multiplications. A novel implementation of these algorithms is proposed which is based upon one-bit systolic arrays. These systolic arrays are networks of identical cells with each cell sharing a common control and timing function. Each cell is only connected to its nearest neighbours. These are all attractive features for implementation using Very Large Scale Integration (VLSI). The throughput rate is only limited by the time to perform a one-bit full addition. In order to assess the usefulness to these systolic arrays a 'cost function' is developed to compare them with more conventional techniques, such as the Cooley-Tukey radix-2 Fast Fourier Transform (FFT). The cost function shows that these systolic arrays offer a good way of implementing the Discrete Fourier Transform for transforms up to about 30 points in length. The cost function is a general tool and allows comparisons to be made between different implementations of the same algorithm and between dissimilar algorithms. Finally a technique is developed for the derivation of Discrete Cosine Transform (DCT) algorithms from the Winograd Fourier Transform Algorithm. These DCT algorithms may be implemented by modified versions of the systolic arrays proposed earlier, but requiring half the number of cells

    DFT algorithms for bit-serial GaAs array processor architectures

    Get PDF
    Systems and Processes Engineering Corporation (SPEC) has developed an innovative array processor architecture for computing Fourier transforms and other commonly used signal processing algorithms. This architecture is designed to extract the highest possible array performance from state-of-the-art GaAs technology. SPEC's architectural design includes a high performance RISC processor implemented in GaAs, along with a Floating Point Coprocessor and a unique Array Communications Coprocessor, also implemented in GaAs technology. Together, these data processors represent the latest in technology, both from an architectural and implementation viewpoint. SPEC has examined numerous algorithms and parallel processing architectures to determine the optimum array processor architecture. SPEC has developed an array processor architecture with integral communications ability to provide maximum node connectivity. The Array Communications Coprocessor embeds communications operations directly in the core of the processor architecture. A Floating Point Coprocessor architecture has been defined that utilizes Bit-Serial arithmetic units, operating at very high frequency, to perform floating point operations. These Bit-Serial devices reduce the device integration level and complexity to a level compatible with state-of-the-art GaAs device technology

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    An improved architecture for the adaptive discrete cosine transform

    Get PDF

    A 64-point Fourier transform chip for high-speed wireless LAN application using OFDM

    No full text
    In this article, we present a novel fixed-point 16-bit word-width 64-point FFT/IFFT processor developed primarily for the application in the OFDM based IEEE 802.11a Wireless LAN (WLAN) baseband processor. The 64-point FFT is realized by decomposing it into a 2-D structure of 8-point FFTs. This approach reduces the number of required complex multiplications compared to the conventional radix-2 64-point FFT algorithm. The complex multiplication operations are realized using shift-and-add operations. Thus, the processor does not use any 2-input digital multiplier. It also does not need any RAM or ROM for internal storage of coefficients. The proposed 64-point FFT/IFFT processor has been fabricated and tested successfully using our in-house 0.25 ?m BiCMOS technology. The core area of this chip is 6.8 mm2. The average dynamic power consumption is 41 mW @ 20 MHz operating frequency and 1.8 V supply voltage. The processor completes one parallel-to-parallel (i. e., when all input data are available in parallel and all output data are generated in parallel) 64-point FFT computation in 23 cycles. These features show that though it has been developed primarily for application in the IEEE 802.11a standard, it can be used for any application that requires fast operation as well as low power consumption
    • …
    corecore