884 research outputs found

    Fast Fourier Transform algorithm design and tradeoffs

    Get PDF
    The Fast Fourier Transform (FFT) is a mainstay of certain numerical techniques for solving fluid dynamics problems. The Connection Machine CM-2 is the target for an investigation into the design of multidimensional Single Instruction Stream/Multiple Data (SIMD) parallel FFT algorithms for high performance. Critical algorithm design issues are discussed, necessary machine performance measurements are identified and made, and the performance of the developed FFT programs are measured. Fast Fourier Transform programs are compared to the currently best Cray-2 FFT program

    Non-power-of-Two FFTs: Exploring the Flexibility of the Montium TP

    Get PDF
    Coarse-grain reconfigurable architectures, like the Montium TP, have proven to be a very successful approach for low-power and high-performance computation of regular digital signal processing algorithms. This paper presents the implementation of a class of non-power-of-two FFTs to discover the limitations and Flexibility of the Montium TP for less regular algorithms. A non-power-of-two FFT is less regular compared to a traditional power-of-two FFT. The results of the implementation show the processing time, accuracy, energy consumption and Flexibility of the implementation

    A study of the communication cost of the FFT on torus multicomputers

    Get PDF
    The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Different approaches are proposed which differ in the way they use the interconnection network. The first approach is based on the multidimensional index mapping technique for the FFT computation. The second approach starts from a hypercube algorithm and then embeds the hypercube onto the torus. The third approach reduces the communication cost of the hypercube algorithm by pipelining the communication operations. A novel methodology to pipeline the communication operations on a torus is proposed. Analytical models are presented to compare the different approaches. This comparison study shows that the best approach depends on the number of dimensions of the torus and the communication start-up and transfer times. The analytical models allow us to select the most efficient approach for the available machine.Peer ReviewedPostprint (published version

    Generating and Searching Families of FFT Algorithms

    Full text link
    A fundamental question of longstanding theoretical interest is to prove the lowest exact count of real additions and multiplications required to compute a power-of-two discrete Fourier transform (DFT). For 35 years the split-radix algorithm held the record by requiring just 4n log n - 6n + 8 arithmetic operations on real numbers for a size-n DFT, and was widely believed to be the best possible. Recent work by Van Buskirk et al. demonstrated improvements to the split-radix operation count by using multiplier coefficients or "twiddle factors" that are not n-th roots of unity for a size-n DFT. This paper presents a Boolean Satisfiability-based proof of the lowest operation count for certain classes of DFT algorithms. First, we present a novel way to choose new yet valid twiddle factors for the nodes in flowgraphs generated by common power-of-two fast Fourier transform algorithms, FFTs. With this new technique, we can generate a large family of FFTs realizable by a fixed flowgraph. This solution space of FFTs is cast as a Boolean Satisfiability problem, and a modern Satisfiability Modulo Theory solver is applied to search for FFTs requiring the fewest arithmetic operations. Surprisingly, we find that there are FFTs requiring fewer operations than the split-radix even when all twiddle factors are n-th roots of unity.Comment: Preprint submitted on March 28, 2011, to the Journal on Satisfiability, Boolean Modeling and Computatio

    Algebraic Signal Processing Theory: Cooley-Tukey Type Algorithms for Polynomial Transforms Based on Induction

    Full text link
    A polynomial transform is the multiplication of an input vector x\in\C^n by a matrix \PT_{b,\alpha}\in\C^{n\times n}, whose (k,)(k,\ell)-th element is defined as p(αk)p_\ell(\alpha_k) for polynomials p_\ell(x)\in\C[x] from a list b={p0(x),,pn1(x)}b=\{p_0(x),\dots,p_{n-1}(x)\} and sample points \alpha_k\in\C from a list α={α0,,αn1}\alpha=\{\alpha_0,\dots,\alpha_{n-1}\}. Such transforms find applications in the areas of signal processing, data compression, and function interpolation. Important examples include the discrete Fourier and cosine transforms. In this paper we introduce a novel technique to derive fast algorithms for polynomial transforms. The technique uses the relationship between polynomial transforms and the representation theory of polynomial algebras. Specifically, we derive algorithms by decomposing the regular modules of these algebras as a stepwise induction. As an application, we derive novel O(nlogn)O(n\log{n}) general-radix algorithms for the discrete Fourier transform and the discrete cosine transform of type 4.Comment: 19 pages. Submitted to SIAM Journal on Matrix Analysis and Application

    Inter-motherboard Memory Scheduling

    Full text link
    Exploring the performance benefits of applying memory scheduling beyond the motherboardSerrano Gómez, M. (2009). Inter-motherboard Memory Scheduling. http://hdl.handle.net/10251/14163Archivo delegad
    corecore