12,879 research outputs found

    Fast Fourier Transforms for Finite Inverse Semigroups

    Get PDF
    We extend the theory of fast Fourier transforms on finite groups to finite inverse semigroups. We use a general method for constructing the irreducible representations of a finite inverse semigroup to reduce the problem of computing its Fourier transform to the problems of computing Fourier transforms on its maximal subgroups and a fast zeta transform on its poset structure. We then exhibit explicit fast algorithms for particular inverse semigroups of interest--specifically, for the rook monoid and its wreath products by arbitrary finite groups.Comment: ver 3: Added improved upper and lower bounds for the memory required by the fast zeta transform on the rook monoid. ver 2: Corrected typos and (naive) bounds on memory requirements. 30 pages, 0 figure

    Fast Fourier Transforms for the Rook Monoid

    Full text link
    We define the notion of the Fourier transform for the rook monoid (also called the symmetric inverse semigroup) and provide two efficient divide-and-conquer algorithms (fast Fourier transforms, or FFTs) for computing it. This paper marks the first extension of group FFTs to non-group semigroups

    Wafer-Scale Fast Fourier Transforms

    Full text link
    We have implemented fast Fourier transforms for one, two, and three-dimensional arrays on the Cerebras CS-2, a system whose memory and processing elements reside on a single silicon wafer. The wafer-scale engine (WSE) encompasses a two-dimensional mesh of roughly 850,000 processing elements (PEs) with fast local memory and equally fast nearest-neighbor interconnections. Our wafer-scale FFT (wsFFT) parallelizes a n3n^3 problem with up to n2n^2 PEs. At this point a PE processes only a single vector of the 3D domain (known as a pencil) per superstep, where each of the three supersteps performs FFT along one of the three axes of the input array. Between supersteps, wsFFT redistributes (transposes) the data to bring all elements of each one-dimensional pencil being transformed into the memory of a single PE. Each redistribution causes an all-to-all communication along one of the mesh dimensions. Given the level of parallelism, the size of the messages transmitted between pairs of PEs can be as small as a single word. In theory, a mesh is not ideal for all-to-all communication due to its limited bisection bandwidth. However, the mesh interconnecting PEs on the WSE lies entirely on-wafer and achieves nearly peak bandwidth even with tiny messages. This high efficiency on fine-grain communication allow wsFFT to achieve unprecedented levels of parallelism and performance. We analyse in detail computation and communication time, as well as the weak and strong scaling, using both FP16 and FP32 precision. With 32-bit arithmetic on the CS-2, we achieve 959 microseconds for 3D FFT of a 5123512^3 complex input array using a 512x512 subgrid of the on-wafer PEs. This is the largest ever parallelization for this problem size and the first implementation that breaks the millisecond barrier

    Fast Quantum Fourier Transforms for a Class of Non-abelian Groups

    Full text link
    An algorithm is presented allowing the construction of fast Fourier transforms for any solvable group on a classical computer. The special structure of the recursion formula being the core of this algorithm makes it a good starting point to obtain systematically fast Fourier transforms for solvable groups on a quantum computer. The inherent structure of the Hilbert space imposed by the qubit architecture suggests to consider groups of order 2^n first (where n is the number of qubits). As an example, fast quantum Fourier transforms for all 4 classes of non-abelian 2-groups with cyclic normal subgroup of index 2 are explicitly constructed in terms of quantum circuits. The (quantum) complexity of the Fourier transform for these groups of size 2^n is O(n^2) in all cases.Comment: 16 pages, LaTeX2

    Some applications of fast Fourier transforms

    Get PDF

    Ordered fast fourier transforms on a massively parallel hypercube multiprocessor

    Get PDF
    Design alternatives for ordered Fast Fourier Transformation (FFT) algorithms were examined on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end, the order and computational phases of the FFT were combined, and the sequence to processor maps that reduce communication were used. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, standard-order and A-order which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 exp r elements and the hypercube has P = 2 exp d processors, then a standard-order FFT can be implemented with d + r/2 + 1 parallel transmissions. An A-order sequence can be transformed with 2d - r/2 parallel transmissions which is r - d + 1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine

    Fast computation of magnetostatic fields by Non-uniform Fast Fourier Transforms

    Get PDF
    The bottleneck of micromagnetic simulations is the computation of the long-ranged magnetostatic fields. This can be tackled on regular N-node grids with Fast Fourier Transforms in time N logN, whereas the geometrically more versatile finite element methods (FEM) are bounded to N^4/3 in the best case. We report the implementation of a Non-uniform Fast Fourier Transform algorithm which brings a N logN convergence to FEM, with no loss of accuracy in the results
    corecore