26,749 research outputs found

    PROCESSOR ARCHITECTURES FOR FAST COMPUTATION OF MULTI-DIMENSIONAL UNITARY TRANSFORMS.

    Get PDF
    This work presents the development of new algorithms and special purpose sequential processor architectures for the computation of a class of one-, two- and multi-dimensional unitary transforms. In particular, a technique is presented to factorize the transformation matrices of a class of multi-dimensional unitary transforms, having separable kernels, into products of sparse matrices. These sparse matrices consist of Kronecker products of factors of the one-dimensional transformation matrix. Such factorizations result in fast algorithms for the computation of a variety of multi-dimensional unitary transforms including Fourier, Walsh-Hadamard and generalized Walsh transforms. It is shown that the u-dimensional Fourier and generalized Walsh transforms can be implemented with a u-dimensional radix-r butterfly operation requiring considerably fewer complex multiplications than the conventional implementation using a one-dimensional radix-r butterfly operation. Residue number principles and techniques are applied to develop novel special purpose sequential processor architectures for the computation of one-dimensional discrete Fourier and Walsh-Hadamard transforms and convolutions in real-time. The residue number system (RNS) based implementations yield a significant improvement in processing speed over the conventional realizations using the binary number system. As an illustration of the factorization techniques developed in this work, novel sequential architectures of RNS-based fast Fourier, Walsh-Hadamard and generalized Walsh transform processors for real-time processing of two-dimensional signals are presented. These sequential processor architectures are capable of processing large bandwidth (\u3e 5 M.Hz) input sequences. The application of the proposed FFT processors for the real-time computation of two-dimensional convolutions is also investigated. A special memory structure to support two-dimensional convolution operations is presented and it is shown that the two-dimensional FFT processor architecture proposed in this work requires less hardware than the conventional implementations. The FFT algorithms and processor architectures are verified by computer simulation.Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis1981 .N246. Source: Dissertation Abstracts International, Volume: 42-08, Section: B, page: 3366. Thesis (Ph.D.)--University of Windsor (Canada), 1981

    4.45 Pflops Astrophysical N-Body Simulation on K computer -- The Gravitational Trillion-Body Problem

    Full text link
    As an entry for the 2012 Gordon-Bell performance prize, we report performance results of astrophysical N-body simulations of one trillion particles performed on the full system of K computer. This is the first gravitational trillion-body simulation in the world. We describe the scientific motivation, the numerical algorithm, the parallelization strategy, and the performance analysis. Unlike many previous Gordon-Bell prize winners that used the tree algorithm for astrophysical N-body simulations, we used the hybrid TreePM method, for similar level of accuracy in which the short-range force is calculated by the tree algorithm, and the long-range force is solved by the particle-mesh algorithm. We developed a highly-tuned gravity kernel for short-range forces, and a novel communication algorithm for long-range forces. The average performance on 24576 and 82944 nodes of K computer are 1.53 and 4.45 Pflops, which correspond to 49% and 42% of the peak speed.Comment: 10 pages, 6 figures, Proceedings of Supercomputing 2012 (http://sc12.supercomputing.org/), Gordon Bell Prize Winner. Additional information is http://www.ccs.tsukuba.ac.jp/CCS/eng/gbp201

    A general purpose subroutine for fast fourier transform on a distributed memory parallel machine

    Get PDF
    One issue which is central in developing a general purpose Fast Fourier Transform (FFT) subroutine on a distributed memory parallel machine is the data distribution. It is possible that different users would like to use the FFT routine with different data distributions. Thus, there is a need to design FFT schemes on distributed memory parallel machines which can support a variety of data distributions. An FFT implementation on a distributed memory parallel machine which works for a number of data distributions commonly encountered in scientific applications is presented. The problem of rearranging the data after computing the FFT is also addressed. The performance of the implementation on a distributed memory parallel machine Intel iPSC/860 is evaluated
    corecore