5,161 research outputs found

    Reducing branch delay to zero in pipelined processors

    Get PDF
    A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is based on the use of multiple prefetch, early computation of the target address, delayed branch, and parallel execution of branches. The implementation of this mechanism using a branch target instruction memory is described. An analytical model of the performance of this implementation makes it possible to measure the efficiency of the mechanism with a very low computational cost. The model is used to determine the size of cache lines that maximizes the processor performance, to compare the performance of the mechanism with that of other schemes, and to analyze the performance of the mechanism with two alternative cache organizations.Peer ReviewedPostprint (published version

    PoisFFT - A Free Parallel Fast Poisson Solver

    Full text link
    A fast Poisson solver software package PoisFFT is presented. It is available as a free software licensed under the GNU GPL license version 3. The package uses the fast Fourier transform to directly solve the Poisson equation on a uniform orthogonal grid. It can solve the pseudo-spectral approximation and the second order finite difference approximation of the continuous solution. The paper reviews the mathematical methods for the fast Poisson solver and discusses the software implementation and parallelization. The use of PoisFFT in an incompressible flow solver is also demonstrated

    DSPSR: Digital Signal Processing Software for Pulsar Astronomy

    Full text link
    DSPSR is a high-performance, open-source, object-oriented, digital signal processing software library and application suite for use in radio pulsar astronomy. Written primarily in C++, the library implements an extensive range of modular algorithms that can optionally exploit both multiple-core processors and general-purpose graphics processing units. After over a decade of research and development, DSPSR is now stable and in widespread use in the community. This paper presents a detailed description of its functionality, justification of major design decisions, analysis of phase-coherent dispersion removal algorithms, and demonstration of performance on some contemporary microprocessor architectures.Comment: 15 pages, 10 figures, to be published in PAS

    Fast Fourier Transform algorithm design and tradeoffs

    Get PDF
    The Fast Fourier Transform (FFT) is a mainstay of certain numerical techniques for solving fluid dynamics problems. The Connection Machine CM-2 is the target for an investigation into the design of multidimensional Single Instruction Stream/Multiple Data (SIMD) parallel FFT algorithms for high performance. Critical algorithm design issues are discussed, necessary machine performance measurements are identified and made, and the performance of the developed FFT programs are measured. Fast Fourier Transform programs are compared to the currently best Cray-2 FFT program

    Feasibility study for the advanced one-dimensional high temperature optical strain measurement system, phase 3

    Get PDF
    The Instrumentation and Control Technology Division is developing optical strain measurement systems for applications using high temperature wire and fiber specimens. This feasibility study has determined that stable optical signals can be obtained from specimens at temperatures beyond 2,400 C. A system using an area array sensor is proposed to alleviate off-axis decorrelation arising from rigid body motions. A digital signal processor (DSP) is recommended to perform speckle correlations at a rate near the data acquisition rate. Design parameters are discussed, and fundamental limits on the speckle shift strain measurement technique are defined

    Synthetic Aperture Radar (SAR) data processing

    Get PDF
    The available and optimal methods for generating SAR imagery for NASA applications were identified. The SAR image quality and data processing requirements associated with these applications were studied. Mathematical operations and algorithms required to process sensor data into SAR imagery were defined. The architecture of SAR image formation processors was discussed, and technology necessary to implement the SAR data processors used in both general purpose and dedicated imaging systems was addressed

    High-resolution ab initio three-dimensional X-ray diffraction microscopy

    Full text link
    Coherent X-ray diffraction microscopy is a method of imaging non-periodic isolated objects at resolutions only limited, in principle, by the largest scattering angles recorded. We demonstrate X-ray diffraction imaging with high resolution in all three dimensions, as determined by a quantitative analysis of the reconstructed volume images. These images are retrieved from the 3D diffraction data using no a priori knowledge about the shape or composition of the object, which has never before been demonstrated on a non-periodic object. We also construct 2D images of thick objects with infinite depth of focus (without loss of transverse spatial resolution). These methods can be used to image biological and materials science samples at high resolution using X-ray undulator radiation, and establishes the techniques to be used in atomic-resolution ultrafast imaging at X-ray free-electron laser sources.Comment: 22 pages, 11 figures, submitte
    corecore