5,161 research outputs found
Reducing branch delay to zero in pipelined processors
A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is based on the use of multiple prefetch, early computation of the target address, delayed branch, and parallel execution of branches. The implementation of this mechanism using a branch target instruction memory is described. An analytical model of the performance of this implementation makes it possible to measure the efficiency of the mechanism with a very low computational cost. The model is used to determine the size of cache lines that maximizes the processor performance, to compare the performance of the mechanism with that of other schemes, and to analyze the performance of the mechanism with two alternative cache organizations.Peer ReviewedPostprint (published version
PoisFFT - A Free Parallel Fast Poisson Solver
A fast Poisson solver software package PoisFFT is presented. It is available
as a free software licensed under the GNU GPL license version 3. The package
uses the fast Fourier transform to directly solve the Poisson equation on a
uniform orthogonal grid. It can solve the pseudo-spectral approximation and the
second order finite difference approximation of the continuous solution. The
paper reviews the mathematical methods for the fast Poisson solver and
discusses the software implementation and parallelization. The use of PoisFFT
in an incompressible flow solver is also demonstrated
DSPSR: Digital Signal Processing Software for Pulsar Astronomy
DSPSR is a high-performance, open-source, object-oriented, digital signal
processing software library and application suite for use in radio pulsar
astronomy. Written primarily in C++, the library implements an extensive range
of modular algorithms that can optionally exploit both multiple-core processors
and general-purpose graphics processing units. After over a decade of research
and development, DSPSR is now stable and in widespread use in the community.
This paper presents a detailed description of its functionality, justification
of major design decisions, analysis of phase-coherent dispersion removal
algorithms, and demonstration of performance on some contemporary
microprocessor architectures.Comment: 15 pages, 10 figures, to be published in PAS
Fast Fourier Transform algorithm design and tradeoffs
The Fast Fourier Transform (FFT) is a mainstay of certain numerical techniques for solving fluid dynamics problems. The Connection Machine CM-2 is the target for an investigation into the design of multidimensional Single Instruction Stream/Multiple Data (SIMD) parallel FFT algorithms for high performance. Critical algorithm design issues are discussed, necessary machine performance measurements are identified and made, and the performance of the developed FFT programs are measured. Fast Fourier Transform programs are compared to the currently best Cray-2 FFT program
Feasibility study for the advanced one-dimensional high temperature optical strain measurement system, phase 3
The Instrumentation and Control Technology Division is developing optical strain measurement systems for applications using high temperature wire and fiber specimens. This feasibility study has determined that stable optical signals can be obtained from specimens at temperatures beyond 2,400 C. A system using an area array sensor is proposed to alleviate off-axis decorrelation arising from rigid body motions. A digital signal processor (DSP) is recommended to perform speckle correlations at a rate near the data acquisition rate. Design parameters are discussed, and fundamental limits on the speckle shift strain measurement technique are defined
Synthetic Aperture Radar (SAR) data processing
The available and optimal methods for generating SAR imagery for NASA applications were identified. The SAR image quality and data processing requirements associated with these applications were studied. Mathematical operations and algorithms required to process sensor data into SAR imagery were defined. The architecture of SAR image formation processors was discussed, and technology necessary to implement the SAR data processors used in both general purpose and dedicated imaging systems was addressed
High-resolution ab initio three-dimensional X-ray diffraction microscopy
Coherent X-ray diffraction microscopy is a method of imaging non-periodic
isolated objects at resolutions only limited, in principle, by the largest
scattering angles recorded. We demonstrate X-ray diffraction imaging with high
resolution in all three dimensions, as determined by a quantitative analysis of
the reconstructed volume images. These images are retrieved from the 3D
diffraction data using no a priori knowledge about the shape or composition of
the object, which has never before been demonstrated on a non-periodic object.
We also construct 2D images of thick objects with infinite depth of focus
(without loss of transverse spatial resolution). These methods can be used to
image biological and materials science samples at high resolution using X-ray
undulator radiation, and establishes the techniques to be used in
atomic-resolution ultrafast imaging at X-ray free-electron laser sources.Comment: 22 pages, 11 figures, submitte
- …