125 research outputs found
Nonuniform Fast Fourier Transforms Using Min-Max Interpolation
The fast Fourier transform (FFT) is used widely in signal processing for efficient computation of the FT of finite-length signals over a set of uniformly spaced frequency locations. However, in many applications, one requires nonuniform sampling in the frequency domain, i.e., a nonuniform FT. Several papers have described fast approximations for the nonuniform FT based on interpolating an oversampled FFT. This paper presents an interpolation method for the nonuniform FT that is optimal in the min-max sense of minimizing the worst-case approximation error over all signals of unit norm. The proposed method easily generalizes to multidimensional signals. Numerical results show that the min-max approach provides substantially lower approximation errors than conventional interpolation methods. The min-max criterion is also useful for optimizing the parameters of interpolation kernels such as the Kaiser-Bessel function.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/85840/1/Fessler70.pd
High-Speed Function Approximation using a Minimax Quadratic Interpolator
A table-based method for high-speed function approximation in single-precision floating-point format is presented in this paper. Our focus is the approximation of reciprocal, square root, square root reciprocal, exponentials, logarithms, trigonometric functions, powering (with a fixed exponent p), or special functions. The algorithm presented here combines table look-up, an enhanced minimax quadratic approximation, and an efficient evaluation of the second-degree polynomial (using a specialized squaring unit, redundant arithmetic, and multioperand addition). The execution times and area costs of an architecture implementing our method are estimated, showing the achievement of the fast execution times of linear approximation methods and the reduced area requirements of other second-degree interpolation algorithms. Moreover, the use of an enhanced minimax approximation which, through an iterative process, takes into account the effect of rounding the polynomial coefficients to a finite size allows for a further reduction in the size of the look-up tables to be used, making our method very suitable for the implementation of an elementary function generator in state-of-the-art DSPs or graphics processing units (GPUs)
Optimized linear, quadratic and cubic interpolators for elementary function hardware implementations
This paper presents a method for designing linear, quadratic and cubic interpolators that compute elementary functions using truncated multipliers, squarers and cubers. Initial coefficient values are obtained using a Chebyshev series approximation. A direct search algorithm is then used to optimize the quantized coefficient values to meet a user-specified error constraint. The algorithm minimizes coefficient lengths to reduce lookup table requirements, maximizes the number of truncated columns to reduce the area, delay and power of the arithmetic units, and minimizes the maximum absolute error of the interpolator output. The method can be used to design interpolators to approximate any function to a user-specified accuracy, up to and beyond 53-bits of precision (e.g., IEEE double precision significand). Linear, quadratic and cubic interpolator designs that approximate reciprocal, square root, reciprocal square root and sine are presented and analyzed. Area, delay and power estimates are given for 16, 24 and 32-bit interpolators that compute the reciprocal function, targeting a 65 nm CMOS technology from IBM. Results indicate the proposed method uses smaller arithmetic units and has reduced lookup table sizes compared to previously proposed methods. The method can be used to optimize coefficients in other systems while accounting for coefficient quantization as well as truncation and rounding effects of multiple arithmetic units.Peer reviewedElectrical and Computer Engineerin
Maximin design on non hypercube domain and kernel interpolation
In the paradigm of computer experiments, the choice of an experimental design
is an important issue. When no information is available about the black-box
function to be approximated, an exploratory design have to be used. In this
context, two dispersion criteria are usually considered: the minimax and the
maximin ones. In the case of a hypercube domain, a standard strategy consists
of taking the maximin design within the class of Latin hypercube designs.
However, in a non hypercube context, it does not make sense to use the Latin
hypercube strategy. Moreover, whatever the design is, the black-box function is
typically approximated thanks to kernel interpolation. Here, we first provide a
theoretical justification to the maximin criterion with respect to kernel
interpolations. Then, we propose simulated annealing algorithms to determine
maximin designs in any bounded connected domain. We prove the convergence of
the different schemes.Comment: 3 figure
Fast, area-efficient 32-bit LNS for computer arithmetic operations
PhD ThesisThe logarithmic number system has been proposed as an alternative to floating-point.
Multiplication, division and square-root operations are accomplished with fixedpoint
arithmetic, but addition and subtraction are considerably more challenging.
Recent work has demonstrated that these operations too can be done with similar
speed and accuracy to their floating-point equivalents, but the necessary circuitry is
complex. In particular, it is dominated by the need for large lookup tables for the
storage of a non-linear function.
This thesis describes the architectures required to implement a newly design
approach for producing fast and area-efficient 32-bit LNS arithmetic unit. The
designs are structured based on two different algorithms. At first, a new cotransformation
procedure is introduced in the singularity region whilst performing
subtractions in which the technique capable to generate less total storage than the cotransformation
method in the previous LNS architecture. Secondly, improvement to
an existing interpolation process is proposed, that also reduce the total tables to an
extent that allows their easy synthesis in logic. Consequently, the total delays in the
system can be significantly reduced.
According to the comparison analysis with previous best LNS design and
floating-point units, it is shown that the new LNS architecture capable to offer
significantly better in speed while sustaining its accuracy within floating-point limit.
In addition, its implementation is more economical than previous best LNS system
and almost equivalent with existing floating-point arithmetic unit.University Malaysia Perlis:
Ministry of Higher Education, Malaysia
- …