Search CORE

5 research outputs found

High-Speed Function Approximation using a Minimax Quadratic Interpolator

Author: Bruguera Javier
Muller Jean-Michel
Oberman Stuart
Pineiro Jose-Alejandro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

A table-based method for high-speed function approximation in single-precision floating-point format is presented in this paper. Our focus is the approximation of reciprocal, square root, square root reciprocal, exponentials, logarithms, trigonometric functions, powering (with a fixed exponent p), or special functions. The algorithm presented here combines table look-up, an enhanced minimax quadratic approximation, and an efficient evaluation of the second-degree polynomial (using a specialized squaring unit, redundant arithmetic, and multioperand addition). The execution times and area costs of an architecture implementing our method are estimated, showing the achievement of the fast execution times of linear approximation methods and the reduced area requirements of other second-degree interpolation algorithms. Moreover, the use of an enhanced minimax approximation which, through an iterative process, takes into account the effect of rounding the polynomial coefficients to a finite size allows for a further reduction in the size of the look-up tables to be used, making our method very suitable for the implementation of an elementary function generator in state-of-the-art DSPs or graphics processing units (GPUs)

HAL-ENS-LYON

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot

Improving Goldschmidt Division, Square Root and Square Root Reciprocal

Author: Ercegovac Milos
Imbert Laurent
Matula David
Muller Jean-Michel
Wei Guoheng
Publication venue: HAL CCSD
Publication date: 01/01/1999
Field of study

The aim of this paper is to accelerate division, square root and square root reciprocal computations, when Goldschmidt method is used on a pipelined multiplier. This is done by replacing the last iteration by the addition of a correcting term that can be looked up during the early iterations. We describe several variants of the Goldschmidt algorithm assuming 4-cycle pipelined multiplier and discuss obtained number of cycles and error achieved. Extensions to other than 4-cycle multipliers are given.Le but de cet article est l'accélération de la division, et du calcul de racines carrées et d'inverses de racines carrées lorsque la méthode de Goldschmidt est utilisée sur un multiplieur pipe-line. Nous faisons ceci en remplaçant la dernière itération par l'addition d'un terme de correction qui peut être déduit d'une lecture de table effectuée lors des premières itérations. Nous décrivons plusieurs variantes de l'algorithme obtenu en supposant un multiplieur à 4 étages de pipe-line, et donnons pour chaque variante l'erreur obtenue et le nombre de cycles de calcul. Des extensions de ce travail à des multiplieurs dont le nombre d'étages est différent sont présentées

INRIA a CCSD electronic archive server

Division and square root for mobile and scientific computing markets

Author: Holimath Vijaykumar
Publication venue: Universidade de Santiago de Compostela. Servizo de Publicacións e Intercambio Científico
Publication date: 01/01/2007
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional da Universidade de Santiago de Compostela

Beschleunigung Hydrodynamischer Astrophysikalischer Simulationen mit FPGA-Basierten Rekonfigurierbaren Koprozessoren

Author: Lienhart Gerhard
Publication venue
Publication date: 01/01/2004
Field of study

Diese Dissertation befasst sich mit der Anwendung rekonfigurierbarer Koprozessoren zur Beschleunigung astrophysikalischer Simulationsalgorithmen, ausgehend von einer hybriden Plattform aus Standardrechner und einem Rechenbeschleuniger für die Gravitationssimulation (GRAPE). Für Simulationen, die eine Berücksichtigung der Hydrodynamik erforderlich machen, schränkt die dazu eingesetzte Simulationsmethode Smoothed Particle Hydrodynamics (SPH) die erzielbare Rechenleistung des Gesamtsystems stark ein. Es wurde der Ansatz verfolgt, durch den Einsatz einer FPGA-basierten Koprozessorplattform das SPH-Verfahren zu beschleunigen. Analysen der Simulationscodes ergaben, dass die SPH-Berechnungen unter Verwendung von Gleitkommazahlen mit 16 Mantissenbits ausreichend genau sind. Um den Ansatz zu realisieren, wurde ein FPGA-Koprozessor in Form einer PCI-Einsteckkarte verwendet, ausgestattet mit einem modernen Virtex-II-3000-FPGA von Xilinx. Es wurden FPGA-Designs entwickelt, welche für die umfangreichen aber einfach strukturierten SPH-Berechnungen bei ausreichend hoher Rechengenauigkeit eine Rechenleistung von über 3 GFlops erreichen. Dazu wurde eine Bibliothek arithmetischer Module für die rekonfigurierbare Logik entwickelt. Alle Module sind bezüglich der Rechengenauigkeit parametrisiert, und es wurden für verschiedene numerische Randbedingungen spezialisierte Operatoren entwickelt. Damit konnten optimal an die Problemstellung angepasste Rechenwerke in Form einer Pipeline aufgebaut werden. Für die SPH-Pipelines konnten 50-60 Gleitkommaoperationen unter Aufwendung von etwa 50 % der FPGA-Ressourcen implementiert werden, mit einer resultierenden Geschwindigkeit von 66 MHz. Die Schaltungen sind in der Lage, die Berechnungen synchron zur maximalen Datenrate von Speicher und PCI-Interface durchführen. Um das Beschleunigungspotential (etwa Faktor 10) effektiv auszuschöpfen, wird eine tiefgehende Umstrukturierung des Simulationsalgorithmus erforderlich, was Gegenstand der weiteren Forschung sein wird

Heidelberger Dokumentenserver

High-Speed Inverse Square Roots

Author: Kent E. Wires
Michael J. Schulte
Publication venue: IEEE Computer Society Press
Publication date
Field of study

Inverse square roots are used in several digital signal processing, multimedia, and scientific computing applications. This paper presents a high-speed method for computing inverse square roots. This method uses a table lookup, operand modification, and multiplication to obtain an initial approximation to the inverse square root. This is followed by a modified Newton-Raphson iteration, consisting of one square, one multiply-complement, and one multiplyadd operation. The initial approximation and NewtonRaphson iteration employ specialized hardware to reduce the delay, area, and power dissipation. Application of this method is illustrated through the design of an inverse square root unit for operands in the IEEE single precision format. An implementation of this unit with a 4-layer metal, 2.5 Volt, 0.25 micron CMOS standard cell library has a cycle time of 6.7 ns, an area of 0.41 mm 2 , a latency of five cycles, and a throughput of one result per cycle. 1. Introduction Square roots a..

CiteSeerX