Search CORE

2,441 research outputs found

Composite Iterative Algorithm and Architecture for q-th Root Calculation

Author: Bruguera Javier
Vazquez Alvaro
Publication venue: HAL CCSD
Publication date: 10/03/2011
Field of study

An algorithm for the q-th root extraction, being q any integer, is presented in this paper. The algorithm is based on an optimized implementation of X^{1/q} by a sequence of parallel and/or overlapped operations: (1) reciprocal, (2) digit-recurrence logarithm, (3) left-to-right carry-free multiplication and (4) on-line exponential. A detailed error analysis and two architectures are proposed, for low precision q and for higher precision q. The execution time and hardware requirements are estimated for single and double precision floating-point computations for several radices; this helps to determine which radices result in the most efficient implementations. The architectures proposed improve the features of other architectures for q-th root extraction.Dans cet article, nous présentons un algorithme matériel pour l'extraction de la racine q-ième d'un nombre X, où q est un entier naturel non nul. Cet algorithme est basé sur une implantation optimisée de la fonction X^{1/q} par une séquence d'opérations parallèles et/ou superposées: (1) réciproque, (2) logarithme chiffre par chiffre, (3) multiplication de gauche-à-droite sans propagation de retenue et (4) exponentielle en ligne. Une analyse détaillée des erreurs et deux architectures sont proposées, pour q de basse précision et pour q de précision plus haute. Le temps d'exécution et les composants matériels à utiliser sont estimés pour des calculs en virgule flottante simple et double précision et pour plusieurs bases. Cette étude aide à déterminer quelles bases mènent aux implantations les plus efficaces. Les architectures proposées améliorent les caractéristiques d'architectures précédentes destinées à l'extraction des racines

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Floating Point Calculation of the Cube Function on FPGAs

Author: Osorio Roberto
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 01/01/2023
Field of study

© 2023 IEEE. This version of the paper has been accepted for publication. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The final published paper is available online at: https://doi.org/10.1109/TPDS.2022.3220039[Abstract]: Specialized arithmetic units allow fast and efficient computation of lesser used mathematical functions. The overall impact of those units would be negligible in a general purpose processor, as added circuitry makes chips more complex despite most software would seldom make use of it. On the opposite side, custom computing machines are built for a specific task, and they can always benefit from specialized units if they are available. In this work, floating point architectures are proposed for computing the cube on Intel and Xilinx FPGAs. Those implementations reduce the cost and latency compared to using simple floating point multiplications and squarers.This work was supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00 / AEI / 10.13039/501100011033), and by Xunta de Galicia and FEDER funds of the EU (Centro de Investigaci´on de Galicia accreditation 2019–2022, ref. ED431G 2019/01; Consolidation Program of Competitive Reference Groups, ref. ED431C 2021/30).Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2021/3

Repositorio da Universidade da Coruña

GRAPE-5: A Special-Purpose Computer for N-body Simulation

Author: Fukushige Toshiyuki
Kawai Atsushi
Makino Junichiro
Taiji Makoto
Publication venue: 'Oxford University Press (OUP)'
Publication date: 07/09/1999
Field of study

We have developed a special-purpose computer for gravitational many-body simulations, GRAPE-5. GRAPE-5 is the successor of GRAPE-3. Both consist of eight custom pipeline chips (G5 chip and GRAPE chip). The difference between GRAPE-5 and GRAPE-3 are: (1) The G5 chip contains two pipelines operating at 80 MHz, while the GRAPE chip had one at 20 MHz. Thus, the calculation speed of the G5 chip and that of GRAPE-5 board are 8 times faster than that of GRAPE chip and GRAPE-3 board. (2) The GRAPE-5 board adopted PCI bus as the interface to the host computer instead of VME of GRAPE-3, resulting in the communication speed one order of magnitude faster. (3) In addition to the pure 1/r potential, the G5 chip can calculate forces with arbitrary cutoff functions, so that it can be applied to Ewald or P^3M methods. (4) The pairwise force calculated on GRAPE-5 is about 10 times more accurate than that on GRAPE-3. On one GRAPE-5 board, one timestep of 128k-body simulation with direct summation algorithm takes 14 seconds. With Barnes-Hut tree algorithm (theta = 0.75), one timestep of 10^6-body simulation can be done in 16 seconds.Comment: 19 pages, 24 Postscript figures, 3 tables, Latex, submitted to Publications of the Astronomical Society of Japa

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Design for Implementation of Image Processing Algorithms

Author: Whitesell Jamison D
Publication venue: RIT Scholar Works
Publication date: 01/01/2014
Field of study

Color image processing algorithms are first developed using a high-level mathematical modeling language. Current integrated development environments offer libraries of intrinsic functions, which on one hand enable faster development, but on the other hand hide the use of fundamental operations. The latter have to be detailed for an efficient hardware and/or software physical implementation. Based on the experience accumulated in the process of implementing a segmentation algorithm, this thesis outlines a design for implementation methodology comprised of a development flow and associated guidelines. The methodology enables algorithm developers to iteratively optimize their algorithms while maintaining the level of image integrity required by their application. Furthermore, it does not require algorithm developers to change their current development process. Rather, the design for implementation methodology is best suited for optimizing a functionally correct algorithm, thus appending to an algorithm developer\u27s design process of choice. The application of this methodology to four segmentation algorithm steps produced measured results with 2-D correlation coefficients (CORR2) better than 0.99, peak-signal-to-noise-ratio (PSNR) better than 70 dB, and structural-similarity-index (SSIM) better than 0.98, for a majority of test cases

RIT Scholar Works

Fast Quantum Modular Exponentiation

Author: A. G. Fowler
A. M. Steane
A. Yao
D. Deutsch
D. E. Knuth
L. Grover
M. A. Nielsen
M. A. Nielsen
M. D. Ercegovac
M. Oskin
P. W. Shor
P. W. Shor
R. Cleve
R. Van Meter
S. Beauregard
Publication venue: 'American Physical Society (APS)'
Publication date: 29/03/2005
Field of study

We present a detailed analysis of the impact on modular exponentiation of architectural features and possible concurrent gate execution. Various arithmetic algorithms are evaluated for execution time, potential concurrency, and space tradeoffs. We find that, to exponentiate an n-bit number, for storage space 100n (twenty times the minimum 5n), we can execute modular exponentiation two hundred to seven hundred times faster than optimized versions of the basic algorithms, depending on architecture, for n=128. Addition on a neighbor-only architecture is limited to O(n) time when non-neighbor architectures can reach O(log n), demonstrating that physical characteristics of a computing device have an important impact on both real-world running time and asymptotic behavior. Our results will help guide experimental implementations of quantum algorithms and devices.Comment: to appear in PRA 71(5); RevTeX, 12 pages, 12 figures; v2 revision is substantial, with new algorithmic variants, much shorter and clearer text, and revised equation formattin

arXiv.org e-Print Archive

Crossref

Dynamic Reconfigurable on the Lifting Steps Wavelet Packet Processor with Frame-Based Psychoacoustic Optimized Time- Frequency Tiling for Real-Time Audio Applications

Author: Petrovsky Alexander
Petrovsky Alexey
Rodionov Maxim
Publication venue: 'IntechOpen'
Publication date: 16/01/2013
Field of study

IntechOpen

Crossref