Search CORE

9 research outputs found

Hardware division by small integer constants

Author: de Dinechin Florent
Didier Laurent-Stéphane
Gener Yılmaz Serhan
Gören Sezer
Ugurdag Fatih
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/11/2016
Field of study

International audienceThis article studies the design of custom circuits for division by a small positive constant. Such circuits can be useful to specific FPGA and ASIC applications. The first problem studied is the Euclidean division of an unsigned integer by a constant, computing a quotient and a remainder. Several new solutions are proposed and compared against the state of the art. As the proposed solutions use small look-up tables, they match well the hardware resources of an FPGA. The article then studies whether the division by the product of two constants is better implemented as two successive dividers or as one atomic divider. It also considers the case when only a quotient or only a remainder are needed. Finally, it addresses the correct rounding of the division of a floating-point number by a small integer constant. All these solutions, and the previous state of the art, are compared in terms of timing, area, and area-timing product. In general, the relevance domains of the various techniques are very different on FPGA and on ASIC

INRIA a CCSD electronic archive server

Hardware division by small integer constants

Author: Boudon Vincent
Demaison Jean
Margulès Laurent
Merke Ilona
Perrin Agnès
ROTGER Maud
Willner Helge
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

HAL-uB

Crossref

eResearch@Ozyegin

INRIA a CCSD electronic archive server

HAL-INSU

Publikationsserver der RWTH Aachen University

Hal-Diderot

HAL - UPEC / UPEM

Integer Division by Constants: Optimal Bounds

Author: Bartlett Colin
Kaser Owen
Lemire Daniel
Publication venue: 'Elsevier BV'
Publication date: 01/06/2021
Field of study

The integer division of a numerator n by a divisor d gives a quotient q and a remainder r. Optimizing compilers accelerate software by replacing the division of n by d with the division of c * n (or c * n + c) by m for convenient integers c and m chosen so that they approximate the reciprocal: c/m ~= 1/d. Such techniques are especially advantageous when m is chosen to be a power of two and when d is a constant so that c and m can be precomputed. The literature contains many bounds on the distance between c/m and the divisor d. Some of these bounds are optimally tight, while others are not. We present optimally tight bounds for quotient and remainder computations

arXiv.org e-Print Archive

R-libre

Directory of Open Access Journals

PubMed Central

Reflections on 10 years of FloPoCo

Author: de Dinechin Florent
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/06/2019
Field of study

International audienceThe FloPoCo open-source arithmetic core generator project started modestly in 2008 [1], with a few parametric floating point cores. It has since then evolved to become a framework for research on hardware arithmetic cores at large, including among others: LNS arithmetic [2], random number generators [3], elementary functions [4]–[9], specialized operators such as constant multiplication and division [10]–[13], various FPGA-specific optimization techniques [14]–[16], and more recently signal-processing transforms and filters [17], [18] (more references can be found on the project’s web site: http://flopoco.gforge.inria.fr/)

Crossref

INRIA a CCSD electronic archive server

Karatsuba with Rectangular Multipliers for FPGAs

Author: de Dinechin Florent
Gustafsson Oscar
Kappauf Johannes
Kumm Martin
Zipf Peter
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Best paper awardInternational audienceThis work presents an extension of Karatsuba's method to efficiently use rectangular multipliers as a base for larger multipliers. The rectangular multipliers that motivate this work are the embedded 18×25-bit signed multipliers found in the DSP blocks of recent Xilinx FPGAs: The traditional Karatsuba approach must under-use them as square 18×18 ones. This work shows that rectangular multipliers can be efficiently exploited in a modified Karatsuba method if their input word sizes have a large greatest common divider. In the Xilinx FPGA case, this can be obtained by using the embedded multipliers as 16×24 unsigned and as 17×25 signed ones. The obtained architectures are implemented with due detail to architectural features such as the pre-adders and post-adders available in Xilinx DSP blocks. They are synthesized and compared with traditional Karatsuba, but also with (non-Karatsuba) state-of-the-art tiling techniques that make use of the full rectangular multipliers. The proposed technique improves resource consumption and performance for multipliers of numbers larger than 64 bits

Publikationer från Linköpings universitet

Crossref

INRIA a CCSD electronic archive server

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Optimisations arithmétiques et synthèse de haut niveau

Author: Uguen Yohann
Publication venue: HAL CCSD
Publication date: 13/11/2019
Field of study

High-level synthesis (HLS) tools offer increased productivity regarding FPGA programming.However, due to their relatively young nature, they still lack many arithmetic optimizations.This thesis proposes safe arithmetic optimizations that should always be applied.These optimizations are simple operator specializations, following the C semantic.Other require to a lift the semantic embedded in high-level input program languages, which are inherited from software programming, for an improved accuracy/cost/performance ratio.To demonstrate this claim, the sum-of-product of floating-point numbers is used as a case study. The sum is performed on a fixed-point format, which is tailored to the application, according to the context in which the operator is instantiated.In some cases, there is not enough information about the input data to tailor the fixed-point accumulator.The fall-back strategy used in this thesis is to generate an accumulator covering the entire floating-point range.This thesis explores different strategies for implementing such a large accumulator, including new ones.The use of a 2's complement representation instead of a sign+magnitude is demonstrated to save resources and to reduce the accumulation loop delay.Based on a tapered precision scheme and an exact accumulator, the posit number systems claims to be a candidate to replace the IEEE floating-point format.A throughout analysis of posit operators is performed, using the same level of hardware optimization as state-of-the-art floating-point operators.Their cost remains much higher that their floating-point counterparts in terms of resource usage and performance. Finally, this thesis presents a compatibility layer for HLS tools that allows one code to be deployed on multiple tools.This library implements a strongly typed custom size integer type along side a set of optimized custom operators.À cause de la nature relativement jeune des outils de synthèse de haut-niveau (HLS), de nombreuses optimisations arithmétiques n'y sont pas encore implémentées. Cette thèse propose des optimisations arithmétiques se servant du contexte spécifique dans lequel les opérateurs sont instanciés.Certaines optimisations sont de simples spécialisations d'opérateurs, respectant la sémantique du C.D'autres nécéssitent de s'éloigner de cette sémantique pour améliorer le compromis précision/coût/performance.Cette proposition est démontré sur des sommes de produits de nombres flottants.La somme est réalisée dans un format en virgule-fixe défini par son contexte.Quand trop peu d’informations sont disponibles pour définir ce format en virgule-fixe, une stratégie est de générer un accumulateur couvrant l'intégralité du format flottant.Cette thèse explore plusieurs implémentations d'un tel accumulateur.L'utilisation d'une représentation en complément à deux permet de réduire le chemin critique de la boucle d'accumulation, ainsi que la quantité de ressources utilisées. Un format alternatif aux nombres flottants, appelé posit, propose d'utiliser un encodage à précision variable.De plus, ce format est augmenté par un accumulateur exact.Pour évaluer précisément le coût matériel de ce format, cette thèse présente des architectures d'opérateurs posits, implémentés avec le même degré d'optimisation que celui de l'état de l'art des opérateurs flottants.Une analyse détaillée montre que le coût des opérateurs posits est malgré tout bien plus élevé que celui de leurs équivalents flottants.Enfin, cette thèse présente une couche de compatibilité entre outils de HLS, permettant de viser plusieurs outils avec un seul code. Cette bibliothèque implémente un type d'entiers de taille variable, avec de plus une sémantique strictement typée, ainsi qu'un ensemble d'opérateurs ad-hoc optimisés

Recommended from our members

Formal Verification of Divider and Square-root Arithmetic Circuits Using Computer Algebra Methods

Author: Yasin Atif
Publication venue: ScholarWorks@UMass Amherst
Publication date: 16/07/2020
Field of study

A considerable progress has been made in recent years in verification of arithmetic circuits such as multipliers, fused multiply-adders, multiply-accumulate, and other components of arithmetic datapaths, both in integer and finite field domain. However, the verification of hardware dividers and square-root functions have received only a limited attention from the verification community, with a notable exception for theorem provers and other inductive, non-automated systems. Division, square root, and transcendental functions are all tied to the basic Intel architecture and proving correctness of such algorithms is of grave importance. Although belonging to the same iterative-subtract class of architectures, they widely differ from each other. IEEE floating point standard specifies square-rooting and division as basic arithmetic operation alongside the usual three basic operations. The difficulty of formally verifying hardware implementation of a divider/square-root can be attributed mostly to the modeling of its characteristic function and the high memory complexity required by standard algebraic approach. The work proposed in this thesis discusses formal verification of combinational divider and square-root circuits. Specifically, it addresses the problem of formally verifying gate-level circuits using an algebraic model. In contrast to standard verification approaches using satisfiability (SAT) or equivalence checking, the proposed method verifies whether the gate-level circuit actually performs the intended function or not, without a need for a reference design. Firstly, we present a verification methodology for a constant divider, where the divisor value is fixed to a constant integer. Albeit simpler case of verification, it provides us with the basic understanding of verification techniques and the underlying issues applicable to divider verification. Secondly, a layered verification approach is proposed for the verification of generic array dividers. Finally, the work proposed in this thesis will further analyze the divider and square-root circuits and aim to curb the memory explosion issue experienced by computer algebra based verification methods in order to successfully verify large bit-width divider-type arithmetic circuits. More specifically, a novel idea of hardware rewriting is introduced, which avoids the high memory complexity. The mentioned technique verifies a 256-bit gate-level square-root circuit with around 260,000 gates in just under 18 minutes and 127-bit gate-level divider circuit in under one minute

ScholarWorks@UMass Amherst

Hardware division by small integer constants

Author: Dinechin F. de
Gener Y. S.
Uğurdağ Hasan Fatih
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2017
Field of study

Due to copyright restrictions, the access to the full text of this article is only available via subscription.This article studies the design of custom circuits for division by a small positive constant. Such circuits can be useful for specific FPGA and ASIC applications. The first problem studied is the Euclidean division of an unsigned integer by a constant, computing a quotient and remainder. Several new solutions are proposed and compared against the state-of-the-art. As the proposed solutions use small look-up tables, they match well with the hardware resources of an FPGA. The article then studies whether the division by the product of two constants is better implemented as two successive dividers or as one atomic divider. It also considers the case when only a quotient or only a remainder is needed. Finally, it addresses the correct rounding of the division of a floating-point number by a small integer constant. All these solutions, and the previous state-of-the-art, are compared in terms of timing, area, and area-timing product. In general, the relevance domains of the various techniques are different on FPGA and on ASIC

eResearch@Ozyegin

Hardware Division by Small Integer Constants

Author: Florent de Dinechin
H. Fatih Ugurdag
Laurent-Stephane Didier
Sezer Goren
Y. Serhan Gener
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref