Search CORE

106 research outputs found

Multi-operand Decimal Adder Trees for FPGAs

Author: de Dinechin Florent
Vazquez Alvaro
Publication venue: HAL CCSD
Publication date: 14/10/2010
Field of study

The research and development of hardware designs for decimal arithmetic is currently going under an intense activity. For most part, the methods proposed to implement fixed and floating point operations are intended for ASIC designs. Thus, a direct mapping or adaptation of these techniques into a FPGA could be far from an optimal solution. Only a few studies have considered new methods more suitable for FPGA implementations. A basic operation that has not received enough attention in this context is multi-operand BCD addition. For example, it is of interest for low latency implementations of decimal fixed and floating point multipliers and decimal fused multiply-add units. We have explored the most representative proposals for multi-operand BCD addition and found that the resultant implementations in FPGAs are still very inefficient in terms of both area and latency when compared to their binary counterparts. In this paper we present a new method for fast and efficient implementation of multi-operand BCD addition in current FPGA devices. In particular, our proposal maps quite well into the slice structure of the Xilinx Virtex-5/Virtex-6 families and it is highly pipelineable. The synthesis results for a Virtex-6 device indicate that our implementations halve the area and latency of previous proposals, presenting area and delay figures close to those of optimal binary adder trees.La recherche sur l'implantation en matériel de l'arithmétique décimale est actuellement très active, la plupart des travaux portant sur des opérateurs pour les processeurs, en virgule fixe ou flottante. Mais les techniques développées pour un circuit intégré n'aboutissent pas forcément à une implémentation optimale dans un FPGA. Il n'y a que peu d'études ciblant explicitement les FPGA. Cet article s'intéresse dans ce contexte, à l'addition BCD multi-opérande, au cœur de multiplieurs et de multiplieurs-accumulateurs à faible latence. Nous étudions les architectures proposées pour cette opération décimale, et nous observons que, sur FPGA, leur performance (surface et latence) est très inférieure à celle des opérations binaire à précision comparable. Nous présentons donc dans cet article une nouvelle technique d'addition BCD multi-opérandes qui s'avère plus efficace que les propositions précédentes sur les FPGA actuels. Elle s'adapte particulièrement bien à la structure fine des FPGA Xilinx Virtex-5/Virtex-6, et se prête bien au pipeline. Les résultats de synthèse montrent que notre implémentation divise par deux la surface et la latence par rapport aux propositions précédentes, les ramenant à des valeurs comparables à celles des meilleurs additionneurs multi-opérandes binaires

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

BCD Multiplier

Author: Abulinain Ahmed Kamal Saad Fathallah
Publication venue: 'Whiting & Birch, Ltd.'
Publication date: 01/01/2015
Field of study

BCD multipliers are the basis of accurate decimal multiplication used in banking systems, scientific calculations, etc. Fractions convert poorly into binary numbers giving rise to conversion error. Therefore, banking industry have been using Binary Coded Decimal numbering system for their banking business transaction to circumvent the error between decimal fraction number to binary. Here we will explore some single-digit Binary Coded Decimal Multiplication units that perform multiplication in hardware for the purpose of future implementation

UTPedia

FPGA Implementation of Double Precision Floating Point Multiplier

Author: Abdullah Mohd.
Chourasia Bharti
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/12/2022
Field of study

High speed computation is the need of today’s generation of Processors.  To accomplish this major task,  many functions  are implemented  inside the hardware  of the processor rather than  having  software  computing  the  same  task. Majority of the operations which the processor executes are Arithmetic operations which are widely used in many applications that require heavy mathematical operations such as scientific calculations, image and signal processing. Especially in the field of signal processing, multiplication division operation is widely used in many applications. The major issue with these operations in hardware is that much iteration is required which results in slow operation while fast algorithms require complex computations within each cycle. The result of a Division operation results in a either  in Quotient  and  Remainder  or a Floating  point  number  which is the  major reason  to  make it  more complex than  Multiplication  operation

International Journal on Recent and Innovation Trends in Computing and Communication

BCD Multiplier

Author: Abulinain Ahmed Kamal Saad Fathallah
Publication venue: 'Whiting & Birch, Ltd.'
Publication date: 01/01/2015
Field of study

UTPedia

Evolution, Revolution and Convolution Recent Progress in Field-Programmable Logic

Author: Alfke P
Publication venue: CERN
Publication date: 01/01/2001
Field of study

CERN Document Server

Diseño e implementación de operaciones aritméticas en punto flotante decimal según el estándar IEEE 754-2008

Author: Minchola Carlos
Publication venue
Publication date: 01/01/2010
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

Algorithms and architectures for decimal transcendental function computation

Author: Chen Dongdong
Publication venue: 'University of Saskatchewan Library'
Publication date: 01/01/2011
Field of study

Nowadays, there are many commercial demands for decimal floating-point (DFP) arithmetic operations such as financial analysis, tax calculation, currency conversion, Internet based applications, and e-commerce. This trend gives rise to further development on DFP arithmetic units which can perform accurate computations with exact decimal operands. Due to the significance of DFP arithmetic, the IEEE 754-2008 standard for floating-point arithmetic includes it in its specifications. The basic decimal arithmetic unit, such as decimal adder, subtracter, multiplier, divider or square-root unit, as a main part of a decimal microprocessor, is attracting more and more researchers' attentions. Recently, the decimal-encoded formats and DFP arithmetic units have been implemented in IBM's system z900, POWER6, and z10 microprocessors. Increasing chip densities and transistor count provide more room for designers to add more essential functions on application domains into upcoming microprocessors. Decimal transcendental functions, such as DFP logarithm, antilogarithm, exponential, reciprocal and trigonometric, etc, as useful arithmetic operations in many areas of science and engineering, has been specified as the recommended arithmetic in the IEEE 754-2008 standard. Thus, virtually all the computing systems that are compliant with the IEEE 754-2008 standard could include a DFP mathematical library providing transcendental function computation. Based on the development of basic decimal arithmetic units, more complex DFP transcendental arithmetic will be the next building blocks in microprocessors. In this dissertation, we researched and developed several new decimal algorithms and architectures for the DFP transcendental function computation. These designs are composed of several different methods: 1) the decimal transcendental function computation based on the table-based first-order polynomial approximation method; 2) DFP logarithmic and antilogarithmic converters based on the decimal digit-recurrence algorithm with selection by rounding; 3) a decimal reciprocal unit using the efficient table look-up based on Newton-Raphson iterations; and 4) a first radix-100 division unit based on the non-restoring algorithm with pre-scaling method. Most decimal algorithms and architectures for the DFP transcendental function computation developed in this dissertation have been the first attempt to analyze and implement the DFP transcendental arithmetic in order to achieve faithful results of DFP operands, specified in IEEE 754-2008. To help researchers evaluate the hardware performance of DFP transcendental arithmetic units, the proposed architectures based on the different methods are modeled, verified and synthesized using FPGAs or with CMOS standard cells libraries in ASIC. Some of implementation results are compared with those of the binary radix-16 logarithmic and exponential converters; recent developed high performance decimal CORDIC based architecture; and Intel's DFP transcendental function computation software library. The comparison results show that the proposed architectures have significant speed-up in contrast to the above designs in terms of the latency. The algorithms and architectures developed in this dissertation provide a useful starting point for future hardware-oriented DFP transcendental function computation researches

eCommons@USASK

University of Saskatchewan Research Archive

FPGA Implementation of Fast Binary Multiplication Based on Customized Basic Cells

Author: Abd Al-Rahman Al-Nounou
Fadi Obeidat
Mohammad Al-Khaleel
Osama Al-Khaleel
Publication venue: 'Pensoft Publishers'
Publication date: 01/01/2022
Field of study

Multiplication is considered one of the most time-consuming and a key operation in wide variety of embedded applications. Speeding up this operation has a significant impact on the overall performance of these applications. A vast number of multiplication approaches are found in the literature where the goal is always to achieve a higher performance. One of these approaches relies on using smaller multiplier blocks which are built based on direct Boolean algebra equations to build large multipliers. In this work, we present a methodology for designing binary multipliers where different sizes customized partial products generation (CPPG) cells are designed and used as smaller building blocks. The sizes of the designed CPPG cells are 2&times;2, 3&times;3, 4&times;4, 5&times;5, and 6&times;6. We use these cells to build 8&times;8, 16&times;16, 32&times;32, 64&times;64, and 128&times;128 binary multipliers. All of the CPPG cells and the binary multipliers are described using the VHDL language, tested, and implemented using XILINX ISE 14.6 tools targeting different FPGA families. The implementation results show that the best performance is achieved when cell 3&times;3 is used and Virtex-7 FPGA is targeted. The binary multipliers that are designed using the proposed CPPG cells achieve better performance when compared with the binary multipliers presented in the literature. As an application that utilizes the proposed multiplier, a Multiply-Accumulate (MAC) unit is designed and implemented in Spartan-3E. The implementation results of the MAC unit demonstrate the effectiveness of the proposed multiplier

ZENODO

Directory of Open Access Journals

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

ARPHA OAI-PMH Endpoint

ARPHA Preprints

Implementation of Separable & Steerable Gaussian Smoothers on an FPGA

Author: Joginipelly Arjun
Publication venue: ScholarWorks@UNO
Publication date: 17/12/2010
Field of study

Smoothing filters have been extensively used for noise removal and image restoration. Directional filters are widely used in computer vision and image processing tasks such as motion analysis, edge detection, line parameter estimation and texture analysis. It is practically impossible to tune the filters to all possible positions and orientations in real time due to huge computation requirement. The efficient way is to design a few basis filters, and express the output of a directional filter as a weighted sum of the basis filter outputs. Directional filters having these properties are called Steerable Filters. This thesis work emphasis is on the implementation of proposed computationally efficient separable and steerable Gaussian smoothers on a Xilinx VirtexII Pro FPGA platform. FPGAs are Field Programmable Gate Arrays which consist of a collection of logic blocks including lookup tables, flip flops and some amount of Random Access Memory. All blocks are wired together using an array of interconnects. The proposed technique [2] is implemented on a FPGA hardware taking the advantage of parallelism and pipelining

University of New Orleans

DESIGN OF ON-LINE DECIMAL MULTIPLIER

Author
Publication venue
Publication date
Field of study