Search CORE

15 research outputs found

Multi-operand Decimal Adder Trees for FPGAs

Author: de Dinechin Florent
Vazquez Alvaro
Publication venue: HAL CCSD
Publication date: 14/10/2010
Field of study

The research and development of hardware designs for decimal arithmetic is currently going under an intense activity. For most part, the methods proposed to implement fixed and floating point operations are intended for ASIC designs. Thus, a direct mapping or adaptation of these techniques into a FPGA could be far from an optimal solution. Only a few studies have considered new methods more suitable for FPGA implementations. A basic operation that has not received enough attention in this context is multi-operand BCD addition. For example, it is of interest for low latency implementations of decimal fixed and floating point multipliers and decimal fused multiply-add units. We have explored the most representative proposals for multi-operand BCD addition and found that the resultant implementations in FPGAs are still very inefficient in terms of both area and latency when compared to their binary counterparts. In this paper we present a new method for fast and efficient implementation of multi-operand BCD addition in current FPGA devices. In particular, our proposal maps quite well into the slice structure of the Xilinx Virtex-5/Virtex-6 families and it is highly pipelineable. The synthesis results for a Virtex-6 device indicate that our implementations halve the area and latency of previous proposals, presenting area and delay figures close to those of optimal binary adder trees.La recherche sur l'implantation en matériel de l'arithmétique décimale est actuellement très active, la plupart des travaux portant sur des opérateurs pour les processeurs, en virgule fixe ou flottante. Mais les techniques développées pour un circuit intégré n'aboutissent pas forcément à une implémentation optimale dans un FPGA. Il n'y a que peu d'études ciblant explicitement les FPGA. Cet article s'intéresse dans ce contexte, à l'addition BCD multi-opérande, au cœur de multiplieurs et de multiplieurs-accumulateurs à faible latence. Nous étudions les architectures proposées pour cette opération décimale, et nous observons que, sur FPGA, leur performance (surface et latence) est très inférieure à celle des opérations binaire à précision comparable. Nous présentons donc dans cet article une nouvelle technique d'addition BCD multi-opérandes qui s'avère plus efficace que les propositions précédentes sur les FPGA actuels. Elle s'adapte particulièrement bien à la structure fine des FPGA Xilinx Virtex-5/Virtex-6, et se prête bien au pipeline. Les résultats de synthèse montrent que notre implémentation divise par deux la surface et la latence par rapport aux propositions précédentes, les ramenant à des valeurs comparables à celles des meilleurs additionneurs multi-opérandes binaires

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

HIGH-SPEED CO-PROCESSORS BASED ON REDUNDANT NUMBER SYSTEMS

Author: Kaivani Amir
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

There is a growing demand for high-speed arithmetic co-processors for use in applications with computationally intensive tasks. For instance, Fast Fourier Transform (FFT) co-processors are used in real-time multimedia services and financial applications use decimal co-processors to perform large amounts of decimal computations. Using redundant number systems to eliminate word-wide carry propagation within interim operations is a well-known technique to increase the speed of arithmetic hardware units. Redundant number systems are mostly useful in applications where many consecutive arithmetic operations are performed prior to the final result, making it advantageous for arithmetic co-processors. This thesis discusses the implementation of two popular arithmetic co-processors based on redundant number systems: namely, the binary FFT co-processor and the decimal arithmetic co-processor. FFT co-processors consist of several consecutive multipliers and adders over complex numbers. FFT architectures are implemented based on fixed-point and floating-point arithmetic. The main advantage of floating-point over fixed-point arithmetic is the wide dynamic range it introduces. Moreover, it avoids numerical issues such as scaling and overflow/underflow concerns at the expense of higher cost. Furthermore, floating-point implementation allows for an FFT co-processor to collaborate with general purpose processors. This offloads computationally intensive tasks from the primary processor. The first part of this thesis, which is devoted to FFT co-processors, proposes a new FFT architecture that uses a new Binary-Signed Digit (BSD) carry-limited adder, a new floating-point BSD multiplier and a new floating-point BSD three-operand adder. Finally, a new unit labeled as Fused-Dot-Product-Add (FDPA) is designed to compute AB+CD+E over floating-point BSD operands. The second part of the thesis discusses decimal arithmetic operations implemented in hardware using redundant number systems. These operations are popularly used in decimal floating-point co-processors. A new signed-digit decimal adder is proposed along with a sequential decimal multiplier that uses redundant number systems to increase the operational frequency of the multiplier. New redundant decimal division and square-root units are also proposed. The architectures proposed in this thesis were all implemented using Hardware-Description-Language (Verilog) and synthesized using Synopsys Design Compiler. The evaluation results prove the speed improvement of the new arithmetic units over previous pertinent works. Consequently, the FFT and decimal co-processors designed in this thesis work with at least 10% higher speed than that of previous works. These architectures are meant to fulfill the demand for the high-speed co-processors required in various applications such as multimedia services and financial computations

eCommons@USASK

University of Saskatchewan Research Archive

Analysis and implementation of decimal arithmetic hardware in nanometer CMOS technology

Author: Castellanos Ivan Dario
Publication venue
Publication date: 01/07/2008
Field of study

Scope and Method of Study: In today's society, decimal arithmetic is growing considerably in importance given its relevance in financial and commercial applications. Decimal calculations on binary hardware significantly impact performance mainly because most systems utilize software to emulate decimal calculations. The introduction of dedicated decimal hardware on the other hand promises the ability to improve performance by two or three orders of magnitude. The founding blocks of binary arithmetic are studied and applied to the development of decimal arithmetic hardware. New findings are contrasted with existent implementations and validated through extensive simulation.Findings and Conclusions: New architectures and a significant study of decimal arithmetic was developed and implemented. The architectures proposed include an IEEE-754 current revision draft compliant floating-point comparator, a study on decimal division, partial product reduction schemes using decimal compressor trees and a final implementation of a decimal multiplier using advanced techniques for partial product generation. The results of each hardware implementation in nanometer technologies are weighed against existent propositions and show improvements upon area, delay, and power

SHAREOK repository

Decimal Floating-point Fused Multiply Add with Redundant Number Systems

Author: Han Liu
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

The IEEE standard of decimal floating-point arithmetic was officially released in 2008. The new decimal floating-point (DFP) format and arithmetic can be applied to remedy the conversion error caused by representing decimal floating-point numbers in binary floating-point format and to improve the computing performance of the decimal processing in commercial and financial applications. Nowadays, many architectures and algorithms of individual arithmetic functions for decimal floating-point numbers are proposed and investigated (e.g., addition, multiplication, division, and square root). However, because of the less efficiency of representing decimal number in binary devices, the area consumption and performance of the DFP arithmetic units are not comparable with the binary counterparts. IBM proposed a binary fused multiply-add (FMA) function in the POWER series of processors in order to improve the performance of floating-point computations and to reduce the complexity of hardware design in reduced instruction set computing (RISC) systems. Such an instruction also has been approved to be suitable for efficiently implementing not only stand-alone addition and multiplication, but also division, square root, and other transcendental functions. Additionally, unconventional number systems including digit sets and encodings have displayed advantages on performance and area efficiency in many applications of computer arithmetic. In this research, by analyzing the typical binary floating-point FMA designs and the design strategy of unconventional number systems, ``a high performance decimal floating-point fused multiply-add (DFMA) with redundant internal encodings" was proposed. First, the fixed-point components inside the DFMA (i.e., addition and multiplication) were studied and investigated as the basis of the FMA architecture. The specific number systems were also applied to improve the basic decimal fixed-point arithmetic. The superiority of redundant number systems in stand-alone decimal fixed-point addition and multiplication has been proved by the synthesis results. Afterwards, a new DFMA architecture which exploits the specific redundant internal operands was proposed. Overall, the specific number system improved, not only the efficiency of the fixed-point addition and multiplication inside the FMA, but also the architecture and algorithms to build up the FMA itself. The functional division, square root, reciprocal, reciprocal square root, and many other functions, which exploit the Newton's or other similar methods, can benefit from the proposed DFMA architecture. With few necessary on-chip memory devices (e.g., Look-up tables) or even only software routines, these functions can be implemented on the basis of the hardwired FMA function. Therefore, the proposed DFMA can be implemented on chip solely as a key component to reduce the hardware cost. Additionally, our research on the decimal arithmetic with unconventional number systems expands the way of performing other high-performance decimal arithmetic (e.g., stand-alone division and square root) upon the basic binary devices (i.e., AND gate, OR gate, and binary full adder). The proposed techniques are also expected to be helpful to other non-binary based applications

eCommons@USASK

University of Saskatchewan Research Archive

Design and synthesis of reversible logic

Author: Chua Shin Cheng
Publication venue: Curtin University
Publication date: 01/01/2016
Field of study

Energy lost during computation is an important issue for digital design. Today, all electronics devices suffer from energy lost due to the conventional logic system used. The amount of energy loss in the form of heat leads to immense challenges in nowadays circuit design. To overcome that, reversible logic has been invented. Since properties of reversible logic differ greatly than conventional logic, synthesis methods used for conventional logic cannot be used in reversible logic. In this dissertation, we proposed new synthesis algorithms and several circuit designs using reversible logic

espace@Curtin

QUANTUM COMPUTING AND HPC TECHNIQUES FOR SOLVING MICRORHEOLOGY AND DIMENSIONALITY REDUCTION PROBLEMS

Author: Orts Gómez Francisco José
Publication venue
Publication date: 23/09/2021
Field of study

Tesis doctoral en período de exposición públicaDoctorado en Informática (RD99/11)(8908

Repositorio Institucional de la Universidad de Almería (Spain)

Modular decomposition techniques for stored-logic digital filters

Author: Mohamed A. Bin Nun (7204172)
Publication venue
Publication date: 01/01/1977
Field of study

Digital filtering is an important signal processing technique whose theory is now well established. At present, however, there are no well-defined and systematic methods available for realising digital filters in hardware. This project aims to develop such methods which are general and technology independent, and adopts a systems and sub-systems design philosophy. The realisation problem is approached in a new way using concepts from finite-automata theory and implementing complete digital filter sections as stored-logic units. Two methods are introduced and developed. [Continues.

Loughborough University Institutional Repository

An instrumentation system for the measurement and display of the dynamic force distribution under the foot during locomotion

Author: Solomon Edward Gerald
Publication venue: Department of Electrical Engineering
Publication date: 01/01/1978
Field of study

Bibliography: pages 81-83.The clinical assessment of the weight bearing foot during locomotion is normally based on subjective judgement rather than on quantitative measurement. The techniques which have been proposed for recording the dynamic forces acting on the foot are either too complex for clinical practise or there is difficulty in relating the measured force distribution to the physical surface of the foot. The system that has been developed measures the vertical foot/ground forces during gait and immediately displays the data in a manner which can be readily assimilated. The instrumentation system consists of a segmented force plate constructed from 16 transparent beams mounted so that the total load as well as the centre of pressure on any beam can be ascertained. When the foot contacts the plate, its plantar surface is photographed through the transparent force plate by a television camera while a second television camera photographs a lateral aspect of the legs and feet. A composite video display is then generated consisting of (i) a lateral view of the legs and feet (ii) a view of the plantar surface of the planted foot with centre of pressure lines superimposed (iii) a bar chart display of the load carried by each beam. The system output is recorded on a video tape recorder which has a stop motion facility. This enables a frame by frame analysis to be made subsequently and selected stills to be photographed as a permanent record. Three series of photographs are presented which clearly show the differences between normal and abnormal gait

Cape Town University OpenUCT