9 research outputs found

    Multi-operand Decimal Adder Trees for FPGAs

    Get PDF
    The research and development of hardware designs for decimal arithmetic is currently going under an intense activity. For most part, the methods proposed to implement fixed and floating point operations are intended for ASIC designs. Thus, a direct mapping or adaptation of these techniques into a FPGA could be far from an optimal solution. Only a few studies have considered new methods more suitable for FPGA implementations. A basic operation that has not received enough attention in this context is multi-operand BCD addition. For example, it is of interest for low latency implementations of decimal fixed and floating point multipliers and decimal fused multiply-add units. We have explored the most representative proposals for multi-operand BCD addition and found that the resultant implementations in FPGAs are still very inefficient in terms of both area and latency when compared to their binary counterparts. In this paper we present a new method for fast and efficient implementation of multi-operand BCD addition in current FPGA devices. In particular, our proposal maps quite well into the slice structure of the Xilinx Virtex-5/Virtex-6 families and it is highly pipelineable. The synthesis results for a Virtex-6 device indicate that our implementations halve the area and latency of previous proposals, presenting area and delay figures close to those of optimal binary adder trees.La recherche sur l'implantation en matériel de l'arithmétique décimale est actuellement très active, la plupart des travaux portant sur des opérateurs pour les processeurs, en virgule fixe ou flottante. Mais les techniques développées pour un circuit intégré n'aboutissent pas forcément à une implémentation optimale dans un FPGA. Il n'y a que peu d'études ciblant explicitement les FPGA. Cet article s'intéresse dans ce contexte, à l'addition BCD multi-opérande, au cœur de multiplieurs et de multiplieurs-accumulateurs à faible latence. Nous étudions les architectures proposées pour cette opération décimale, et nous observons que, sur FPGA, leur performance (surface et latence) est très inférieure à celle des opérations binaire à précision comparable. Nous présentons donc dans cet article une nouvelle technique d'addition BCD multi-opérandes qui s'avère plus efficace que les propositions précédentes sur les FPGA actuels. Elle s'adapte particulièrement bien à la structure fine des FPGA Xilinx Virtex-5/Virtex-6, et se prête bien au pipeline. Les résultats de synthèse montrent que notre implémentation divise par deux la surface et la latence par rapport aux propositions précédentes, les ramenant à des valeurs comparables à celles des meilleurs additionneurs multi-opérandes binaires

    A New Family of High.Performance Parallel Decimal Multipliers

    Full text link

    DESIGN OF ON-LINE DECIMAL MULTIPLIER

    Get PDF

    DESIGN OF ON-LINE DECIMAL MULTIPLIER

    Get PDF

    HIGH-SPEED CO-PROCESSORS BASED ON REDUNDANT NUMBER SYSTEMS

    Get PDF
    There is a growing demand for high-speed arithmetic co-processors for use in applications with computationally intensive tasks. For instance, Fast Fourier Transform (FFT) co-processors are used in real-time multimedia services and financial applications use decimal co-processors to perform large amounts of decimal computations. Using redundant number systems to eliminate word-wide carry propagation within interim operations is a well-known technique to increase the speed of arithmetic hardware units. Redundant number systems are mostly useful in applications where many consecutive arithmetic operations are performed prior to the final result, making it advantageous for arithmetic co-processors. This thesis discusses the implementation of two popular arithmetic co-processors based on redundant number systems: namely, the binary FFT co-processor and the decimal arithmetic co-processor. FFT co-processors consist of several consecutive multipliers and adders over complex numbers. FFT architectures are implemented based on fixed-point and floating-point arithmetic. The main advantage of floating-point over fixed-point arithmetic is the wide dynamic range it introduces. Moreover, it avoids numerical issues such as scaling and overflow/underflow concerns at the expense of higher cost. Furthermore, floating-point implementation allows for an FFT co-processor to collaborate with general purpose processors. This offloads computationally intensive tasks from the primary processor. The first part of this thesis, which is devoted to FFT co-processors, proposes a new FFT architecture that uses a new Binary-Signed Digit (BSD) carry-limited adder, a new floating-point BSD multiplier and a new floating-point BSD three-operand adder. Finally, a new unit labeled as Fused-Dot-Product-Add (FDPA) is designed to compute AB+CD+E over floating-point BSD operands. The second part of the thesis discusses decimal arithmetic operations implemented in hardware using redundant number systems. These operations are popularly used in decimal floating-point co-processors. A new signed-digit decimal adder is proposed along with a sequential decimal multiplier that uses redundant number systems to increase the operational frequency of the multiplier. New redundant decimal division and square-root units are also proposed. The architectures proposed in this thesis were all implemented using Hardware-Description-Language (Verilog) and synthesized using Synopsys Design Compiler. The evaluation results prove the speed improvement of the new arithmetic units over previous pertinent works. Consequently, the FFT and decimal co-processors designed in this thesis work with at least 10% higher speed than that of previous works. These architectures are meant to fulfill the demand for the high-speed co-processors required in various applications such as multimedia services and financial computations

    Analysis and implementation of decimal arithmetic hardware in nanometer CMOS technology

    Get PDF
    Scope and Method of Study: In today's society, decimal arithmetic is growing considerably in importance given its relevance in financial and commercial applications. Decimal calculations on binary hardware significantly impact performance mainly because most systems utilize software to emulate decimal calculations. The introduction of dedicated decimal hardware on the other hand promises the ability to improve performance by two or three orders of magnitude. The founding blocks of binary arithmetic are studied and applied to the development of decimal arithmetic hardware. New findings are contrasted with existent implementations and validated through extensive simulation.Findings and Conclusions: New architectures and a significant study of decimal arithmetic was developed and implemented. The architectures proposed include an IEEE-754 current revision draft compliant floating-point comparator, a study on decimal division, partial product reduction schemes using decimal compressor trees and a final implementation of a decimal multiplier using advanced techniques for partial product generation. The results of each hardware implementation in nanometer technologies are weighed against existent propositions and show improvements upon area, delay, and power

    Decimal Floating-point Fused Multiply Add with Redundant Number Systems

    Get PDF
    The IEEE standard of decimal floating-point arithmetic was officially released in 2008. The new decimal floating-point (DFP) format and arithmetic can be applied to remedy the conversion error caused by representing decimal floating-point numbers in binary floating-point format and to improve the computing performance of the decimal processing in commercial and financial applications. Nowadays, many architectures and algorithms of individual arithmetic functions for decimal floating-point numbers are proposed and investigated (e.g., addition, multiplication, division, and square root). However, because of the less efficiency of representing decimal number in binary devices, the area consumption and performance of the DFP arithmetic units are not comparable with the binary counterparts. IBM proposed a binary fused multiply-add (FMA) function in the POWER series of processors in order to improve the performance of floating-point computations and to reduce the complexity of hardware design in reduced instruction set computing (RISC) systems. Such an instruction also has been approved to be suitable for efficiently implementing not only stand-alone addition and multiplication, but also division, square root, and other transcendental functions. Additionally, unconventional number systems including digit sets and encodings have displayed advantages on performance and area efficiency in many applications of computer arithmetic. In this research, by analyzing the typical binary floating-point FMA designs and the design strategy of unconventional number systems, ``a high performance decimal floating-point fused multiply-add (DFMA) with redundant internal encodings" was proposed. First, the fixed-point components inside the DFMA (i.e., addition and multiplication) were studied and investigated as the basis of the FMA architecture. The specific number systems were also applied to improve the basic decimal fixed-point arithmetic. The superiority of redundant number systems in stand-alone decimal fixed-point addition and multiplication has been proved by the synthesis results. Afterwards, a new DFMA architecture which exploits the specific redundant internal operands was proposed. Overall, the specific number system improved, not only the efficiency of the fixed-point addition and multiplication inside the FMA, but also the architecture and algorithms to build up the FMA itself. The functional division, square root, reciprocal, reciprocal square root, and many other functions, which exploit the Newton's or other similar methods, can benefit from the proposed DFMA architecture. With few necessary on-chip memory devices (e.g., Look-up tables) or even only software routines, these functions can be implemented on the basis of the hardwired FMA function. Therefore, the proposed DFMA can be implemented on chip solely as a key component to reduce the hardware cost. Additionally, our research on the decimal arithmetic with unconventional number systems expands the way of performing other high-performance decimal arithmetic (e.g., stand-alone division and square root) upon the basic binary devices (i.e., AND gate, OR gate, and binary full adder). The proposed techniques are also expected to be helpful to other non-binary based applications

    Variable radix online decimal arithmetic

    Get PDF
    El residuo generado cada ciclo será utilizado en los futuros ciclos para compensar el error producido debido a la falta de datos característica de la aritmética online. Se presentan dos arquitecturas para comparar dos formas distintas de implementación del algoritmo, una de ellas utilizando especulación. A su vez se ha diseñado un multiplicador decimal online RTL de 16x16 dígitos, y se ha insertado en un sistema online. Por motivos de comparación, también hemos implementado el mismo sistema pero sustituyendo nuestro multiplicador online por un multiplicador decimal paralelo rápido. Un divisor decimal online, que sigue un algoritmo basado en el uso de un residuo que se utiliza para compensar el error producido por la ausencia de todos los datos de entrada en cada ciclo (característico de la aritmética online). Para ello, se implementa un módulo de corrección que realiza la multiplicación vector por dígito de los dígitos obtenidos en los ciclos anteriores con el residuo acumulado. Dicho algoritmo se basa en la estrategia de separar el valor del cociente q en dos variables qH y qL. Se implementa a su vez una función de selección para obtener cada una de las variables del cociente que se basa en el uso de constantes de selección. Las contribuciones anteriormente listadas han sido publicadas en conferencias internacionales [10, 12] y en revistas [11] clasificadas por el ISI Journal Citation Reports (JCR). El resto de la tesis está estructurada de la siguiente manera: El capítulo 2 presenta los fundamentos del sumador decimal online (olDFA) utilizando la codificación RBCD y basándose en la descomposición de las entradas. Se presenta una versión del olDFA segmentada de 3 etapas (olDFAp) para reducir el tiempo de cálculo y se realiza una optimización en el procesamiento del stream de datos. En el capítulo 3 se define un método para construir árboles de suma decimal online multioperando y se presentan expresiones analíticas de las arquitecturas que resultan útiles para realizar estudios previos de los sistemas a diseñar. También se propone dos diseños para realizar sumas decimales online multiformato, una de ellas con una etapa de conversión, y la segunda modificando la arquitectura interna del olDFA. Ambos diseños se presentan también con un estudio de sus versiones segmentadas. El capítulo 4 presenta un algoritmo para la multiplicación decimal online basado en recurrencia del residuo y se expone el diseño de dos arquitecturas. La primera de ellas consiste en una arquitectura de multiplicación decimal online sin especulación y la segunda en una multiplicación decimal online con especulación con el objetivo de reducir el alto coste de computación en cada ciclo. Ambas arquitecturas son comparadas siguiendo unos parámetros de simulación. El capítulo 5 presenta el algoritmo para realizar una división decimal online utilizando la códificación RBCD. Dicho algoritmo se basa en la estrategia de separar el valor del cociente q en dos variables qH y qL. Debido a que no se dispone de todos los datos, el algoritmo va generando un error que es compensado mediante el módulo de corrección. Por útlimo, se exponen los resultados en retardo y área obtenidos mediante la simulación de la implementación del algoritmo diseñado. dos obtenidos sean acordes con el estudio teórico.En esta tesis se estudia la unión de la aritmética decimal y la artimética online para obtener un sistema online para operar con dígitos decimales usando la codificación RBCD que cumple con los requisitos de ambas aritméticas. Las principales contribuciones de esta tesis son las siguientes: Un sumador decimal online (olDFA) que realiza la suma de dos números RBCD utilizando un método de descomposición de mínima latencia. A su vez se presenta una versión del olDFA segmentada de 3 etapas (olDFAp) para reducir el tiempo de cálculo. Mediante un estudio del procesado del stream de datos, se propone una solución para obtener el máximo throughput teórico posible en un sumador online. Finalmente, se realiza una comparativa estadística de los resultados de simulación de los dos diseños propuestos con sumadores paralelos que utilizan la codificación RBCD. Un sumador decimal online multioperando definiendo un método para construir árboles con olDFAs y olDFAps como elementos base. También presentamos expresiones analíticas de las arquitecturas que resultan útiles en los estudios previos de los sistemas a diseñar. Los diseños presentados son comparados siguiendo criterios específicos de simulación y estudiando los resultados obtenidos en retardo y área. dos estrategias para diseñar sumadores decimales online multioperando y multiformato. La primera estrategia se basa en utilizar un olDFA con una etapa de conversión, y la segunda se basa en el diseño específico del sumador multiformato modificando, para ello, la arquitectura interna del olDFA. Ambas estrategias son comparadas en área y retardo siguiendo los mismos criterios de simulación. Un multiplicador decimal online que sigue el algoritmo de multiplicación decimal online usando la codificación RBCD, que se basa en el uso de un residuo acumulativo recurrente, obteniendo el dígito del producto empezando por el bit más significativo (MSD)
    corecore