533 research outputs found

    High-speed radix-10 multiplication using partial shifter adder tree-based convertor

    Get PDF
    A radix-10 multiplication is the foremost frequent operations employed by several monetary business and user-oriented applications, decimal multiplier using in state of art digital systems are significantly good but can be upgraded with time delay and area optimization. This work is proposed a more area and time delay optimized new design of overloaded decimal digit set (ODDS) architecture-based radix-10 multiplier for signed numbers. Binary coded decimal (BCD) to binary followed by binary multiplication and finally binary to BCD conversion are 3 major modules employed in radix-10 multiplication. This paperwork presents a replacement technique for binary coded decimal (BCD) to binary and vice-versa convertors in radix-10 multiplication. A novel addition tree structure called as partial shifter adder (PSA) tree-based approach has been developed for BCD to binary conversion, and it is used to add partially generated products. To meet our major concern i.e. speed, we need particular high-speed multiplication, hence the proposed PSA based radix-10 multiplier is using vertical cross binary multiplication and concurrent shifter-based addition method. The design has been tested on 45nm technology-based Zynq-7 field programmable gate array (FPGA) devices with a 6-input lookup table (LUTs). A combinational implementation maps quite well into the slice structure of the Xilinx Zynq-7 families field programmable gate array. The synthesis results for a Zynq-7 device indicate that our design outperforms in terms of the area and time delay

    RADIX-10 PARALLEL DECIMAL MULTIPLIER

    Get PDF
    This paper introduces novel architecture for Radix-10 decimal multiplier. The new generation of highperformance decimal floating-point units (DFUs) is demanding efficient implementations of parallel decimal multiplier. The parallel generation of partial products is performed using signed-digit radix-10 recoding of the multiplier and a simplified set of multiplicand multiples. The reduction of partial products is implemented in a tree structure based on a new algorithm decimal multioperand carry-save addition that uses a unconventional decimal-coded number systems. We further detail these techniques and it significantly improves the area and latency of the previous design, which include: optimized digit recoders, decimal carry-save adders (CSA’s) combining different decimal-coded operands, and carry free adders implemented by special designed bit counters

    BCD Multiplier

    Get PDF
    BCD multipliers are the basis of accurate decimal multiplication used in banking systems, scientific calculations, etc. Fractions convert poorly into binary numbers giving rise to conversion error. Therefore, banking industry have been using Binary Coded Decimal numbering system for their banking business transaction to circumvent the error between decimal fraction number to binary. Here we will explore some single-digit Binary Coded Decimal Multiplication units that perform multiplication in hardware for the purpose of future implementation

    HIGH-SPEED CO-PROCESSORS BASED ON REDUNDANT NUMBER SYSTEMS

    Get PDF
    There is a growing demand for high-speed arithmetic co-processors for use in applications with computationally intensive tasks. For instance, Fast Fourier Transform (FFT) co-processors are used in real-time multimedia services and financial applications use decimal co-processors to perform large amounts of decimal computations. Using redundant number systems to eliminate word-wide carry propagation within interim operations is a well-known technique to increase the speed of arithmetic hardware units. Redundant number systems are mostly useful in applications where many consecutive arithmetic operations are performed prior to the final result, making it advantageous for arithmetic co-processors. This thesis discusses the implementation of two popular arithmetic co-processors based on redundant number systems: namely, the binary FFT co-processor and the decimal arithmetic co-processor. FFT co-processors consist of several consecutive multipliers and adders over complex numbers. FFT architectures are implemented based on fixed-point and floating-point arithmetic. The main advantage of floating-point over fixed-point arithmetic is the wide dynamic range it introduces. Moreover, it avoids numerical issues such as scaling and overflow/underflow concerns at the expense of higher cost. Furthermore, floating-point implementation allows for an FFT co-processor to collaborate with general purpose processors. This offloads computationally intensive tasks from the primary processor. The first part of this thesis, which is devoted to FFT co-processors, proposes a new FFT architecture that uses a new Binary-Signed Digit (BSD) carry-limited adder, a new floating-point BSD multiplier and a new floating-point BSD three-operand adder. Finally, a new unit labeled as Fused-Dot-Product-Add (FDPA) is designed to compute AB+CD+E over floating-point BSD operands. The second part of the thesis discusses decimal arithmetic operations implemented in hardware using redundant number systems. These operations are popularly used in decimal floating-point co-processors. A new signed-digit decimal adder is proposed along with a sequential decimal multiplier that uses redundant number systems to increase the operational frequency of the multiplier. New redundant decimal division and square-root units are also proposed. The architectures proposed in this thesis were all implemented using Hardware-Description-Language (Verilog) and synthesized using Synopsys Design Compiler. The evaluation results prove the speed improvement of the new arithmetic units over previous pertinent works. Consequently, the FFT and decimal co-processors designed in this thesis work with at least 10% higher speed than that of previous works. These architectures are meant to fulfill the demand for the high-speed co-processors required in various applications such as multimedia services and financial computations

    A New Family of High.Performance Parallel Decimal Multipliers

    Full text link

    EARLY ESTIMATION OF DELAY IN BINARY TO BCD CONVERTOR

    Get PDF
    A novel high speed architecture for fixed bit binary to BCD conversion which is better in terms of delay is presented in this paper. In recent years, decimal data processing applications have grown and thus there is a need to have hardware support for decimal arithmetic. Decimal digit multipliers are having Binary to BCD conversion as the basic building block. This decimal multiplication in turn is an integral part of commercial, internet and financial based applications

    Algorithms and architectures for decimal transcendental function computation

    Get PDF
    Nowadays, there are many commercial demands for decimal floating-point (DFP) arithmetic operations such as financial analysis, tax calculation, currency conversion, Internet based applications, and e-commerce. This trend gives rise to further development on DFP arithmetic units which can perform accurate computations with exact decimal operands. Due to the significance of DFP arithmetic, the IEEE 754-2008 standard for floating-point arithmetic includes it in its specifications. The basic decimal arithmetic unit, such as decimal adder, subtracter, multiplier, divider or square-root unit, as a main part of a decimal microprocessor, is attracting more and more researchers' attentions. Recently, the decimal-encoded formats and DFP arithmetic units have been implemented in IBM's system z900, POWER6, and z10 microprocessors. Increasing chip densities and transistor count provide more room for designers to add more essential functions on application domains into upcoming microprocessors. Decimal transcendental functions, such as DFP logarithm, antilogarithm, exponential, reciprocal and trigonometric, etc, as useful arithmetic operations in many areas of science and engineering, has been specified as the recommended arithmetic in the IEEE 754-2008 standard. Thus, virtually all the computing systems that are compliant with the IEEE 754-2008 standard could include a DFP mathematical library providing transcendental function computation. Based on the development of basic decimal arithmetic units, more complex DFP transcendental arithmetic will be the next building blocks in microprocessors. In this dissertation, we researched and developed several new decimal algorithms and architectures for the DFP transcendental function computation. These designs are composed of several different methods: 1) the decimal transcendental function computation based on the table-based first-order polynomial approximation method; 2) DFP logarithmic and antilogarithmic converters based on the decimal digit-recurrence algorithm with selection by rounding; 3) a decimal reciprocal unit using the efficient table look-up based on Newton-Raphson iterations; and 4) a first radix-100 division unit based on the non-restoring algorithm with pre-scaling method. Most decimal algorithms and architectures for the DFP transcendental function computation developed in this dissertation have been the first attempt to analyze and implement the DFP transcendental arithmetic in order to achieve faithful results of DFP operands, specified in IEEE 754-2008. To help researchers evaluate the hardware performance of DFP transcendental arithmetic units, the proposed architectures based on the different methods are modeled, verified and synthesized using FPGAs or with CMOS standard cells libraries in ASIC. Some of implementation results are compared with those of the binary radix-16 logarithmic and exponential converters; recent developed high performance decimal CORDIC based architecture; and Intel's DFP transcendental function computation software library. The comparison results show that the proposed architectures have significant speed-up in contrast to the above designs in terms of the latency. The algorithms and architectures developed in this dissertation provide a useful starting point for future hardware-oriented DFP transcendental function computation researches

    Fast decimal floating-point division

    Get PDF
    A new implementation for decimal floating-point (DFP) division is introduced. The algorithm is based on high-radix SRT division The SRT division algorithm is named after D. Sweeney, J. E. Robertson, and T. D. Tocher. with the recurrence in a new decimal signed-digit format. Quotient digits are selected using comparison multiples, where the magnitude of the quotient digit is calculated by comparing the truncated partial remainder with limited precision multiples of the divisor. The sign is determined concurrently by investigating the polarity of the truncated partial remainder. A timing evaluation using a logic synthesis shows a significant decrease in the division execution time in contrast with one of the fastest DFP dividers reported in the open literatureHooman Nikmehr, Braden Phillips and Cheng-Chew Li

    Multi-operand Decimal Adder Trees for FPGAs

    Get PDF
    The research and development of hardware designs for decimal arithmetic is currently going under an intense activity. For most part, the methods proposed to implement fixed and floating point operations are intended for ASIC designs. Thus, a direct mapping or adaptation of these techniques into a FPGA could be far from an optimal solution. Only a few studies have considered new methods more suitable for FPGA implementations. A basic operation that has not received enough attention in this context is multi-operand BCD addition. For example, it is of interest for low latency implementations of decimal fixed and floating point multipliers and decimal fused multiply-add units. We have explored the most representative proposals for multi-operand BCD addition and found that the resultant implementations in FPGAs are still very inefficient in terms of both area and latency when compared to their binary counterparts. In this paper we present a new method for fast and efficient implementation of multi-operand BCD addition in current FPGA devices. In particular, our proposal maps quite well into the slice structure of the Xilinx Virtex-5/Virtex-6 families and it is highly pipelineable. The synthesis results for a Virtex-6 device indicate that our implementations halve the area and latency of previous proposals, presenting area and delay figures close to those of optimal binary adder trees.La recherche sur l'implantation en matériel de l'arithmétique décimale est actuellement très active, la plupart des travaux portant sur des opérateurs pour les processeurs, en virgule fixe ou flottante. Mais les techniques développées pour un circuit intégré n'aboutissent pas forcément à une implémentation optimale dans un FPGA. Il n'y a que peu d'études ciblant explicitement les FPGA. Cet article s'intéresse dans ce contexte, à l'addition BCD multi-opérande, au cœur de multiplieurs et de multiplieurs-accumulateurs à faible latence. Nous étudions les architectures proposées pour cette opération décimale, et nous observons que, sur FPGA, leur performance (surface et latence) est très inférieure à celle des opérations binaire à précision comparable. Nous présentons donc dans cet article une nouvelle technique d'addition BCD multi-opérandes qui s'avère plus efficace que les propositions précédentes sur les FPGA actuels. Elle s'adapte particulièrement bien à la structure fine des FPGA Xilinx Virtex-5/Virtex-6, et se prête bien au pipeline. Les résultats de synthèse montrent que notre implémentation divise par deux la surface et la latence par rapport aux propositions précédentes, les ramenant à des valeurs comparables à celles des meilleurs additionneurs multi-opérandes binaires

    FPGA Implementation of Double Precision Floating Point Multiplier

    Get PDF
    High speed computation is the need of today’s generation of Processors.  To accomplish this major task,  many functions  are implemented  inside the hardware  of the processor rather than  having  software  computing  the  same  task. Majority of the operations which the processor executes are Arithmetic operations which are widely used in many applications that require heavy mathematical operations such as scientific calculations, image and signal processing. Especially in the field of signal processing, multiplication division operation is widely used in many applications. The major issue with these operations in hardware is that much iteration is required which results in slow operation while fast algorithms require complex computations within each cycle. The result of a Division operation results in a either  in Quotient  and  Remainder  or a Floating  point  number  which is the  major reason  to  make it  more complex than  Multiplication  operation
    • …
    corecore