Search CORE

16,534 research outputs found

On synthesis and optimization of floating point units

Author: Shah Syed Yawar Ali
Publication venue
Publication date: 01/01/2000
Field of study

This work describes the effect of architectural/system level design decisions on the performance, of floating point arithmetic units. By modeling with VHDL and using design synthesis techniques, different architectures of floating point adders, multipliers and multiply-accumulate fused units, are compared using different technologies and cell libraries. Some modifications to recent published works have been proposed to minimize the energy delay product with special emphasis on power reduction. A new low power, high performance, transition activity scaled, double data path floating point multiplier has been proposed and its validity is proved by comparing it to a single data path floating point multiplier. A transition activity scaled, triple data path floating point adder has been compared with a high speed, single data path floating point adder using an optimized Leading Zero Anticipatory logic. Three different architectures of floating point multiply-accumulate fused units are evaluated for their desirability for high speed, low power and minimum area. The findings of this work validate different higher level design methodologies of floating point arithmetic units irrespective of the rapidly changing underneath technology

Concordia University Research Repository

Design of a Single Precision Floating Point Divider and Multiplier with Pipelined Architecture

Author: Keni Mayuresh Vijay
Publication venue: RIT Scholar Works
Publication date: 01/12/2017
Field of study

High speed computation is the need of today’s generation of Processors. To accomplish this major task, many functions are implemented inside the hardware of the processor rather than having software computing the same task. Majority of the operations which the processor executes are Arithmetic operations which are widely used in many applications that require heavy mathematical operations such as scientific calculations, image and signal processing. Especially in the field of signal processing, multiplication division operation is widely used in many applications. The major issue with these operations in hardware is that many iteration’s are required which results in slow operation while fast algorithms require complex computations within each cycle. The result of a Division operation results in a either in Quotient and Remainder or a Floating point number which is the major reason to make it more complex than Multiplication operation. The work described in this paper includes design and verification of a floating point divider and multiplier. The inputs of both the Multiplier and Divider and also the output are designed using the single precision IEEE Standard for floating point numbers

RIT Scholar Works

HIGH-SPEED CO-PROCESSORS BASED ON REDUNDANT NUMBER SYSTEMS

Author: Kaivani Amir
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

There is a growing demand for high-speed arithmetic co-processors for use in applications with computationally intensive tasks. For instance, Fast Fourier Transform (FFT) co-processors are used in real-time multimedia services and financial applications use decimal co-processors to perform large amounts of decimal computations. Using redundant number systems to eliminate word-wide carry propagation within interim operations is a well-known technique to increase the speed of arithmetic hardware units. Redundant number systems are mostly useful in applications where many consecutive arithmetic operations are performed prior to the final result, making it advantageous for arithmetic co-processors. This thesis discusses the implementation of two popular arithmetic co-processors based on redundant number systems: namely, the binary FFT co-processor and the decimal arithmetic co-processor. FFT co-processors consist of several consecutive multipliers and adders over complex numbers. FFT architectures are implemented based on fixed-point and floating-point arithmetic. The main advantage of floating-point over fixed-point arithmetic is the wide dynamic range it introduces. Moreover, it avoids numerical issues such as scaling and overflow/underflow concerns at the expense of higher cost. Furthermore, floating-point implementation allows for an FFT co-processor to collaborate with general purpose processors. This offloads computationally intensive tasks from the primary processor. The first part of this thesis, which is devoted to FFT co-processors, proposes a new FFT architecture that uses a new Binary-Signed Digit (BSD) carry-limited adder, a new floating-point BSD multiplier and a new floating-point BSD three-operand adder. Finally, a new unit labeled as Fused-Dot-Product-Add (FDPA) is designed to compute AB+CD+E over floating-point BSD operands. The second part of the thesis discusses decimal arithmetic operations implemented in hardware using redundant number systems. These operations are popularly used in decimal floating-point co-processors. A new signed-digit decimal adder is proposed along with a sequential decimal multiplier that uses redundant number systems to increase the operational frequency of the multiplier. New redundant decimal division and square-root units are also proposed. The architectures proposed in this thesis were all implemented using Hardware-Description-Language (Verilog) and synthesized using Synopsys Design Compiler. The evaluation results prove the speed improvement of the new arithmetic units over previous pertinent works. Consequently, the FFT and decimal co-processors designed in this thesis work with at least 10% higher speed than that of previous works. These architectures are meant to fulfill the demand for the high-speed co-processors required in various applications such as multimedia services and financial computations

eCommons@USASK

University of Saskatchewan Research Archive

Self-timed design in GaAs - case study of a high-speed parallel multiplier

Author: Brunvand Erik L.
Chandramouli V.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1996
Field of study

Journal ArticleAbstract-The problems with synchronous designs at high clock frequencies have been well documented. This makes an asynchronous approach attractive for high speed technologies like GaAs. We investigate the issues involved by describing the design of a parallel multiplier that can be part of a floating point multiplier. We first present a new architecture called the partial army of array (PAA) that is more regular than a partial tree approach while having the same latency. We then show how this architecture can be used in a self-timed implementation in the style of micropipelines. We next describe how we can design the final carry propagate adder using a new precharged logic family in GaAs that we developed as part of this project. We conclude with some genera1 observations on doing asynchronous design in GaAs

The University of Utah: J. Willard Marriott Digital Library

Evaluation of High Speed Hardware Multipliers - Fixed Point and Floating point

Author: Abbas Syed Haider
Ahmed Awais
Haider Hussnain
Siddique Muhammad Faheem
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2013
Field of study

There is a huge demand in high speed arithmetic blocks, due to increased performance of processing units. For higher frequency clocks of the system, the arithmetic blocks must keep pace with greater requirement of more computational power. Area and speed are usually conflicting constraints so that improving speed results mostly in larger areas. In our research we will try to determine the best solution to this problem by comparing the results of different multipliers. Different sized of two algorithms for high speed hardware multipliers were studied and implemented ie. Parallel multiplier, Bit serial multiplier. The workings of these two multipliers were compared by implementing each of them separately in VHDL. A number of high speed adder designs are developed and algorithm and design of these adders are discussed. The result of this research will help us to choose the better option between serial and parallel multipliers for both fixed point and floating point multipliers to fabricate in different systems. As multipliers form one of the most important components of many systems, analysing different multipliers will help us to frame a better system with area and better speed.DOI:http://dx.doi.org/10.11591/ijece.v3i6.418

Institute of Advanced Engineering and Science

Combined Integer and Floating Point Multiplication Architecture(CIFM) for FPGAs and Its Reversible Logic Implementation

Author: Arabnia Hamid R.
Thapliyal Himanshu
Vinod A. P
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/10/2006
Field of study

In this paper, the authors propose the idea of a combined integer and floating point multiplier(CIFM) for FPGAs. The authors propose the replacement of existing 18x18 dedicated multipliers in FPGAs with dedicated 24x24 multipliers designed with small 4x4 bit multipliers. It is also proposed that for every dedicated 24x24 bit multiplier block designed with 4x4 bit multipliers, four redundant 4x4 multiplier should be provided to enforce the feature of self repairability (to recover from the faults). In the proposed CIFM reconfigurability at run time is also provided resulting in low power. The major source of motivation for providing the dedicated 24x24 bit multiplier stems from the fact that single precision floating point multiplier requires 24x24 bit integer multiplier for mantissa multiplication. A reconfigurable, self-repairable 24x24 bit multiplier (implemented with 4x4 bit multiply modules) will ideally suit this purpose, making FPGAs more suitable for integer as well floating point operations. A dedicated 4x4 bit multiplier is also proposed in this paper. Moreover, in the recent years, reversible logic has emerged as a promising technology having its applications in low power CMOS, quantum computing, nanotechnology, and optical computing. It is not possible to realize quantum computing without reversible logic. Thus, this paper also paper provides the reversible logic implementation of the proposed CIFM. The reversible CIFM designed and proposed here will form the basis of the completely reversible FPGAs.Comment: Published in the proceedings of the The 49th IEEE International Midwest Symposium on Circuits and Systems (MWSCAS 2006), Puerto Rico, August 2006. Nominated for the Student Paper Award(12 papers are nominated for Student paper Award among all submissions

arXiv.org e-Print Archive

Crossref

Floating-Point Matrix Product on FPGA

Author: Amira A.
Bensaali F.
Sotudeh R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.---- Copyright IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE

CiteSeerX

Crossref

University of Hertfordshire Research Archive

A general framework for efficient FPGA implementation of matrix product

Author: Amira A.
Bensaali F.
Sotudeh R.
Publication venue
Publication date: 01/01/2007
Field of study

Original article can be found at: http://www.medjcn.com/ Copyright Softmotor LimitedHigh performance systems are required by the developers for fast processing of computationally intensive applications. Reconfigurable hardware devices in the form of Filed-Programmable Gate Arrays (FPGAs) have been proposed as viable system building blocks in the construction of high performance systems at an economical price. Given the importance and the use of matrix algorithms in scientific computing applications, they seem ideal candidates to harness and exploit the advantages offered by FPGAs. In this paper, a system for matrix algorithm cores generation is described. The system provides a catalog of efficient user-customizable cores, designed for FPGA implementation, ranging in three different matrix algorithm categories: (i) matrix operations, (ii) matrix transforms and (iii) matrix decomposition. The generated core can be either a general purpose or a specific application core. The methodology used in the design and implementation of two specific image processing application cores is presented. The first core is a fully pipelined matrix multiplier for colour space conversion based on distributed arithmetic principles while the second one is a parallel floating-point matrix multiplier designed for 3D affine transformations.Peer reviewe

University of Hertfordshire Research Archive

Measuring Improvement when Using HUB Formats to Implement Floating-Point Systems under Round-to-Nearest

Author: Hormigo-Aguilar Javier
Villalba-Moreno Julio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

MEC bajo TIN2013-42253-PThis paper analyzes the benefits of using HUB formats to implement floating-point arithmetic under round-tonearest mode from a quantitative point of view. Using HUB formats to represent numbers allows the removal of the rounding logic of arithmetic units, including sticky-bit computation. This is shown for floating-point adders, multipliers, and converters. Experimental analysis demonstrates that HUB formats and the corresponding arithmetic units maintain the same accuracy as conventional ones. On the other hand, the implementation of these units, based on basic architectures, shows that HUB formats simultaneously improve area, speed, and power consumption. Specifically, based on data obtained from the synthesis, a HUB single-precision adder is about 14% faster but consumes 38% less area and 26% less power than the conventional adder. Similarly, a HUB single-precision multiplier is 17% faster, uses 22% less area, and consumes slightly less power than conventional multiplier. At the same speed, the adder and multiplier achieve area and power reductions of up to 50% and 40%, respectively

Repositorio Institucional Universidad de Málaga