Search CORE

27,483 research outputs found

Resource Efficient Single Precision Floating Point Multiplier Using Karatsuba Algorithm

Author: K Gowreesrinivas V
P Samundiswary
Publication venue: IAES Indonesia Section
Publication date: 06/09/2018
Field of study

In floating point arithmetic operations, multiplication is the most required operation for many signal processing and scientific applications. 24-bit length mantissa multiplication is involved to obtain the floating point multiplication final result for two given single precision floating point numbers. This mantissa multiplication plays the major role in the performance evaluation in respect of occupied area and propagation delay. This paper presents the design and analysis of single precision floating point multiplication using karatsuba algorithm with vedic multiplier with the considering of modified 2x1 multiplexers and modified 4:2 compressors in order to overcome the drawbacks in the existing techniques. Further, the performance analysis of single precision floating point multiplier is analyzed in terms of area and delay using Karatsuba Algorithm with different existing techniques such as 4x1 multiplexers and 3:2 compressors and modified techniques such as 2x1 multiplexers, 4:2 compressors. From the simulation results, it is observed that single precision floating point multiplication with karatsuba algorithm using modified 4:2 compressor with XOR-MUX logic provides better performance with efficient usage of resources such as area and delay than that of existing techniques. All the blocks involved for floating point multiplication are coded with Verilog and synthesized using Xilinx ISE Simulator

Indonesian Journal of Electrical Engineering and Informatics (IJEEI)

Recommended from our members

Notes on the SWTPC MP-N Calculator Interface and the Calc-1 Program

Author: Long Daniel Paul
Publication venue: North Texas State University
Publication date: 01/05/1979
Field of study

This interface was bought to perform floating-point arithmetic and for its function capabilities such as SIN, COS, and e^x. My application required an integer truncation function that is not performed by this calculator, so i wrote a small assembly language subroutine to do it. A potentially irritating problem is that the calculator chip does not automatically convert to scientific notation if the numbers become too big to display in floating point. The control program must keep track of the display mode

UNT Digital Library

Design of a Single Precision Floating Point Divider and Multiplier with Pipelined Architecture

Author: Keni Mayuresh Vijay
Publication venue: RIT Scholar Works
Publication date: 01/12/2017
Field of study

High speed computation is the need of today’s generation of Processors. To accomplish this major task, many functions are implemented inside the hardware of the processor rather than having software computing the same task. Majority of the operations which the processor executes are Arithmetic operations which are widely used in many applications that require heavy mathematical operations such as scientific calculations, image and signal processing. Especially in the field of signal processing, multiplication division operation is widely used in many applications. The major issue with these operations in hardware is that many iteration’s are required which results in slow operation while fast algorithms require complex computations within each cycle. The result of a Division operation results in a either in Quotient and Remainder or a Floating point number which is the major reason to make it more complex than Multiplication operation. The work described in this paper includes design and verification of a floating point divider and multiplier. The inputs of both the Multiplier and Divider and also the output are designed using the single precision IEEE Standard for floating point numbers

RIT Scholar Works

A general framework for efficient FPGA implementation of matrix product

Author: Amira A.
Bensaali F.
Sotudeh R.
Publication venue
Publication date: 01/01/2007
Field of study

Original article can be found at: http://www.medjcn.com/ Copyright Softmotor LimitedHigh performance systems are required by the developers for fast processing of computationally intensive applications. Reconfigurable hardware devices in the form of Filed-Programmable Gate Arrays (FPGAs) have been proposed as viable system building blocks in the construction of high performance systems at an economical price. Given the importance and the use of matrix algorithms in scientific computing applications, they seem ideal candidates to harness and exploit the advantages offered by FPGAs. In this paper, a system for matrix algorithm cores generation is described. The system provides a catalog of efficient user-customizable cores, designed for FPGA implementation, ranging in three different matrix algorithm categories: (i) matrix operations, (ii) matrix transforms and (iii) matrix decomposition. The generated core can be either a general purpose or a specific application core. The methodology used in the design and implementation of two specific image processing application cores is presented. The first core is a fully pipelined matrix multiplier for colour space conversion based on distributed arithmetic principles while the second one is a parallel floating-point matrix multiplier designed for 3D affine transformations.Peer reviewe

University of Hertfordshire Research Archive

Optimistic Parallelization of Floating-Point Accumulation

Author: DeHon André
Kapre Nachiket
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Floating-point arithmetic is notoriously non-associative due to the limited precision representation which demands intermediate values be rounded to fit in the available precision. The resulting cyclic dependency in floating-point accumulation inhibits parallelization of the computation, including efficient use of pipelining. In practice, however, we observe that floating-point operations are "mostly" associative. This observation can be exploited to parallelize floating-point accumulation using a form of optimistic concurrency. In this scheme, we first compute an optimistic associative approximation to the sum and then relax the computation by iteratively propagating errors until the correct sum is obtained. We map this computation to a network of 16 statically-scheduled, pipelined, double-precision floating-point adders on the Virtex-4 LX160 (-12) device where each floating-point adder runs at 296 MHz and has a pipeline depth of 10. On this 16 PE design, we demonstrate an average speedup of 6× with randomly generated data and 3-7× with summations extracted from Conjugate Gradient benchmarks

CiteSeerX

Crossref

Caltech Authors

ScholarlyCommons@Penn