Search CORE

2,933 research outputs found

High-speed radix-10 multiplication using partial shifter adder tree-based convertor

Author: Malviya Utsav Kumar
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/04/2021
Field of study

A radix-10 multiplication is the foremost frequent operations employed by several monetary business and user-oriented applications, decimal multiplier using in state of art digital systems are significantly good but can be upgraded with time delay and area optimization. This work is proposed a more area and time delay optimized new design of overloaded decimal digit set (ODDS) architecture-based radix-10 multiplier for signed numbers. Binary coded decimal (BCD) to binary followed by binary multiplication and finally binary to BCD conversion are 3 major modules employed in radix-10 multiplication. This paperwork presents a replacement technique for binary coded decimal (BCD) to binary and vice-versa convertors in radix-10 multiplication. A novel addition tree structure called as partial shifter adder (PSA) tree-based approach has been developed for BCD to binary conversion, and it is used to add partially generated products. To meet our major concern i.e. speed, we need particular high-speed multiplication, hence the proposed PSA based radix-10 multiplier is using vertical cross binary multiplication and concurrent shifter-based addition method. The design has been tested on 45nm technology-based Zynq-7 field programmable gate array (FPGA) devices with a 6-input lookup table (LUTs). A combinational implementation maps quite well into the slice structure of the Xilinx Zynq-7 families field programmable gate array. The synthesis results for a Zynq-7 device indicate that our design outperforms in terms of the area and time delay

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System

RADIX-10 PARALLEL DECIMAL MULTIPLIER

Author: INGLE MRUNALINI E.
PANSE TEJASWINI
Publication venue: Institute for Project Management Pvt. Ltd
Publication date: 30/07/2020
Field of study

This paper introduces novel architecture for Radix-10 decimal multiplier. The new generation of highperformance decimal floating-point units (DFUs) is demanding efficient implementations of parallel decimal multiplier. The parallel generation of partial products is performed using signed-digit radix-10 recoding of the multiplier and a simplified set of multiplicand multiples. The reduction of partial products is implemented in a tree structure based on a new algorithm decimal multioperand carry-save addition that uses a unconventional decimal-coded number systems. We further detail these techniques and it significantly improves the area and latency of the previous design, which include: optimized digit recoders, decimal carry-save adders (CSA’s) combining different decimal-coded operands, and carry free adders implemented by special designed bit counters

Interscience Research Network

HIGH-SPEED CO-PROCESSORS BASED ON REDUNDANT NUMBER SYSTEMS

Author: Kaivani Amir
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

There is a growing demand for high-speed arithmetic co-processors for use in applications with computationally intensive tasks. For instance, Fast Fourier Transform (FFT) co-processors are used in real-time multimedia services and financial applications use decimal co-processors to perform large amounts of decimal computations. Using redundant number systems to eliminate word-wide carry propagation within interim operations is a well-known technique to increase the speed of arithmetic hardware units. Redundant number systems are mostly useful in applications where many consecutive arithmetic operations are performed prior to the final result, making it advantageous for arithmetic co-processors. This thesis discusses the implementation of two popular arithmetic co-processors based on redundant number systems: namely, the binary FFT co-processor and the decimal arithmetic co-processor. FFT co-processors consist of several consecutive multipliers and adders over complex numbers. FFT architectures are implemented based on fixed-point and floating-point arithmetic. The main advantage of floating-point over fixed-point arithmetic is the wide dynamic range it introduces. Moreover, it avoids numerical issues such as scaling and overflow/underflow concerns at the expense of higher cost. Furthermore, floating-point implementation allows for an FFT co-processor to collaborate with general purpose processors. This offloads computationally intensive tasks from the primary processor. The first part of this thesis, which is devoted to FFT co-processors, proposes a new FFT architecture that uses a new Binary-Signed Digit (BSD) carry-limited adder, a new floating-point BSD multiplier and a new floating-point BSD three-operand adder. Finally, a new unit labeled as Fused-Dot-Product-Add (FDPA) is designed to compute AB+CD+E over floating-point BSD operands. The second part of the thesis discusses decimal arithmetic operations implemented in hardware using redundant number systems. These operations are popularly used in decimal floating-point co-processors. A new signed-digit decimal adder is proposed along with a sequential decimal multiplier that uses redundant number systems to increase the operational frequency of the multiplier. New redundant decimal division and square-root units are also proposed. The architectures proposed in this thesis were all implemented using Hardware-Description-Language (Verilog) and synthesized using Synopsys Design Compiler. The evaluation results prove the speed improvement of the new arithmetic units over previous pertinent works. Consequently, the FFT and decimal co-processors designed in this thesis work with at least 10% higher speed than that of previous works. These architectures are meant to fulfill the demand for the high-speed co-processors required in various applications such as multimedia services and financial computations

eCommons@USASK

University of Saskatchewan Research Archive

Scalable Emulation of Sign-Problem $-$ Free Hamiltonians with Room Temperature p-bits

Author: Camsari Kerem Y.
Chowdhury Shuvro
Datta Supriyo
Publication venue: 'American Physical Society (APS)'
Publication date: 30/09/2019
Field of study

The growing field of quantum computing is based on the concept of a q-bit which is a delicate superposition of 0 and 1, requiring cryogenic temperatures for its physical realization along with challenging coherent coupling techniques for entangling them. By contrast, a probabilistic bit or a p-bit is a robust classical entity that fluctuates between 0 and 1, and can be implemented at room temperature using present-day technology. Here, we show that a probabilistic coprocessor built out of room temperature p-bits can be used to accelerate simulations of a special class of quantum many-body systems that are sign-problem

-

free or stoquastic, leveraging the well-known Suzuki-Trotter decomposition that maps a

d

-dimensional quantum many body Hamiltonian to a

d

+1-dimensional classical Hamiltonian. This mapping allows an efficient emulation of a quantum system by classical computers and is commonly used in software to perform Quantum Monte Carlo (QMC) algorithms. By contrast, we show that a compact, embedded MTJ-based coprocessor can serve as a highly efficient hardware-accelerator for such QMC algorithms providing several orders of magnitude improvement in speed compared to optimized CPU implementations. Using realistic device-level SPICE simulations we demonstrate that the correct quantum correlations can be obtained using a classical p-circuit built with existing technology and operating at room temperature. The proposed coprocessor can serve as a tool to study stoquastic quantum many-body systems, overcoming challenges associated with physical quantum annealers.Comment: Fixed minor typos and expanded Appendi

arXiv.org e-Print Archive

eScholarship - University of California

Evaluation of High Speed Hardware Multipliers - Fixed Point and Floating point

Author: Abbas Syed Haider
Ahmed Awais
Haider Hussnain
Siddique Muhammad Faheem
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2013
Field of study

There is a huge demand in high speed arithmetic blocks, due to increased performance of processing units. For higher frequency clocks of the system, the arithmetic blocks must keep pace with greater requirement of more computational power. Area and speed are usually conflicting constraints so that improving speed results mostly in larger areas. In our research we will try to determine the best solution to this problem by comparing the results of different multipliers. Different sized of two algorithms for high speed hardware multipliers were studied and implemented ie. Parallel multiplier, Bit serial multiplier. The workings of these two multipliers were compared by implementing each of them separately in VHDL. A number of high speed adder designs are developed and algorithm and design of these adders are discussed. The result of this research will help us to choose the better option between serial and parallel multipliers for both fixed point and floating point multipliers to fabricate in different systems. As multipliers form one of the most important components of many systems, analysing different multipliers will help us to frame a better system with area and better speed.DOI:http://dx.doi.org/10.11591/ijece.v3i6.418

Institute of Advanced Engineering and Science

Study on bit parallel and serial arithmetic logic approaches

Author: Vähäsöyrinki V. (Veikko)
Publication venue: University of Oulu
Publication date: 10/03/2023
Field of study

Abstract. This paper provides general overview of how computers process numbers and how computers do arithmetic. Different ways to implement digital arithmetic logic are presented. Bit-serial designs can save chip real estate, but require more clock cycles for arithmetic operations such as additions and multiplications. Bit-parallel designs produce results with fewer clock cycles, but require more gates, e.g., due to carry-look-ahead generators. This may translate into higher power dissipation. This BSc thesis presents an exploration of bit-serial-parallel and bit-parallel arithmetic logic designs. The intention is to gain understanding of their basic design characteristics

University of Oulu Repository - Jultika

High accuracy computation with linear analog optical systems: a critical study

Author: Athale Ravindra A.
Psaltis Demetri
Publication venue: Optical Society of America
Publication date: 15/09/1986
Field of study

High accuracy optical processors based on the algorithm of digital multiplication by analog convolution (DMAC) are studied for ultimate performance limitations. Variations of optical processors that perform high accuracy vector-vector inner products are studied in abstract and with specific examples. It is concluded that the use of linear analog optical processors in performing digital computations with DMAC leads to impractical requirements for the accuracy of analog optical systems and the complexity of postprocessing electronics

Caltech Authors

High Resolution Single-Chip Radix II FFT Processor for High- Tech Application

Author: Teymourzadeh Rozita
Publication venue: 'IntechOpen'
Publication date: 01/01/2017
Field of study

Electrical motors are vital components of many industrial processes and their operation failure leads losing in production line. Motor functionality and its behavior should be monitored to avoid production failure catastrophe. Hence, a high‐tech DSP processor is a significant method for electrical harmonic analysis that can be realized as embedded systems. This chapter introduces principal embedded design of novel high‐tech 1024‐point FFT processor architecture for high performance harmonic measurement techniques. In FFT processor algorithm pipelining and parallel implementation are incorporated in order to enhance the performance. The proposed FFT makes use of floating point to realize higher precision FFT. Since floating‐point architecture limits the maximum clock frequency and increases the power consumption, the chapter focuses on improving the speed, area, resolution and power consumption, as well as latency for the FFT. It illustrates very large‐scale integration (VLSI) implementation of the floating‐point parallel pipelined (FPP) 1024‐point Radix II FFT processor with applying novel architecture that makes use of only single butterfly incorporation of intelligent controller. The functionality of the conventional Radix II FFT was verified as embedded in FPGA prototyping. For area and power consumption, the proposed Radix II FPP‐FFT was optimized in ASIC under Silterra 0.18 µm and Mimos 0.35 µm technology libraries