2,376 research outputs found

    Design a Low Power Built in Self-Test (BIST) Architecture for Fast Multiplier and Optimize in Terms of Real Time Functionality

    Get PDF
    Aiming low power during testing, Maypresent a methodology for derivingBIST Architecture forfast Multipliers. In my propose Researchseveral design rules for designing the Wallace tree in order to be fully testable under the cell fault model. The proposed low power BISTArchitecture for the derived multipliers is achieved by: (i) IntroducingTest Pattern Generators (ii) Properly assigning the TPGsoutputs to the multiplier inputs and(iii) Significantly reducing the test vector length. In this work, I have implemented 4bit * 4bit Multiplier with many test pattern generators (TPG) alternative. A BIST TPG Architecture was use of 6 bit counter. I have calculated operation speed, time delay, area, power consumption for Design. Reductionof power dissipationachieved byproperly assigning the TPG output to the multiplier input,significantly reducing the test set length, suitableTPG built of a6-bit counter DOI: 10.17762/ijritcc2321-8169.150317

    Vector processing-aware advanced clock-gating techniques for low-power fused multiply-add

    Get PDF
    The need for power efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a retailoring for the mobile market that they are entering now. Floating-point (FP) fused multiply-add (FMA), being a functional unit with high power consumption, deserves special attention. Although clock gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector FMA units (VFUs). These techniques ensure power savings without jeopardizing the timing. We evaluate the proposed techniques using both synthetic and “real-world” application-based benchmarking. Using vector masking and vector multilane-aware clock gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector FP instructions. Finally, when evaluating all techniques together, using “real-world” benchmarking, the power reductions are up to 80%. Additionally, in accordance with processor design trends, we perform this research in a fully parameterizable and automated fashion.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA 321253 and is supported in part by the European Union (FEDER funds) under contract TTIN2015-65316-P. The work of I. Ratkovic was supported by a FPU research grant from the Spanish MECD.Peer ReviewedPostprint (author's final draft

    A Synthesizable single-cycle multiply-accumulator

    Get PDF
    The multiplication and multiply-accumulate operations are expensive to implement in hardware for Digital Signal Processing, video, and graphics applications. A standard multiply-accumulator has three inputs and a single output that is equal to the product of two of its inputs added to the third input. For some applications it is desirable for a multiply-accumulator to have two outputs; one output that is the product of the first two inputs, and a second output that is the multiply-accumulate result. The goal of this thesis is to investigate algorithms and architectures used to design multipliers and multiply-accumulators, and to create a multiply-accumulator that computes both outputs in a single clock cycle. Often times in high speed designs the most time-consuming operations are pipelined to meet the system timing requirements. If the multiply-accumulate computation can be reduced to a single-cycle operation the overall processor performance can be improved for many applications. A multiply-accumulator with two outputs can be created using a combination of standard multiply, add, or multiply-accumulate components. Using these components, a multiplier and a multiply-accumulator can be used to produce the outputs in the most time-efficient manner. A multiplier and an adder will result in a smaller design with a larger worst-case delay. Therefore, the goal is to create a multiply-accumulator that is comparable in speed, but requires less area than a design using an industry standard multiplier and multiply-accumulator

    Fast Implementation of Lifting Based DWT Architecture For Image Compression

    Get PDF
    Technological growth in semiconductor industry have led to unprecedented demand for faster area efficient and low power VLSI circuits for complex image processing applications DWT-IDWT is one of the most popular IP that is used for image transformation In this work a high speed low power DWT IDWT architecture is designed and implemented on ASIC using 130nm Technology 2D DWT architecture based on lifting scheme architecture uses multipliers and adders thus consuming power This paper addresses power reduction in multiplier by proposing a modified algorithm for BZFAD multiplier The proposed BZFAD multiplier is 65 faster and occupies 44 less area compared with the generic multipliers The DWT architecture designed based on modified BZFAD multiplier achieves 35 less power reduction and operates at frequency of 200MHz with latency of 1536 clock cycles for 512x512 image The developed DWT can be used as an IP for VLSI implementatio

    A Survey on Approximate Multiplier Designs for Energy Efficiency: From Algorithms to Circuits

    Full text link
    Given the stringent requirements of energy efficiency for Internet-of-Things edge devices, approximate multipliers, as a basic component of many processors and accelerators, have been constantly proposed and studied for decades, especially in error-resilient applications. The computation error and energy efficiency largely depend on how and where the approximation is introduced into a design. Thus, this article aims to provide a comprehensive review of the approximation techniques in multiplier designs ranging from algorithms and architectures to circuits. We have implemented representative approximate multiplier designs in each category to understand the impact of the design techniques on accuracy and efficiency. The designs can then be effectively deployed in high-level applications, such as machine learning, to gain energy efficiency at the cost of slight accuracy loss.Comment: 38 pages, 37 figure

    Power-efficient design of 16-bit mixed-operand multipliers

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 53).Multiplication is an expensive and slow arithmetic operation, which plays an important role in many DSP algorithms. It usually lies in the critical-delay paths, having an effect on performance of the system as well as consuming large power. Consequently, significant improvements in both power and performance can be achieved in the overall DSP system by carefully designing and optimizing power and performance of the multiplier. This thesis explores several circuit-level techniques for power-efficiently designing multipliers, including supply voltage reduction, efficient multiplication algorithms, low power circuit logic styles, and transistor sizing using dynamic and static tuners. Based on these techniques, several 16-bit multipliers have been successfully designed and implemented in 0.13[micro]m CMOS technology at the supply voltage of 1.5V and 0.9V. The multipliers are modified to handle multiplications of two 16-bit operands in which each can be either signed magnitude or two's complement formats. Examining power-performance characteristics of these multipliers reveals that both array and tree structures are feasible solutions for designing 16-bit multipliers, and complementary CMOS and single-ended CPL-TG logics are promising candidates for power-efficient design. The appropriate choices of structures and logic styles depend on power and performance constraints of the particular design.by Sataporn Pornpromlikit.M.Eng

    Low-Power, Low-Cost, & High-Performance Digital Designs : Multi-bit Signed Multiplier design using 32nm CMOS Technology

    Get PDF
    Binary multipliers are ubiquitous in digital hardware. Digital multipliers along with the adders play a major role in computing, communicating, and controlling devices. Multipliers are used majorly in the areas of digital signal and image processing, central processing unit (CPU) of the computers, high-performance and parallel scientific computing, machine learning, physical layer design of the communication equipment, etc. The predominant presence and increasing demand for low-power, low-cost, and high-performance digital hardware led to this work of developing optimized multiplier designs. Two optimized designs are proposed in this work. One is an optimized 8 x 8 Booth multiplier architecture which is implemented using 32nm CMOS technology. Synthesis (pre-layout) and post-layout results show that the delay is reduced by 24.7% and 25.6% respectively, the area is reduced by 5.5% and 15% respectively, the power consumption is reduced by 21.5% and 26.6% respectively, and the area-delay-product is reduced by 28.8% and 36.8% respectively when compared to the performance results obtained for the state-of-the-art 8 x 8 Booth multiplier designed using 32nm CMOS technology with 1.05 V supply voltage at 500 MHz input frequency. Another is a novel radix-8 structure with 3-bit grouping to reduce the number of partial products along with the effective partial product reduction schemes for 8 x 8, 16 x 16, 32 x 32, and 64 x 64 signed multipliers. Comparing the performance results of the (synthesized, post-layout) designs of sizes 32 x 32, and 64 x 64 based on the simple novel radix-8 structure with the estimated performance measurements for the optimized Booth multiplier design presented in this work, reduction in delay by (2.64%, 0.47%) and (2.74%, 18.04%) respectively, and reduction in area-delay-product by (12.12%, -5.17%) and (17.82%, 12.91%) respectively can be observed. With the use of the higher radix structure, delay, area, and power consumption can be further reduced. Appropriate adder deployment, further exploring the optimized grouping or compression strategies, and applying more low-power design techniques such as power-gating, multi-Vt MOS transistor utilization, multi-VDD domain creation, etc., help, along with the higher radix structures, realizing the more efficient multiplier designs
    • …
    corecore