2,491 research outputs found

    A Modified Architecture for Radix-4 Booth Multiplier with Adaptive Hold Logic

    Get PDF
    High speed digital multipliers are most efficiently used in many applications such as Fourier transform, discrete cosine transforms, and digital filtering. The throughput of the multipliers is based on speed of the multiplier, and then the entire performance of the circuit depends on it. The pMOS transistor in negative bias cause negative bias temperature instability (NBTI), which increases the threshold voltage of the transistor and reduces the multiplier speed. Similarly, the nMOS transistor in positive bias cause positive bias temperature instability (PBTI).These effects reduce the transistor speed and the system may fail due to timing violations. So here a new multiplier was designed with novel adaptive hold logic (AHL) using Radix-4 Modified Booth Multiplier. By using Radix-4 Modified Booth Encoding (MBE), we can reduce the number of partial products by half. Modified booth multiplier helps to provide higher throughput with low power consumption. This can adjust the AHL circuit to reduce the performance degradation. The expected result will be reduce threshold voltage, increase throughput and speed and also reduce power. This modified multiplier design is coded by Verilog and simulated using Xilinx ISE 12.1 and implemented in Spartan 3E FPGA kit

    Energy Aware Design and Analysis for Synchronous and Asynchronous Circuits

    Get PDF
    Power dissipation has become a major concern for IC designers. Various low power design techniques have been developed for synchronous circuits. Asynchronous circuits, however. have gained more interests recently due to their benefits in lower noise, easy timing control, etc. But few publications on energy reduction techniques for asynchronous logic are available. Power awareness indicates the ability of the system power to scale with changing conditions and quality requirements. Scalability is an important figure-of-merit since it allows the end user to implement operational policy. just like the user of mobile multimedia equipment needs to select between better quality and longer battery operation time. This dissertation discusses power/energy optimization and performs analysis on both synchronous and asynchronous logic. The major contributions of this dissertation include: 1 ) A 2-Dimensional Pipeline Gating technique for synchronous pipelined circuits to improve their power awareness has been proposed. This technique gates the corresponding clock lines connected to registers in both vertical direction (the data flow direction) and horizontal direction (registers within each pipeline stage) based on current input precision. 2) Two energy reduction techniques, Signal Bypassing & Insertion and Zero Insertion. have been developed for NCL circuits. Both techniques use Nulls to replace redundant Data 0\u27s based on current input precision in order to reduce the switching activity while Signal Bypassing & Insertion is for non-pipelined NCI, circuits and Zero Insertion is for pipelined counterparts. A dynamic active-bit detection scheme is also developed as an expansion. 3) Two energy estimation techniques, Equivalent Inverter Modeling based on Input Mapping in transistor-level and Switching Activity Modeling in gate-level, have been proposed. The former one is for CMOS gates with feedbacks and the latter one is for NCL circuits

    Age-Acknowledging Reliable Multiplier Design with Adaptive Hold Logic

    Full text link
    Digital multipliers are among the most critical arithmetic functional units. The overall performance of these systems depends on the throughput of the multiplier. Meanwhile, the negative bias temperature instability effect occurs when a pMOS transistor is under negative bias (Vgs = −Vdd), increasing the threshold voltage of the pMOS transistor, and reducing multiplier speed. A similar phenomenon, positive bias temperature instability, occurs when an nMOS transistor is under positive bias. Both effects degrade transistor speed, and in the long term, the system may fail due to timing violations. Therefore, it is important to design reliable high performance multipliers. In this paper, we propose an aging-aware multiplier design with novel adaptive hold logic (AHL) circuit. The multiplier is able to provide higher throughput through the variable latency and can adjust the AHL circuit to mitigate performance degradation that is due to the aging effect. Moreover, the proposed architecture can be applied to a column- or row-bypassing multiplier. The experimental results show that our proposed architecture with 16 ×16 and 32 ×32 column-bypassing multipliers can attain up to 62.88% and 76.28% performance improvement, respectively, compared with 16×16 and 32×32 fixed-latency column-bypassing multipliers. Furthermore, our proposed architecture with 16 × 16 and 32 × 32 row-bypassing multipliers can achieve up to 80.17% and 69.40% performance improvement as compared with 16×16 and 32 × 32 fixed-latency row-bypassing multipliers

    Improving Compute & Data Efficiency of Flexible Architectures

    Get PDF

    Power-Aware Design Methodologies for FPGA-Based Implementation of Video Processing Systems

    Get PDF
    The increasing capacity and capabilities of FPGA devices in recent years provide an attractive option for performance-hungry applications in the image and video processing domain. FPGA devices are often used as implementation platforms for image and video processing algorithms for real-time applications due to their programmable structure that can exploit inherent spatial and temporal parallelism. While performance and area remain as two main design criteria, power consumption has become an important design goal especially for mobile devices. Reduction in power consumption can be achieved by reducing the supply voltage, capacitances, clock frequency and switching activities in a circuit. Switching activities can be reduced by architectural optimization of the processing cores such as adders, multipliers, multiply and accumulators (MACS), etc. This dissertation research focuses on reducing the switching activities in digital circuits by considering data dependencies in bit level, word level and block level neighborhoods in a video frame. The bit level data neighborhood dependency consideration for power reduction is illustrated in the design of pipelined array, Booth and log-based multipliers. For an array multiplier, operands of the multipliers are partitioned into higher and lower parts so that the probability of the higher order parts being zero or one increases. The gating technique for the pipelined approach deactivates part(s) of the multiplier when the above special values are detected. For the Booth multiplier, the partitioning and gating technique is integrated into the Booth recoding scheme. In addition, a delay correction strategy is developed for the Booth multiplier to reduce the switching activities of the sign extension part in the partial products. A novel architecture design for the computation of log and inverse-log functions for the reduction of power consumption in arithmetic circuits is also presented. This also utilizes the proposed partitioning and gating technique for further dynamic power reduction by reducing the switching activities. The word level and block level data dependencies for reducing the dynamic power consumption are illustrated by presenting the design of a 2-D convolution architecture. Here the similarities of the neighboring pixels in window-based operations of image and video processing algorithms are considered for reduced switching activities. A partitioning and detection mechanism is developed to deactivate the parallel architecture for window-based operations if higher order parts of the pixel values are the same. A neighborhood dependent approach (NDA) is incorporated with different window buffering schemes. Consideration of the symmetry property in filter kernels is also applied with the NDA method for further reduction of switching activities. The proposed design methodologies are implemented and evaluated in a FPGA environment. It is observed that the dynamic power consumption in FPGA-based circuit implementations is significantly reduced in bit level, data level and block level architectures when compared to state-of-the-art design techniques. A specific application for the design of a real-time video processing system incorporating the proposed design methodologies for low power consumption is also presented. An image enhancement application is considered and the proposed partitioning and gating, and NDA methods are utilized in the design of the enhancement system. Experimental results show that the proposed multi-level power aware methodology achieves considerable power reduction. Research work is progressing In utilizing the data dependencies in subsequent frames in a video stream for the reduction of circuit switching activities and thereby the dynamic power consumption

    Multiplier Based On Add And Shift Method By Passing Zero

    Get PDF
    In this paper, a low-power structure for shift-and-add multipliers is proposed. The architec-ture considerably lowers the switching activity of conventional multipliers. The modification to the multiplier which multiplies A by B include the removal of the shifting register, direct feeding of A to the adder, bypassing the adder whenever possible, using a ring counter instead of a binary counter and removal of the partial product shift. The architecture makes use of a low-power ring counter proposed in this work . The proposed multiplier can be used for low-power applications where the speed is not a primary design parameter

    ArrayFlex: A Systolic Array Architecture with Configurable Transparent Pipelining

    Full text link
    Convolutional Neural Networks (CNNs) are the state-of-the-art solution for many deep learning applications. For maximum scalability, their computation should combine high performance and energy efficiency. In practice, the convolutions of each CNN layer are mapped to a matrix multiplication that includes all input features and kernels of each layer and is computed using a systolic array. In this work, we focus on the design of a systolic array with configurable pipeline with the goal to select an optimal pipeline configuration for each CNN layer. The proposed systolic array, called ArrayFlex, can operate in normal, or in shallow pipeline mode, thus balancing the execution time in cycles and the operating clock frequency. By selecting the appropriate pipeline configuration per CNN layer, ArrayFlex reduces the inference latency of state-of-the-art CNNs by 11%, on average, as compared to a traditional fixed-pipeline systolic array. Most importantly, this result is achieved while using 13%-23% less power, for the same applications, thus offering a combined energy-delay-product efficiency between 1.4x and 1.8x.Comment: DATE 202

    Research in the development of an improved multiplier phototube Final report

    Get PDF
    Cascade aperture design, gas pressure effects, gain calibration, and photon counting efficiency of multiplier phototub

    An efficient multiple precision floating-point Multiply-Add Fused unit

    Get PDF
    Multiply-Add Fused (MAF) units play a key role in the processor's performance for a variety of applications. The objective of this paper is to present a multi-functional, multiple precision floating-point Multiply-Add Fused (MAF) unit. The proposed MAF is reconfigurable and able to execute a quadruple precision MAF instruction, or two double precision instructions, or four single precision instructions in parallel. The MAF architecture features a dual-path organization reducing the latency of the floating-point add (FADD) instruction and utilizes the minimum number of operating components to keep the area low. The proposed MAF design was implemented on a 65 nm silicon process achieving a maximum operating frequency of 293.5 MHz at 381 mW power
    • …
    corecore