5,840 research outputs found

    Serial-data computation in VLSI

    Get PDF

    Highly Efficient Twin Module Structure of 64-Bit Exponential Function Implemented on SGI RASC Platform

    Get PDF
    This paper presents an implementation of the double precision exponential function. A novel table-based architecture, together with short Taylor expansion, provides a low latency (30 clock cycles) which is comparable to 32 bit implementations. A low area consumption of a single exp() module (roughtly 4% of XC4LX200) allows that several modules can be implemented in a single FPGAs.The employment of massive parallelism results in high performance of the module. Nevertheless, because of the external memory interface limitation, only a twin module structure is presented in this paper. This implementation aims primarily to meet quantum chemistry huge and strict requirements for precision and speed. Each module is capable of processing at speed of 200MHz with max. error of 1 ulp, RMSE equals 0.6

    Implementation of Input Oriented Dynamic Voltage and Frequency Scaling for Multiplier on FPGA

    Full text link
    This paper presents an Implementation of Dynamic voltage and frequency scaling according to input data. In the conventional method the power supply is fixed and independent on workload, so, voltage and area will be consumed unnecessary .Paper proposes the approach which focuses on making system dynamic for low power digital multiplier on reconfigurable device FPGA (Spartan III). For making system Dynamic input workload should be known and scanning is used to detect range of input so system can adjust voltage and frequency. Control signal generated from scanning which can dynamically change voltage and frequency for low power consumption according to input data

    Truncated SIMD Multiplier Architecture for Approximate Computing in Low-Power Programmable Processors

    Get PDF
    [Abstract]: Approximate computing has been exploited for many years in application-specific architectures. Recently, it has also been proposed for low-power programmable processors. However, this poses some challenges as, in a microprocessor, the energy consumed by fetching and decoding an instruction may be significantly higher than that of the execution itself. Therefore, approximate computing would be advisable only for those instructions, in which the execution stage is significantly expensive in terms of energy consumption. In this paper, we present new architectures for truncated SIMD multipliers able to calculate signed and unsigned products from 8 × 8 to 64× 64 bits. Next, we analyze the precision loss incurred by truncation for all product sizes. We implement accurate and truncated architectures for both scalar and SIMD products and find that truncation allows area savings of up to 27%. The proposed design is experimentally evaluated in different scenarios, showing potential energy savings ranging from 29% to 42%. Finally, this paper analyzes the overall convenience of introducing truncated SIMD architectures with respect to accurate SIMD and scalar architectures.This work was supported in part by the Ministry of Economy and Competitiveness of Spain under Project TIN2016-75845-P (AEI/FEDER, UE), in part by the Xunta de Galicia and FEDER Funds of the EU under the Consolidation Program of Competitive Reference Groups under Grant ED431C 2017/04, and in part by the Centro Singular de Investigación de Galicia Accreditation 2016 2019 under Grant ED431G/01.Xunta de Galicia; ED431C 2017/04Xunta de Galicia; ED431G/0

    Design of Energy-Efficient Approximate Arithmetic Circuits

    Get PDF
    Energy consumption has become one of the most critical design challenges in integrated circuit design. Arithmetic computing circuits, in particular array-based arithmetic computing circuits such as adders, multipliers, squarers, have been widely used. In many cases, array-based arithmetic computing circuits consume a significant amount of energy in a chip design. Hence, reduction of energy consumption of array-based arithmetic computing circuits is an important design consideration. To this end, designing low-power arithmetic circuits by intelligently trading off processing precision for energy saving in error-resilient applications such as DSP, machine learning and neuromorphic circuits provides a promising solution to the energy dissipation challenge of such systems. To solve the chip’s energy problem, especially for those applications with inherent error resilience, array-based approximate arithmetic computing (AAAC) circuits that produce errors while having improved energy efficiency have been proposed. Specifically, a number of approximate adders, multipliers and squarers have been presented in the literature. However, the chief limitation of these designs is their un-optimized processing accuracy, which is largely due to the current lack of systemic guidance for array-based AAAC circuit design pertaining to optimal tradeoffs between error, energy and area overhead. Therefore, in this research, our first contribution is to propose a general model for approximate array-based approximate arithmetic computing to guide the minimization of processing error. As part of this model, the Error Compensation Unit (ECU) is identified as a key building block for a wide range of AAAC circuits. We develop theoretical analysis geared towards addressing two critical design problems of the ECU, namely, determination of optimal error compensation values and identification of the optimal error compensation scheme. We demonstrate how this general AAAC model can be leveraged to derive practical design insights that may lead to optimal tradeoffs between accuracy, energy dissipation and area overhead. To further minimize energy consumption, delay and area of AAAC circuits, we perform ECU logic simplification by introducing don't cares. By applying the proposed model, we propose an approximate 16x16 fixed-width Booth multiplier that consumes 44.85% and 28.33% less energy and area compared with theoretically the most accurate fixed-width Booth multiplier when implemented using a 90nm CMOS standard cell library. Furthermore, it reduces average error, max error and mean square error by 11.11%, 28.11% and 25.00%, respectively, when compared with the best reported approximate Booth multiplier and outperforms the best reported approximate design significantly by 19.10% in terms of the energy-delay-mean square error product (EDE_(ms)). Using the same approach, significant energy consumption, area and error reduction is achieved for a squarer unit, with more than 20.00% EDE_(ms) reduction over existing fixed-width squarer designs. To further reduce error and cost by utilizing extra signatures and don't cares, we demonstrate a 16-bit fixed-width squarer that improves the energy-delay-max error (EDE_(max)) by 15.81%
    • …
    corecore