Search CORE

205 research outputs found

Imprecise Arithmetic for Low Power Image Processing

Author: Albicocco Pietro
Cardarilli Gian Carlo
Nannarelli Alberto
Petricca Massimo
Re Marco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Sometimes reducing the precision of a numerical processor, by introducing errors, can lead to significant performance (delay, area and power dissipation) improvements without compromising the overall quality of the processing. In this work, we show how to perform the two basic operations, addition and multiplication, in an imprecise manner by simplifying the hardware implementation. With the proposed 'sloppy' operations, we obtain a reduction in delay, area and power dissipation, and the error introduced is still acceptable for applications such as image processing. © 2012 IEEE

CiteSeerX

Crossref

ART

Online Research Database In Technology

Sloppy Addition and Multiplication

Author: Nannarelli Alberto
Publication venue: Technical University of Denmark
Publication date: 01/01/2011
Field of study

Online Research Database In Technology

Computer arithmetic based on the Continuous Valued Number System

Author: Mirhassani Mitra
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2007
Field of study

Scholarship at UWindsor

Design of Energy-Efficient Approximate Arithmetic Circuits

Author: Shao Botang
Publication venue
Publication date: 05/02/2015
Field of study

Energy consumption has become one of the most critical design challenges in integrated circuit design. Arithmetic computing circuits, in particular array-based arithmetic computing circuits such as adders, multipliers, squarers, have been widely used. In many cases, array-based arithmetic computing circuits consume a significant amount of energy in a chip design. Hence, reduction of energy consumption of array-based arithmetic computing circuits is an important design consideration. To this end, designing low-power arithmetic circuits by intelligently trading off processing precision for energy saving in error-resilient applications such as DSP, machine learning and neuromorphic circuits provides a promising solution to the energy dissipation challenge of such systems. To solve the chip’s energy problem, especially for those applications with inherent error resilience, array-based approximate arithmetic computing (AAAC) circuits that produce errors while having improved energy efficiency have been proposed. Specifically, a number of approximate adders, multipliers and squarers have been presented in the literature. However, the chief limitation of these designs is their un-optimized processing accuracy, which is largely due to the current lack of systemic guidance for array-based AAAC circuit design pertaining to optimal tradeoffs between error, energy and area overhead. Therefore, in this research, our first contribution is to propose a general model for approximate array-based approximate arithmetic computing to guide the minimization of processing error. As part of this model, the Error Compensation Unit (ECU) is identified as a key building block for a wide range of AAAC circuits. We develop theoretical analysis geared towards addressing two critical design problems of the ECU, namely, determination of optimal error compensation values and identification of the optimal error compensation scheme. We demonstrate how this general AAAC model can be leveraged to derive practical design insights that may lead to optimal tradeoffs between accuracy, energy dissipation and area overhead. To further minimize energy consumption, delay and area of AAAC circuits, we perform ECU logic simplification by introducing don't cares. By applying the proposed model, we propose an approximate 16x16 fixed-width Booth multiplier that consumes 44.85% and 28.33% less energy and area compared with theoretically the most accurate fixed-width Booth multiplier when implemented using a 90nm CMOS standard cell library. Furthermore, it reduces average error, max error and mean square error by 11.11%, 28.11% and 25.00%, respectively, when compared with the best reported approximate Booth multiplier and outperforms the best reported approximate design significantly by 19.10% in terms of the energy-delay-mean square error product (EDE_(ms)). Using the same approach, significant energy consumption, area and error reduction is achieved for a squarer unit, with more than 20.00% EDE_(ms) reduction over existing fixed-width squarer designs. To further reduce error and cost by utilizing extra signatures and don't cares, we demonstrate a 16-bit fixed-width squarer that improves the energy-delay-max error (EDE_(max)) by 15.81%

Texas A&M Repository

Recommended from our members

Multi-Valued Majority Logic Circuits Using Spin Waves

Author: Rajapandian Sankara Narayanan
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2013
Field of study

With increasing data sets for processing, there is a requirement to build faster and smaller arithmetic circuits. One of the ways to improve the performance of higher order arithmetic units is to reduce the carry propagation levels. Multi-valued logic enables this by reducing the number of digits required to represent a range of numbers. Area reduction is also obtained through fewer operations and signals required to realise a function. Though theoretically multi-valued logic has these advantages, implementation of the multi-valued logic using CMOS has not been efficient. The main reason is because multi-valued logic is emulated in CMOS using binary switches. Two main approaches are followed in CMOS in implementing multi-valued logic using CMOS. Voltage mode logic, where the logic states are encoded using the node voltages suffer from low noise margins and limitation of radix due to the power supply. Current mode logic, where the branch currents are used to represent the logic levels suffer from high power consumption due to static current flow and requirement of restoration devices. The mindset of the post-CMOS approaches explored so far for multi-valued logic circuit design has been to replace the CMOS switches with their novel nano switches. Hence they too suffer from the same issues as CMOS implementation. Our value proposition is through the use of a truly multi-state device based on electron spin. Spin waves, which are a collection of electron spins of an atom enables multi-valued logic by allowing encoding information in the amplitude and phase of the wave.Another advantage of the spin wave fabric is that the computation is through wave propagation and interference which does not involve any movement of charge. This enables building low energy,smaller and faster multi-valued circuits. In this thesis, implementation of the basic building blocks of multi-valued logic using these novel spin wave based devices is shown. Building of arithmetic circuits like adders using these building blocks have also been demonstrated. To quantify the benefits of spin wave based multi-valued circuits, they are benchmarked with CMOS. For 32-bits, our projected comparisons show a 5X increased performance, 125X area improvement and 1717X power reduction for hexa-decimal spin wave based adders compared to binary CMOS. Similarly there is a 4X increase in performance of hexa-decimal SPWF multiplier compared to CMOS for 16 bits. Finally, we have implemented the I/O circuits for smooth interface between binary CMOS and multi-valued SPWF logic

ScholarWorks@UMass Amherst

Design of approximate overclocked datapath

Author: Shi Kan
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/03/2016
Field of study

Embedded applications can often demand stringent latency requirements. While high degrees of parallelism within custom FPGA-based accelerators may help to some extent, it may also be necessary to limit the precision used in the datapath to boost the operating frequency of the implementation. However, by reducing the precision, the engineer introduces quantisation error into the design. In this thesis, we describe an alternative circuit design methodology when considering trade-offs between accuracy, performance and silicon area. We compare two different approaches that could trade accuracy for performance. One is the traditional approach where the precision used in the datapath is limited to meet a target latency. The other is a proposed new approach which simply allows the datapath to operate without timing closure. We demonstrate analytically and experimentally that for many applications it would be preferable to simply overclock the design and accept that timing violations may arise. Since the errors introduced by timing violations occur rarely, they will cause less noise than quantisation errors. Furthermore, we show that conventional forms of computer arithmetic do not fail gracefully when pushed beyond the deterministic clocking region. In this thesis we take a fresh look at Online Arithmetic, originally proposed for digit serial operation, and synthesize unrolled digit parallel online arithmetic operators to allow for graceful degradation. We quantify the impact of timing violations on key arithmetic primitives, and show that substantial performance benefits can be obtained in comparison to binary arithmetic. Since timing errors are caused by long carry chains, these result in errors in least significant digits with online arithmetic, causing less impact than conventional implementations.Open Acces

Spiral - Imperial College Digital Repository

Improving the Hardware Performance of Arithmetic Circuits using Approximate Computing

Author: Adams Lizzie M
Publication venue: 'University of Saskatchewan Library'
Publication date: 16/03/2021
Field of study

An application that can produce a useful result despite some level of computational error is said to be error resilient. Approximate computing can be applied to error resilient applications by intentionally introducing error to the computation in order to improve performance, and it has been shown that approximation is especially well-suited for application in arithmetic computing hardware. In this thesis, novel approximate arithmetic architectures are proposed for three different operations, namely multiplication, division, and the multiply accumulate (MAC) operation. For all designs, accuracy is evaluated in terms of mean relative error distance (MRED) and normalized mean error distance (NMED), while hardware performance is reported in terms of critical path delay, area, and power consumption. Three approximate Booth multipliers (ABM-M1, ABM-M2, ABM-M3) are designed in which two novel inexact partial product generators are used to reduce the dimensions of the partial product matrix. The proposed multipliers are compared to other state-of-the-art designs in terms of both accuracy and hardware performance, and are found to reduce power consumption by up to 56% when compared to the exact multiplier. The function of the multipliers is verified in several image processing applications. Two approximate restoring dividers (AXRD-M1, AXRD-M2) are proposed along with a novel inexact restoring divider cell. In the first divider, the conventional cells are replaced with the proposed inexact cells in several columns. The second divider computes only a subset of the trial subtractions, after which the divisor and partial remainder are rounded and encoded so that they may be used to estimate the remaining quotient bits. The proposed dividers are evaluated for accuracy and hardware performance alongside several benchmarking designs, and their function is verified using change detection and foreground extraction applications. An approximate MAC unit is presented in which the multiplication is implemented using a modified version of ABM-M3. The delay is reduced by using a fused architecture where the accumulator is summed as part of the multiplier compression. The accuracy and hardware savings of the MAC unit are measured against several works from the literature, and the design is utilized in a number of convolution operations

University of Saskatchewan Research Archive

A Survey on Approximate Multiplier Designs for Energy Efficiency: From Algorithms to Circuits

Author: Chen Chuangtao
Han Jie
Qian Weikang
Wang Xuan
Wen Chenyi
Wu Ying
Xiao Weihua
Yin Xunzhao
Zhuo Cheng
Publication venue
Publication date: 29/06/2023
Field of study

Given the stringent requirements of energy efficiency for Internet-of-Things edge devices, approximate multipliers, as a basic component of many processors and accelerators, have been constantly proposed and studied for decades, especially in error-resilient applications. The computation error and energy efficiency largely depend on how and where the approximation is introduced into a design. Thus, this article aims to provide a comprehensive review of the approximation techniques in multiplier designs ranging from algorithms and architectures to circuits. We have implemented representative approximate multiplier designs in each category to understand the impact of the design techniques on accuracy and efficiency. The designs can then be effectively deployed in high-level applications, such as machine learning, to gain energy efficiency at the cost of slight accuracy loss.Comment: 38 pages, 37 figure

arXiv.org e-Print Archive