Search CORE

318 research outputs found

Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques

Author: Armeniakos Giorgos
Hanif Muhammad Abdullah
Jiao Xun
Leon Vasileios
Pekmestzi Kiamal
Shafique Muhammad
Soudris Dimitrios
Publication venue
Publication date: 20/07/2023
Field of study

The rapid growth of demanding applications in domains applying multimedia processing and machine learning has marked a new era for edge and cloud computing. These applications involve massive data and compute-intensive tasks, and thus, typical computing paradigms in embedded systems and data centers are stressed to meet the worldwide demand for high performance. Concurrently, the landscape of the semiconductor field in the last 15 years has constituted power as a first-class design concern. As a result, the community of computing systems is forced to find alternative design approaches to facilitate high-performance and/or power-efficient computing. Among the examined solutions, Approximate Computing has attracted an ever-increasing interest, with research works applying approximations across the entire traditional computing stack, i.e., at software, hardware, and architectural levels. Over the last decade, there is a plethora of approximation techniques in software (programs, frameworks, compilers, runtimes, languages), hardware (circuits, accelerators), and architectures (processors, memories). The current article is Part I of our comprehensive survey on Approximate Computing, and it reviews its motivation, terminology and principles, as well it classifies and presents the technical details of the state-of-the-art software and hardware approximation techniques.Comment: Under Review at ACM Computing Survey

arXiv.org e-Print Archive

A Study on Efficient Designs of Approximate Arithmetic Circuits

Author: Venkatachalam Suganthi
Publication venue: 'University of Saskatchewan Library'
Publication date: 03/01/2019
Field of study

Approximate computing is a popular field where accuracy is traded with energy. It can benefit applications such as multimedia, mobile computing and machine learning which are inherently error resilient. Error introduced in these applications to a certain degree is beyond human perception. This flexibility can be exploited to design area, delay and power efficient architectures. However, care must be taken on how approximation compromises the correctness of results. This research work aims to provide approximate hardware architectures with error metrics and design metrics analyzed and their effects in image processing applications. Firstly, we study and propose unsigned array multipliers based on probability statistics and with approximate 4-2 compressors, full adders and half adders. This work deals with a new design approach for approximation of multipliers. The partial products of the multiplier are altered to introduce varying probability terms. Logic complexity of approximation is varied for the accumulation of altered partial products based on their probability. The proposed approximation is utilized in two variants of 16-bit multipliers. Synthesis results reveal that two proposed multipliers achieve power savings of 72% and 38% respectively compared to an exact multiplier. They have better precision when compared to existing approximate multipliers. Mean relative error distance (MRED) figures are as low as 7.6% and 0.02% for the proposed approximate multipliers, which are better than the previous state-of-the-art works. Performance of the proposed multipliers is evaluated with geometric mean filtering application, where one of the proposed models achieves the highest peak signal to noise ratio (PSNR). Second, approximation is proposed for signed Booth multiplication. Approximation is introduced in partial product generation and partial product accumulation circuits. In this work, three multipliers (ABM-M1, ABM-M2, and ABM-M3) are proposed in which the modified Booth algorithm is approximated. In all three designs, approximate Booth partial product generators are designed with different variations of approximation. The approximations are performed by reducing the logic complexity of the Booth partial product generator, and the accumulation of partial products is slightly modified to improve circuit performance. Compared to the exact Booth multiplier, ABM-M1 achieves up to 15% reduction in power consumption with an MRED value of 7.9 × 10-4. ABM-M2 has power savings of up to 60% with an MRED of 1.1 × 10-1. ABM-M3 has power savings of up to 50% with an MRED of 3.4 × 10-3. Compared to existing approximate Booth multipliers, the proposed multipliers ABM-M1 and ABM-M3 achieve up to a 41% reduction in power consumption while exhibiting very similar error metrics. Image multiplication and matrix multiplication are used as case studies to illustrate the high performance of the proposed approximate multipliers. Third, distributed arithmetic based sum of products units approximation is analyzed. Sum of products units are key elements in many digital signal processing applications. Three approximate sum of products models which are based on distributed arithmetic are proposed. They are designed for different levels of accuracy. First model of approximate sum of products achieves an improvement up to 64% on area and 70% on power, when compared to conventional unit. Other two models provide an improvement of 32% and 48% on area and 54% and 58% on power, respectively, with a reduced error rate compared to the first model. Third model achieves MRED and normalized mean error distance (NMED) as low as 0.05% and 0.009%. Performance of approximate units is evaluated with a noisy image smoothing application, where the proposed models are capable of achieving higher PSNR than existing state of the art techniques. Fourth, approximation is applied in division architecture. Two approximation models are proposed for restoring divider. In the first design, approximation is performed at circuit level, where approximate divider cells are utilized in place of exact ones by simplifying the logic equations. In the second model, restoring divider is analyzed strategically and number of restoring divider cells are reduced by finding the portions of divisor and dividend with significant information. An approximation factor

p

is used in both designs. In model 1, the design with p=8 has a 58% reduction in both area and power consumption compared to exact design, with a Q-MRED of 1.909 × 10-2 and Q-NMED of 0.449 × 10-2. The second model with an approximation factor p=4 has 54% area savings and 62% power savings compared to exact design. The proposed models are found to have better error metrics compared to existing designs, with better performance at similar error values. A change detection image processing application is used for real time assessment of proposed and existing approximate dividers and one of the models achieves a PSNR of 54.27 dB

eCommons@USASK

University of Saskatchewan Research Archive

Improving the Hardware Performance of Arithmetic Circuits using Approximate Computing

Author: Adams Lizzie M
Publication venue: 'University of Saskatchewan Library'
Publication date: 16/03/2021
Field of study

An application that can produce a useful result despite some level of computational error is said to be error resilient. Approximate computing can be applied to error resilient applications by intentionally introducing error to the computation in order to improve performance, and it has been shown that approximation is especially well-suited for application in arithmetic computing hardware. In this thesis, novel approximate arithmetic architectures are proposed for three different operations, namely multiplication, division, and the multiply accumulate (MAC) operation. For all designs, accuracy is evaluated in terms of mean relative error distance (MRED) and normalized mean error distance (NMED), while hardware performance is reported in terms of critical path delay, area, and power consumption. Three approximate Booth multipliers (ABM-M1, ABM-M2, ABM-M3) are designed in which two novel inexact partial product generators are used to reduce the dimensions of the partial product matrix. The proposed multipliers are compared to other state-of-the-art designs in terms of both accuracy and hardware performance, and are found to reduce power consumption by up to 56% when compared to the exact multiplier. The function of the multipliers is verified in several image processing applications. Two approximate restoring dividers (AXRD-M1, AXRD-M2) are proposed along with a novel inexact restoring divider cell. In the first divider, the conventional cells are replaced with the proposed inexact cells in several columns. The second divider computes only a subset of the trial subtractions, after which the divisor and partial remainder are rounded and encoded so that they may be used to estimate the remaining quotient bits. The proposed dividers are evaluated for accuracy and hardware performance alongside several benchmarking designs, and their function is verified using change detection and foreground extraction applications. An approximate MAC unit is presented in which the multiplication is implemented using a modified version of ABM-M3. The delay is reduced by using a fused architecture where the accumulator is summed as part of the multiplier compression. The accuracy and hardware savings of the MAC unit are measured against several works from the literature, and the design is utilized in a number of convolution operations

University of Saskatchewan Research Archive

Optimization of new Chinese Remainder theorems using special moduli sets

Author: Narayanaswamy Narendran
Publication venue: LSU Digital Commons
Publication date: 01/01/2010
Field of study

The residue number system (RNS) is an integer number representation system, which is capable of supporting parallel, high-speed arithmetic. This system also offers some useful properties for error detection, error correction and fault tolerance. It has numerous applications in computation-intensive digital signal processing (DSP) operations, like digital filtering, convolution, correlation, Discrete Fourier Transform, Fast Fourier Transform, direct digital frequency synthesis, etc. The residue to binary conversion is based on Chinese Remainder Theorem (CRT) and Mixed Radix Conversion (MRC). However, the CRT requires a slow large modulo operation while the MRC requires finding the mixed radix digits which is a slow process. The new Chinese Remainder Theorems (CRT I, CRT II and CRT III) make the computations faster and efficient without any extra overheads. But, New CRTs are hardware intensive as they require many inverse modulus operators, modulus operators, multipliers and dividers. Dividers and inverse modulus operators in turn needs many half and full adders and subtractors. So, some kind of optimization is necessary to implement these theorems practically. In this research, for the optimization, new both co-prime and non co-prime multi modulus sets are proposed that simplify the new Chinese Remainder theorems by eliminating the huge summations, inverse modulo operators, and dividers. Furthermore, the proposed hardware optimization removes the multiplication terms in the theorems, which further simplifies the implementation

Louisiana State University

Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications

Author: Armeniakos Giorgos
Hanif Muhammad Abdullah
Jiao Xun
Leon Vasileios
Pekmestzi Kiamal
Shafique Muhammad
Soudris Dimitrios
Publication venue
Publication date: 20/07/2023
Field of study

The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions.Comment: Under Review at ACM Computing Survey

arXiv.org e-Print Archive

The use of reversible logic gates in the design of residue number systems

Author: Asadpour Ailin
Emrani Zarandi Azadeh Alsadat
Molahosseini Amir Sabbagh
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/04/2023
Field of study

Reversible computing is an emerging technique to achieve ultra-low-power circuits. Reversible arithmetic circuits allow for achieving energy-efficient high-performance computational systems. Residue number systems (RNS) provide parallel and fault-tolerant additions and multiplications without carry propagation between residue digits. The parallelism and fault-tolerance features of RNS can be leveraged to achieve high-performance reversible computing. This paper proposed RNS full reversible circuits, including forward converters, modular adders and multipliers, and reverse converters used for a class of RNS moduli sets with the composite form {2k, 2p-1}. Modulo 2n-1, 2n, and 2n+1 adders and multipliers were designed using reversible gates. Besides, reversible forward and reverse converters for the 3-moduli set {2n-1, 2n+k, 2n+1} have been designed. The proposed RNS-based reversible computing approach has been applied for consecutive multiplications with an improvement of above 15% in quantum cost after the twelfth iteration, and above 27% in quantum depth after the ninth iteration. The findings show that the use of the proposed RNS-based reversible computing in convolution results in a significant improvement in quantum depth in comparison to conventional methods based on weighted binary adders and multipliers

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Approximation Opportunities in Edge Computing Hardware : A Systematic Literature Review

Author: Damsgaard Hans Jakob
Nurmi Jari
Ometov Aleksandr
Publication venue
Publication date: 29/11/2022
Field of study

With the increasing popularity of the Internet of Things and massive Machine Type Communication technologies, the number of connected devices is rising. However, while enabling valuable effects to our lives, bandwidth and latency constraints challenge Cloud processing of their associated data amounts. A promising solution to these challenges is the combination of Edge and approximate computing techniques that allows for data processing nearer to the user. This paper aims to survey the potential benefits of these paradigms’ intersection. We provide a state-of-the-art review of circuit-level and architecture-level hardware techniques and popular applications. We also outline essential future research directions.publishedVersionPeer reviewe

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Trepo - Institutional Repository of Tampere University

Recommended from our members

Formal Verification of Divider and Square-root Arithmetic Circuits Using Computer Algebra Methods

Author: Yasin Atif
Publication venue: ScholarWorks@UMass Amherst
Publication date: 16/07/2020
Field of study

A considerable progress has been made in recent years in verification of arithmetic circuits such as multipliers, fused multiply-adders, multiply-accumulate, and other components of arithmetic datapaths, both in integer and finite field domain. However, the verification of hardware dividers and square-root functions have received only a limited attention from the verification community, with a notable exception for theorem provers and other inductive, non-automated systems. Division, square root, and transcendental functions are all tied to the basic Intel architecture and proving correctness of such algorithms is of grave importance. Although belonging to the same iterative-subtract class of architectures, they widely differ from each other. IEEE floating point standard specifies square-rooting and division as basic arithmetic operation alongside the usual three basic operations. The difficulty of formally verifying hardware implementation of a divider/square-root can be attributed mostly to the modeling of its characteristic function and the high memory complexity required by standard algebraic approach. The work proposed in this thesis discusses formal verification of combinational divider and square-root circuits. Specifically, it addresses the problem of formally verifying gate-level circuits using an algebraic model. In contrast to standard verification approaches using satisfiability (SAT) or equivalence checking, the proposed method verifies whether the gate-level circuit actually performs the intended function or not, without a need for a reference design. Firstly, we present a verification methodology for a constant divider, where the divisor value is fixed to a constant integer. Albeit simpler case of verification, it provides us with the basic understanding of verification techniques and the underlying issues applicable to divider verification. Secondly, a layered verification approach is proposed for the verification of generic array dividers. Finally, the work proposed in this thesis will further analyze the divider and square-root circuits and aim to curb the memory explosion issue experienced by computer algebra based verification methods in order to successfully verify large bit-width divider-type arithmetic circuits. More specifically, a novel idea of hardware rewriting is introduced, which avoids the high memory complexity. The mentioned technique verifies a 256-bit gate-level square-root circuit with around 260,000 gates in just under 18 minutes and 127-bit gate-level divider circuit in under one minute

ScholarWorks@UMass Amherst