Search CORE

2,396 research outputs found

Design of Wallace Tree Multiplier with Power Efficient Adiabatic Logic

Author: Ms. Smita S.Wakhare, Prof. Payal M.Ghutke
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/09/2014
Field of study

The objective of this project is to design high performance arithmetic circuits which are faster and have lower power consumption using a new adiabatic logic family of CMOS and to analyze its performance for sequential circuits and effects upon cascading. It commencement on evaluation of a computational block before its evaluation phase begins, and quickly performs a final evaluation as soon as the inputs are valid. This adiabatic logic family is best suited to arithmetic circuits because the critical path is made of a long chain of cascaded inverting gates.In this paper we are going to design Wallace Tree Multiplier using full adder structure with adiabatic logic. As the major advantage of this logic which is higher speed and low power consumption is observed upon cascading, it is most suitable for arithmetic circuits

International Journal on Recent and Innovation Trends in Computing and Communication

A Survey on Approximate Multiplier Designs for Energy Efficiency: From Algorithms to Circuits

Author: Chen Chuangtao
Han Jie
Qian Weikang
Wang Xuan
Wen Chenyi
Wu Ying
Xiao Weihua
Yin Xunzhao
Zhuo Cheng
Publication venue
Publication date: 29/06/2023
Field of study

Given the stringent requirements of energy efficiency for Internet-of-Things edge devices, approximate multipliers, as a basic component of many processors and accelerators, have been constantly proposed and studied for decades, especially in error-resilient applications. The computation error and energy efficiency largely depend on how and where the approximation is introduced into a design. Thus, this article aims to provide a comprehensive review of the approximation techniques in multiplier designs ranging from algorithms and architectures to circuits. We have implemented representative approximate multiplier designs in each category to understand the impact of the design techniques on accuracy and efficiency. The designs can then be effectively deployed in high-level applications, such as machine learning, to gain energy efficiency at the cost of slight accuracy loss.Comment: 38 pages, 37 figure

arXiv.org e-Print Archive

Design for Power and Area Efficient Approximate Multipliers

Author: NADDUNOORO MALATHI
NARESH SANDHI
Publication venue: International Journal of Innovative Technology and Research
Publication date: 12/11/2019
Field of study

Multimedia and image processing applications, may tolerate errors in calculations but still generate meaningful and beneficial results. This work deals with a high speed approximate multiplier with TDM tree and carry prediction circuit. The modified multiplier utilizes an optimised TDM carry save tree which reduces the device utilization on FPGA as well as the combinational path delay and power consumption. The proposed design is analyzed using the simulation and implementation results on Xilinx Spartan 3E family

International Journal of Innovative Technology and Research (IJITR)

Energy-efficient design and implementation of approximate floating-point multiplier

Author: Paparouni Theodora
Παπαρούνη Θεοδώρα
Publication venue
Publication date: 28/01/2020
Field of study

DSpace at NTUA

Compressor based approximate multiplier architectures for media processing applications

Author: Ahmed Syed Ershad
Kumar Uppugunduru Anil
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/08/2021
Field of study

Approximate computing is an attractive technique to gain substantial improvement in the area, speed, and power in applications where exact computation is not required. This paper proposes two improved multiplier designs based on a new 4:2 approximate compressor circuit to simplify the hardware at the partial product reduction stage. The proposed multiplier designs are targeted towards error-tolerant applications. Exhaustive error and hardware analysis has been carried out on the existing and proposed multiplier designs. The results prove that the proposed approximate multiplier architecture performs better than the existing architectures without significant compromise on quality metrics. Experimental results show that die-area and power consumed are reduced upto 28%, and 25.29% respectively in comparison with the existing designs without significant compromise on accuracy

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Energy-efficient embedded machine learning algorithms for smart sensing systems

Author: OSTA MARIO
Publication venue: Universit\ue0 degli studi di Genova
Publication date: 27/02/2020
Field of study

Embedded autonomous electronic systems are required in numerous application domains such as Internet of Things (IoT), wearable devices, and biomedical systems. Embedded electronic systems usually host sensors, and each sensor hosts multiple input channels (e.g., tactile, vision), tightly coupled to the electronic computing unit (ECU). The ECU extracts information by often employing sophisticated methods, e.g., Machine Learning. However, embedding Machine Learning algorithms poses essential challenges in terms of hardware resources and energy consumption because of: 1) the high amount of data to be processed; 2) computationally demanding methods. Leveraging on the trade-off between quality requirements versus computational complexity and time latency could reduce the system complexity without affecting the performance. The objectives of the thesis are to develop: 1) energy-efficient arithmetic circuits outperforming state of the art solutions for embedded machine learning algorithms, 2) an energy-efficient embedded electronic system for the \u201celectronic-skin\u201d (e-skin) application. As such, this thesis exploits two main approaches: Approximate Computing: In recent years, the approximate computing paradigm became a significant major field of research since it is able to enhance the energy efficiency and performance of digital systems. \u201cApproximate Computing\u201d(AC) turned out to be a practical approach to trade accuracy for better power, latency, and size . AC targets error-resilient applications and offers promising benefits by conserving some resources. Usually, approximate results are acceptable for many applications, e.g., tactile data processing,image processing , and data mining ; thus, it is highly recommended to take advantage of energy reduction with minimal variation in performance . In our work, we developed two approximate multipliers: 1) the first one is called \u201cMETA\u201d multiplier and is based on the Error Tolerant Adder (ETA), 2) the second one is called \u201cApproximate Baugh-Wooley(BW)\u201d multiplier where the approximations are implemented in the generation of the partial products. We showed that the proposed approximate arithmetic circuits could achieve a relevant reduction in power consumption and time delay around 80.4% and 24%, respectively, with respect to the exact BW multiplier. Next, to prove the feasibility of AC in real world applications, we explored the approximate multipliers on a case study as the e-skin application. The e-skin application is defined as multiple sensing components, including 1) structural materials, 2) signal processing, 3) data acquisition, and 4) data processing. Particularly, processing the originated data from the e-skin into low or high-level information is the main problem to be addressed by the embedded electronic system. Many studies have shown that Machine Learning is a promising approach in processing tactile data when classifying input touch modalities. In our work, we proposed a methodology for evaluating the behavior of the system when introducing approximate arithmetic circuits in the main stages (i.e., signal and data processing stages) of the system. Based on the proposed methodology, we first implemented the approximate multipliers on the low-pass Finite Impulse Response (FIR) filter in the signal processing stage of the application. We noticed that the FIR filter based on (Approx-BW) outperforms state of the art solutions, while respecting the tradeoff between accuracy and power consumption, with an SNR degradation of 1.39dB. Second, we implemented approximate adders and multipliers respectively into the Coordinate Rotational Digital Computer (CORDIC) and the Singular Value Decomposition (SVD) circuits; since CORDIC and SVD take a significant part of the computationally expensive Machine Learning algorithms employed in tactile data processing. We showed benefits of up to 21% and 19% in power reduction at the cost of less than 5% accuracy loss for CORDIC and SVD circuits when scaling the number of approximated bits. 2) Parallel Computing Platforms (PCP): Exploiting parallel architectures for near-threshold computing based on multi-core clusters is a promising approach to improve the performance of smart sensing systems. In our work, we exploited a novel computing platform embedding a Parallel Ultra Low Power processor (PULP), called \u201cMr. Wolf,\u201d for the implementation of Machine Learning (ML) algorithms for touch modalities classification. First, we tested the ML algorithms at the software level; for RGB images as a case study and tactile dataset, we achieved accuracy respectively equal to 97% and 83.5%. After validating the effectiveness of the ML algorithm at the software level, we performed the on-board classification of two touch modalities, demonstrating the promising use of Mr. Wolf for smart sensing systems. Moreover, we proposed a memory management strategy for storing the needed amount of trained tensors (i.e., 50 trained tensors for each class) in the on-chip memory. We evaluated the execution cycles for Mr. Wolf using a single core, 2 cores, and 3 cores, taking advantage of the benefits of the parallelization. We presented a comparison with the popular low power ARM Cortex-M4F microcontroller employed, usually for battery-operated devices. We showed that the ML algorithm on the proposed platform runs 3.7 times faster than ARM Cortex M4F (STM32F40), consuming only 28 mW. The proposed platform achieves 15 7 better energy efficiency than the classification done on the STM32F40, consuming 81mJ per classification and 150 pJ per operation

Archivio istituzionale della ricerca - Università di Genova

High-Performance Accurate and Approximate Multipliers for FPGA-Based Hardware Accelerators

Author: Kumar Akash
Rehman Semeen
Shafique Muhammad
Ullah Salim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/02/2023
Field of study

Multiplication is one of the widely used arithmetic operations in a variety of applications, such as image/video processing and machine learning. FPGA vendors provide high-performance multipliers in the form of DSP blocks. These multipliers are not only limited in number and have fixed locations on FPGAs but can also create additional routing delays and may prove inefficient for smaller bit-width multiplications. Therefore, FPGA vendors additionally provide optimized soft IP cores for multiplication. However, in this work, we advocate that these soft multiplier IP cores for FPGAs still need better designs to provide high-performance and resource efficiency. Toward this, we present generic area-optimized, low-latency accurate, and approximate softcore multiplier architectures, which exploit the underlying architectural features of FPGAs, i.e., lookup table (LUT) structures and fast-carry chains to reduce the overall critical path delay (CPD) and resource utilization of multipliers. Compared to Xilinx multiplier LogiCORE IP, our proposed unsigned and signed accurate architecture provides up to 25% and 53% reduction in LUT utilization, respectively, for different sizes of multipliers. Moreover, with our unsigned approximate multiplier architectures, a reduction of up to 51% in the CPD can be achieved with an insignificant loss in output accuracy when compared with the LogiCORE IP. For illustration, we have deployed the proposed multiplier architecture in accelerators used in image and video applications, and evaluated them for area and performance gains. Our library of accurate and approximate multipliers is opensource and available online at https://cfaed.tu-dresden.de/pd-downloads to fuel further research and development in this area, facilitate reproducible research, and thereby enabling a new research direction for the FPGA community

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Two examples of approximate arithmetic to reduce hardware complexity and power consumption

Author: Altet Sanahujes Josep
Calomarde Palomino Antonio
Etxezarreta Imanol
Fontova Pau
Fornt Mas Jordi
Jin Leixin
Moll Echeto Francisco de Borja
Morancho Llena Enrique
Rubio Sola Jose Antonio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.As the end of Moore's Law approaches, electronic system designers must find ways to keep up with the ever increasing computational demands of the modern era. Some computationally intensive applications, such as multimedia processing, computer vision and artificial intelligence, present a unique feature that makes them especially suitable for hardware-level optimizations: their inherent robustness to noise and errors. This allows circuit designers to relax the constraint that arithmetic operations, such as multiplications and additions, must be completely accurate. Instead, approximations can be used in the arithmetic units, enabling system-level reductions in hardware area and power consumption, as well as improvements in performance, while hardly affecting the output of the final application. In this work, we explore two approximate arithmetic techniques. First, we consider approximations at the circuit design level by implementing several approximate multiplier units and evaluating their accuracy when used in executing YOLOv3, a state-of-the-art camera-based object detection deep neural network. Second, we apply the technique of overscaling to induce approximations in adder circuits by aggressively undervoltaging and overclocking them, and we compare the behavior of exact and approximate adders under these conditions. We find that, on one hand, some approximate multipliers are able to execute the YOLO network with almost no effect on the results, and on the other, approximate adder circuits are much more resilient to overscaling techniques than exact adders.This work was partially supported by Spanish MCIN/AEI/10.13039/501100011033, Project PID2019-103869RB-C33.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Design and evaluation of approximate arithmetic units for machine learning

Author: Miguel José Nunes Almeida
Publication venue
Publication date: 11/10/2023
Field of study

Repositório Aberto da Universidade do Porto

Improving the Hardware Performance of Arithmetic Circuits using Approximate Computing

Author: Adams Lizzie M
Publication venue: 'University of Saskatchewan Library'
Publication date: 16/03/2021
Field of study

An application that can produce a useful result despite some level of computational error is said to be error resilient. Approximate computing can be applied to error resilient applications by intentionally introducing error to the computation in order to improve performance, and it has been shown that approximation is especially well-suited for application in arithmetic computing hardware. In this thesis, novel approximate arithmetic architectures are proposed for three different operations, namely multiplication, division, and the multiply accumulate (MAC) operation. For all designs, accuracy is evaluated in terms of mean relative error distance (MRED) and normalized mean error distance (NMED), while hardware performance is reported in terms of critical path delay, area, and power consumption. Three approximate Booth multipliers (ABM-M1, ABM-M2, ABM-M3) are designed in which two novel inexact partial product generators are used to reduce the dimensions of the partial product matrix. The proposed multipliers are compared to other state-of-the-art designs in terms of both accuracy and hardware performance, and are found to reduce power consumption by up to 56% when compared to the exact multiplier. The function of the multipliers is verified in several image processing applications. Two approximate restoring dividers (AXRD-M1, AXRD-M2) are proposed along with a novel inexact restoring divider cell. In the first divider, the conventional cells are replaced with the proposed inexact cells in several columns. The second divider computes only a subset of the trial subtractions, after which the divisor and partial remainder are rounded and encoded so that they may be used to estimate the remaining quotient bits. The proposed dividers are evaluated for accuracy and hardware performance alongside several benchmarking designs, and their function is verified using change detection and foreground extraction applications. An approximate MAC unit is presented in which the multiplication is implemented using a modified version of ABM-M3. The delay is reduced by using a fused architecture where the accumulator is summed as part of the multiplier compression. The accuracy and hardware savings of the MAC unit are measured against several works from the literature, and the design is utilized in a number of convolution operations

University of Saskatchewan Research Archive