Search CORE

483 research outputs found

VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing

Author: Ardakani Arash
Gross Warren J.
Hanyu Takahiro
Leduc-Primeau François
Onizawa Naoya
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

The hardware implementation of deep neural networks (DNNs) has recently received tremendous attention: many applications in fact require high-speed operations that suit a hardware implementation. However, numerous elements and complex interconnections are usually required, leading to a large area occupation and copious power consumption. Stochastic computing has shown promising results for low-power area-efficient hardware implementations, even though existing stochastic algorithms require long streams that cause long latencies. In this paper, we propose an integer form of stochastic computation and introduce some elementary circuits. We then propose an efficient implementation of a DNN based on integral stochastic computing. The proposed architecture has been implemented on a Virtex7 FPGA, resulting in 45% and 62% average reductions in area and latency compared to the best reported architecture in literature. We also synthesize the circuits in a 65 nm CMOS technology and we show that the proposed integral stochastic architecture results in up to 21% reduction in energy consumption compared to the binary radix implementation at the same misclassification rate. Due to fault-tolerant nature of stochastic architectures, we also consider a quasi-synchronous implementation which yields 33% reduction in energy consumption w.r.t. the binary radix implementation without any compromise on performance.Comment: 11 pages, 12 figure

arXiv.org e-Print Archive

HAL-Université de Bretagne Occidentale

PolyPublie

Automated Design of Approximate Accelerators

Author: Castro-Godínez Jorge
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 04/01/2021
Field of study

In den letzten zehn Jahren hat das Bedürfnis nach Recheneffizienz die Entwicklung neuer Geräte, Architekturen und Entwurfstechniken motiviert. Approximate Computing hat sich als modernes, energieeffizientes Entwurfsparadigma für Anwendungen herausgestellt, die eine inhärente Fehlertoleranz aufweisen. Wenn die Genauigkeit der Ergebnisse in aktuellen Anwendungen wie Bildverarbeitung, Computer Vision und maschinellem Lernen auf ein akzeptables Maß reduziert wird, können Einsparungen im Schaltungsbereich, bei der Schaltkreisverzögerung und beim Stromverbrauch erzielt werden. Mit dem Aufkommen dieses Approximate Computing Paradigmas wurden in der Literatur viele approximierte Funktionseinheiten angegeben, insbesondere approximierte Addierer und Multiplizierer. Für eine Vielzahl solcher approximierter Schaltkreise und unter Berücksichtigung ihrer Verwendung als Bausteine für den Entwurf von approximierten Beschleunigern für fehlertolerante Anwendungen, ergibt sich eine Herausforderung: die Auswahl dieser approximierten Schaltkreise für eine bestimmte Anwendung, die die erforderlichen Ressourcen minimieren und gleichzeitig eine definierte Genauigkeit erfüllen. Diese Dissertation schlägt automatisierte Methoden zum Entwerfen und Implementieren von approximierten Beschleunigern vor, die aus approximierten arithmetischen Schaltungen aufgebaut sind. Um dies zu erreichen, befasst sich diese Dissertation mit folgenden Herausforderungen und liefert die nachfolgenden neuartigen Beiträge: In der Literatur wurden viele approximierte Addierer und Multiplizierer vorgestellt, indem entweder approximierte Entwürfe aus genauen Implementierungen wie dem Ripple-Carry-Addierer vorgeschlagen oder durch Approximate Logic Synthesis (ALS) Methoden generiert wurden. Ein repräsentativer Satz dieser approximierten Komponenten ist erforderlich, um approximierte Beschleuniger zu bauen. In diesem Sinne präsentiert diese Dissertation zwei Ansätze, um solche approximierte arithmetische Schaltungen zu erstellen. Zunächst wird AUGER vorgestellt, ein Tool, mit dem Register-Transfer Level (RTL) Beschreibungen für einen breiten Satz von approximierten Addierern und Multiplizierer für unterschiedliche Datenbitbreiten- und Genauigkeitskonfigurationen generiert werden können. Mit AUGER kann eine Design Space Exploration (DSE) von approximierten Komponenten durchgeführt werden, um diejenigen zu finden, die für eine gegebene Bitbreite, einen gegebenen Approximationsbereich und eine gegebene Schaltungsmetrik Pareto-optimal sind. Anschließend wird AxLS vorgestellt, ein Framework für ALS, das die Implementierung modernster Methoden und den Vorschlag neuartiger Methoden ermöglicht, um strukturelle Netzlistentransformationen durchzuführen und approximierte arithmetische Schaltungen aus genauen Schaltungen zu generieren. Darüber hinaus bieten beide Werkzeuge eine Fehlercharakterisierung in Form einer Fehlerverteilung und Schaltungseigenschaften (Fläche, Schaltkreisverzögerung und Leistung) für jede von ihnen erzeugte approximierte Schaltung. Diese Informationen sind für das Untersuchungsziel dieser Dissertation von wesentlicher Bedeutung. Trotz der Fehlertoleranz müssen approximierte Beschleuniger so ausgelegt sein, dass sie Genauigkeitsvorgaben erfüllen. Für den Entwurf solcher Beschleuniger unter Verwendung von approximierten arithmetischen Schaltungen ist es daher unerlässlich zu bewerten, wie sich die durch approximierte Schaltungen verursachten Fehler durch andere Berechnungen ausbreiten, entweder genau oder ungenau, und sich schließlich am Ausgang ansammeln. Diese Dissertation schlägt analytische Modelle vor, um die Fehlerpropagation durch genaue und approximierte Berechnungen zu beschreiben. Mit ihnen wird eine automatisierte, compilerbasierte Methodik vorgeschlagen, um die Fehlerpropagation auf approximierten Beschleunigerdesigns abzuschätzen. Diese Methode ist in ein Tool, CEDA, integriert, um schnelle, simulationsfreie Genauigkeitsschätzungen von approximierten Beschleunigermodellen durchzuführen, die unter Verwendung von C-Code beschrieben wurden. Beim Entwurf von approximierten Beschleunigern benötigen sich wiederholende Simulationen auf Gate-Level und die Schaltungssynthese viel Zeit, um viele oder sogar alle möglichen Kombinationen für einen gegebenen Satz von approximierten arithmetischen Schaltungen zu untersuchen. Andererseits basieren aktuelle Trends beim Entwerfen von Beschleunigern auf High-Level Synthesis (HLS) Werkzeugen. In dieser Dissertation werden analytische Modelle zur Schätzung der erforderlichen Rechenressourcen vorgestellt, wenn approximierte Addierer und Multiplizierer in Konstruktionen von approximierten Beschleunigern verwendet werden. Darüber hinaus werden diese Modelle zusammen mit den vorgeschlagenen analytischen Modellen zur Genauigkeitsschätzung in eine DSE-Methodik für fehlertolerante Anwendungen, DSEwam, integriert, um Pareto-optimale oder nahezu Pareto-optimale Lösungen für approximierte Beschleuniger zu identifizieren. DSEwam ist in ein HLS-Tool integriert, um automatisch RTL-Beschreibungen von approximierten Beschleunigern aus C-Sprachbeschreibungen für eine bestimmte Fehlerschwelle und ein bestimmtes Minimierungsziel zu generieren. Die Verwendung von approximierten Beschleunigern muss sicherstellen, dass Fehler, die aufgrund von approximierten Berechnungen erzeugt werden, innerhalb eines definierten Maximalwerts für eine gegebene Genauigkeitsmetrik bleiben. Die Fehler, die durch approximierte Beschleuniger erzeugt werden, hängen jedoch von den Eingabedaten ab, die hinsichtlich der für das Design verwendeten Daten unterschiedlich sein können. In dieser Dissertation wird ECAx vorgestellt, eine automatisierte Methode zur Untersuchung und Anwendung feinkörniger Fehlerkorrekturen mit geringem Overhead in approximierten Beschleunigern, um die Kosten für die Fehlerkorrektur auf Softwareebene (wie es in der Literatur gemacht wird) zu senken. Dies erfolgt durch selektive Korrektur der signifikantesten Fehler (in Bezug auf ihre Größenordnung), die von approximierten Komponenten erzeugt werden, ohne die Vorteile der Approximationen zu verlieren. Die experimentelle Auswertung zeigt Beschleunigungsverbesserungen für die Anwendung im Austausch für einen leicht gestiegenen Flächen- und Leistungsverbrauch im approximierten Beschleunigerdesign

KITopen

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

A Genetic-algorithm-based Approach to the Design of DCT Hardware Accelerators

Author: Barbareschi M.
Barone S.
Bosio A.
Han J.
Traiola M.
Publication venue
Publication date: 01/01/2022
Field of study

As modern applications demand an unprecedented level of computational resources, traditional computing system design paradigms are no longer adequate to guarantee significant performance enhancement at an affordable cost. Approximate Computing (AxC) has been introduced as a potential candidate to achieve better computational performances by relaxing non-critical functional system specifications. In this article, we propose a systematic and high-abstraction-level approach allowing the automatic generation of near Pareto-optimal approximate configurations for a Discrete Cosine Transform (DCT) hardware accelerator. We obtain the approximate variants by using approximate operations, having configurable approximation degree, rather than full-precise ones. We use a genetic searching algorithm to find the appropriate tuning of the approximation degree, leading to optimal tradeoffs between accuracy and gains. Finally, to evaluate the actual HW gains, we synthesize non-dominated approximate DCT variants for two different target technologies, namely, Field Programmable Gate Arrays (FPGAs) and Application Specific Integrated Circuits (ASICs). Experimental results show that the proposed approach allows performing a meaningful exploration of the design space to find the best tradeoffs in a reasonable time. Indeed, compared to the state-of-the-art work on approximate DCT, the proposed approach allows an 18% average energy improvement while providing at the same time image quality improvement

Archivio della ricerca - Università degli studi di Napoli Federico II

A FRAMEWORK FOR OPTIMAL DESIGN OF LOW-POWER FIR FILTERS

Author: Banerjee Aparajita
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2015
Field of study

Approximate Computing has emerged as a new low-power design approach for application domains characterized by intrinsic error resilience. Digital Signal Processing (DSP) is one such domain where outputs of acceptable quality can be produced even though the internal computations are carried out in an approximate manner. With the ever increasing need for data rates at lower power usage; the need for improved complexity reduction schemes for DSP systems continues. One of the most widely performed steps in DSP is FIR filtering. FIR filters are preferred due to their linea

Purdue E-Pubs

Calcul approximatif à haute efficacité énergétique pour des applications de l'internet des objets

Author: Ndour Geneviève
Publication venue: HAL CCSD
Publication date: 15/07/2019
Field of study

Reduced width units are ones of the power reduction methods. However such units have been mostly evaluated separately, i.e. not evaluated in a complete applications. In this thesis, we extend the RISC-V processor with reduced width computation and memory units, in which only a number of most significant bits (MSBs), configurable at runtime is active. The energy reduction vs quality of output trade-offs of applications executed with the extended RISC-V are studied. The results indicate that the energy can be reduced by up to 14% for an error ≤ 0.1%. Moreover we propose a generic energy model that includes both software parameters and hardware architecture ones. It allows software and hardware designers to have an early insight into the effects of optimizations on software and/or units.Les unités à taille réduite font partie des méthodes proposées pour la réduction de la consommation d’énergie. Cependant, la plupart de ces unités sont évaluées séparément,c’est-à-dire elles ne sont pas évaluées dans une application complète. Dans cette thèse, des unités à taille réduite pour le calcul et pour l’accès à la mémoire de données, configurables au moment de l’exécution, sont intégrées dans un processeur RISC-V. La réduction d’énergie et la qualité de sortie des applications exécutées sur le processeur RISC-V étendu avec ces unités, sont évaluées. Les résultats indiquent que la consommation d’énergie peut être réduite jusqu’à 14% pour une erreur ≤0.1%. De plus, nous avons proposé un modèle d’énergie générique qui inclut à la fois des paramètres logiciels et architecturaux. Le modèle permet aux concepteurs logiciels et matériels d’avoir un aperçu rapide sur l’impact des optimisations effectuées sur le code source et/ou sur les unités de calcul

Repository of the University of Rijeka

Croatian Digital Thesis Repository

Repository of the University of Rijeka, Faculty of Tourism and Hospitality Management, Opatija

Evolutionary design of digital VLSI hardware

Author: Thomson Robert
Publication venue: The University of Edinburgh
Publication date: 01/01/2005
Field of study

Edinburgh Research Archive

Investigation of reconfigurable-accuracy approximate adder designs for image processing applications

Author: Al-Ma'aitah Khaled Suleiman
Publication venue: Newcastle University
Publication date: 01/01/2019
Field of study

Ph. D. Thesis.In the last decades, integrated circuits with CMOS technology show progressive scaling challenges of both increased power density and power dissipation. Meanwhile, high-performance requirements of current and future application operations show rapid demands of computing resources like power. This design conflict has pushed much effort to search for high performance and energy efficient design approach, such as approximate computing. Approximate computing exploits the error resilience of compute- intensive applications such as image processing applications to implement approximation design techniques with different levels of abstractions and scalability. The basic principle is to relax the strict accuracy requirements in favour of a lower design complexity, thereby achieving more computational performance (i.e., speed) and energy saving. The adder arithmetic unit is considered one of the essential computational blocks in most of the applications. As such, much effort has explored new designs of an efficient approximate adder design. This thesis presents an investigation into design enhancement, novel approximate adder designs and implementation approaches. The first approach introduces a modification to the error detection technique of a popular configurable-accuracy approximate adder design. The proposed lightweight error detection technique reduces the required gates of the error detection circuit, thus, mitigating the design area overhead. Furthermore, at the error correction process of the adder, we have proposed an extensive error detection while activating more than one correction stage concurrently. As a result, this ensures achieving an optimum accuracy of outputs for the worst case of quality requirements. In general, approximate (speculative) adder designs use the seg- mentation technique to divide the adder into multiple short length sub-adders which operate in parallel. Hence, this would limit the long chains of carry propagation and result in a better performance operations. However, the use of overlapped parts of sub-adders regarding a better carry speculation and then more accuracy be- comes a significant challenge of a large design area overhead. The second approach continues mitigating this challenge by present- ing a novel and simpler adder dividing technique to a number of sub-adders. The new method uses what is known as the carry-kill signal for both limiting the carry propagation and applying adder segmentation. Further, between every two adjacent sub-adders, one AND gate and one XOR gate are used for carry speculation and error (i.e., carry propagation) detection respectively. Thus, a significant reduction of the design overhead has been achieved, yet, with acceptable levels of output results accuracy. In the third final approach, simple logic OR gates are used to build the approximate adder while compensating the conventional full adders operation. The resulted approximate adder design presents very low complex- ity, high speed, and low power consumption. Furthermore, instead of augmenting error recovery circuit, short bit-length exact adders are used as correction stages to control the general level of output quality (i.e., without error detection overhead). At the final correc- tion stage, the proposed design would operate the same as an exact adder. To validate the efficiency of these approaches, a number of adders with different bit-widths are designed and synthesized showing considerable reductions in the critical delay, silicon area and more savings in energy consumption, compared to other existing ap- proaches. In addition to acceptable levels or output errors, which are extensively analysed for each proposed design. In this study, the proposed configurable adder designs exhibit energy/quality trade-offs at a different number of correction stages. These trade-offs can be effectively exploited to implement adders in applications, where energy can be gracefully minimised within the envelope of quality requirements. As such, designs implemen- tation in an image processing application known as Gaussian blur filter was introduced, demonstrating the loss in the image quality at each error correction stage. The output images showed promis- ing results to use the proposed designs for more energy-efficient applications, where output quality requirements can be relaxed.Mutah Universit

Newcastle University eTheses

Practical Techniques for Improving Performance and Evaluating Security on Circuit Designs

Author: Xu Wenbin
Publication venue
Publication date: 20/11/2019
Field of study

As the modern semiconductor technology approaches to nanometer era, integrated circuits (ICs) are facing more and more challenges in meeting performance demand and security. With the expansion of markets in mobile and consumer electronics, the increasing demands require much faster delivery of reliable and secure IC products. In order to improve the performance and evaluate the security of emerging circuits, we present three practical techniques on approximate computing, split manufacturing and analog layout automation. Approximate computing is a promising approach for low-power IC design. Although a few accuracy-configurable adder (ACA) designs have been developed in the past, these designs tend to incur large area overheads as they rely on either redundant computing or complicated carry prediction. We investigate a simple ACA design that contains no redundancy or error detection/correction circuitry and uses very simple carry prediction. The simulation results show that our design dominates the latest previous work on accuracy-delay-power tradeoff while using 39% less area. One variant of this design provides finer-grained and larger tunability than that of the previous works. Moreover, we propose a delay-adaptive self-configuration technique to further improve the accuracy-delay-power tradeoff. Split manufacturing prevents attacks from an untrusted foundry. The untrusted foundry has front-end-of-line (FEOL) layout and the original circuit netlist and attempts to identify critical components on the layout for Trojan insertion. Although defense methods for this scenario have been developed, the corresponding attack technique is not well explored. Hence, the defense methods are mostly evaluated with the k-security metric without actual attacks. We develop a new attack technique based on structural pattern matching. Experimental comparison with existing attack shows that the new attack technique achieves about the same success rate with much faster speed for cases without the k-security defense, and has a much better success rate at the same runtime for cases with the k-security defense. The results offer an alternative and practical interpretation for k-security in split manufacturing. Analog layout automation is still far behind its digital counterpart. We develop the layout automation framework for analog/mixed-signal ICs. A hierarchical layout synthesis flow which works in bottom-up manner is presented. To ensure the qualified layouts for better circuit performance, we use the constraint-driven placement and routing methodology which employs the expert knowledge via design constraints. The constraint-driven placement uses simulated annealing process to find the optimal solution. The packing represented by sequence pairs and constraint graphs can simultaneously handle different kinds of placement constraints. The constraint-driven routing consists of two stages, integer linear programming (ILP) based global routing and sequential detailed routing. The experiment results demonstrate that our flow can handle complicated hierarchical designs with multiple design constraints. Furthermore, the placement performance can be further improved by using mixed-size block placement which works on large blocks in priority

Highly Automated Formal Verification of Arithmetic Circuits

Author: Sayed-Ahmed Amr
Publication venue
Publication date: 01/01/2017
Field of study

This dissertation investigates the problems of two distinctive formal verification techniques for verifying large scale multiplier circuits and proposes two approaches to overcome some of these problems. The first technique is equivalence checking based on recurrence relations, while the second one is the symbolic computation technique which is based on the theory of Gröbner bases. This investigation demonstrates that approaches based on symbolic computation have better scalability and more robustness than state-of-the-art equivalence checking techniques for verification of arithmetic circuits. According to this conclusion, the thesis leverages the symbolic computation technique to verify floating-point designs. It proposes a new algebraic equivalence checking, in contrast to classical combinational equivalence checking, the proposed technique is capable of checking the equivalence of two circuits which have different architectures of arithmetic units as well as control logic parts, e.g., floating-point multipliers

E-LIB Dokumentserver - Staats und Universitätsbibliothek Bremen