Search CORE

250 research outputs found

Assessing Approximate Arithmetic Designs in the presence of Process Variations and Voltage Scaling

Author: Naseer Adnan Aquib
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2015
Field of study

As environmental concerns and portability of electronic devices move to the forefront of priorities, innovative approaches which reduce processor energy consumption are sought. Approximate arithmetic units are one of the avenues whereby significant energy savings can be achieved. Approximation of fundamental arithmetic units is achieved by judiciously reducing the number of transistors in the circuit. A satisfactory tradeoff of energy vs. accuracy of the circuit can be determined by trial-and-error methods of each functional approximation. Although the accuracy of the output is compromised, it is only decreased to an acceptable extent that can still fulfill processing requirements. A number of scenarios are evaluated with approximate arithmetic units to thoroughly cross-check them with their accurate counterparts. Some of the attributes evaluated are energy consumption, delay and process variation. Additionally, novel methods to create such approximate units are developed. One such method developed uses a Genetic Algorithm (GA), which mimics the biologically-inspired evolutionary techniques to obtain an optimal solution. A GA employs genetic operators such as crossover and mutation to mix and match several different types of approximate adders to find the best possible combination of such units for a given input set. As the GA usually consumes a significant amount of time as the size of the input set increases, we tackled this problem by using various methods to parallelize the fitness computation process of the GA, which is the most compute intensive task. The parallelization improved the computation time from 2,250 seconds to 1,370 seconds for up to 8 threads, using both OpenMP and Intel TBB. Apart from using the GA with seeded multiple approximate units, other seeds such as basic logic gates with limited logic space were used to develop completely new multi-bit approximate adders with good fitness levels. iii The effect of process variation was also calculated. As the number of transistors is reduced, the distribution of the transistor widths and gate oxide may shift away from a Gaussian Curve. This result was demonstrated in different types of single-bit adders with the delay sigma increasing from 6psec to 12psec, and when the voltage is scaled to Near-Threshold-Voltage (NTV) levels sigma increases by up to 5psec. Approximate Arithmetic Units were not affected greatly by the change in distribution of the thickness of the gate oxide. Even when considering the 3-sigma value, the delay of an approximate adder remains below that of a precise adder with additional transistors. Additionally, it is demonstrated that the GA obtains innovative solutions to the appropriate combination of approximate arithmetic units, to achieve a good balance between energy savings and accuracy

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Method and system for fault accommodation of machines

Author: Frederick Dean Kimball
Goebel Kai Frank
Rausch Randal Thomas
Subbu Rajesh Venkat
Publication venue
Publication date: 08/03/2011
Field of study

A method for multi-objective fault accommodation using predictive modeling is disclosed. The method includes using a simulated machine that simulates a faulted actual machine, and using a simulated controller that simulates an actual controller. A multi-objective optimization process is performed, based on specified control settings for the simulated controller and specified operational scenarios for the simulated machine controlled by the simulated controller, to generate a Pareto frontier-based solution space relating performance of the simulated machine to settings of the simulated controller, including adjustment to the operational scenarios to represent a fault condition of the simulated machine. Control settings of the actual controller are adjusted, represented by the simulated controller, for controlling the actual machine, represented by the simulated machine, in response to a fault condition of the actual machine, based on the Pareto frontier-based solution space, to maximize desirable operational conditions and minimize undesirable operational conditions while operating the actual machine in a region of the solution space defined by the Pareto frontier

NASA Technical Reports Server

Linear-Phase FIR Digital Filter ‎Design with Reduced Hardware Complexity using Discrete Differential Evolution

Author: Rehan Muhammed Kunwar
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2016
Field of study

Optimal design of xed coe cient nite word length linear phase FIR digital lters for custom ICs has been the focus of research in the past decade. With the ever increasing demands for high throughput and low power circuits, the need to design lters with reduced hardware complexity has become more crucial. Multiplierless lters provide substantial saving in hardware by using a shift add network to generate the lter coe cients. In this thesis, the multiplierless lter design problem is modeled as combinatorial optimization problem and is solved using a discrete Di erential Evolution algorithm. The Di erential Evolution algorithm\u27s population representation adapted for the nite word length lter design problem is developed and the mutation operator is rede ned for discrete valued parameters. Experiments show that the method is able to design lters up to a length of 300 taps with reduced hardware and shorter design times

Scholarship at UWindsor

Digital Filters

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The new technology advances provide that a great number of system signals can be easily measured with a low cost. The main problem is that usually only a fraction of the signal is useful for different purposes, for example maintenance, DVD-recorders, computers, electric/electronic circuits, econometric, optimization, etc. Digital filters are the most versatile, practical and effective methods for extracting the information necessary from the signal. They can be dynamic, so they can be automatically or manually adjusted to the external and internal conditions. Presented in this book are the most advanced digital filters including different case studies and the most relevant literature

Directory of Open Access Books (DOAB)

Automated Design of Approximate Accelerators

Author: Castro-Godínez Jorge
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 04/01/2021
Field of study

In den letzten zehn Jahren hat das Bedürfnis nach Recheneffizienz die Entwicklung neuer Geräte, Architekturen und Entwurfstechniken motiviert. Approximate Computing hat sich als modernes, energieeffizientes Entwurfsparadigma für Anwendungen herausgestellt, die eine inhärente Fehlertoleranz aufweisen. Wenn die Genauigkeit der Ergebnisse in aktuellen Anwendungen wie Bildverarbeitung, Computer Vision und maschinellem Lernen auf ein akzeptables Maß reduziert wird, können Einsparungen im Schaltungsbereich, bei der Schaltkreisverzögerung und beim Stromverbrauch erzielt werden. Mit dem Aufkommen dieses Approximate Computing Paradigmas wurden in der Literatur viele approximierte Funktionseinheiten angegeben, insbesondere approximierte Addierer und Multiplizierer. Für eine Vielzahl solcher approximierter Schaltkreise und unter Berücksichtigung ihrer Verwendung als Bausteine für den Entwurf von approximierten Beschleunigern für fehlertolerante Anwendungen, ergibt sich eine Herausforderung: die Auswahl dieser approximierten Schaltkreise für eine bestimmte Anwendung, die die erforderlichen Ressourcen minimieren und gleichzeitig eine definierte Genauigkeit erfüllen. Diese Dissertation schlägt automatisierte Methoden zum Entwerfen und Implementieren von approximierten Beschleunigern vor, die aus approximierten arithmetischen Schaltungen aufgebaut sind. Um dies zu erreichen, befasst sich diese Dissertation mit folgenden Herausforderungen und liefert die nachfolgenden neuartigen Beiträge: In der Literatur wurden viele approximierte Addierer und Multiplizierer vorgestellt, indem entweder approximierte Entwürfe aus genauen Implementierungen wie dem Ripple-Carry-Addierer vorgeschlagen oder durch Approximate Logic Synthesis (ALS) Methoden generiert wurden. Ein repräsentativer Satz dieser approximierten Komponenten ist erforderlich, um approximierte Beschleuniger zu bauen. In diesem Sinne präsentiert diese Dissertation zwei Ansätze, um solche approximierte arithmetische Schaltungen zu erstellen. Zunächst wird AUGER vorgestellt, ein Tool, mit dem Register-Transfer Level (RTL) Beschreibungen für einen breiten Satz von approximierten Addierern und Multiplizierer für unterschiedliche Datenbitbreiten- und Genauigkeitskonfigurationen generiert werden können. Mit AUGER kann eine Design Space Exploration (DSE) von approximierten Komponenten durchgeführt werden, um diejenigen zu finden, die für eine gegebene Bitbreite, einen gegebenen Approximationsbereich und eine gegebene Schaltungsmetrik Pareto-optimal sind. Anschließend wird AxLS vorgestellt, ein Framework für ALS, das die Implementierung modernster Methoden und den Vorschlag neuartiger Methoden ermöglicht, um strukturelle Netzlistentransformationen durchzuführen und approximierte arithmetische Schaltungen aus genauen Schaltungen zu generieren. Darüber hinaus bieten beide Werkzeuge eine Fehlercharakterisierung in Form einer Fehlerverteilung und Schaltungseigenschaften (Fläche, Schaltkreisverzögerung und Leistung) für jede von ihnen erzeugte approximierte Schaltung. Diese Informationen sind für das Untersuchungsziel dieser Dissertation von wesentlicher Bedeutung. Trotz der Fehlertoleranz müssen approximierte Beschleuniger so ausgelegt sein, dass sie Genauigkeitsvorgaben erfüllen. Für den Entwurf solcher Beschleuniger unter Verwendung von approximierten arithmetischen Schaltungen ist es daher unerlässlich zu bewerten, wie sich die durch approximierte Schaltungen verursachten Fehler durch andere Berechnungen ausbreiten, entweder genau oder ungenau, und sich schließlich am Ausgang ansammeln. Diese Dissertation schlägt analytische Modelle vor, um die Fehlerpropagation durch genaue und approximierte Berechnungen zu beschreiben. Mit ihnen wird eine automatisierte, compilerbasierte Methodik vorgeschlagen, um die Fehlerpropagation auf approximierten Beschleunigerdesigns abzuschätzen. Diese Methode ist in ein Tool, CEDA, integriert, um schnelle, simulationsfreie Genauigkeitsschätzungen von approximierten Beschleunigermodellen durchzuführen, die unter Verwendung von C-Code beschrieben wurden. Beim Entwurf von approximierten Beschleunigern benötigen sich wiederholende Simulationen auf Gate-Level und die Schaltungssynthese viel Zeit, um viele oder sogar alle möglichen Kombinationen für einen gegebenen Satz von approximierten arithmetischen Schaltungen zu untersuchen. Andererseits basieren aktuelle Trends beim Entwerfen von Beschleunigern auf High-Level Synthesis (HLS) Werkzeugen. In dieser Dissertation werden analytische Modelle zur Schätzung der erforderlichen Rechenressourcen vorgestellt, wenn approximierte Addierer und Multiplizierer in Konstruktionen von approximierten Beschleunigern verwendet werden. Darüber hinaus werden diese Modelle zusammen mit den vorgeschlagenen analytischen Modellen zur Genauigkeitsschätzung in eine DSE-Methodik für fehlertolerante Anwendungen, DSEwam, integriert, um Pareto-optimale oder nahezu Pareto-optimale Lösungen für approximierte Beschleuniger zu identifizieren. DSEwam ist in ein HLS-Tool integriert, um automatisch RTL-Beschreibungen von approximierten Beschleunigern aus C-Sprachbeschreibungen für eine bestimmte Fehlerschwelle und ein bestimmtes Minimierungsziel zu generieren. Die Verwendung von approximierten Beschleunigern muss sicherstellen, dass Fehler, die aufgrund von approximierten Berechnungen erzeugt werden, innerhalb eines definierten Maximalwerts für eine gegebene Genauigkeitsmetrik bleiben. Die Fehler, die durch approximierte Beschleuniger erzeugt werden, hängen jedoch von den Eingabedaten ab, die hinsichtlich der für das Design verwendeten Daten unterschiedlich sein können. In dieser Dissertation wird ECAx vorgestellt, eine automatisierte Methode zur Untersuchung und Anwendung feinkörniger Fehlerkorrekturen mit geringem Overhead in approximierten Beschleunigern, um die Kosten für die Fehlerkorrektur auf Softwareebene (wie es in der Literatur gemacht wird) zu senken. Dies erfolgt durch selektive Korrektur der signifikantesten Fehler (in Bezug auf ihre Größenordnung), die von approximierten Komponenten erzeugt werden, ohne die Vorteile der Approximationen zu verlieren. Die experimentelle Auswertung zeigt Beschleunigungsverbesserungen für die Anwendung im Austausch für einen leicht gestiegenen Flächen- und Leistungsverbrauch im approximierten Beschleunigerdesign

KITopen

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Approximation Opportunities in Edge Computing Hardware : A Systematic Literature Review

Author: Damsgaard Hans Jakob
Nurmi Jari
Ometov Aleksandr
Publication venue
Publication date: 29/11/2022
Field of study

With the increasing popularity of the Internet of Things and massive Machine Type Communication technologies, the number of connected devices is rising. However, while enabling valuable effects to our lives, bandwidth and latency constraints challenge Cloud processing of their associated data amounts. A promising solution to these challenges is the combination of Edge and approximate computing techniques that allows for data processing nearer to the user. This paper aims to survey the potential benefits of these paradigms’ intersection. We provide a state-of-the-art review of circuit-level and architecture-level hardware techniques and popular applications. We also outline essential future research directions.publishedVersionPeer reviewe

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Trepo - Institutional Repository of Tampere University

Using reconfigurable computing technology to accelerate matrix decomposition and applications

Author: Wang Xinying
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2016
Field of study

Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications. The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions: • We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices. • We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices. • We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each. • We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. • By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture. • We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update. Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time

Digital Repository @ Iowa State University (ISU)

High-speed fir filter design and optimization using artificial intelligence techniques

Author: CEN LING
Publication venue
Publication date: 20/11/2006
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS