Search CORE

64 research outputs found

Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques

Author: Armeniakos Giorgos
Hanif Muhammad Abdullah
Jiao Xun
Leon Vasileios
Pekmestzi Kiamal
Shafique Muhammad
Soudris Dimitrios
Publication venue
Publication date: 20/07/2023
Field of study

The rapid growth of demanding applications in domains applying multimedia processing and machine learning has marked a new era for edge and cloud computing. These applications involve massive data and compute-intensive tasks, and thus, typical computing paradigms in embedded systems and data centers are stressed to meet the worldwide demand for high performance. Concurrently, the landscape of the semiconductor field in the last 15 years has constituted power as a first-class design concern. As a result, the community of computing systems is forced to find alternative design approaches to facilitate high-performance and/or power-efficient computing. Among the examined solutions, Approximate Computing has attracted an ever-increasing interest, with research works applying approximations across the entire traditional computing stack, i.e., at software, hardware, and architectural levels. Over the last decade, there is a plethora of approximation techniques in software (programs, frameworks, compilers, runtimes, languages), hardware (circuits, accelerators), and architectures (processors, memories). The current article is Part I of our comprehensive survey on Approximate Computing, and it reviews its motivation, terminology and principles, as well it classifies and presents the technical details of the state-of-the-art software and hardware approximation techniques.Comment: Under Review at ACM Computing Survey

arXiv.org e-Print Archive

Parallélisme des nids de boucles pour l’optimisation du temps d’exécution et de la taille du code

Author: Elloumi Yaroub
Publication venue: HAL CCSD
Publication date: 16/12/2013
Field of study

The real time implementation algorithms always include nested loops which require important execution times. Thus, several nested loop parallelism techniques have been proposed with the aim of decreasing their execution times. These techniques can be classified in terms of granularity, which are the iteration level parallelism and the instruction level parallelism. In the case of the instruction level parallelism, the techniques aim to achieve a full parallelism. However, the loop carried dependencies implies shifting instructions in both side of nested loops. Consequently, these techniques provide implementations with non-optimal execution times and important code sizes, which represent limiting factors when implemented on embedded real-time systems. In this work, we are interested on enhancing the parallelism strategies of nested loops. The first contribution consists of purposing a novel instruction level parallelism technique, called “delayed multidimensional retiming”. It aims to scheduling the nested loops with the minimal cycle period, without achieving a full parallelism. The second contribution consists of employing the “delayed multidimensional retiming” when providing nested loop implementations on real time embedded systems. The aim is to respect an execution time constraint while using minimal code size. In this context, we proposed a first approach that selects the minimal instruction parallelism level allowing the execution time constraint respect. The second approach employs both instruction level parallelism and iteration level parallelism, by using the “delayed multidimensional retiming” and the “loop striping”Les algorithmes des systèmes temps réels incluent de plus en plus de nids de boucles, qui sont caractérisés par un temps d’exécution important. De ce fait, plusieurs démarches de parallélisme des boucles imbriquées ont été proposées dans l’objectif de réduire leurs temps d’exécution. Ces démarches peuvent être classifiées selon deux niveaux de granularité : le parallélisme au niveau des itérations et le parallélisme au niveau des instructions. Dans le cas du deuxième niveau de granularité, les techniques visent à atteindre un parallélisme total des instructions appartenant à une même itération. Cependant, le parallélisme est contraint par les dépendances des données inter-itérations ce qui implique le décalage des instructions à travers les boucles imbriquées, provocant ainsi une augmentation du code proportionnelle au niveau du parallélisme. Par conséquent, le parallélisme total au niveau des instructions des nids de boucles engendre des implémentations avec des temps d’exécution non-optimaux et des tailles du code importantes. Les travaux de cette thèse s’intéressent à l’amélioration des stratégies de parallélisme des nids de boucles. Une première contribution consiste à proposer une nouvelle technique de parallélisme au niveau des instructions baptisée « retiming multidimensionnel décalé ». Elle vise à ordonnancer les nids de boucles avec une période de cycle minimale, sans atteindre un parallélisme total. Une deuxième contribution consiste à mettre en pratique notre technique dans le contexte de l’implémentation temps réel embarquée des nids de boucles. L’objectif est de respecter la contrainte du temps d’exécution tout en utilisant un code de taille minimale. Dans ce contexte, nous avons proposé une première démarche d’optimisation qui consiste à utiliser notre technique pour déterminer le niveau parallélisme minimal. Par la suite, nous avons décrit une deuxième démarche permettant de combiner les parallélismes au niveau des instructions et au niveau des itérations, en utilisant notre technique et le « loop striping

Thèses en Ligne

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Design and application of reconfigurable circuits and systems

Author: Cheung Peter
Publication venue: Electrical & Electronic Engineering, Imperial College London
Publication date: 01/12/2015
Field of study

Open Acces

Spiral - Imperial College Digital Repository

Broadening the Scope of Multi-Objective Optimizations in Physical Synthesis of Integrated Circuits.

Author: Papa David Anthony
Publication venue
Publication date: 01/01/2010
Field of study

In modern VLSI design, physical synthesis tools are primarily responsible for satisfying chip-performance constraints by invoking a broad range of circuit optimizations, such as buffer insertion, logic restructuring, gate sizing and relocation. This process is known as timing closure. Our research seeks more powerful and efficient optimizations to improve the state of the art in modern chip design. In particular, we integrate timing-driven relocation, retiming, logic cloning, buffer insertion and gate sizing in novel ways to create powerful circuit transformations that help satisfy setup-time constraints. State-of-the-art physical synthesis optimizations are typically applied at two scales: i) global algorithms that affect the entire netlist and ii) local transformations that focus on a handful of gates or interconnections. The scale of modern chip designs dictates that only near-linear-time optimization algorithms can be applied at the global scope — typically limited to wirelength-driven placement and legalization. Localized transformations can rely on more time-consuming optimizations with accurate delay models. Few techniques bridge the gap between fully-global and localized optimizations. This dissertation broadens the scope of physical synthesis optimization to include accurate transformations operating between the global and local scales. In particular, we integrate groups of related transformations to break circular dependencies and increase the number of circuit elements that can be jointly optimized to escape local minima. Integrated transformations in this dissertation are developed by identifying and removing obstacles to successful optimizations. Integration is achieved through mapping multiple operations to rigorous mathematical optimization problems that can be solved simultaneously. We achieve computational scalability in our techniques by leveraging analytical delay models and focusing optimization efforts on carefully selected regions of the chip. In this regard, we make extensive use of a linear interconnect-delay model that accounts for the impact of subsequent repeated insertion. Our integrated transformations are evaluated on high-performance circuits with over 100,000 gates. Integrated optimization techniques described in this dissertation ensure graceful timing-closure process and impact nearly every aspect of a typical physical synthesis flow. They have been validated in EDA tools used at IBM for physical synthesis of high-performance CPU and ASIC designs, where they significantly improved chip performance.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/78744/1/iamyou_1.pd

CiteSeerX

Deep Blue Documents at the University of Michigan

Compilation de systèmes temps réel

Author: Potop-Butucaru Dumitru
Publication venue: HAL CCSD
Publication date: 10/11/2015
Field of study

I introduce and advocate for the concept of Real-Time Systems Compilation. By analogy with classical compilation, real-time systems compilation consists in the fully automatic construction of running, correct-by-construction implementations from functional and non-functional specifications of embedded control systems. Like in a classical compiler, the whole process must be fast (thus enabling a trial-and-error design style) and produce reasonably efficient code. This requires the use of fast heuristics, and the use of fine-grain platform and application models. Unlike a classical compiler, a real-time systems compiler must take into account non-functional properties of a system and ensure the respect of non-functional requirements (in addition to functional correctness). I also present Lopht, a real-time systems compiler for statically-scheduled real-time systems we built by combining techniques and concepts from real-time scheduling, compilation, and synchronous languages

Thèses en Ligne

HAL-UNICE

INRIA a CCSD electronic archive server

Design, Analysis and Test of Logic Circuits under Uncertainty.

Author: Krishnaswamy Smita
Publication venue
Publication date
Field of study

Integrated circuits are increasingly susceptible to uncertainty caused by soft errors, inherently probabilistic devices, and manufacturing variability. As device technologies scale, these effects become detrimental to circuit reliability. In order to address this, we develop methods for analyzing, designing, and testing circuits subject to probabilistic effects. Our main contributions are: 1) a fast, soft-error rate (SER) analyzer that uses functional-simulation signatures to capture error effects, 2) novel design techniques that improve reliability using little area and performance overhead, 3) a matrix-based reliability-analysis framework that captures many types of probabilistic faults, and 4) test-generation/compaction methods aimed at probabilistic faults in logic circuits. SER analysis must account for the main error-masking mechanisms in ICs: logic, timing, and electrical masking. We relate logic masking to node testability of the circuit and utilize functional-simulation signatures, i.e., partial truth tables, to efficiently compute estability (signal probability and observability). To account for timing masking, we compute error-latching windows (ELWs) from timing analysis information. Electrical masking is incorporated into our estimates through derating factors for gate error probabilities. The SER of a circuit is computed by combining the effects of all three masking mechanisms within our SER analyzer called AnSER. Using AnSER, we develop several low-overhead techniques that increase reliability, including: 1) an SER-aware design method that uses redundancy already present within the circuit, 2) a technique that resynthesizes small logic windows to improve area and reliability, and 3) a post-placement gate-relocation technique that increases timing masking by decreasing ELWs. We develop the probabilistic transfer matrix (PTM) modeling framework to analyze effects beyond soft errors. PTMs are compressed into algebraic decision diagrams (ADDs) to improve computational efficiency. Several ADD algorithms are developed to extract reliability and error susceptibility information from PTMs representing circuits. We propose new algorithms for circuit testing under probabilistic faults, which require a reformulation of existing test techniques. For instance, a test vector may need to be repeated many times to detect a fault. Also, different vectors detect the same fault with different probabilities. We develop test generation methods that account for these differences, and integer linear programming (ILP) formulations to optimize test sets.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61584/1/smita_1.pd

Deep Blue Documents at the University of Michigan

Recommended from our members

Scalable algorithms for software based self test using formal methods

Author: Prabhu Mahesh
Publication venue
Publication date: 09/07/2014
Field of study

textTransistor scaling has kept up with Moore's law with a doubling of the number of transistors on a chip. More logic on a chip means more opportunities for manufacturing defects to slip in. This, in turn, has made processor testing after manufacturing a significant challenge. At-speed functional testing, being completely non-intrusive, has been seen as the ideal way of testing chips. However for processor testing, generating instruction level tests for covering all faults is a challenge given the issue of scalability. Data-path faults are relatively easier to control and observe compared to control-path faults. In this research we present a novel method to generate instruction level tests for hard to detect control-path faults in a processor. We initially map the gate level stuck-at fault to the Register Transfer Level (RTL) and build an equivalent faulty RTL model. The fault activation and propagation constraints are captured using Control and Data Flow Graphs of the RTL as a Liner Temporal Logic (LTL) property. This LTL property is then negated and given to a Bounded Model Checker based on a Bit-Vector Satisfiability Module Theories (SMT) solver. From the counter-example to the property we can extract a sequence of instructions that activates the gate level fault and propagates the fault effect to one of the observable points in the design. Other than the user supplying instruction constraints, this approach is completely automatic and does not require any manual intervention. Not all the design behaviors are required to generate a test for a fault. We use this insight to scale our previous methodology further. Underapproximations are design abstractions that only capture a subset of the original design behaviors. The use of RTL for test generation affords us two types of under-approximations: bit-width reduction and operator approximation. These are abstractions that perform reductions based on semantics of the RTL design. We also explore structural reductions of the RTL, called path based search, where we search through error propagation paths incrementally. This approach increases the size of the test generation problem step by step. In this way the SMT solver searches through the state space piecewise rather than doing the entire search at once. Experimental results show that our methods are robust and scalable for generating functional tests for hard to detect faults.Electrical and Computer Engineerin

Texas ScholarWorks

The application of genetic algorithms to high-level synthesis

Author: Heijligers M.J.M.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1996
Field of study

Repository TU/e

Pure OAI Repository

Doctor of Philosophy

Author: Sun Xiaojun
Publication venue: University of Utah
Publication date: 01/01/2017
Field of study

dissertationFormal verification of hardware designs has become an essential component of the overall system design flow. The designs are generally modeled as finite state machines, on which property and equivalence checking problems are solved for verification. Reachability analysis forms the core of these techniques. However, increasing size and complexity of the circuits causes the state explosion problem. Abstraction is the key to tackling the scalability challenges. This dissertation presents new techniques for word-level abstraction with applications in sequential design verification. By bundling together k bit-level state-variables into one word-level constraint expression, the state-space is construed as solutions (variety) to a set of polynomial constraints (ideal), modeled over the finite (Galois) field of 2^k elements. Subsequently, techniques from algebraic geometry -- notably, Groebner basis theory and technology -- are researched to perform reachability analysis and verification of sequential circuits. This approach adds a "word-level dimension" to state-space abstraction and verification to make the process more efficient. While algebraic geometry provides powerful abstraction and reasoning capabilities, the algorithms exhibit high computational complexity. In the dissertation, we show that by analyzing the constraints, it is possible to obtain more insights about the polynomial ideals, which can be exploited to overcome the complexity. Using our algorithm design and implementations, we demonstrate how to perform reachability analysis of finite-state machines purely at the word level. Using this concept, we perform scalable verification of sequential arithmetic circuits. As contemporary approaches make use of resolution proofs and unsatisfiable cores for state-space abstraction, we introduce the algebraic geometry analog of unsatisfiable cores, and present algorithms to extract and refine unsatisfiable cores of polynomial ideals. Experiments are performed to demonstrate the efficacy of our approaches

The University of Utah: J. Willard Marriott Digital Library