64 research outputs found

    Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques

    Full text link
    The rapid growth of demanding applications in domains applying multimedia processing and machine learning has marked a new era for edge and cloud computing. These applications involve massive data and compute-intensive tasks, and thus, typical computing paradigms in embedded systems and data centers are stressed to meet the worldwide demand for high performance. Concurrently, the landscape of the semiconductor field in the last 15 years has constituted power as a first-class design concern. As a result, the community of computing systems is forced to find alternative design approaches to facilitate high-performance and/or power-efficient computing. Among the examined solutions, Approximate Computing has attracted an ever-increasing interest, with research works applying approximations across the entire traditional computing stack, i.e., at software, hardware, and architectural levels. Over the last decade, there is a plethora of approximation techniques in software (programs, frameworks, compilers, runtimes, languages), hardware (circuits, accelerators), and architectures (processors, memories). The current article is Part I of our comprehensive survey on Approximate Computing, and it reviews its motivation, terminology and principles, as well it classifies and presents the technical details of the state-of-the-art software and hardware approximation techniques.Comment: Under Review at ACM Computing Survey

    Parallélisme des nids de boucles pour l’optimisation du temps d’exécution et de la taille du code

    Get PDF
    The real time implementation algorithms always include nested loops which require important execution times. Thus, several nested loop parallelism techniques have been proposed with the aim of decreasing their execution times. These techniques can be classified in terms of granularity, which are the iteration level parallelism and the instruction level parallelism. In the case of the instruction level parallelism, the techniques aim to achieve a full parallelism. However, the loop carried dependencies implies shifting instructions in both side of nested loops. Consequently, these techniques provide implementations with non-optimal execution times and important code sizes, which represent limiting factors when implemented on embedded real-time systems. In this work, we are interested on enhancing the parallelism strategies of nested loops. The first contribution consists of purposing a novel instruction level parallelism technique, called “delayed multidimensional retiming”. It aims to scheduling the nested loops with the minimal cycle period, without achieving a full parallelism. The second contribution consists of employing the “delayed multidimensional retiming” when providing nested loop implementations on real time embedded systems. The aim is to respect an execution time constraint while using minimal code size. In this context, we proposed a first approach that selects the minimal instruction parallelism level allowing the execution time constraint respect. The second approach employs both instruction level parallelism and iteration level parallelism, by using the “delayed multidimensional retiming” and the “loop striping”Les algorithmes des systèmes temps réels incluent de plus en plus de nids de boucles, qui sont caractérisés par un temps d’exécution important. De ce fait, plusieurs démarches de parallélisme des boucles imbriquées ont été proposées dans l’objectif de réduire leurs temps d’exécution. Ces démarches peuvent être classifiées selon deux niveaux de granularité : le parallélisme au niveau des itérations et le parallélisme au niveau des instructions. Dans le cas du deuxième niveau de granularité, les techniques visent à atteindre un parallélisme total des instructions appartenant à une même itération. Cependant, le parallélisme est contraint par les dépendances des données inter-itérations ce qui implique le décalage des instructions à travers les boucles imbriquées, provocant ainsi une augmentation du code proportionnelle au niveau du parallélisme. Par conséquent, le parallélisme total au niveau des instructions des nids de boucles engendre des implémentations avec des temps d’exécution non-optimaux et des tailles du code importantes. Les travaux de cette thèse s’intéressent à l’amélioration des stratégies de parallélisme des nids de boucles. Une première contribution consiste à proposer une nouvelle technique de parallélisme au niveau des instructions baptisée « retiming multidimensionnel décalé ». Elle vise à ordonnancer les nids de boucles avec une période de cycle minimale, sans atteindre un parallélisme total. Une deuxième contribution consiste à mettre en pratique notre technique dans le contexte de l’implémentation temps réel embarquée des nids de boucles. L’objectif est de respecter la contrainte du temps d’exécution tout en utilisant un code de taille minimale. Dans ce contexte, nous avons proposé une première démarche d’optimisation qui consiste à utiliser notre technique pour déterminer le niveau parallélisme minimal. Par la suite, nous avons décrit une deuxième démarche permettant de combiner les parallélismes au niveau des instructions et au niveau des itérations, en utilisant notre technique et le « loop striping

    Design and application of reconfigurable circuits and systems

    No full text
    Open Acces

    Broadening the Scope of Multi-Objective Optimizations in Physical Synthesis of Integrated Circuits.

    Full text link
    In modern VLSI design, physical synthesis tools are primarily responsible for satisfying chip-performance constraints by invoking a broad range of circuit optimizations, such as buffer insertion, logic restructuring, gate sizing and relocation. This process is known as timing closure. Our research seeks more powerful and efficient optimizations to improve the state of the art in modern chip design. In particular, we integrate timing-driven relocation, retiming, logic cloning, buffer insertion and gate sizing in novel ways to create powerful circuit transformations that help satisfy setup-time constraints. State-of-the-art physical synthesis optimizations are typically applied at two scales: i) global algorithms that affect the entire netlist and ii) local transformations that focus on a handful of gates or interconnections. The scale of modern chip designs dictates that only near-linear-time optimization algorithms can be applied at the global scope — typically limited to wirelength-driven placement and legalization. Localized transformations can rely on more time-consuming optimizations with accurate delay models. Few techniques bridge the gap between fully-global and localized optimizations. This dissertation broadens the scope of physical synthesis optimization to include accurate transformations operating between the global and local scales. In particular, we integrate groups of related transformations to break circular dependencies and increase the number of circuit elements that can be jointly optimized to escape local minima. Integrated transformations in this dissertation are developed by identifying and removing obstacles to successful optimizations. Integration is achieved through mapping multiple operations to rigorous mathematical optimization problems that can be solved simultaneously. We achieve computational scalability in our techniques by leveraging analytical delay models and focusing optimization efforts on carefully selected regions of the chip. In this regard, we make extensive use of a linear interconnect-delay model that accounts for the impact of subsequent repeated insertion. Our integrated transformations are evaluated on high-performance circuits with over 100,000 gates. Integrated optimization techniques described in this dissertation ensure graceful timing-closure process and impact nearly every aspect of a typical physical synthesis flow. They have been validated in EDA tools used at IBM for physical synthesis of high-performance CPU and ASIC designs, where they significantly improved chip performance.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/78744/1/iamyou_1.pd

    Compilation de systèmes temps réel

    Get PDF
    I introduce and advocate for the concept of Real-Time Systems Compilation. By analogy with classical compilation, real-time systems compilation consists in the fully automatic construction of running, correct-by-construction implementations from functional and non-functional specifications of embedded control systems. Like in a classical compiler, the whole process must be fast (thus enabling a trial-and-error design style) and produce reasonably efficient code. This requires the use of fast heuristics, and the use of fine-grain platform and application models. Unlike a classical compiler, a real-time systems compiler must take into account non-functional properties of a system and ensure the respect of non-functional requirements (in addition to functional correctness). I also present Lopht, a real-time systems compiler for statically-scheduled real-time systems we built by combining techniques and concepts from real-time scheduling, compilation, and synchronous languages

    Design, Analysis and Test of Logic Circuits under Uncertainty.

    Full text link
    Integrated circuits are increasingly susceptible to uncertainty caused by soft errors, inherently probabilistic devices, and manufacturing variability. As device technologies scale, these effects become detrimental to circuit reliability. In order to address this, we develop methods for analyzing, designing, and testing circuits subject to probabilistic effects. Our main contributions are: 1) a fast, soft-error rate (SER) analyzer that uses functional-simulation signatures to capture error effects, 2) novel design techniques that improve reliability using little area and performance overhead, 3) a matrix-based reliability-analysis framework that captures many types of probabilistic faults, and 4) test-generation/compaction methods aimed at probabilistic faults in logic circuits. SER analysis must account for the main error-masking mechanisms in ICs: logic, timing, and electrical masking. We relate logic masking to node testability of the circuit and utilize functional-simulation signatures, i.e., partial truth tables, to efficiently compute estability (signal probability and observability). To account for timing masking, we compute error-latching windows (ELWs) from timing analysis information. Electrical masking is incorporated into our estimates through derating factors for gate error probabilities. The SER of a circuit is computed by combining the effects of all three masking mechanisms within our SER analyzer called AnSER. Using AnSER, we develop several low-overhead techniques that increase reliability, including: 1) an SER-aware design method that uses redundancy already present within the circuit, 2) a technique that resynthesizes small logic windows to improve area and reliability, and 3) a post-placement gate-relocation technique that increases timing masking by decreasing ELWs. We develop the probabilistic transfer matrix (PTM) modeling framework to analyze effects beyond soft errors. PTMs are compressed into algebraic decision diagrams (ADDs) to improve computational efficiency. Several ADD algorithms are developed to extract reliability and error susceptibility information from PTMs representing circuits. We propose new algorithms for circuit testing under probabilistic faults, which require a reformulation of existing test techniques. For instance, a test vector may need to be repeated many times to detect a fault. Also, different vectors detect the same fault with different probabilities. We develop test generation methods that account for these differences, and integer linear programming (ILP) formulations to optimize test sets.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61584/1/smita_1.pd

    The application of genetic algorithms to high-level synthesis

    Get PDF

    Doctor of Philosophy

    Get PDF
    dissertationFormal verification of hardware designs has become an essential component of the overall system design flow. The designs are generally modeled as finite state machines, on which property and equivalence checking problems are solved for verification. Reachability analysis forms the core of these techniques. However, increasing size and complexity of the circuits causes the state explosion problem. Abstraction is the key to tackling the scalability challenges. This dissertation presents new techniques for word-level abstraction with applications in sequential design verification. By bundling together k bit-level state-variables into one word-level constraint expression, the state-space is construed as solutions (variety) to a set of polynomial constraints (ideal), modeled over the finite (Galois) field of 2^k elements. Subsequently, techniques from algebraic geometry -- notably, Groebner basis theory and technology -- are researched to perform reachability analysis and verification of sequential circuits. This approach adds a "word-level dimension" to state-space abstraction and verification to make the process more efficient. While algebraic geometry provides powerful abstraction and reasoning capabilities, the algorithms exhibit high computational complexity. In the dissertation, we show that by analyzing the constraints, it is possible to obtain more insights about the polynomial ideals, which can be exploited to overcome the complexity. Using our algorithm design and implementations, we demonstrate how to perform reachability analysis of finite-state machines purely at the word level. Using this concept, we perform scalable verification of sequential arithmetic circuits. As contemporary approaches make use of resolution proofs and unsatisfiable cores for state-space abstraction, we introduce the algebraic geometry analog of unsatisfiable cores, and present algorithms to extract and refine unsatisfiable cores of polynomial ideals. Experiments are performed to demonstrate the efficacy of our approaches
    • …
    corecore