1,153 research outputs found

    Elastic circuits

    Get PDF
    Elasticity in circuits and systems provides tolerance to variations in computation and communication delays. This paper presents a comprehensive overview of elastic circuits for those designers who are mainly familiar with synchronous design. Elasticity can be implemented both synchronously and asynchronously, although it was traditionally more often associated with asynchronous circuits. This paper shows that synchronous and asynchronous elastic circuits can be designed, analyzed, and optimized using similar techniques. Thus, choices between synchronous and asynchronous implementations are localized and deferred until late in the design process.Peer ReviewedPostprint (published version

    The impact of asynchrony on computer architecture

    Get PDF
    The performance characteristics of asynchronous circuits are quite different from those of their synchronous counterparts. As a result, the best asynchronous design of a particular system does not necessarily correspond to the best synchronous design, even at the algorithmic level. The goal of this thesis is to examine certain aspects of computer architecture and design in the context of an asynchronous VLSI implementation. We present necessary and sufficient conditions under which the degree of pipelining of a component can be modified without affecting the correctness of an asynchronous computation. As an instance of the improvements possible using an asynchronous architecture, we present circuits to solve the prefix problem with average-case behavior better than that possible by any synchronous solution in the case when the prefix operator has a right zero. We show that our circuit implementations are area-optimal given their performance characteristics, and have the best possible average-case latency. At the level of processor design, we present a mechanism for the implementation of precise exceptions in asynchronous processors. The novel feature of this mechanism is that it permits the presence of a data-dependent number of instructions in the execution pipeline of the processor. Finally, at the level of processor architecture, we present the architecture of a processor with an independent instruction stream for branches. The instruction set permits loops and function calls to be executed with minimal control-flow overhead

    A general model for performance optimization of sequential systems

    Get PDF
    Retiming, c-slow retiming and recycling are different transformations for the performance optimization of sequential circuits. For retiming and c-slow retiming, different models that provide exact solutions have already been proposed. An exact model for recycling was yet unknown. This paper presents a general formulation that covers the combination of the three schemes for performance optimization. It provides an exact model based on integer linear programming that resorts to the structural theory of marked graphs. A set of experiments has been designed to show the benefits in performance obtained by combining retiming and recycling. The results also show the applicability of the method in large circuits.Peer ReviewedPostprint (published version

    Technology Mapping for Circuit Optimization Using Content-Addressable Memory

    Get PDF
    The growing complexity of Field Programmable Gate Arrays (FPGA's) is leading to architectures with high input cardinality look-up tables (LUT's). This thesis describes a methodology for area-minimizing technology mapping for combinational logic, specifically designed for such FPGA architectures. This methodology, called LURU, leverages the parallel search capabilities of Content-Addressable Memories (CAM's) to outperform traditional mapping algorithms in both execution time and quality of results. The LURU algorithm is fundamentally different from other techniques for technology mapping in that LURU uses textual string representations of circuit topology in order to efficiently store and search for circuit patterns in a CAM. A circuit is mapped to the target LUT technology using both exact and inexact string matching techniques. Common subcircuit expressions (CSE's) are also identified and used for architectural optimization---a small set of CSE's is shown to effectively cover an average of 96% of the test circuits. LURU was tested with the ISCAS'85 suite of combinational benchmark circuits and compared with the mapping algorithms FlowMap and CutMap. The area reduction shown by LURU is, on average, 20% better compared to FlowMap and CutMap. The asymptotic runtime complexity of LURU is shown to be better than that of both FlowMap and CutMap

    An Operand-Optimized Asynchronous IEEE 754 Double-Precision Floating-Point Adder

    Full text link

    The Lutonium: A Sub-Nanojoule Asynchronous 8051 Microcontroller

    Get PDF
    We describe the Lutonium, an asynchronous 8051 microcontroller designed for low Et/sup 2/. In 0.18 /spl mu/m CMOS, at nominal 1.8 V, we expect a performance of 0.5 nJ per instruction at 200 MIPS. At 0.5 V, we expect 4 MIPS and 40 pJ/instruction, corresponding to 25,000 MIPS/Watt. We describe the structure of a fine-grain pipeline optimized for Et/sup 2/ efficiency, some of the peripherals implementation, and the advantages of an asynchronous implementation of a deep-sleep mechanism

    Doctor of Philosophy

    Get PDF
    dissertationThe design of integrated circuit (IC) requires an exhaustive verification and a thorough test mechanism to ensure the functionality and robustness of the circuit. This dissertation employs the theory of relative timing that has the advantage of enabling designers to create designs that have significant power and performance over traditional clocked designs. Research has been carried out to enable the relative timing approach to be supported by commercial electronic design automation (EDA) tools. This allows asynchronous and sequential designs to be designed using commercial cad tools. However, two very significant holes in the flow exist: the lack of support for timing verification and manufacturing test. Relative timing (RT) utilizes circuit delay to enforce and measure event sequencing on circuit design. Asynchronous circuits can optimize power-performance product by adjusting the circuit timing. A thorough analysis on the timing characteristic of each and every timing path is required to ensure the robustness and correctness of RT designs. All timing paths have to conform to the circuit timing constraints. This dissertation addresses back-end design robustness by validating full cyclical path timing verification with static timing analysis and implementing design for testability (DFT). Circuit reliability and correctness are necessary aspects for the technology to become commercially ready. In this study, scan-chain, a commercial DFT implementation, is applied to burst-mode RT designs. In addition, a novel testing approach is developed along with scan-chain to over achieve 90% fault coverage on two fault models: stuck-at fault model and delay fault model. This work evaluates the cost of DFT and its coverage trade-off then determines the best implementation. Designs such as a 64-point fast Fourier transform (FFT) design, an I2C design, and a mixed-signal design are built to demonstrate power, area, performance advantages of the relative timing methodology and are used as a platform for developing the backend robustness. Results are verified by performing post-silicon timing validation and test. This work strengthens overall relative timed circuit flow, reliability, and testability

    Elastic systems

    Get PDF
    Elastic systems provide tolerance to the variations in computation and communication delays. The incorporation of elasticity opens new opportunities for optimization using new correct-by-construction transformations that cannot be applied to rigid non-elastic systems. The basics of synchronous and asynchronous elastic systems will be reviewed. A set of behavior-preserving transformations will be presented: retiming, recycling, early evaluation, variable-latency units and speculative execution. The application of these transformations for performance and power optimization will be discussed. Finally, a novel framework for microarchitectural exploration will be introduced, showing that the optimal pipelining of a circuit can be automatically obtained by using the previous transformations.Peer ReviewedPostprint (published version
    • …
    corecore