4,406 research outputs found

    Asynchronous techniques for system-on-chip design

    Get PDF
    SoC design will require asynchronous techniques as the large parameter variations across the chip will make it impossible to control delays in clock networks and other global signals efficiently. Initially, SoCs will be globally asynchronous and locally synchronous (GALS). But the complexity of the numerous asynchronous/synchronous interfaces required in a GALS will eventually lead to entirely asynchronous solutions. This paper introduces the main design principles, methods, and building blocks for asynchronous VLSI systems, with an emphasis on communication and synchronization. Asynchronous circuits with the only delay assumption of isochronic forks are called quasi-delay-insensitive (QDI). QDI is used in the paper as the basis for asynchronous logic. The paper discusses asynchronous handshake protocols for communication and the notion of validity/neutrality tests, and completion tree. Basic building blocks for sequencing, storage, function evaluation, and buses are described, and two alternative methods for the implementation of an arbitrary computation are explained. Issues of arbitration, and synchronization play an important role in complex distributed systems and especially in GALS. The two main asynchronous/synchronous interfaces needed in GALS-one based on synchronizer, the other on stoppable clock-are described and analyzed

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

    Get PDF
    Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

    Investigation of the Benefits of Interlocked Synchronous Pipelines

    Get PDF
    The majority of today’s digital circuits use synchronous pipelines. As the technology nodes get smaller, these pipelines are facing problems with area, power, and timing. One of the major sources of power consumption is the global clock and stall signals. These signals have to be routed across large sections of the chip, and with regards to stalling the pipeline, often face significant timing issues. One solution, developed by Hans M. Jacobson et al., is “Synchronous Interlocked Pipelines”. This pipeline design combines synchronous pipelines with the handshaking of asynchronous pipelines. Asynchronous pipelines are less power intensive because they send acknowledge and request signals to neighboring stages that allow stages to turn off when not being used. Jacobson et al. use this handshaking technique to create local valid and stall signals instead of using global ones. To test the benefits of this design, an asynchronous pipeline, synchronous pipeline, and interlocked synchronous pipeline were built using a generic 45 nm library. Comparisons showed that while the asynchronous and interlocked synchronous pipelines took up 4 times more area than the synchronous pipeline, the asynchronous pipeline had the highest throughput of the three pipeline designs, followed by the interlocked synchronous pipeline. The synchronous pipeline had the worst throughput

    Elastic circuits

    Get PDF
    Elasticity in circuits and systems provides tolerance to variations in computation and communication delays. This paper presents a comprehensive overview of elastic circuits for those designers who are mainly familiar with synchronous design. Elasticity can be implemented both synchronously and asynchronously, although it was traditionally more often associated with asynchronous circuits. This paper shows that synchronous and asynchronous elastic circuits can be designed, analyzed, and optimized using similar techniques. Thus, choices between synchronous and asynchronous implementations are localized and deferred until late in the design process.Peer ReviewedPostprint (published version

    Design techniques for high performance asynchronous arithmetic operators

    Get PDF
    High performance asynchronous arithmetic operator design techniques are proposed, which adopt some of the techniques commonly used in synchronous systems such as fast precharged logic and efficient latch design, while maintaining the features of localized and elastic pipelining control inherent in asynchronous design. A pipelined sixteen bit multiplier designed using these techniques is presented and its performance compared with several previously reported asynchronous and synchronous designs

    HPSAP: A High-Performance and Synthesizable Asynchronous Pipeline with Quasi-2phase Conversion Method

    Get PDF
    \ua9 2013 IEEE.This paper presents a high-performance and synthesizable asynchronous pipeline (HPSAP). First, a 4-phase pipeline controlled by the relative-timing (RT) controller is designed. The controller is small (7 gates) and its handshake protocol is highly concurrent, resulting in fewer component delays in cycle time. However, the RT pipeline\u27s throughput is limited by the inherent reset phase in the 4-phase protocol. Thus, the variant HPSAP pipeline is proposed with the quasi-2phase conversion method. Unlike other existing solutions which aimed at reducing the reset time of the 4-phase protocol, this method imitates the behavior of the 2-phase pipeline and re-activates the reset edge by two steps: 1) replacing all delay-matched units with the maximum delay; 2) adding a small pulse generator on each RT controller. The post-layout HSPICE simulation of a 4-bit 10-stage HPSAP first-in-first-out (FIFO) pipeline indicated a throughput of 5.382 giga data item per second (GDI/s) under SMIC 55nm CMOS technology, which was 77.5% and 14.65% higher than the Click pipeline and Mousetrap pipeline. In the pipeline with processing, a 32 7 32 bits multiplier was built, and the maximum working frequency of the HPSAP multiplier was faster than Mousetrap and the synchronous counterparts
    corecore