675 research outputs found

    Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

    Get PDF
    Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

    Design methodologies for variation-aware integrated circuits

    Get PDF
    The scaling of VLSI technology has spurred a rapid growth in the semiconductor industry. With the CMOS device dimension scaling to and beyond 90nm technology, it is possible to achieve higher performance and to pack more complex functionalities on a single chip. However, the scaling trend has introduced drastic variation of process and design parameters, leading to severe variability of chip performance in nanometer regime. Also, the manufacturing community projects CMOS will scale for three to four more generations. Since the uncertainties due to variations are expected to increase in each generation, it will significantly impact the performance of design and consequently the yield. Another challenging issue in the nanometer IC design is the high power consumption due to the greater packing density, higher frequency of operation and excessive leakage power. Moreover, the circuits are usually over-designed to compensate for uncertainties due to variations. The over-designed circuits not only make timing closure difficult but also cause excessive power consumption. For portable electronics, excessive power consumption may reduce battery life; for non-portable systems it may impose great difficulties in cooling and packaging. The objective of my research has been to develop design methodologies to address variations and power dissipation for reliable circuit operation. The proposed work has been divided into three parts: the first part addresses the issues related with power/ground noise induced by clock distribution network and proposes techniques to reduce power/ground noise considering the effects of process variations. The second part proposes an elastic pipeline scheme for random circuits with feedback loops. The proposed scheme provides a low-power solution that has the same variation tolerance as the conventional approaches. The third section deals with discrete buffer and wire sizing for link-based non-tree clock network, which is an energy efficient structure for skew tolerance to variations. For the power/ground noise problem, our approach could reduce the peak current and the delay variations by 50% and 51% respectively. Compared to conventional approach, the elastic timing scheme reduces power dissipation by 20% − 27%. The sizing method achieves clock skew reduction of 45% with a small increase in power dissipation

    Power Reductions with Energy Recovery Using Resonant Topologies

    Get PDF
    The problem of power densities in system-on-chips (SoCs) and processors has become more exacerbated recently, resulting in high cooling costs and reliability issues. One of the largest components of power consumption is the low skew clock distribution network (CDN), driving large load capacitance. This can consume as much as 70% of the total dynamic power that is lost as heat, needing elaborate sensing and cooling mechanisms. To mitigate this, resonant clocking has been utilized in several applications over the past decade. An improved energy recovering reconfigurable generalized series resonance (GSR) solution with all the critical support circuitry is developed in this work. This LC resonant clock driver is shown to save about 50% driver power (\u3e40% overall), on a 22nm process node and has 50% less skew than a non-resonant driver at 2GHz. It can operate down to 0.2GHz to support other energy savings techniques like dynamic voltage and frequency scaling (DVFS). As an example, GSR can be configured for the simpler pulse series resonance (PSR) operation to enable further power saving for double data rate (DDR) applications, by using de-skewing latches instead of flip-flop banks. A PSR based subsystem for 40% savings in clocking power with 40% driver active area reduction xii is demonstrated. This new resonant driver generates tracking pulses at each transition of clock for dual edge operation across DVFS. PSR clocking is designed to drive explicit-pulsed latches with negative setup time. Simulations using 45nm IBM/PTM device and interconnect technology models, clocking 1024 flip-flops show the reductions, compared to non-resonant clocking. DVFS range from 2GHz/1.3V to 200MHz/0.5V is obtained. The PSR frequency is set \u3e3× the clock rate, needing only 1/10th the inductance of prior-art LC resonance schemes. The skew reductions are achieved without needing to increase the interconnect widths owing to negative set-up times. Applications in data circuits are shown as well with a 90nm example. Parallel resonant and split-driver non-resonant configurations as well are derived from GSR. Tradeoffs in timing performance versus power, based on theoretical analysis, are compared for the first time and verified. This enables synthesis of an optimal topology for a given application from the GSR

    High performance IC clock networks with grid and tree topologies

    Get PDF
    In this dissertation, an essential step in the integrated circuit (IC) physical design flow—the clock network design—is investigated. Clock network design entailsa series of computationally intensive, large-scale design and optimization tasks for the generation and distribution of the clock signal through different topologies. The lack or inefficacy of the automation for implementing high performance clock networks, especially for low-power, high speed and variation-aware implementations, is the main driver for this research. The synthesis and optimization methods for the two most commonly used clock topologies in IC design—the grid topology and the tree topology—are primarily investigated.The clock mesh network, which uses the grid topology, has very low skew variation at the cost of high power dissipation. Two novel clock mesh network designmethodologies are proposed in this dissertation in order to reduce the power dissipation. These are the first methods known in literature that combine clock meshsynthesis with incremental register placement and clock gating for power saving purposes. The application of the proposed automation methods on the emerging resonant rotary clocking technology, which also has the grid topology, is investigated in this dissertation as well.The clock tree topology has the advantage of lower power dissipation compared to other traditional clock topologies (e.g. clock mesh, clock spine, clock tree with cross links) at the cost of increased performance degradation due to on-chip variations. A novel clock tree buffer polarity assignment flow is proposed in this dissertation in order to reduce these effects of on-chip variations on the clock tree topology. The proposed polarity assignment flow is the first work that introduces post-silicon, dynamic reconfigurability for polarity assignment, enabling clock gating for low power operation of the variation-tolerant clock tree networks.Ph.D., Electrical Engineering -- Drexel University, 201

    6T-SRAM 1Mb Design with Test Structures and Post Silicon Validation

    Get PDF
    abstract: Static random-access memories (SRAM) are integral part of design systems as caches and data memories that and occupy one-third of design space. The work presents an embedded low power SRAM on a triple well process that allows body-biasing control. In addition to the normal mode operation, the design is embedded with Physical Unclonable Function (PUF) [Suh07] and Sense Amplifier Test (SA Test) mode. With PUF mode structures, the fabrication and environmental mismatches in bit cells are used to generate unique identification bits. These bits are fixed and known as preferred state of an SRAM bit cell. The direct access test structure is a measurement unit for offset voltage analysis of sense amplifiers. These designs are manufactured using a foundry bulk CMOS 55 nm low-power (LP) process. The details about SRAM bit-cell and peripheral circuit design is discussed in detail, for certain cases the circuit simulation analysis is performed with random variations embedded in SPICE models. Further, post-silicon testing results are discussed for normal operation of SRAMs and the special test modes. The silicon and circuit simulation results for various tests are presented.Dissertation/ThesisMasters Thesis Electrical Engineering 201

    Design and automation of voltage-scaled clock networks

    Get PDF
    In this dissertation, a vital step of VLSI physical design flow, synthesis of clock distribution networks, is investigated. Clock network synthesis (CNS) involves large and complex optimization problems to achieve high performance and low power demands of current integrated circuits (ICs). Ineffectiveness of existing methodologies to provide high performance at lower voltage nodes is the main driver for this dissertation research. A design and automation flow for voltage-scaled clock networks is proposed to satisfy tight timing constraints at high frequency (for high performance) and low voltage (for low power) operation. One implementation of voltage-scaled clock networks is low (voltage) swing clocking, which is a known technique, yet its applicability remains limited to designs with low performance demands. In this dissertation, novel methodologies are introduced to i) apply low swing clocking to legacy designs as a power saving methodology, ii) develop a complete CNS flow for low swing clocking of high performance ICs. These methodologies include slew-driven approaches that are better suited to future transistor and interconnect technologies. Second implementation of voltage-scaled clock networks is multi-voltage clocking, which is another known technique, yet its applicability remains limited to clock tree topology. In this dissertation, multi-voltage clocking with a clock mesh topology is investigated in order to address a missing aspect in the current IC design flows. Practical considerations of the current IC design flows are also investigated in this dissertation to expand the applicability of the proposed CNS flow. A novel methodology is introduced to facilitate clock gating within low swing clocking. The applicability of low swing clocking to FinFET technology, which is currently the industry norm, is shown to be effective.Ph.D., Electrical Engineering -- Drexel University, 201

    Synchronous handshake circuits

    Get PDF
    We present the synchronous implementation of handshake circuits as an extra feature in the otherwise asynchronous design flow based on Tangram. This synchronous option can be used in the mapping onto FPGAs or as a fallback option to provide a circuit that is easier to test and integrate in a synchronous environment. When single-rail and synchronous realizations of the same handshake circuit are compared, the synchronous versions typically require fewer state-holding elements, occupy less area, have similar performance, but consume significantly more power (in the examples studied up to a factor four). Synchronous handshake circuits provide a means to study clock-gating techniques based on the synthesis starting from a behavioral-level specification. In addition, the study provides hints as to where the asynchronous handshake circuits may be optimized further

    Timing Closure in Chip Design

    Get PDF
    Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips
    • …
    corecore