1,187 research outputs found

    Synthesis of Clock Trees with Useful Skew based on Sparse-Graph Algorithms

    Get PDF
    Computer-aided design (CAD) for very large scale integration (VLSI) involve

    Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

    Get PDF
    Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

    Modeling of thermally induced skew variations in clock distribution network

    Get PDF
    Clock distribution network is sensitive to large thermal gradients on the die as the performance of both clock buffers and interconnects are affected by temperature. A robust clock network design relies on the accurate analysis of clock skew subject to temperature variations. In this work, we address the problem of thermally induced clock skew modeling in nanometer CMOS technologies. The complex thermal behavior of both buffers and interconnects are taken into account. In addition, our characterization of the temperature effect on buffers and interconnects provides valuable insight to designers about the potential impact of thermal variations on clock networks. The use of industrial standard data format in the interface allows our tool to be easily integrated into existing design flow

    Delay-Bounded Routing for Shadow Registers

    No full text
    The on-chip timing behaviour of synchronous circuits can be quantified at run-time by adding shadow registers, which allow designers to sample the most critical paths of a circuit at a different point in time than the user register would normally. In order to sample these paths precisely, the path skew between the user and the shadow register must be tightly controlled and consistent across all paths that are shadowed. Unlike a custom IC, FPGAs contain prefabricated resources from which composing an arbitrary routing delay is not trivial. This paper presents a method for inserting shadow registers with a minimum skew bound, whilst also reducing the maximum skew. To preserve circuit timing, we apply this to FPGA circuits post place-and-route, using only the spare resources left behind. We find that our techniques can achieve an average STA reported delay bound of ±200ps on a Xilinx device despite incomplete timing information, and achieve <1ps accuracy against our own delay model

    Pulse propagation, graph cover, and packet forwarding

    Get PDF
    We study distributed systems, with a particular focus on graph problems and fault tolerance. Fault-tolerance in a microprocessor or even System-on-Chip can be improved by using a fault-tolerant pulse propagation design. The existing design TRIX achieves this goal by being a distributed system consisting of very simple nodes. We show that even in the typical mode of operation without faults, TRIX performs significantly better than a regular wire or clock tree: Statistical evaluation of our simulated experiments show that we achieve a skew with standard deviation of O(log log H), where H is the height of the TRIX grid. The distance-r generalization of classic graph problems can give us insights on how distance affects hardness of a problem. For the distance-r dominating set problem, we present both an algorithmic upper and unconditional lower bound for any graph class with certain high-girth and sparseness criteria. In particular, our algorithm achieves a O(r·f(r))-approximation in time O(r), where f is the expansion function, which correlates with density. For constant r, this implies a constant approximation factor, in constant time. We also show that no algorithm can achieve a (2r + 1 − ÎŽ)-approximation for any ÎŽ > 0 in time O(r), not even on the class of cycles of girth at least 5r. Furthermore, we extend the algorithm to related graph cover problems and even to a different execution model. Furthermore, we investigate the problem of packet forwarding, which addresses the question of how and when best to forward packets in a distributed system. These packets are injected by an adversary. We build on the existing algorithm OED to handle more than a single destination. In particular, we show that buffers of size O(log n) are sufficient for this algorithm, in contrast to O(n) for the naive approach.Wir untersuchen verteilte Systeme, mit besonderem Augenmerk auf Graphenprobleme und Fehlertoleranz. Fehlertoleranz auf einem System-on-Chip (SoC) kann durch eine fehlertolerante Puls- Weiterleitung verbessert werden. Das bestehende Puls-Weiterleitungs-System TRIX toleriert Fehler indem es ein verteiltes System ist das nur aus sehr einfachen Knoten besteht. Wir zeigen dass selbst im typischen, fehlerfreien Fall TRIX sich weitaus besser verhĂ€lt als man naiverweise erwarten wĂŒrde: Statistische Analysen unserer simulierten Experimente zeigen, dass der Verzögerungs-Unterschied eine Standardabweichung von lediglich O(log logH) erreicht, wobei H die Höhe des TRIX-Netzes ist. Das Generalisieren einiger klassischer Graphen-Probleme auf Distanz r kann uns neue Erkenntnisse bescheren ĂŒber den Zusammenhang zwischen Distanz und KomplexitĂ€t eines Problems. FĂŒr das Problem der dominierenden Mengen auf Distanz r zeigen wir sowohl eine algorithmische obere Schranke als auch eine bedingungsfreie untere Schranke fĂŒr jede Klasse von Graphen, die bestimmte Eigenschaften an Umfang und Dichte erfĂŒllt. Konkret erreicht unser Algorithmus in Zeit O(r) eine AnnĂ€herungsgĂŒte von O(r · f(r)). FĂŒr konstante r bedeutet das, dass der Algorithmus in konstanter Zeit eine AnnĂ€herung konstanter GĂŒte erreicht. Weiterhin zeigen wir, dass kein Algorithmus in Zeit O(r) eine AnnĂ€herungsgĂŒte besser als 2r + 1 erreichen kann, nicht einmal in der Klasse der Kreis-Graphen von Umfang mindestens 5r. Weiterhin haben wir das Paketweiterleitungs-Problem untersucht, welches sich mit der Frage beschĂ€ftigt, wann genau Pakete in einem verteilten System idealerweise weitergeleitetwerden sollten. Die Paketewerden dabei von einem Gegenspieler eingefĂŒgt. Wir bauen auf dem existierenden Algorithmus OED auf, um mehr als ein Paket-Ziel beliefern zu können. Dadurch zeigen wir, dass Paket-Speicher der GrĂ¶ĂŸe O(log n) fĂŒr dieses Problem ausreichen, im Gegensatz zu den Paket-Speichern der GrĂ¶ĂŸe O(n) die fĂŒr einen naiven Ansatz nötig wĂ€ren

    Timing Closure in Chip Design

    Get PDF
    Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips

    Throughput-driven floorplanning with wire pipelining

    Get PDF
    The size of future high-performance SoC is such that the time-of-flight of wires connecting distant pins in the layout can be much higher than the clock period. In order to keep the frequency as high as possible, the wires may be pipelined. However, the insertion of flip-flops may alter the throughput of the system due to the presence of loops in the logic netlist. In this paper, we address the problem of floorplanning a large design where long interconnects are pipelined by inserting the throughput in the cost function of a tool based on simulated annealing. The results obtained on a series of benchmarks are then validated using a simple router that breaks long interconnects by suitably placing flip-flops along the wires

    High performance IC clock networks with grid and tree topologies

    Get PDF
    In this dissertation, an essential step in the integrated circuit (IC) physical design flow—the clock network design—is investigated. Clock network design entailsa series of computationally intensive, large-scale design and optimization tasks for the generation and distribution of the clock signal through different topologies. The lack or inefficacy of the automation for implementing high performance clock networks, especially for low-power, high speed and variation-aware implementations, is the main driver for this research. The synthesis and optimization methods for the two most commonly used clock topologies in IC design—the grid topology and the tree topology—are primarily investigated.The clock mesh network, which uses the grid topology, has very low skew variation at the cost of high power dissipation. Two novel clock mesh network designmethodologies are proposed in this dissertation in order to reduce the power dissipation. These are the first methods known in literature that combine clock meshsynthesis with incremental register placement and clock gating for power saving purposes. The application of the proposed automation methods on the emerging resonant rotary clocking technology, which also has the grid topology, is investigated in this dissertation as well.The clock tree topology has the advantage of lower power dissipation compared to other traditional clock topologies (e.g. clock mesh, clock spine, clock tree with cross links) at the cost of increased performance degradation due to on-chip variations. A novel clock tree buffer polarity assignment flow is proposed in this dissertation in order to reduce these effects of on-chip variations on the clock tree topology. The proposed polarity assignment flow is the first work that introduces post-silicon, dynamic reconfigurability for polarity assignment, enabling clock gating for low power operation of the variation-tolerant clock tree networks.Ph.D., Electrical Engineering -- Drexel University, 201
