88 research outputs found

    Analysis and optimization of VLSI Clock Distribution Networks for skew variability reduction

    Get PDF
    As VLSI technology moves into the Ultra-Deep Sub-Micron (UDSM) era, manufacturing variations, power supply noise and temperature variations greatly affect the performance and yield of VLSI circuits. Clock Distribution Network (CDN), which is one of the biggest and most important nets in any synchronous VLSI chip, is especially sensitive to these variations. To address this problem variability-aware analysis and optimization techniques for VLSI circuits are needed. In the first part of this thesis an analytical bound for the unwanted skew due to interconnect variation is established. Experimental results show that this bound is safer, tighter and computationally faster than existing approaches. This bound could be used in variation-aware clock tree synthesis.The second part of the thesis deals with optimizing a given clock tree to minimize the unwanted skew variations. Non-tree CDNs have been recognized as a promising approach to overcome the variation problem. We propose a novel non-tree CDN obtained by adding cross links in an existing clock tree. We analyze the effect of the link insertion on clock skew variability and propose link insertion schemes. The non-tree CDNs so obtained are shown to be highly tolerant to skew variability with very little increase in total wire-length. This can be used in applications such as ASIC design where a significant increase in the total wire-length is unacceptable

    Interconnect performance estimation models for design planning

    Full text link

    Timing modeling and optimization under the transmission line model

    Full text link

    Spiking Neural Networks for Inference and Learning: A Memristor-based Design Perspective

    Get PDF
    On metrics of density and power efficiency, neuromorphic technologies have the potential to surpass mainstream computing technologies in tasks where real-time functionality, adaptability, and autonomy are essential. While algorithmic advances in neuromorphic computing are proceeding successfully, the potential of memristors to improve neuromorphic computing have not yet born fruit, primarily because they are often used as a drop-in replacement to conventional memory. However, interdisciplinary approaches anchored in machine learning theory suggest that multifactor plasticity rules matching neural and synaptic dynamics to the device capabilities can take better advantage of memristor dynamics and its stochasticity. Furthermore, such plasticity rules generally show much higher performance than that of classical Spike Time Dependent Plasticity (STDP) rules. This chapter reviews the recent development in learning with spiking neural network models and their possible implementation with memristor-based hardware

    An efficient and optimal algorithm for simultaneous buffer and wire sizing

    Full text link

    High-performance and Low-power Clock Network Synthesis in the Presence of Variation.

    Full text link
    Semiconductor technology scaling requires continuous evolution of all aspects of physical design of integrated circuits. Among the major design steps, clock-network synthesis has been greatly affected by technology scaling, rendering existing methodologies inadequate. Clock routing was previously sufficient for smaller ICs, but design difficulty and structural complexity have greatly increased as interconnect delay and clock frequency increased in the 1990s. Since a clock network directly influences IC performance and often consumes a substantial portion of total power, both academia and industry developed synthesis methodologies to achieve low skew, low power and robustness from PVT variations. Nevertheless, clock network synthesis under tight constraints is currently the least automated step in physical design and requires significant manual intervention, undermining turn-around-time. The need for multi-objective optimization over a large parameter space and the increasing impact of process variation make clock network synthesis particularly challenging. Our work identifies new objectives, constraints and concerns in the clock-network synthesis for systems-on-chips and microprocessors. To address them, we generate novel clock-network structures and propose changes in traditional physical-design flows. We develop new modeling techniques and algorithms for clock power optimization subject to tight skew constraints in the presence of process variations. In particular, we offer SPICE-accurate optimizations of clock networks, coordinated to reduce nominal skew below 5 ps, satisfy slew constraints and trade-off skew, insertion delay and power, while tolerating variations. To broaden the scope of clock-network-synthesis optimizations, we propose new techniques and a methodology to reduce dynamic power consumption by 6.8%-11.6% for large IC designs with macro blocks by integrating clock network synthesis within global placement. We also present a novel non-tree topology that is 2.3x more power-efficient than mesh structures. We fuse several clock trees to create large-scale redundancy in a clock network to bridge the gap between tree-like and mesh-like topologies. Integrated optimization techniques for high-quality clock networks described in this dissertation strong empirical results in experiments with recent industry-released benchmarks in the presence of process variation. Our software implementations were recognized with the first-place awards at the ISPD 2009 and ISPD 2010 Clock-Network Synthesis Contests organized by IBM Research and Intel Research.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89711/1/ejdjsy_1.pd

    Power and Thermal Management of System-on-Chip

    Get PDF

    Case Studies on Clock Gating and Local Routign for VLSI Clock Mesh

    Get PDF
    The clock is the important synchronizing element in all synchronous digital systems. The difference in the clock arrival time between sink points is called the clock skew. This uncertainty in arrival times will limit operating frequency and might cause functional errors. Various clock routing techniques can be broadly categorized into 'balanced tree' and 'fixed mesh' methods. The skew and delay using the balanced tree method is higher compared to the fixed mesh method. Although fixed mesh inherently uses more wire length, the redundancy created by loops in a mesh structure reduces undesired delay variations. The fixed mesh method uses a single mesh over the entire chip but it is hard to introduce clock gating in a single clock mesh. This thesis deals with the introduction of 'reconfigurability' by using control structures like transmission gates between sub-clock meshes, thus enabling clock gating in clock mesh. By using the optimum value of size for PMOS and NMOS of transmission gate (SZF) and optimum number of transmission gates between sub-clock meshes (NTG) for 4x4 reconfigurable mesh, the average of the maximum skew for all benchmarks is reduced by 18.12 percent compared to clock mesh structure when no transmission gates are used between the sub-clock meshes (reconfigurable mesh with NTG =0). Further, the research deals with a ‘modified zero skew method' to connect synchronous flip-flops or sink points in the circuit to the clock grids of clock mesh. The wire length reduction algorithms can be applied to reduce the wire length used for a local clock distribution network. The modified version of ‘zero skew method’ of local clock routing which is based on Elmore delay balancing aims at minimizing wire length for the given bounded skew of CDN using clock mesh and H-tree. The results of ‘modified zero skew method' (HC_MZSK) show average local wire length reduction of 17.75 percent for all ISPD benchmarks compared to direct connection method. The maximum skew is small for HC_MZSK in most of the test cases compared to other methods of connections like direct connections and modified AHHK. Thus, HC_MZSK for local routing reduces the wire length and maximum skew

    Efficient schemes to size transistors for optimal delay by solving fanout branches with balancing algorithm

    Get PDF
    High performance digital system requires minimal logic and properly sized transistor to operate in all PVT corners. Specifically, high-speed data-path design is mostly about optimizing the system for better timing. In this work, the author proposed a better timing model to analyze parallel data-paths better for performance comparison. Moreover, a novel transistor sizing technique is also proposed as part of the work to minimize delay in parallel data-path circuits in the presence of practical wire capacitance. With this technique it is easier to calculate the optimal capacitance distribution in a fanout branch path that equalizes the delays in all branches as well as minimizes the overall delay starting from the primary inputs to the primary outputs of a circuit. The problem is widely termed as the "Load distribution problem at branch". A collection of fast algorithms were designed to accurately solve the load distribution problem for branch in digital circuits for optimal delay. The author used prior work on Unified Logical Effort[1] as a tool for delay estimation and transistor sizing. This research work also shows the impact of branching on critical path. Experiments are run on industry standard circuits using different types of tools developed to model the circuit. The new developed theories are tested on the circuit models , that are also included in this work

    Timing optimization during the physical synthesis of cell-based VLSI circuits

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia de Automação e Sistemas, Florianópolis, 2016.Abstract : The evolution of CMOS technology made possible integrated circuits with billions of transistors assembled into a single silicon chip, giving rise to the jargon Very-Large-Scale Integration (VLSI). The required clock frequency affects the performance of a VLSI circuit and induces timing constraints that must be properly handled by synthesis tools. During the physical synthesis of VLSI circuits, several optimization techniques are used to iteratively reduce the number of timing violations until the target clock frequency is met. The dramatic increase of interconnect delay under technology scaling represents one of the major challenges for the timing closure of modern VLSI circuits. In this scenario, effective interconnect synthesis techniques play a major role. That is why this thesis targets two timing optimization problems for effective interconnect synthesis: Incremental Timing-Driven Placement (ITDP) and Incremental Timing-Driven Layer Assignment (ITLA). For solving the ITDP problem, this thesis proposes a new Lagrangian Relaxation formulation that minimizes timing violations for both setup and hold timing constraints. This work also proposes a netbased technique that uses Lagrange multipliers as net-weights, which are dynamically updated using an accurate timing analyzer. The netbased technique makes use of a novel discrete search to relocate cells by employing the Euclidean distance to define a proper neighborhood. For solving the ITLA problem, this thesis proposes a network flow approach that handles simultaneously critical and non-critical segments, and exploits a few flow conservation conditions to extract timing information for each net segment individually, thereby enabling the use of an external timing engine. The experimental validation using benchmark suites derived from industrial circuits demonstrates the effectiveness of the proposed techniques when compared with state-of-the-art works.A evolução da tecnologia CMOS viabilizou a fabricação de circuitos integrados contendo bilhões de transistores em uma única pastilha de silício, dando origem ao jargão Very-Large-Scale Integration (VLSI). A frequência-alvo de operação de um circuito VLSI afeta o seu desempenho e induz restrições de timing que devem ser manipuladas pelas ferramentas de síntese. Durante a síntese física de circuitos VLSI, diversas técnicas de otimização são usadas para iterativamente reduzir o número de violações de timing até que a frequência-alvo de operação seja atingida. O aumento dramático do atraso das interconexões devido à evolução tecnológica representa um dos maiores desafios para o fluxo de timing closure de circuitos VLSI contemporâneos. Nesse cenário, técnicas de síntese de interconexão eficientes têm um papel fundamental. Por este motivo, esta tese aborda dois problemas de otimização de timing para uma síntese eficiente das interconexões de um circuito VLSI: Incremental Timing-Driven Placement (ITDP) e Incremental Timing-Driven Layer Assignment (ITLA). Para resolver o problema de ITDP, esta tese propõe uma nova formulação utilizando Relaxação Lagrangeana que tem por objetivo a minimização simultânea das violações de timing para restrições do tipo setup e hold. Este trabalho também propõe uma técnica que utiliza multiplicadores de Lagrange como pesos para as interconexões, os quais são atualizados dinamicamente através dos resultados de uma ferramenta de análise de timing. Tal técnica realoca as células do circuito por meio de uma nova busca discreta que adota a distância Euclidiana como vizinhança.Para resolver o problema de ITLA, esta tese propõe uma abordagem em fluxo em redes que otimiza simultaneamente segmentos críticos e não-críticos, e explora algumas condições de fluxo para extrair as informações de timing para cada segmento individualmente, permitindo assim o uso de uma ferramenta de timing externa. A validação experimental, utilizando benchmarks derivados de circuitos industriais, demonstra a eficiência das técnicas propostas quando comparadas com trabalhos estado da arte
    corecore