55 research outputs found

    Synthesis and Verification of Digital Circuits using Functional Simulation and Boolean Satisfiability.

    Full text link
    The semiconductor industry has long relied on the steady trend of transistor scaling, that is, the shrinking of the dimensions of silicon transistor devices, as a way to improve the cost and performance of electronic devices. However, several design challenges have emerged as transistors have become smaller. For instance, wires are not scaling as fast as transistors, and delay associated with wires is becoming more significant. Moreover, in the design flow for integrated circuits, accurate modeling of wire-related delay is available only toward the end of the design process, when the physical placement of logic units is known. Consequently, one can only know whether timing performance objectives are satisfied, i.e., if timing closure is achieved, after several design optimizations. Unless timing closure is achieved, time-consuming design-flow iterations are required. Given the challenges arising from increasingly complex designs, failing to quickly achieve timing closure threatens the ability of designers to produce high-performance chips that can match continually growing consumer demands. In this dissertation, we introduce powerful constraint-guided synthesis optimizations that take into account upcoming timing closure challenges and eliminate expensive design iterations. In particular, we use logic simulation to approximate the behavior of increasingly complex designs leveraging a recently proposed concept, called bit signatures, which allows us to represent a large fraction of a complex circuit's behavior in a compact data structure. By manipulating these signatures, we can efficiently discover a greater set of valid logic transformations than was previously possible and, as a result, enhance timing optimization. Based on the abstractions enabled through signatures, we propose a comprehensive suite of novel techniques: (1) a fast computation of circuit don't-cares that increases restructuring opportunities, (2) a verification methodology to prove the correctness of speculative optimizations that efficiently utilizes the computational power of modern multi-core systems, and (3) a physical synthesis strategy using signatures that re-implements sections of a critical path while minimizing perturbations to the existing placement. Our results indicate that logic simulation is effective in approximating the behavior of complex designs and enables a broader family of optimizations than previous synthesis approaches.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61793/1/splaza_1.pd

    Synthesis of Clock Trees with Useful Skew based on Sparse-Graph Algorithms

    Get PDF
    Computer-aided design (CAD) for very large scale integration (VLSI) involve

    Energy-Efficient Digital Circuit Design using Threshold Logic Gates

    Get PDF
    abstract: Improving energy efficiency has always been the prime objective of the custom and automated digital circuit design techniques. As a result, a multitude of methods to reduce power without sacrificing performance have been proposed. However, as the field of design automation has matured over the last few decades, there have been no new automated design techniques, that can provide considerable improvements in circuit power, leakage and area. Although emerging nano-devices are expected to replace the existing MOSFET devices, they are far from being as mature as semiconductor devices and their full potential and promises are many years away from being practical. The research described in this dissertation consists of four main parts. First is a new circuit architecture of a differential threshold logic flipflop called PNAND. The PNAND gate is an edge-triggered multi-input sequential cell whose next state function is a threshold function of its inputs. Second a new approach, called hybridization, that replaces flipflops and parts of their logic cones with PNAND cells is described. The resulting \hybrid circuit, which consists of conventional logic cells and PNANDs, is shown to have significantly less power consumption, smaller area, less standby power and less power variation. Third, a new architecture of a field programmable array, called field programmable threshold logic array (FPTLA), in which the standard lookup table (LUT) is replaced by a PNAND is described. The FPTLA is shown to have as much as 50% lower energy-delay product compared to conventional FPGA using well known FPGA modeling tool called VPR. Fourth, a novel clock skewing technique that makes use of the completion detection feature of the differential mode flipflops is described. This clock skewing method improves the area and power of the ASIC circuits by increasing slack on timing paths. An additional advantage of this method is the elimination of hold time violation on given short paths. Several circuit design methodologies such as retiming and asynchronous circuit design can use the proposed threshold logic gate effectively. Therefore, the use of threshold logic flipflops in conventional design methodologies opens new avenues of research towards more energy-efficient circuits.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Synthesis Methodologies for Robust and Reconfigurable Clock Networks

    Get PDF
    In today\u27s aggressively scaled technology nodes, billions of transistors are packaged into a single integrated circuit. Electronic Design Automation (EDA) tools are needed to automatically assemble the transistors into a functioning system. One of the most important design steps in the physical synthesis is the design of the clock network. The clock network delivers a synchronizing clock signal to each sequential element. The clock signal is required to be delivered meeting timing constraints under variations and in multiple operating modes. Synthesizing such clock networks is becoming increasingly difficult with the complex power management methodologies and severe manufacturing variations. Clock network synthesis is an important problem because it has a direct impact on the functional correctness, the maximum operating frequency, and the overall power consumption of each synchronous integrated circuit. In this dissertation, we proposed synthesis methodologies for robust and reconfigurable clock networks. We have made three contributions to this topic. First, we have proposed a clock network optimization framework that can achieve better timing quality than previous frameworks. Our proposed framework improves timing quality by reducing the propagation delay on critical paths in a clock network using buffer sizing and layer assignment. Second, we have proposed a clock tree synthesis methodology that integrates the clock tree synthesis with the clock tree optimization. The methodology improves timing quality by avoiding to synthesize clock trees with topologies that are sensitive to variations. Third, we have proposed a clock network that can reconfigure the topology based on the active mode of operation. Lastly, we conclude the dissertation with future research directions

    Timing Closure in Chip Design

    Get PDF
    Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips

    Broadening the Scope of Multi-Objective Optimizations in Physical Synthesis of Integrated Circuits.

    Full text link
    In modern VLSI design, physical synthesis tools are primarily responsible for satisfying chip-performance constraints by invoking a broad range of circuit optimizations, such as buffer insertion, logic restructuring, gate sizing and relocation. This process is known as timing closure. Our research seeks more powerful and efficient optimizations to improve the state of the art in modern chip design. In particular, we integrate timing-driven relocation, retiming, logic cloning, buffer insertion and gate sizing in novel ways to create powerful circuit transformations that help satisfy setup-time constraints. State-of-the-art physical synthesis optimizations are typically applied at two scales: i) global algorithms that affect the entire netlist and ii) local transformations that focus on a handful of gates or interconnections. The scale of modern chip designs dictates that only near-linear-time optimization algorithms can be applied at the global scope — typically limited to wirelength-driven placement and legalization. Localized transformations can rely on more time-consuming optimizations with accurate delay models. Few techniques bridge the gap between fully-global and localized optimizations. This dissertation broadens the scope of physical synthesis optimization to include accurate transformations operating between the global and local scales. In particular, we integrate groups of related transformations to break circular dependencies and increase the number of circuit elements that can be jointly optimized to escape local minima. Integrated transformations in this dissertation are developed by identifying and removing obstacles to successful optimizations. Integration is achieved through mapping multiple operations to rigorous mathematical optimization problems that can be solved simultaneously. We achieve computational scalability in our techniques by leveraging analytical delay models and focusing optimization efforts on carefully selected regions of the chip. In this regard, we make extensive use of a linear interconnect-delay model that accounts for the impact of subsequent repeated insertion. Our integrated transformations are evaluated on high-performance circuits with over 100,000 gates. Integrated optimization techniques described in this dissertation ensure graceful timing-closure process and impact nearly every aspect of a typical physical synthesis flow. They have been validated in EDA tools used at IBM for physical synthesis of high-performance CPU and ASIC designs, where they significantly improved chip performance.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/78744/1/iamyou_1.pd

    Design and Optimization for Resilient Energy Efficient Computing

    Get PDF
    Heutzutage sind moderne elektronische Systeme ein integraler Bestandteil unseres Alltags. Dies wurde unter anderem durch das exponentielle Wachstum der Integrationsdichte von integrierten Schaltkreisen ermöglicht zusammen mit einer Verbesserung der Energieeffizienz, welche in den letzten 50 Jahren stattfand, auch bekannt als Moore‘s Gesetz. In diesem Zusammenhang ist die Nachfrage von energieeffizienten digitalen Schaltkreisen enorm angestiegen, besonders in Anwendungsfeldern wie dem Internet of Things (IoT). Da der Leistungsverbrauch von Schaltkreisen stark mit der Versorgungsspannung verknĂŒpft ist, wurden effiziente Verfahren entwickelt, welche die Versorgungsspannung in den nahen Schwellenspannung-Bereich skalieren, zusammengefasst unter dem Begriff Near-Threshold-Computing (NTC). Mithilfe dieser Verfahren kann eine Erhöhung der Energieeffizienz von Schaltungen um eine ganze GrĂ¶ĂŸenordnung ermöglicht werden. Neben der verbesserten Energiebilanz ergeben sich jedoch zahlreiche Herausforderungen was den Schaltungsentwurf angeht. Zum Beispiel fĂŒhrt das Reduzieren der Versorgungsspannung in den nahen Schwellenspannungsbereich zu einer verzehnfachten Erhöhung der SensibilitĂ€t der Schaltkreise gegenĂŒber Prozessvariation, Spannungsfluktuationen und TemperaturverĂ€nderungen. Die EinflĂŒsse dieser Variationen reduzieren die ZuverlĂ€ssigkeit von NTC Schaltkreisen und sind ihr grĂ¶ĂŸtes Hindernis bezĂŒglich einer umfassenden Nutzung. Traditionelle AnsĂ€tze und Methoden aus dem nominalen Spannungsbereich zur Kompensation von VariabilitĂ€t können nicht effizient angewandt werden, da die starken Performance-Variationen und SensitivitĂ€ten im nahen Schwellenspannungsbereich dessen KapazitĂ€ten ĂŒbersteigen. Aus diesem Grund sind neue Entwurfsparadigmen und Entwurfsautomatisierungskonzepte fĂŒr die Anwendung von NTC erforderlich. Das Ziel dieser Arbeit ist die zuvor erwĂ€hnten Probleme durch die Bereitstellung von ganzheitlichen Methoden zum Design von NTC Schaltkreisen sowie dessen Entwurfsautomatisierung anzugehen, welche insbesondere auf der Schaltungs- sowie Logik-Ebene angewandt werden. Dabei werden tiefgehende Analysen der ZuverlĂ€ssigkeit von NTC Systemen miteinbezogen und Optimierungsmethoden werden vorgeschlagen welche die ZuverlĂ€ssigkeit, Performance und Energieeffizienz verbessern. Die BeitrĂ€ge dieser Arbeit sind wie folgt: Schaltungssynthese und Timing Closure unter Einbezug von Variationen: Das Einhalten von Anforderungen an das zeitliche Verhalten und ZuverlĂ€ssigkeit von NTC ist eine anspruchsvolle Aufgabe. Die Auswirkungen von VariabilitĂ€t kommen bei starken Performance-Schwankungen, welche zu teuren zeitlichen Sicherheitsmargen fĂŒhren, oder sich in Hold-Time VerstĂ¶ĂŸen ausdrĂŒcken, verursacht durch funktionale Störungen, zum Vorschein. Die konventionellen AnsĂ€tze beschrĂ€nken sich dabei alleine auf die Erhöhung von zeitlichen Sicherheitsmargen. Dies ist jedoch sehr ineffizient fĂŒr NTC, wegen dem starken Ausmaß an Variationen und den erhöhten Leckströmen. In dieser Arbeit wird ein Konzept zur Synthese und Timing Closure von Schaltkreisen unter Variationen vorgestellt, welches sowohl die SensitivitĂ€t gegenĂŒber Variationen reduziert als auch die Energieeffizienz, Performance und ZuverlĂ€ssigkeit verbessert und zugleich den Mehraufwand von Timing Closures [1, 2] verringert. Simulationsergebnisse belegen, dass unser vorgeschlagener Ansatz die Verzögerungszeit um 87% reduziert und die Performance und Energieeffizienz um 25% beziehungsweise 7.4% verbessert, zu Kosten eines erhöhten FlĂ€chenbedarfs von 4.8%. SchichtĂŒbergreifende ZuverlĂ€ssigkeits-, Energieeffizienz- und Performance-Optimierung von Datenpfaden: SchichtĂŒbergreifende Analyse von Prozessor-Datenpfaden, welche den ganzen Weg spannen vom Kompilierer zum Schaltungsentwurf, kann potenzielle OptimierungsansĂ€tze aufzeigen. Ein Datenpfad ist eine Kombination von mehreren funktionalen Einheiten, welche diverse Instruktionen verarbeiten können. Unsere Analyse zeigt, dass die AusfĂŒhrungszeiten von Instruktionen bei niedrigen Versorgungsspannungen stark variieren, weshalb eine Klassifikation in schnelle und langsame Instruktionen vorgenommen werden kann. Des Weiteren können funktionale Instruktionen als hĂ€ufig und selten genutzte Instruktionen kategorisiert werden. Diese Arbeit stellt eine Multi-Zyklen-Instruktionen-Methode vor, welche die Energieeffizienz und Belastbarkeit von funktionalen Einheiten erhöhen kann [3]. ZusĂ€tzlich stellen wir einen Partitionsalgorithmus vor, welcher ein fein-granulares Power-gating von selten genutzten Einheiten ermöglicht [4] durch Partition von einzelnen funktionalen Einheiten in mehrere kleinere Einheiten. Die vorgeschlagenen Methoden verbessern das zeitliche Schaltungsverhalten signifikant, und begrenzen zugleich die Leckströme betrĂ€chtlich, durch Einsatz einer Kombination von Schaltungs-Redesign- und Code-Replacement-Techniken. Simulationsresultate zeigen, dass die entwickelten Methoden die Performance und Energieeffizienz von arithmetisch-logischen Einheiten (ALU) um 19% beziehungsweise 43% verbessern. Des Weiteren kann der Zuwachs in Performance der optimierten Schaltungen in eine Verbesserung der ZuverlĂ€ssigkeit umgewandelt werden [5, 6]. Post-Fabrication und Laufzeit-Tuning: Prozess- und Laufzeitvariationen haben einen starken Einfluss auf den Minimum Energy Point (MEP) von NTC-Schaltungen, welcher mit der energieeffizientesten Versorgungsspannung assoziiert ist. Es ist ein besonderes Anliegen, die NTC-Schaltung nach der Herstellung (post-fabrication) so zu kalibrieren, dass sich die Schaltung im MEP-Zustand befindet, um die beste Energieeffizient zu erreichen. In dieser Arbeit, werden Post-Fabrication und Laufzeit-Tuning vorgeschlagen, welche die Schaltung basierend auf Geschwindigkeits- und Leistungsverbrauch-Messungen nach der Herstellung auf den MEP kalibrieren. Die vorgestellten Techniken ermitteln den MEP per Chip-Basis um den Einfluss von Prozessvariationen mit einzubeziehen und dynamisch die Versorgungsspannung und Frequenz zu adaptieren um zeitabhĂ€ngige Variationen wie Workload und Temperatur zu adressieren. Zu diesem Zweck wird in die Firmware eines Chips ein Regression-Modell integriert, welches den MEP basierend auf Workload- und Temperatur-Messungen zur Laufzeit extrahiert. Das Regressions-Modell ist fĂŒr jeden Chip einzigartig und basiert lediglich auf Post-Fabrication-Messungen. Simulationsergebnisse zeigen das der entwickelte Ansatz eine sehr hohe prognostische Treffsicherheit und Energieeffizienz hat, Ă€hnlich zu hardware-implementierten Methoden, jedoch ohne hardware-seitigen Mehraufwand [7, 8]. Selektierte Flip-Flop Optimierung: Ultra-Low-Voltage Schaltungen mĂŒssen im nominalen Versorgungsspannungs-Mode arbeiten um zeitliche Anforderungen von laufenden Anwendungen zu erfĂŒllen. In diesem Fall ist die Schaltung von starken Alterungsprozessen betroffen, welche die Transistoren durch Erhöhung der Schwellenspannungen degradieren. Unsere tiefgehenden Analysen haben gezeigt das gewisse Flip-Flop-Architekturen von diesen Alterungserscheinungen beeinflusst werden indem fĂ€lschlicherweise konstante Werte ( \u270\u27 oder \u271\u27) fĂŒr eine lange Zeit gespeichert sind. Im Vergleich zu anderen Komponenten sind Flip-Flops sensitiver zu Alterungsprozessen und versagen unter anderem dabei einen neuen Wert innerhalb des vorgegebenen zeitlichen Rahmens zu ĂŒbernehmen. Außerdem kann auch ein geringfĂŒgiger Spannungsabfall zu diesen zeitlichen VerstĂ¶ĂŸen fĂŒhren, falls die betreffenden gealterten Flip-Flops zum kritischen Pfad zuzuordnen sind. In dieser Arbeit wird eine selektiver Flip-Flop-Optimierungsmethode vorgestellt, welche die Schaltungen bezĂŒglich Robustheit gegen statische Alterung und Spannungsabfall optimieren. Dabei werden zuerst optimierte robuste Flip-Flops generiert und diese dann anschließend in die Standard-Zellen-Bibliotheken integriert. Flip-Flops, die in der Schaltung zum kritischen Pfad gehören und Alterung sowie Spannungsabfall erfahren, werden durch die optimierten robusten Versionen ersetzt, um das Zeitverhalten und die ZuverlĂ€ssigkeit der Schaltung zu verbessern [9, 10]. Simulationsergebnisse zeigen, dass die erwartete Lebenszeit eines Prozessors um 37% verbessert werden kann, wĂ€hrend Leckströme um nur 0.1% erhöht werden. WĂ€hrend NTC das Potenzial hat große Energieeffizienz zu ermöglichen, ist der Einsatz in neue Anwendungsfeldern wie IoT wegen den zuvor erwĂ€hnten Problemen bezĂŒglich der hohen SensitivitĂ€t gegenĂŒber Variationen und deshalb mangelnder ZuverlĂ€ssigkeit, noch nicht durchsetzbar. In dieser Dissertation und in noch nicht publizierten Werken [11–17], stellen wir Lösungen zu diesen Problemen vor, die eine Integration von NTC in heutige Systeme ermöglichen

    Some Applications of the Weighted Combinatorial Laplacian

    Get PDF
    The weighted combinatorial Laplacian of a graph is a symmetric matrix which is the discrete analogue of the Laplacian operator. In this thesis, we will study a new application of this matrix to matching theory yielding a new characterization of factor-criticality in graphs and matroids. Other applications are from the area of the physical design of very large scale integrated circuits. The placement of the gates includes the minimization of a quadratic form given by a weighted Laplacian. A method based on the dual constrained subgradient method is proposed to solve the simultaneous placement and gate-sizing problem. A crucial step of this method is the projection to the flow space of an associated graph, which can be performed by minimizing a quadratic form given by the unweighted combinatorial Laplacian.Andwendungen der gewichteten kombinatorischen Laplace-Matrix Die gewichtete kombinatorische Laplace-Matrix ist das diskrete Analogon des Laplace-Operators. In dieser Arbeit stellen wir eine neuartige Charakterisierung von Faktor-KritikalitĂ€t von Graphen und Matroiden mit Hilfe dieser Matrix vor. Wir untersuchen andere Anwendungen im Bereich des Entwurfs von höchstintegrierten Schaltkreisen. Die Platzierung basiert auf der Minimierung einer quadratischen Form, die durch eine gewichtete kombinatorische Laplace-Matrix gegeben ist. Wir prĂ€sentieren einen Algorithmus fĂŒr das allgemeine simultane Platzierungs- und GattergrĂ¶ĂŸen-Optimierungsproblem, der auf der dualen Subgradientenmethode basiert. Ein wichtiger Bestandteil dieses Verfahrens ist eine Projektion auf den Flussraum eines assoziierten Graphen, die als die Minimierung einer durch die Laplace-Matrix gegebenen quadratischen Form aufgefasst werden kann

    A Structured Design Methodology for High Performance VLSI Arrays

    Get PDF
    abstract: The geometric growth in the integrated circuit technology due to transistor scaling also with system-on-chip design strategy, the complexity of the integrated circuit has increased manifold. Short time to market with high reliability and performance is one of the most competitive challenges. Both custom and ASIC design methodologies have evolved over the time to cope with this but the high manual labor in custom and statistic design in ASIC are still causes of concern. This work proposes a new circuit design strategy that focuses mostly on arrayed structures like TLB, RF, Cache, IPCAM etc. that reduces the manual effort to a great extent and also makes the design regular, repetitive still achieving high performance. The method proposes making the complete design custom schematic but using the standard cells. This requires adding some custom cells to the already exhaustive library to optimize the design for performance. Once schematic is finalized, the designer places these standard cells in a spreadsheet, placing closely the cells in the critical paths. A Perl script then generates Cadence Encounter compatible placement file. The design is then routed in Encounter. Since designer is the best judge of the circuit architecture, placement by the designer will allow achieve most optimal design. Several designs like IPCAM, issue logic, TLB, RF and Cache designs were carried out and the performance were compared against the fully custom and ASIC flow. The TLB, RF and Cache were the part of the HEMES microprocessor.Dissertation/ThesisPh.D. Electrical Engineering 201
    • 

    corecore