33 research outputs found

    Timing optimization during the physical synthesis of cell-based VLSI circuits

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia de Automação e Sistemas, Florianópolis, 2016.Abstract : The evolution of CMOS technology made possible integrated circuits with billions of transistors assembled into a single silicon chip, giving rise to the jargon Very-Large-Scale Integration (VLSI). The required clock frequency affects the performance of a VLSI circuit and induces timing constraints that must be properly handled by synthesis tools. During the physical synthesis of VLSI circuits, several optimization techniques are used to iteratively reduce the number of timing violations until the target clock frequency is met. The dramatic increase of interconnect delay under technology scaling represents one of the major challenges for the timing closure of modern VLSI circuits. In this scenario, effective interconnect synthesis techniques play a major role. That is why this thesis targets two timing optimization problems for effective interconnect synthesis: Incremental Timing-Driven Placement (ITDP) and Incremental Timing-Driven Layer Assignment (ITLA). For solving the ITDP problem, this thesis proposes a new Lagrangian Relaxation formulation that minimizes timing violations for both setup and hold timing constraints. This work also proposes a netbased technique that uses Lagrange multipliers as net-weights, which are dynamically updated using an accurate timing analyzer. The netbased technique makes use of a novel discrete search to relocate cells by employing the Euclidean distance to define a proper neighborhood. For solving the ITLA problem, this thesis proposes a network flow approach that handles simultaneously critical and non-critical segments, and exploits a few flow conservation conditions to extract timing information for each net segment individually, thereby enabling the use of an external timing engine. The experimental validation using benchmark suites derived from industrial circuits demonstrates the effectiveness of the proposed techniques when compared with state-of-the-art works.A evolução da tecnologia CMOS viabilizou a fabricação de circuitos integrados contendo bilhões de transistores em uma única pastilha de silício, dando origem ao jargão Very-Large-Scale Integration (VLSI). A frequência-alvo de operação de um circuito VLSI afeta o seu desempenho e induz restrições de timing que devem ser manipuladas pelas ferramentas de síntese. Durante a síntese física de circuitos VLSI, diversas técnicas de otimização são usadas para iterativamente reduzir o número de violações de timing até que a frequência-alvo de operação seja atingida. O aumento dramático do atraso das interconexões devido à evolução tecnológica representa um dos maiores desafios para o fluxo de timing closure de circuitos VLSI contemporâneos. Nesse cenário, técnicas de síntese de interconexão eficientes têm um papel fundamental. Por este motivo, esta tese aborda dois problemas de otimização de timing para uma síntese eficiente das interconexões de um circuito VLSI: Incremental Timing-Driven Placement (ITDP) e Incremental Timing-Driven Layer Assignment (ITLA). Para resolver o problema de ITDP, esta tese propõe uma nova formulação utilizando Relaxação Lagrangeana que tem por objetivo a minimização simultânea das violações de timing para restrições do tipo setup e hold. Este trabalho também propõe uma técnica que utiliza multiplicadores de Lagrange como pesos para as interconexões, os quais são atualizados dinamicamente através dos resultados de uma ferramenta de análise de timing. Tal técnica realoca as células do circuito por meio de uma nova busca discreta que adota a distância Euclidiana como vizinhança.Para resolver o problema de ITLA, esta tese propõe uma abordagem em fluxo em redes que otimiza simultaneamente segmentos críticos e não-críticos, e explora algumas condições de fluxo para extrair as informações de timing para cada segmento individualmente, permitindo assim o uso de uma ferramenta de timing externa. A validação experimental, utilizando benchmarks derivados de circuitos industriais, demonstra a eficiência das técnicas propostas quando comparadas com trabalhos estado da arte

    Rapid SoC Design: On Architectures, Methodologies and Frameworks

    Full text link
    Modern applications like machine learning, autonomous vehicles, and 5G networking require an order of magnitude boost in processing capability. For several decades, chip designers have relied on Moore’s Law - the doubling of transistor count every two years to deliver improved performance, higher energy efficiency, and an increase in transistor density. With the end of Dennard’s scaling and a slowdown in Moore’s Law, system architects have developed several techniques to deliver on the traditional performance and power improvements we have come to expect. More recently, chip designers have turned towards heterogeneous systems comprised of more specialized processing units to buttress the traditional processing units. These specialized units improve the overall performance, power, and area (PPA) metrics across a wide variety of workloads and applications. While the GPU serves as a classical example, accelerators for machine learning, approximate computing, graph processing, and database applications have become commonplace. This has led to an exponential growth in the variety (and count) of these compute units found in modern embedded and high-performance computing platforms. The various techniques adopted to combat the slowing of Moore’s Law directly translates to an increase in complexity for modern system-on-chips (SoCs). This increase in complexity in turn leads to an increase in design effort and validation time for hardware and the accompanying software stacks. This is further aggravated by fabrication challenges (photo-lithography, tooling, and yield) faced at advanced technology nodes (below 28nm). The inherent complexity in modern SoCs translates into increased costs and time-to-market delays. This holds true across the spectrum, from mobile/handheld processors to high-performance data-center appliances. This dissertation presents several techniques to address the challenges of rapidly birthing complex SoCs. The first part of this dissertation focuses on foundations and architectures that aid in rapid SoC design. It presents a variety of architectural techniques that were developed and leveraged to rapidly construct complex SoCs at advanced process nodes. The next part of the dissertation focuses on the gap between a completed design model (in RTL form) and its physical manifestation (a GDS file that will be sent to the foundry for fabrication). It presents methodologies and a workflow for rapidly walking a design through to completion at arbitrary technology nodes. It also presents progress on creating tools and a flow that is entirely dependent on open-source tools. The last part presents a framework that not only speeds up the integration of a hardware accelerator into an SoC ecosystem, but emphasizes software adoption and usability.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168119/1/ajayi_1.pd

    Multi-objective Optimisation of Digital Circuits based on Cell Mapping in an Industrial EDA Flow

    Get PDF
    Modern electronic design automation (EDA) tools can handle the complexity of state-of-the-art electronic systems by decomposing them into smaller blocks or cells, introducing different levels of abstraction and staged design flows. However, throughout each independent-optimised design step, overhead and inefficiency can accumulate in the resulting overall design. Performing design-specific optimisation from a more global viewpoint requires more time due to the larger search space, but has the potential to provide solutions with improved performance. In this work, a fully-automated, multi-objective (MO) EDA flow is introduced to address this issue. It specifically tunes drive strength mapping, preceding physical implementation, through multi-objective population-based search algorithms. Designs are evaluated with respect to their power, performance and area (PPA). The proposed approach is aimed at digital circuit optimisation at the block-level, where it is capable of expanding the design space and offers a set of trade-off solutions for different case-specific utilisation. We have applied the proposed MOEDA framework to ISCAS-85 and EPFL benchmark circuits using a commercial 65nm standard cell library. The experimental results demonstrate how the MOEDA flow enhances the solutions initially generated by the standard digital flow, and how simultaneously a significant improvement in PPA metrics is achieved

    Multi-objective digital circuit block optimisation based on cell mapping in an industrial electronic design automation flow

    Get PDF
    Abstract Modern electronic design automation (EDA) tools can handle the complexity of state‐of‐the‐art electronic systems by decomposing them into smaller blocks or cells, introducing different levels of abstraction and staged design flows. However, throughout each independently optimised design step, overheads and inefficiencies can accumulate in the resulting overall design. Performing design‐specific optimisation from a more global viewpoint requires more time due to the larger search space but has the potential to provide solutions with improved performanc. In this work, a fully‐automated, multi‐objective (MO) EDA flow is introduced to address this issue. It specifically tunes drive strength mapping, prior to physical implementation, through MO population‐based search algorithms. Designs are evaluated with respect to their power, performance and area (PPA). The proposed approach is aimed at digital circuit optimisation at the block level, where it is capable of expanding the design space and offers a set of trade‐off solutions for different case‐specific utilisation. We have applied the proposed multi‐objective electronic design automation flow (MOEDA) framework to ISCAS‐85 and EPFL benchmark circuits by using a commercial 65 nm standard cell library. The experimental results demonstrate how the MOEDA flow enhances the solutions initially generated by the standard digital flow and how simultaneously a significant improvement in PPA metrics is achieved

    Machine Learning Techniques for Performance Prediction and Diagnosis of VLSI Designs

    Get PDF
    As the cost of scaling-down the manufacturing process of integrated circuits grows larger and its performance gains become smaller, designs must grow in complexity in order to achieve expected performance improvements. As this complexity grows, the development of automation tools for design, validation, and debug is critical. The number of machine learning-based techniques aiming to improve available tools has grown rapidly in recent years, as machine learning has proven an extraordinary capability of extracting knowledge from data and handling complicated non-linear behaviors, which makes it the best approach to mimic a human manual process among mathematical or algorithmic options. The work presented in this dissertation aims to evaluate the application of machine learning techniques in two different areas of the integrated circuit design process: pre-routing timing prediction and performance debugging of microprocessor cores. The strategy proposed for pre-route timing prediction is based on machine learning models that predict the post-routing timing using only placed, but un-routed circuit databases. This strategy prevents over-design due to pessimistic timing estimations, as well as it saves time by reducing the need of multiple design iterations caused by the use of inaccurate timing estimations to guide circuit optimizations such as gate resizing, logic restructuring, or threshold voltage assignment leading to design violations once routing is executed. The obtained results show that our models achieve a prediction quality on-par with a sign-off static timing analysis commercial tool, with a 3× speedup. For the performance debug of microprocessor cores task, we focus on bugs that affect the generation-by-generation performance improvement in new designs. This task is very challenging due to the lack of an accurate golden performance model, unlike its functional counterpart. In addition, there is a limited visibility of the performance on intermediate steps of the design, and overall, the debugging infrastructure is lacking, which makes this problem even more challenging. Currently this process is executed on a highly manual manner, which requires large amounts of time to fully characterize a bug. Therefore, automated techniques for performance debugging are essential to keep-up the performance gains obtained by new microarchitectural designs. In this dissertation, we focus on detecting the presence of a performance bug and localizing the microarchitectural unit on which the bug might be, more detailed debugging is left for future work. Our proposed techniques achieved up to a 91.5% of bug detection, and up to a 98% top-3 (out of 16 possible) bug localization accuracy on bugs with average IPC impact > 1%

    Logical simulation of communication subsystem for Universal Serial Bus (USB)

    Get PDF
    The primary purpose of this thesis was to design a logical simulation of a communication sub block to be used in the effective communication of digital data between the host and the peripheral devices. The module designed is a Serial interface engine in the Universal Serial Bus that effectively controls the flow of data for communication between the host and the peripheral devices with the emphasis on the study of timing and control signals, considering the practical aspects of them. In this study an attempt was made to realize data communication in the hardware using the Verilog Hardware Description language, which is supported by most popular logic synthesis tools. Various techniques like Cyclic Redundancy Checks, bit-stuffing and Non Return to Zero are implemented in the design to provide enhanced performance of the module
    corecore