178 research outputs found

    Physical design of USB1.1

    Get PDF
    In earlier days, interfacing peripheral devices to host computer has a big problematic. There existed so many different kinds’ ports like serial port, parallel port, PS/2 etc. And their use restricts many situations, Such as no hot-pluggability and involuntary configuration. There are very less number of methods to connect the peripheral devices to host computer. The main reason that Universal Serial Bus was implemented to provide an additional benefits compared to earlier interfacing ports. USB is designed to allow many peripheral be connecting using single standardize interface. It provides an expandable fast, cost effective, hot-pluggable plug and play serial hardware interface that makes life of computer user easier allowing them to plug different devices to into USB port and have them configured automatically. In this thesis demonstrated the USB v1.1 architecture part in briefly and generated gate level net list form RTL code by applying the different constraints like timing, area and power. By applying the various types design constraints so that the performance was improved by 30%. And then it implemented in physically by using SoC encounter EDI system, estimation of chip size, power analysis and routing the clock signal to all flip-flops presented in the design. To reduce the clock switching power implemented register clustering algorithm (DBSCAN). In this design implementation TSMC 180nm technology library is used

    Physical Design Methodologies for Low Power and Reliable 3D ICs

    Get PDF
    As the semiconductor industry struggles to maintain its momentum down the path following the Moore's Law, three dimensional integrated circuit (3D IC) technology has emerged as a promising solution to achieve higher integration density, better performance, and lower power consumption. However, despite its significant improvement in electrical performance, 3D IC presents several serious physical design challenges. In this dissertation, we investigate physical design methodologies for 3D ICs with primary focus on two areas: low power 3D clock tree design, and reliability degradation modeling and management. Clock trees are essential parts for digital system which dissipate a large amount of power due to high capacitive loads. The majority of existing 3D clock tree designs focus on minimizing the total wire length, which produces sub-optimal results for power optimization. In this dissertation, we formulate a 3D clock tree design flow which directly optimizes for clock power. Besides, we also investigate the design methodology for clock gating a 3D clock tree, which uses shutdown gates to selectively turn off unnecessary clock activities. Different from the common assumption in 2D ICs that shutdown gates are cheap thus can be applied at every clock node, shutdown gates in 3D ICs introduce additional control TSVs, which compete with clock TSVs for placement resources. We explore the design methodologies to produce the optimal allocation and placement for clock and control TSVs so that the clock power is minimized. We show that the proposed synthesis flow saves significant clock power while accounting for available TSV placement area. Vertical integration also brings new reliability challenges including TSV's electromigration (EM) and several other reliability loss mechanisms caused by TSV-induced stress. These reliability loss models involve complex inter-dependencies between electrical and thermal conditions, which have not been investigated in the past. In this dissertation we set up an electrical/thermal/reliability co-simulation framework to capture the transient of reliability loss in 3D ICs. We further derive and validate an analytical reliability objective function that can be integrated into the 3D placement design flow. The reliability aware placement scheme enables co-design and co-optimization of both the electrical and reliability property, thus improves both the circuit's performance and its lifetime. Our electrical/reliability co-design scheme avoids unnecessary design cycles or application of ad-hoc fixes that lead to sub-optimal performance. Vertical integration also enables stacking DRAM on top of CPU, providing high bandwidth and short latency. However, non-uniform voltage fluctuation and local thermal hotspot in CPU layers are coupled into DRAM layers, causing a non-uniform bit-cell leakage (thereby bit flip) distribution. We propose a performance-power-resilience simulation framework to capture DRAM soft error in 3D multi-core CPU systems. In addition, a dynamic resilience management (DRM) scheme is investigated, which adaptively tunes CPU's operating points to adjust DRAM's voltage noise and thermal condition during runtime. The DRM uses dynamic frequency scaling to achieve a resilience borrow-in strategy, which effectively enhances DRAM's resilience without sacrificing performance. The proposed physical design methodologies should act as important building blocks for 3D ICs and push 3D ICs toward mainstream acceptance in the near future

    CAD methodologies for low power and reliable 3D ICs

    Get PDF
    The main objective of this dissertation is to explore and develop computer-aided-design (CAD) methodologies and optimization techniques for reliability, timing performance, and power consumption of through-silicon-via(TSV)-based and monolithic 3D IC designs. The 3D IC technology is a promising answer to the device scaling and interconnect problems that industry faces today. Yet, since multiple dies are stacked vertically in 3D ICs, new problems arise such as thermal, power delivery, and so on. New physical design methodologies and optimization techniques should be developed to address the problems and exploit the design freedom in 3D ICs. Towards the objective, this dissertation includes four research projects. The first project is on the co-optimization of traditional design metrics and reliability metrics for 3D ICs. It is well known that heat removal and power delivery are two major reliability concerns in 3D ICs. To alleviate thermal problem, two possible solutions have been proposed: thermal-through-silicon-vias (T-TSVs) and micro-fluidic-channel (MFC) based cooling. For power delivery, a complex power distribution network is required to deliver currents reliably to all parts of the 3D IC while suppressing the power supply noise to an acceptable level. However, these thermal and power networks pose major challenges in signal routability and congestion. In this project, a co-optimization methodology for signal, power, and thermal interconnects in 3D ICs is presented. The goal of the proposed approach is to improve signal, thermal, and power noise metrics and to provide fast and accurate design space explorations for early design stages. The second project is a study on 3D IC partition. For a 3D IC, the target circuit needs to be partitioned into multiple parts then mapped onto the dies. The partition style impacts design quality such as footprint, wirelength, timing, and so on. In this project, the design methodologies of 3D ICs with different partition styles are demonstrated. For the LEON3 multi-core microprocessor, three partitioning styles are compared: core-level, block-level, and gate-level. The design methodologies for such partitioning styles and their implications on the physical layout are discussed. Then, to perform timing optimizations for 3D ICs, two timing constraint generation methods are demonstrated that lead to different design quality. The third project is on the buffer insertion for timing optimization of 3D ICs. For high performance 3D ICs, it is crucial to perform thorough timing optimizations. Among timing optimization techniques, buffer insertion is known to be the most effective way. The TSVs have a large parasitic capacitance that increases the signal slew and the delay on the downstream. In this project, a slew-aware buffer insertion algorithm is developed that handles full 3D nets and considers TSV parasitics and slew effects on delay. Compared with the well-known van Ginneken algorithm and a commercial tool, the proposed algorithm finds buffering solutions with lower delay values and acceptable runtime overhead. The last project is on the ultra-high-density logic designs for monolithic 3D ICs. The nano-scale 3D interconnects available in monolithic 3D IC technology enable ultra-high-density device integration at the individual transistor-level. The benefits and challenges of monolithic 3D integration technology for logic designs are investigated. First, a 3D standard cell library for transistor-level monolithic 3D ICs is built and their timing and power behavior are characterized. Then, various interconnect options for monolithic 3D ICs that improve design quality are explored. Next, timing-closed, full-chip GDSII layouts are built and iso-performance power comparisons with 2D IC designs are performed. Important design metrics such as area, wirelength, timing, and power consumption are compared among transistor-level monolithic 3D, gate-level monolithic 3D, TSV-based 3D, and traditional 2D designs.PhDCommittee Chair: Lim, Sung Kyu; Committee Member: Bakir, Muhannad; Committee Member: Kim, Hyesoon; Committee Member: Lee, Hsien-Hsin; Committee Member: Mukhopadhyay, Saiba

    Timing Closure in Chip Design

    Get PDF
    Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips

    The impact of design techniques in the reduction of power consumption of SoCs Multimedia

    Get PDF
    Orientador: Guido Costa Souza de AraújoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A indústria de semicondutores sempre enfrentou fortes demandas em resolver problema de dissipação de calor e reduzir o consumo de energia em dispositivos. Esta tendência tem sido intensificada nos últimos anos com o movimento de sustentabilidade ambiental. A concepção correta de um sistema eletrônico de baixo consumo de energia é um problema de vários níveis de complexidade e exige estratégias sistemáticas na sua construção. Fora disso, a adoção de qualquer técnica de redução de energia sempre está vinculada com objetivos especiais e provoca alguns impactos no projeto. Apesar dos projetistas conheçam bem os impactos de forma qualitativa, as detalhes quantitativas ainda são incógnitas ou apenas mantidas dentro do 'know-how' das empresas. Neste trabalho, de acordo com resultados experimentais baseado num plataforma de SoC1 industrial, tentamos quantificar os impactos derivados do uso de técnicas de redução de consumo de energia. Nos concentramos em relacionar o fator de redução de energia de cada técnica aos impactos em termo de área, desempenho, esforço de implementação e verificação. Na ausência desse tipo de dados, que relacionam o esforço de engenharia com as metas de consumo de energia, incertezas e atrasos serão frequentes no cronograma de projeto. Esperamos que este tipo de orientações possam ajudar/guiar os arquitetos de projeto em selecionar as técnicas adequadas para reduzir o consumo de energia dentro do alcance de orçamento e cronograma de projetoAbstract: The semiconductor industry has always faced strong demands to solve the problem of heat dissipation and reduce the power consumption in electronic devices. This trend has been increased in recent years with the action of environmental sustainability. The correct conception of an electronic system for low power consumption is an issue with multiple levels of complexities and requires systematic approaches in its construction. However, the adoption of any technique for reducing the power consumption is always linked with some specific goals and causes some impacts on the project. Although the designers know well that these impacts can affect the design in a quality aspect, the quantitative details are still unkown or just be kept inside the company's know-how. In this work, according to the experimental results based on an industrial SoC2 platform, we try to quantify the impacts of the use of low power techniques. We will relate the power reduction factor of each technique to the impact in terms of area, performance, implementation and verification effort. In the absence of such data, which relates the engineering effort to the goals of power consumption, uncertainties and delays are frequent. We hope that such guidelines can help/guide the project architects in selecting the appropriate techniques to reduce the power consumption within the limit of budget and project scheduleMestradoCiência da ComputaçãoMestre em Ciência da Computaçã

    FinFET Cell Library Design and Characterization

    Get PDF
    abstract: Modern-day integrated circuits are very capable, often containing more than a billion transistors. For example, the Intel Ivy Bridge 4C chip has about 1.2 billion transistors on a 160 mm2 die. Designing such complex circuits requires automation. Therefore, these designs are made with the help of computer aided design (CAD) tools. A major part of this custom design flow for application specific integrated circuits (ASIC) is the design of standard cell libraries. Standard cell libraries are a collection of primitives from which the automatic place and route (APR) tools can choose a collection of cells and implement the design that is being put together. To operate efficiently, the CAD tools require multiple views of each cell in the standard cell library. This data is obtained by characterizing the standard cell libraries and compiling the results in formats that the tools can easily understand and utilize. My thesis focusses on the design and characterization of one such standard cell library in the ASAP7 7 nm predictive design kit (PDK). The complete design flow, starting from the choice of the cell architecture, design of the cell layouts and the various decisions made in that process to obtain optimum results, to the characterization of those cells using the Liberate tool provided by Cadence design systems Inc., is discussed in this thesis. The end results of the characterized library are used in the APR of a few open source register-transfer logic (RTL) projects and the efficiency of the library is demonstrated.Dissertation/ThesisMasters Thesis Computer Engineering 201

    Algorithmic techniques for nanometer VLSI design and manufacturing closure

    Get PDF
    As Very Large Scale Integration (VLSI) technology moves to the nanoscale regime, design and manufacturing closure becomes very difficult to achieve due to increasing chip and power density. Imperfections due to process, voltage and temperature variations aggravate the problem. Uncertainty in electrical characteristic of individual device and wire may cause significant performance deviations or even functional failures. These impose tremendous challenges to the continuation of Moore's law as well as the growth of semiconductor industry. Efforts are needed in both deterministic design stage and variation-aware design stage. This research proposes various innovative algorithms to address both stages for obtaining a design with high frequency, low power and high robustness. For deterministic optimizations, new buffer insertion and gate sizing techniques are proposed. For variation-aware optimizations, new lithography-driven and post-silicon tuning-driven design techniques are proposed. For buffer insertion, a new slew buffering formulation is presented and is proved to be NP-hard. Despite this, a highly efficient algorithm which runs > 90x faster than the best alternatives is proposed. The algorithm is also extended to handle continuous buffer locations and blockages. For gate sizing, a new algorithm is proposed to handle discrete gate library in contrast to unrealistic continuous gate library assumed by most existing algorithms. Our approach is a continuous solution guided dynamic programming approach, which integrates the high solution quality of dynamic programming with the short runtime of rounding continuous solution. For lithography-driven optimization, the problem of cell placement considering manufacturability is studied. Three algorithms are proposed to handle cell flipping and relocation. They are based on dynamic programming and graph theoretic approaches, and can provide different tradeoff between variation reduction and wire- length increase. For post-silicon tuning-driven optimization, the problem of unified adaptivity optimization on logical and clock signal tuning is studied, which enables us to significantly save resources. The new algorithm is based on a novel linear programming formulation which is solved by an advanced robust linear programming technique. The continuous solution is then discretized using binary search accelerated dynamic programming, batch based optimization, and Latin Hypercube sampling based fast simulation

    Hardware implementation of a low-power two-dimensional discrete cosine transform

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.Includes bibliographical references (p. 143-144).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.In this project, a JPEG compliant, low-power dedicated, two-dimensional, Discrete Cosine Transform (DCT) core meeting all IBM Softcore requirements is developed. Power is optimized completely at the algorithmic, architectural, and logic levels. The architecture uses row-column decomposition of a fast 1-D algorithm implemented with distributed arithmetic. It features clock gating schemes as well as power-aware schemes that utilize input correlations to dynamically scale down power consumption. This is done by eliminating glitching in the ROM Accumulate (RAC) units to effectively stop unnecessary computation. The core is approximately 180K transistors, runs at a maximum of 100MHz, is synthesized to a .18[mu]m double-well CMOS technology with a 1.8V power supply, and consumes between 63 and 87 mW of power at 100MHz depending on the image data. The thesis explores the algorithmic evaluations, architectural design, development of the C and VHDL models, verification methods, synthesis operations, static timing analysis, design for test compliance, power analysis, and performance comparisons for the development of the core. The work has been completed in the ASIC Digital Cores I department of the IBM Microelectronics Division in Burlington, Vermont as part of the third assignment in the MIT VI-A program.by Rajul R. Shah.M.Eng

    Clock routing for high performance microprocessor designs.

    Get PDF
    Tian, Haitong.Chinese abstract is on unnumbered page.Thesis (M.Phil.)--Chinese University of Hong Kong, 2011.Includes bibliographical references (p. 65-74).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivations --- p.1Chapter 1.2 --- Our Contributions --- p.2Chapter 1.3 --- Organization of the Thesis --- p.3Chapter 2 --- Background Study --- p.4Chapter 2.1 --- Traditional Clock Routing Problem --- p.4Chapter 2.2 --- Tree-Based Clock Routing Algorithms --- p.5Chapter 2.2.1 --- Clock Routing Using H-tree --- p.5Chapter 2.2.2 --- Method of Means and Medians(MMM) --- p.6Chapter 2.2.3 --- Geometric Matching Algorithm (GMA) --- p.8Chapter 2.2.4 --- Exact Zero-Skew Algorithm --- p.9Chapter 2.2.5 --- Deferred Merge Embedding (DME) --- p.10Chapter 2.2.6 --- Boundary Merging and Embedding (BME) Algorithm --- p.14Chapter 2.2.7 --- Planar Clock Routing Algorithm --- p.17Chapter 2.2.8 --- Useful-skew Tree Algorithm --- p.18Chapter 2.3 --- Non-Tree Clock Distribution Networks --- p.19Chapter 2.3.1 --- Grid (Mesh) Structure --- p.20Chapter 2.3.2 --- Spine Structure --- p.20Chapter 2.3.3 --- Hybrid Structure --- p.21Chapter 2.4 --- Post-grid Clock Routing Problem --- p.22Chapter 2.5 --- Limitations of the Previous Work --- p.24Chapter 3 --- Post-Grid Clock Routing Problem --- p.26Chapter 3.1 --- Introduction --- p.26Chapter 3.2 --- Problem Definition --- p.27Chapter 3.3 --- Our Approach --- p.30Chapter 3.3.1 --- Delay-driven Path Expansion Algorithm --- p.31Chapter 3.3.2 --- Pre-processing to Connect Critical ports --- p.34Chapter 3.3.3 --- Post-processing to Reduce Capacitance --- p.36Chapter 3.4 --- Experimental Results --- p.39Chapter 3.4.1 --- Experiment Setup --- p.39Chapter 3.4.2 --- Validations of the Delay and Slew Estimation --- p.39Chapter 3.4.3 --- Comparisons with the Tree Grow (TG) Approach --- p.41Chapter 3.4.4 --- Lowest Achievable Delays --- p.42Chapter 3.4.5 --- Simulation Results --- p.42Chapter 4 --- Non-tree Based Post-Grid Clock Routing Problem --- p.44Chapter 4.1 --- Introduction --- p.44Chapter 4.2 --- Handling Ports with Large Load Capacitances --- p.46Chapter 4.2.1 --- Problem Ports Identification --- p.47Chapter 4.2.2 --- Non-Tree Construction --- p.47Chapter 4.2.3 --- Wire Link Selection --- p.48Chapter 4.3 --- Path Expansion in Non-tree Algorithm --- p.51Chapter 4.4 --- Limitations of the Non-tree Algorithm --- p.51Chapter 4.5 --- Experimental Results --- p.51Chapter 4.5.1 --- Experiment Setup --- p.51Chapter 4.5.2 --- Validations of the Delay and Slew Estimation --- p.52Chapter 4.5.3 --- Lowest Achievable Delays --- p.53Chapter 4.5.4 --- Results on New Benchmarks --- p.53Chapter 4.5.5 --- Simulation Results --- p.55Chapter 5 --- Efficient Partitioning-based Extension --- p.57Chapter 5.1 --- Introduction --- p.57Chapter 5.2 --- Partition-based Extension --- p.58Chapter 5.3 --- Experimental Results --- p.61Chapter 5.3.1 --- Experiment Setup --- p.61Chapter 5.3.2 --- Running Time Improvement with Partitioning Technique --- p.61Chapter 6 --- Conclusion --- p.63Bibliography --- p.6
    corecore