330 research outputs found

    Timing and Area Optimization for Standard-Cell VLSI Circuit Design

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryJoint Services Electronics Program / N00014-90-J-127

    Simultaneous slack budgeting and retiming for synchronous circuits optimization

    Full text link
    Abstract- With the challenges of growing functionality and scaling chip size, the possible performance improvements should be considered in the earlier IC design stages, which gives more freedom to the later optimization. Potential slack as an effective metric of possible performance improvements is considered in this work which, as far as we known, is the first work that maximizes the potential slack by retiming for synchronous sequential circuit. A simultaneous slack budgeting and incremental retiming algorithm is proposed for maximizing potential slack. The overall slack budget is optimized by relocating the FFs iteratively with the MIS-based slack estimation. Compared with the potential slack of a well-known min-period retiming, our algorithm improves potential slack averagely 19.6 % without degrading the circuit performance in reasonable runtime. Furthermore, at the expense of a small amount of timing performance, 0.52 % and 2.08%, the potential slack is increased averagely by 19.89 % and 28.16 % separately, which give a hint of the tradeoff between the timing performance and the slack budget.

    Fast algorithms for retiming large digital circuits

    Get PDF
    The increasing complexity of VLSI systems and shrinking time to market requirements demand good optimization tools capable of handling large circuits. Retiming is a powerful transformation that preserves functionality, and can be used to optimize sequential circuits for a wide range of objective functions by judiciously relocating the memory elements. Leiserson and Saxe, who introduced the concept, presented algorithms for period optimization (minperiod retiming) and area optimization (minarea retiming). The ASTRA algorithm proposed an alternative view of retiming using the equivalence between retiming and clock skew optimization;The first part of this thesis defines the relationship between the Leiserson-Saxe and the ASTRA approaches and utilizes it for efficient minarea retiming of large circuits. The new algorithm, Minaret, uses the same linear program formulation as the Leiserson-Saxe approach. The underlying philosophy of the ASTRA approach is incorporated to reduce the number of variables and constraints in this linear program. This allows minarea retiming of circuits with over 56,000 gates in under fifteen minutes;The movement of flip-flops in control logic changes the state encoding of finite state machines, requiring the preservation of initial (reset) states. In the next part of this work the problem of minimizing the number of flip-flops in control logic subject to a specified clock period and with the guarantee of an equivalent initial state, is formulated as a mixed integer linear program. Bounds on the retiming variables are used to guarantee an equivalent initial state in the retimed circuit. These bounds lead to a simple method for calculating an equivalent initial state for the retimed circuit;The transparent nature of level sensitive latches enables level-clocked circuits to operate faster and require less area. However, this transparency makes the operation of level-clocked circuits very complex, and optimization of level-clocked circuits is a difficult task. This thesis also presents efficient algorithms for retiming large level-clocked circuits. The relationship between retiming and clock skew optimization for level-clocked circuits is defined and utilized to develop efficient retiming algorithms for period and area optimization. Using these algorithms a circuit with 56,000 gates could be retimed for minimum period in under twenty seconds and for minimum area in under 1.5 hours

    Timing Closure in Chip Design

    Get PDF
    Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips

    Physical Design Methodologies for Low Power and Reliable 3D ICs

    Get PDF
    As the semiconductor industry struggles to maintain its momentum down the path following the Moore's Law, three dimensional integrated circuit (3D IC) technology has emerged as a promising solution to achieve higher integration density, better performance, and lower power consumption. However, despite its significant improvement in electrical performance, 3D IC presents several serious physical design challenges. In this dissertation, we investigate physical design methodologies for 3D ICs with primary focus on two areas: low power 3D clock tree design, and reliability degradation modeling and management. Clock trees are essential parts for digital system which dissipate a large amount of power due to high capacitive loads. The majority of existing 3D clock tree designs focus on minimizing the total wire length, which produces sub-optimal results for power optimization. In this dissertation, we formulate a 3D clock tree design flow which directly optimizes for clock power. Besides, we also investigate the design methodology for clock gating a 3D clock tree, which uses shutdown gates to selectively turn off unnecessary clock activities. Different from the common assumption in 2D ICs that shutdown gates are cheap thus can be applied at every clock node, shutdown gates in 3D ICs introduce additional control TSVs, which compete with clock TSVs for placement resources. We explore the design methodologies to produce the optimal allocation and placement for clock and control TSVs so that the clock power is minimized. We show that the proposed synthesis flow saves significant clock power while accounting for available TSV placement area. Vertical integration also brings new reliability challenges including TSV's electromigration (EM) and several other reliability loss mechanisms caused by TSV-induced stress. These reliability loss models involve complex inter-dependencies between electrical and thermal conditions, which have not been investigated in the past. In this dissertation we set up an electrical/thermal/reliability co-simulation framework to capture the transient of reliability loss in 3D ICs. We further derive and validate an analytical reliability objective function that can be integrated into the 3D placement design flow. The reliability aware placement scheme enables co-design and co-optimization of both the electrical and reliability property, thus improves both the circuit's performance and its lifetime. Our electrical/reliability co-design scheme avoids unnecessary design cycles or application of ad-hoc fixes that lead to sub-optimal performance. Vertical integration also enables stacking DRAM on top of CPU, providing high bandwidth and short latency. However, non-uniform voltage fluctuation and local thermal hotspot in CPU layers are coupled into DRAM layers, causing a non-uniform bit-cell leakage (thereby bit flip) distribution. We propose a performance-power-resilience simulation framework to capture DRAM soft error in 3D multi-core CPU systems. In addition, a dynamic resilience management (DRM) scheme is investigated, which adaptively tunes CPU's operating points to adjust DRAM's voltage noise and thermal condition during runtime. The DRM uses dynamic frequency scaling to achieve a resilience borrow-in strategy, which effectively enhances DRAM's resilience without sacrificing performance. The proposed physical design methodologies should act as important building blocks for 3D ICs and push 3D ICs toward mainstream acceptance in the near future

    Voltage stacking for near/sub-threshold operation

    Get PDF

    Algorithmic techniques for nanometer VLSI design and manufacturing closure

    Get PDF
    As Very Large Scale Integration (VLSI) technology moves to the nanoscale regime, design and manufacturing closure becomes very difficult to achieve due to increasing chip and power density. Imperfections due to process, voltage and temperature variations aggravate the problem. Uncertainty in electrical characteristic of individual device and wire may cause significant performance deviations or even functional failures. These impose tremendous challenges to the continuation of Moore's law as well as the growth of semiconductor industry. Efforts are needed in both deterministic design stage and variation-aware design stage. This research proposes various innovative algorithms to address both stages for obtaining a design with high frequency, low power and high robustness. For deterministic optimizations, new buffer insertion and gate sizing techniques are proposed. For variation-aware optimizations, new lithography-driven and post-silicon tuning-driven design techniques are proposed. For buffer insertion, a new slew buffering formulation is presented and is proved to be NP-hard. Despite this, a highly efficient algorithm which runs > 90x faster than the best alternatives is proposed. The algorithm is also extended to handle continuous buffer locations and blockages. For gate sizing, a new algorithm is proposed to handle discrete gate library in contrast to unrealistic continuous gate library assumed by most existing algorithms. Our approach is a continuous solution guided dynamic programming approach, which integrates the high solution quality of dynamic programming with the short runtime of rounding continuous solution. For lithography-driven optimization, the problem of cell placement considering manufacturability is studied. Three algorithms are proposed to handle cell flipping and relocation. They are based on dynamic programming and graph theoretic approaches, and can provide different tradeoff between variation reduction and wire- length increase. For post-silicon tuning-driven optimization, the problem of unified adaptivity optimization on logical and clock signal tuning is studied, which enables us to significantly save resources. The new algorithm is based on a novel linear programming formulation which is solved by an advanced robust linear programming technique. The continuous solution is then discretized using binary search accelerated dynamic programming, batch based optimization, and Latin Hypercube sampling based fast simulation
    corecore