13 research outputs found

    Why we need statistical static timing analysis

    Get PDF
    Abstract A

    Variability-driven module selection with joint design time optimization and post-silicon tuning

    Full text link
    Abstract—Increasing delay and power variation are significant chal-lenges to the designers as technology scales to the deep sub-micron (DSM) regime. Traditional module selection techniques in high level synthesis use worst case delay/power information to perform the optimization, and therefore may be too pessimistic such that extra resources are used to guarantee design requirements. Parametric yield, which is defined as the probability of the synthesized hardware meeting the performance/power constraints, can be used to guide design space exploration. The para-metric yield can be effectively improved by combining both design-time variation-aware optimization and post silicon tuning techniques (such as adaptive body biasing (ABB)). In this paper, we propose a module selection algorithm that combines design-time optimization with post-silicon tuning (using ABB) to maximize design yield. A variation-aware module selection algorithm based on efficient performance and power yield gradient computation is developed. The post silicon optimization is formulated as an efficient sequential conic program to determine the optimal body bias distribution, which in turn affects design-time module selection. The experiment results show that significant yield can be achieved compared to traditional worst-case driven module selection technique. To the best of our knowledge, this is the first variability-driven high level synthesis technique that considers post-silicon tuning during design time optimization. 1 I

    Analysis of performance variation in 16nm FinFET FPGA devices

    Get PDF

    A Geometric Programming-Based Worst Case Gate Sizing Method Incorporating Spatial Correlation

    Full text link

    Statistical static timing analysis of nonzero clock skew circuit

    Get PDF
    As microprocessor and ASIC manufacturers continue to push the limits of transistor sizing into the sub-100nm regime, variations in the manufacturing process lead to increased uncertainty about the exact geometry and performance of the resulting devices. Traditional corner-based Static Timing Analysis (STA) assumes worst-case values for process parameters such as transistor channel length and threshold voltage when verifying integrated circuit timing performance. This has become unrea-sonably pessimistic and causes over-design that degrades full-chip performance, wastes engineering effort, and erodes profits while providing negligible yield improvement. Recently, Statistical Static Timing Analysis (SSTA) methods, which model process variations statistically as probability distribution functions (PDFs) rather than deterministically, have emerged to more accurately portray integrated circuit performance. This analysis has been thoroughly performed on traditional zero clock skew circuits where the synchronizing clock signal is assumed to arrive in phase with respect to each register. However, designers will often schedule the clock skew to different registers in order to decrease the minimum clock period of the entire circuit. Clock skew scheduling (CSS) imparts very different timing constraints that are based, in part, on the topology of the circuit. In this thesis, SSTA is applied to nonzero clock skew circuits in order to determine the accuracy improvement relative to their zero skew counterparts, and also to assess how the results of skew scheduling might be impacted with more accurate statistical modeling. For 99.7% timing yield (3 variation), SSTA is observed to improve the accuracy, and therefore increase the timing margin, of nonzero clock skew circuits by up to 2.5x, and on average by 1.3x, the amount seen by zero skew circuits.M.S., Computer Engineering -- Drexel University, 200

    Large-scale simulations of intrinsic parameter fluctuations in nano-scale MOSFETs

    Get PDF
    Intrinsic parameter fluctuations have become a serious obstacle to the continued scaling of MOSFET devices, particularly in the sub-100 nm regime. The increase in intrinsic parameter fluctuations means that simulations on a statistical scale are necessary to capture device parameter distributions. In this work, large-scale simulations of samples of 100,000s of devices are carried out in order to accurately characterise statistical variability of the threshold voltage in a real 35 nm MOSFET. Simulations were performed for the two dominant sources of statistical variability – random discrete dopants (RDD) and line edge roughness (LER). In total ∼400,000 devices have been simulated, taking approximately 500,000 CPU hours (60 CPU years). The results reveal the true shape of the distribution of threshold voltage, which is shown to be positively skewed for random dopants and negatively skewed for line edge roughness. Through further statistical analysis and data mining, techniques for reconstructing the distributions of the threshold voltage are developed. By using these techniques, methods are demonstrated that allow statistical enhancement of random dopant and line edge roughness simulations, thereby reducing the computational expense necessary to accurately characterise their effects. The accuracy of these techniques is analysed and they are further verified against scaled and alternative device architectures. The combined effects of RDD and LER are also investigated and it is demonstrated that the statistical combination of the individual RDD and LER-induced distributions of threshold voltage closely matches that obtained from simulations. By applying the statistical enhancement techniques developed for RDD and LER, it is shown that the computational cost of characterising their effects can be reduced by 1–2 orders of magnitude

    Efficient Monte Carlo Based Methods for Variability Aware Analysis and Optimization of Digital Circuits.

    Full text link
    Process variability is of increasing concern in modern nanometer-scale CMOS. The suitability of Monte Carlo based algorithms for efficient analysis and optimization of digital circuits under variability is explored in this work. Random sampling based Monte Carlo techniques incur high cost of computation, due to the large sample size required to achieve target accuracy. This motivates the need for intelligent sample selection techniques to reduce the number of samples. As these techniques depend on information about the system under analysis, there is a need to tailor the techniques to fit the specific application context. We propose efficient smart sampling based techniques for timing and leakage power consumption analysis of digital circuits. For the case of timing analysis, we show that the proposed method requires 23.8X fewer samples on average to achieve comparable accuracy as a random sampling approach, for benchmark circuits studied. It is further illustrated that the parallelism available in such techniques can be exploited using parallel machines, especially Graphics Processing Units. Here, we show that SH-QMC implemented on a Multi GPU is twice as fast as a single STA on a CPU for benchmark circuits considered. Next we study the possibility of using such information from statistical analysis to optimize digital circuits under variability, for example to achieve minimum area on silicon though gate sizing while meeting a timing constraint. Though several techniques to optimize circuits have been proposed in literature, it is not clear how much gains are obtained in these approaches specifically through utilization of statistical information. Therefore, an effective lower bound computation technique is proposed to enable efficient comparison of statistical design optimization techniques. It is shown that even techniques which use only limited statistical information can achieve results to within 10% of the proposed lower bound. We conclude that future optimization research should shift focus from use of more statistical information to achieving more efficiency and parallelism to obtain speed ups.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/78936/1/tvvin_1.pd

    Timing Closure in Chip Design

    Get PDF
    Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips

    Algorithmic techniques for nanometer VLSI design and manufacturing closure

    Get PDF
    As Very Large Scale Integration (VLSI) technology moves to the nanoscale regime, design and manufacturing closure becomes very difficult to achieve due to increasing chip and power density. Imperfections due to process, voltage and temperature variations aggravate the problem. Uncertainty in electrical characteristic of individual device and wire may cause significant performance deviations or even functional failures. These impose tremendous challenges to the continuation of Moore's law as well as the growth of semiconductor industry. Efforts are needed in both deterministic design stage and variation-aware design stage. This research proposes various innovative algorithms to address both stages for obtaining a design with high frequency, low power and high robustness. For deterministic optimizations, new buffer insertion and gate sizing techniques are proposed. For variation-aware optimizations, new lithography-driven and post-silicon tuning-driven design techniques are proposed. For buffer insertion, a new slew buffering formulation is presented and is proved to be NP-hard. Despite this, a highly efficient algorithm which runs > 90x faster than the best alternatives is proposed. The algorithm is also extended to handle continuous buffer locations and blockages. For gate sizing, a new algorithm is proposed to handle discrete gate library in contrast to unrealistic continuous gate library assumed by most existing algorithms. Our approach is a continuous solution guided dynamic programming approach, which integrates the high solution quality of dynamic programming with the short runtime of rounding continuous solution. For lithography-driven optimization, the problem of cell placement considering manufacturability is studied. Three algorithms are proposed to handle cell flipping and relocation. They are based on dynamic programming and graph theoretic approaches, and can provide different tradeoff between variation reduction and wire- length increase. For post-silicon tuning-driven optimization, the problem of unified adaptivity optimization on logical and clock signal tuning is studied, which enables us to significantly save resources. The new algorithm is based on a novel linear programming formulation which is solved by an advanced robust linear programming technique. The continuous solution is then discretized using binary search accelerated dynamic programming, batch based optimization, and Latin Hypercube sampling based fast simulation
    corecore