6 research outputs found

    A timing optimization method based on clock skew scheduling and partitioning in a parallel computing environment

    Get PDF
    Paper presented at the Midwest Symposium on Circuits and Systems, San Juan, Puerto Rico.This paper describes the implementation of a heuristic method to perform non-zero clock skew scheduling of digital VLSI circuits in a parallel computing environment. In the proposed method, circuit partitions that have low number of timing paths between partitions are formed. Clock skew scheduling is applied independently to each partition-sequentially or in parallel on a computing cluster-and results are iteratively merged. The scalability of the proposed method is superior compared to conventional non-zero clock skew scheduling techniques due to the reduction of analyzed circuit sizes (partition sizes) at each iteration step and the potential to parallelize the analyses of these partitions. It is demonstrated that after only the first iteration step of the proposed method, feasible clock schedules for 65% of the ISCAS'89 benchmark circuits are computed. For these circuits, average speedups of 2.1X and 2.6X are observed for sequential and parallel application of clock skew scheduling to partitions, respectively

    Linearization of The Timing Analysis and Optimization of Level-Sensitive Circuits

    Get PDF
    This thesis describes a linear programming (LP) formulation applicable to the static timing analysis of large scale synchronous circuits with level-sensitive latches. The automatic timing analysis procedure presented here is composed of deriving the connectivity information, constructing the LP model and solving the clock period minimization problem of synchronous digital VLSI circuits. In synchronous circuits with level-sensitive latches, operation at a reduced clock period (higher clock frequency) is possible by takingadvantage of both non-zero clock skew scheduling and time borrowing. Clock skew schedulingis performed in order to exploit the benefits of nonidentical clock signal delays on circuit timing. The time borrowing property of level-sensitive circuits permits higher operating frequencies compared to edge-sensitivecircuits. Considering time borrowing in the timing analysis, however, introduces non-linearity in this timing analysis. The modified big M (MBM) method is defined in order to transform the non-linear constraints arising in the problem formulation into solvable linear constraints. Equivalent LP model problemsfor single-phase clock synchronization of the ISCAS'89 benchmark circuits are generated and these problems are solved by the industrial LP solver CPLEX. Through the simultaneous application of time borrowing and clock skew scheduling, up to 63% improvements are demonstrated in minimum clock period with respect to zero-skew edge-sensitive synchronous circuits. The timing constraints governing thelevel-sensitive synchronous circuit operation not only solve the clock period minimization problem but also provide a common framework for the general timing analysis of such circuits. The inclusion of additional constraints into the problem formulation in order to meet the timing requirements imposed by specific applicationenvironments is discussed

    Clustering for the optimisation of asynchronous controllers

    Get PDF
    The miniaturisation of integrated circuits is bringing new problems in terms of power consumption, speed, and variability tolerance. The current synchronous designs are struggling to cope with these problems, and in consequence new optimisations or paradigms are being studied. The study of this thesis are the optimisations like clock skew for synchronous circuits and asynchronous circuits as an alternative paradigm. The performance analysis of both cases are equivalent and algorithms on graph theory for cycles have been implemented to calculate the optimum speed. Asynchronous controllers are essential for a good asynchronous design. To create a connectivity structure of controllers it is necessary to group the memory elements (registers) of the circuit into clusters. Clustering registers affects power consumption, performance, area, and variability tolerance. To produce a good clustering is a hard job because of the high number of registers and for the trade-offs of optimising all these characteristics. An initial problem in clustering of controllers is to decide how many controllers we want. A design with one cluster give us the same problems of a synchronous design, high power consumption and too much sensible on variability of temperature, voltage, manufacturing errors, etc. On the other hand, having as many controllers as registers will produce too much overhead in area for all the new logic and wires that needs to be added. It is important to have clusters as less connected as possible to design simple controllers and to minimise the impact on area. We know from benchmarks and industrial designs that the register graph is highly connected, and the controllers graph is almost complete. A variation of Min-Cut can give us a solution to optimise this property. The clustering will have an impact on performance. Grouping registers implies a lost of freedom, and optimisations like clock skew or the asynchronous circuit will be affected by this lost as a handicap to reach the maximum speed. From the placement point of view we need to have clusters where their registers are close to minimise the clock tree. The ideal solution is a partition of the space. The worst solution is to have the registers spared around. The contribution of this thesis are two clustering algorithms; A local search solution to minimise the number of connections, and a k-means implementation that combines the minimisation of the clock trees and the maximisation of performance, by using parameters to balance it. These algorithms have been implemented in the Elastix EDA tool and executed on ISCAS benchmarks and SUN Microsystems OpenSparc processo

    Advanced Timing and Synchronization Methodologies for Digital VLSI Integrated Circuits

    Get PDF
    This dissertation addresses timing and synchronization methodologies that are critical to the design, analysis and optimization of high-performance, integrated digital VLSI systems. As process sizes shrink and design complexities increase, achieving timing closure for digital VLSI circuits becomes a significant bottleneck in the integrated circuit design flow. Circuit designers are motivated to investigate and employ alternative methods to satisfy the timing and physical design performance targets. Such novel methods for the timing and synchronization of complex circuitry are developed in this dissertation and analyzed for performance and applicability.Mainstream integrated circuit design flow is normally tuned for zero clock skew, edge-triggered circuit design. Non-zero clock skew or multi-phase clock synchronization is seldom used because the lack of design automation tools increases the length and cost of the design cycle. For similar reasons, level-sensitive registers have not become an industry standard despite their superior size, speed and power consumption characteristics compared to conventional edge-triggered flip-flops.In this dissertation, novel design and analysis techniques that fully automate the design and analysis of non-zero clock skew circuits are presented. Clock skew scheduling of both edge-triggered and level-sensitive circuits are investigated in order to exploit maximum circuit performances. The effects of multi-phase clocking on non-zero clock skew, level-sensitive circuits are investigated leading to advanced synchronization methodologies. Improvements in the scalability of the computational timing analysis process with clock skew scheduling are explored through partitioning and parallelization.The integration of the proposed design and analysis methods to the physical design flow of integrated circuits synchronized with a next-generation clocking technology-resonant rotary clocking technology-is also presented. Based on the design and analysis methods presented in this dissertation, a computer-aided design tool for the design of rotary clock synchronized integrated circuits is developed

    Timing Closure in Chip Design

    Get PDF
    Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips
    corecore