781 research outputs found

    Fast and exact simultaneous gate and wire sizing by Lagrangian relaxation

    Full text link

    Timing Closure in Chip Design

    Get PDF
    Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips

    Scalable Analysis, Verification and Design of IC Power Delivery

    Get PDF
    Due to recent aggressive process scaling into the nanometer regime, power delivery network design faces many challenges that set more stringent and specific requirements to the EDA tools. For example, from the perspective of analysis, simulation efficiency for large grids must be improved and the entire network with off-chip models and nonlinear devices should be able to be analyzed. Gated power delivery networks have multiple on/off operating conditions that need to be fully verified against the design requirements. Good power delivery network designs not only have to save the wiring resources for signal routing, but also need to have the optimal parameters assigned to various system components such as decaps, voltage regulators and converters. This dissertation presents new methodologies to address these challenging problems. At first, a novel parallel partitioning-based approach which provides a flexible network partitioning scheme using locality is proposed for power grid static analysis. In addition, a fast CPU-GPU combined analysis engine that adopts a boundary-relaxation method to encompass several simulation strategies is developed to simulate power delivery networks with off-chip models and active circuits. These two proposed analysis approaches can achieve scalable simulation runtime. Then, for gated power delivery networks, the challenge brought by the large verification space is addressed by developing a strategy that efficiently identifies a number of candidates for the worst-case operating condition. The computation complexity is reduced from O(2^N) to O(N). At last, motivated by a proposed two-level hierarchical optimization, this dissertation presents a novel locality-driven partitioning scheme to facilitate divide-and-conquer-based scalable wire sizing for large power delivery networks. Simultaneous sizing of multiple partitions is allowed which leads to substantial runtime improvement. Moreover, the electric interactions between active regulators/converters and passive networks and their influences on key system design specifications are analyzed comprehensively. With the derived design insights, the system-level co-design of a complete power delivery network is facilitated by an automatic optimization flow. Results show significant performance enhancement brought by the co-design

    Discrete Gate Sizing Methodologies for Delay, Area and Power Optimization

    Get PDF
    The modeling of an individual gate and the optimization of circuit performance has long been a critical issue in the VLSI industry. In this work, we first study of the gate sizing problem for today\u27s industrial designs, and explore the contributions and limitations of all the existing approaches, which mainly suffer from producing only continuous solutions, using outdated timing models or experiencing performance inefficiency. In this dissertation, we present our new discrete gate sizing technique which optimizes different aspects of circuit performance, including delay, area and power consumption. And our method is fast and efficient as it applies the local search instead of global exhaustive search during gate size selection process, which greatly reduces the search space and improves the computation complexity. In addition to that, it is also flexible with different timing models, and it is able to deal with the constraints of input/output slew and output load capacitance, under which very few previous research works were reported. We then propose a new timing model, which is derived from the classic Elmore delay model, but takes the features of modern timing models from standard cell library. With our new timing model, we are able to formulate the combinatorial discrete sizing problem as a simplified mathematical expression and apply it to existing Lagrangian relaxation method, which is shown to converge to optimal solution. We demonstrate that the classic Elmore delay model based gate sizing approaches can still be valid. Therefore, our work might provide a new look into the numerous Elmore delay model based research works in various areas (such as placement, routing, layout, buffer insertion, timing analysis, etc.)

    Resource Management Algorithms for Computing Hardware Design and Operations: From Circuits to Systems

    Get PDF
    The complexity of computation hardware has increased at an unprecedented rate for the last few decades. On the computer chip level, we have entered the era of multi/many-core processors made of billions of transistors. With transistor budget of this scale, many functions are integrated into a single chip. As such, chips today consist of many heterogeneous cores with intensive interaction among these cores. On the circuit level, with the end of Dennard scaling, continuously shrinking process technology has imposed a grand challenge on power density. The variation of circuit further exacerbated the problem by consuming a substantial time margin. On the system level, the rise of Warehouse Scale Computers and Data Centers have put resource management into new perspective. The ability of dynamically provision computation resource in these gigantic systems is crucial to their performance. In this thesis, three different resource management algorithms are discussed. The first algorithm assigns adaptivity resource to circuit blocks with a constraint on the overhead. The adaptivity improves resilience of the circuit to variation in a cost-effective way. The second algorithm manages the link bandwidth resource in application specific Networks-on-Chip. Quality-of-Service is guaranteed for time-critical traffic in the algorithm with an emphasis on power. The third algorithm manages the computation resource of the data center with precaution on the ill states of the system. Q-learning is employed to meet the dynamic nature of the system and Linear Temporal Logic is leveraged as a tool to describe temporal constraints. All three algorithms are evaluated by various experiments. The experimental results are compared to several previous work and show the advantage of our methods

    Algorithms for VLSI Circuit Optimization and GPU-Based Parallelization

    Get PDF
    This research addresses some critical challenges in various problems of VLSI design automation, including sophisticated solution search on DAG topology, simultaneous multi-stage design optimization, optimization on multi-scenario and multi-core designs, and GPU-based parallel computing for runtime acceleration. Discrete optimization for VLSI design automation problems is often quite complex, due to the inconsistency and interference between solutions on reconvergent paths in directed acyclic graph (DAG). This research proposes a systematic solution search guided by a global view of the solution space. The key idea of the proposal is joint relaxation and restriction (JRR), which is similar in spirit to mathematical relaxation techniques, such as Lagrangian relaxation. Here, the relaxation and restriction together provides a global view, and iteratively improves the solution. Traditionally, circuit optimization is carried out in a sequence of separate optimization stages. The problem with sequential optimization is that the best solution in one stage may be worse for another. To overcome this difficulty, we take the approach of performing multiple optimization techniques simultaneously. By searching in the combined solution space of multiple optimization techniques, a broader view of the problem leads to the overall better optimization result. This research takes this approach on two problems, namely, simultaneous technology mapping and cell placement, and simultaneous gate sizing and threshold voltage assignment. Modern processors have multiple working modes, which trade off between power consumption and performance, or to maintain certain performance level in a powerefficient way. As a result, the design of a circuit needs to accommodate different scenarios, such as different supply voltage settings. This research deals with this multi-scenario optimization problem with Lagrangian relaxation technique. Multiple scenarios are taken care of simultaneously through the balance by Lagrangian multipliers. Similarly, multiple objective and constraints are simultaneously dealt with by Lagrangian relaxation. This research proposed a new method to calculate the subgradients of the Lagrangian function, and solve the Lagrangian dual problem more effectively. Multi-core architecture also poses new problems and challenges to design automation. For example, multiple cores on the same chip may have identical design in some part, while differ from each other in the rest. In the case of buffer insertion, the identical part have to be carefully optimized for all the cores with different environmental parameters. This problem has much higher complexity compared to buffer insertion on single cores. This research proposes an algorithm that optimizes the buffering solution for multiple cores simultaneously, based on critical component analysis. Under the intensifying time-to-market pressure, circuit optimization not only needs to find high quality solutions, but also has to come up with the result fast. Recent advance in general purpose graphics processing unit (GPGPU) technology provides massive parallel computing power. This research turns the complex computation task of circuit optimization into many subtasks processed by parallel threads. The proposed task partitioning and scheduling methods take advantage of the GPU computing power, achieve significant speedup without sacrifice on the solution quality
    • …
    corecore