5 research outputs found

    Accelerating Thermal Simulations of 3D ICs with Liquid Cooling using Neural Networks

    Get PDF
    Vertical integration is a promising solution to further increase the performance of future ICs, but such 3D ICs present complex thermal issues that cannot be solved by conventional cooling techniques. Interlayer liquid cooling has been proposed to extract the heat accumulated within the chip. However, the development of liquid-cooled 3D ICs strongly relies on the availability of accurate and fast thermal models. In this work, we present a novel thermal model for 3D ICs with interlayer liquid cooling that exploits the neural network theory. Neural Networks can be trained to mimic with high accuracy the thermal behaviour of 3D ICs and their implementation can efficiently exploit the massive computational power of modern parallel architectures such as graphic processing units. We have designed an ad-hoc Neural Network model based on pertinent physical considerations of how heat propagates in 3D IC architectures, as well as exploring the most optimal configuration of the model to improve the simulation speed without undermining accuracy. We have assessed the accuracy and run-time speed-ups of the proposed model against a 3D IC simulator based on compact model. We show that the proposed thermal simulator achieves speed-ups up to 106x for 3D ICs with liquid cooling while preserving the maximum absolute error lower than 1.0 degrees C

    Parallel Acceleration for Timing Analysis and Optimization of Adaptive Integrated Circuits

    Get PDF
    Adaptive circuit design is a power-efficient approach to handle variations. Compared to conventional circuits, its implementation is more complicated especially when we deal with the fine-grained adaptivity. The unconventional and sophisticated nature of adaptive design further requires timing verification to validate the design. However timing analysis becomes more complicated due to complexities arising from nanometer VLSI technologies. A well-known challenge is process variations, which need to be addressed in timing analysis at least by considering different process corners. Adaptive circuit design further needs statistical static timing analysis (SSTA), which is much more time consuming than variation-oblivious timing analysis. Besides timing analysis, gate implementation selections of the adaptive design process are also computational expensive. This research focuses on parallel acceleration techniques for timing analysis and optimization of adaptive circuit. General purpose graphic processing units (GPGPU) and multithreading techniques are used in this work. Previous works on GPU acceleration for SSTA are mostly based on Monte Carlo based SSTA. By contrast, the parallelization techniques for principle component analysis (PCA) based SSTA are explored in this work, which is intrinsically more efficient. We develop a batch-based scheduling algorithm to partition the circuit graph into topological levels for GPU processing and investigate other techniques such as latency hiding. We propose a multithreading based acceleration method for the process of gate implementation selection and use the same batch-based scheduling result. The experiment result shows effectiveness of our parallel acceleration for timing analysis and for optimization with the performance up to 130Ă— and 5Ă— speedup respectively

    Lagrangian relaxation-based multi-threaded discrete gate sizer

    Get PDF
    In integrated circuit design gate sizing is one of the key optimization techniques which is repeatedly invoked to trade-off delays for area and/or power of the gates during logic design and physical design stages. With increasing design sizes of a million gates and larger, discrete gate sizes and non-convex delay models the gate sizing algorithms that were designed for continuous sizes and convex delay models are slow and timing inaccurate. Of the several published discrete gate sizing algorithms, recent works have shown that Lagrangian relaxation based gate sizers have produced designs with the lowest power on average with high timing accuracy. But they are also very slow due to a large number of expensive timing updates spread across hundreds of iterations of solving the Lagrangian sub-problem. In this thesis we present a Lagrangian relaxation based multi-threaded discrete gate sizer for fast timing and power reduction by swapping the gate sizes and the threshold voltages. We developed two parallelization enabling techniques to reduce the runtime of Lagrangian sub-problem solver, namely, mutual exclusion edge (MEE) assignment and directed acyclic graph (DAG) based netlist traversal. MEEs are dummy edges assigned to reduce computational dependencies among gates sharing one or more common fan-ins. DAG based netlist traversal facilitates simultaneous resizing of gates belonging to different topological levels. We designed a Lagrange multiplier update framework that enables rapid convergence of the timing recovery and power recovery algorithms. To reduce the runtime of timing updates, we proposed a simple and fast-to-compute effective capacitance model and several mechanisms to calibrate the timing models to improve their accuracy. Compared to the state-of-the-art gate sizer, our proposed gate sizer is on average 15x faster and the optimized designs have only 1.7\% higher power. In digital synchronous designs simultaneous gate sizing and clock skew scheduling provides significantly more power saving. We extend the gate sizer to simultaneously schedule the clock skew. It can achieve an average of 18.8\% more reduction in power with only 20\% increase in the runtime

    Algorithms for VLSI Circuit Optimization and GPU-Based Parallelization

    Get PDF
    This research addresses some critical challenges in various problems of VLSI design automation, including sophisticated solution search on DAG topology, simultaneous multi-stage design optimization, optimization on multi-scenario and multi-core designs, and GPU-based parallel computing for runtime acceleration. Discrete optimization for VLSI design automation problems is often quite complex, due to the inconsistency and interference between solutions on reconvergent paths in directed acyclic graph (DAG). This research proposes a systematic solution search guided by a global view of the solution space. The key idea of the proposal is joint relaxation and restriction (JRR), which is similar in spirit to mathematical relaxation techniques, such as Lagrangian relaxation. Here, the relaxation and restriction together provides a global view, and iteratively improves the solution. Traditionally, circuit optimization is carried out in a sequence of separate optimization stages. The problem with sequential optimization is that the best solution in one stage may be worse for another. To overcome this difficulty, we take the approach of performing multiple optimization techniques simultaneously. By searching in the combined solution space of multiple optimization techniques, a broader view of the problem leads to the overall better optimization result. This research takes this approach on two problems, namely, simultaneous technology mapping and cell placement, and simultaneous gate sizing and threshold voltage assignment. Modern processors have multiple working modes, which trade off between power consumption and performance, or to maintain certain performance level in a powerefficient way. As a result, the design of a circuit needs to accommodate different scenarios, such as different supply voltage settings. This research deals with this multi-scenario optimization problem with Lagrangian relaxation technique. Multiple scenarios are taken care of simultaneously through the balance by Lagrangian multipliers. Similarly, multiple objective and constraints are simultaneously dealt with by Lagrangian relaxation. This research proposed a new method to calculate the subgradients of the Lagrangian function, and solve the Lagrangian dual problem more effectively. Multi-core architecture also poses new problems and challenges to design automation. For example, multiple cores on the same chip may have identical design in some part, while differ from each other in the rest. In the case of buffer insertion, the identical part have to be carefully optimized for all the cores with different environmental parameters. This problem has much higher complexity compared to buffer insertion on single cores. This research proposes an algorithm that optimizes the buffering solution for multiple cores simultaneously, based on critical component analysis. Under the intensifying time-to-market pressure, circuit optimization not only needs to find high quality solutions, but also has to come up with the result fast. Recent advance in general purpose graphics processing unit (GPGPU) technology provides massive parallel computing power. This research turns the complex computation task of circuit optimization into many subtasks processed by parallel threads. The proposed task partitioning and scheduling methods take advantage of the GPU computing power, achieve significant speedup without sacrifice on the solution quality

    Graphics Processing Unit-Based Computer-Aided Design Algorithms for Electronic Design Automation

    Get PDF
    The electronic design automation (EDA) tools are a specific set of software that play important roles in modern integrated circuit (IC) design. These software automate the design processes of IC with various stages. Among these stages, two important EDA design tools are the focus of this research: floorplanning and global routing. Specifically, the goal of this study is to parallelize these two tools such that their execution time can be significantly shortened on modern multi-core and graphics processing unit (GPU) architectures. The GPU hardware is a massively parallel architecture, enabling thousands of independent threads to execute concurrently. Although a small set of EDA tools can benefit from using GPU to accelerate their speed, most algorithms in this field are designed with the single-core paradigm in mind. The floorplanning and global routing algorithms are among the latter, and difficult to render any speedup on the GPU due to their inherent sequential nature. This work parallelizes the floorplanning and global routing algorithm through a novel approach and results in significant speedups for both tools implemented on the GPU hardware. Specifically, with a complete overhaul of solution space and design space exploration, a GPU-based floorplanning algorithm is able to render 4-166X speedup, while achieving similar or improved solutions compared with the sequential algorithm. The GPU-based global routing algorithm is shown to achieve significant speedup against existing state-of-the-art routers, while delivering competitive solution quality. Importantly, this parallel model for global routing renders a stable solution that is independent from the level of parallelism. In summary, this research has shown that through a design paradigm overhaul, sequential algorithms can also benefit from the massively parallel architecture. The findings of this study have a positive impact on the efficiency and design quality of modern EDA design flow
    corecore