16 research outputs found
SCALABLE INTEGRATED CIRCUIT SIMULATION ALGORITHMS FOR ENERGY-EFFICIENT TERAFLOP HETEROGENEOUS PARALLEL COMPUTING PLATFORMS
Integrated circuit technology has gone through several decades of aggressive scaling.It is increasingly challenging to analyze growing design complexity. Post-layout SPICE simulation can be computationally prohibitive due to the huge amount of parasitic elements, which can easily boost the computation and memory cost. As the decrease in device size, the circuits become more vulnerable to process variations. Designers need to statistically simulate the probability that a circuit does not meet the performance metric, which requires millions times of simulations to capture rare failure events.
Recent, multiprocessors with heterogeneous architecture have emerged as mainstream computing platforms. The heterogeneous computing platform can achieve highthroughput energy efficient computing. However, the application of such platform is not trivial and needs to reinvent existing algorithms to fully utilize the computing resources. This dissertation presents several new algorithms to address those aforementioned two significant and challenging issues on the heterogeneous platform.
Harmonic Balance (HB) analysis is essential for efficient verification of large postlayout RF and microwave integrated circuits (ICs). However, existing methods either suffer from excessively long simulation time and prohibitively large memory consumption or exhibit poor stability. This dissertation introduces a novel transient-simulation guided graph sparsification technique, as well as an efficient runtime performance modeling approach tailored for heterogeneous manycore CPU-GPU computing system to build nearly-optimal subgraph preconditioners that can lead to minimum HB simulation runtime. Additionally, we propose a novel heterogeneous parallel sparse block matrix algorithm by taking advantages of the structure of HB Jacobian matrices as well as GPU’s streaming multiprocessors to achieve optimal workload balancing during the preconditioning phase of HB analysis. We also show how the proposed preconditioned iterative algorithm can efficiently adapt to heterogeneous computing systems with different CPU and GPU computing capabilities. Extensive experimental results show that our HB solver can achieve up to 20X speedups and 5X memory reduction when compared with the state-of-the-art direct solver highly optimized for twelve-core CPUs.
In nowadays variation-aware IC designs, cell characterizations and SRAM memory yield analysis require many thousands or even millions of repeated SPICE simulations for relatively small nonlinear circuits. In this dissertation, for the first time, we present a massively parallel SPICE simulator on GPU, TinySPICE, for efficiently analyzing small nonlinear circuits. TinySPICE integrates a highly-optimized shared-memory based matrix solver and fast parametric three-dimensional (3D) LUTs based device evaluation method. A novel circuit clustering method is also proposed to improve the stability and efficiency of the matrix solver. Compared with CPU-based SPICE simulator, TinySPICE achieves up to 264X speedups for parametric SRAM yield analysis without loss of accuracy
Recommended from our members
Sensitivity technologies for large scale simulation.
Sensitivity analysis is critically important to numerous analysis algorithms, including large scale optimization, uncertainty quantification,reduced order modeling, and error estimation. Our research focused on developing tools, algorithms and standard interfaces to facilitate the implementation of sensitivity type analysis into existing code and equally important, the work was focused on ways to increase the visibility of sensitivity analysis. We attempt to accomplish the first objective through the development of hybrid automatic differentiation tools, standard linear algebra interfaces for numerical algorithms, time domain decomposition algorithms and two level Newton methods. We attempt to accomplish the second goal by presenting the results of several case studies in which direct sensitivities and adjoint methods have been effectively applied, in addition to an investigation of h-p adaptivity using adjoint based a posteriori error estimation. A mathematical overview is provided of direct sensitivities and adjoint methods for both steady state and transient simulations. Two case studies are presented to demonstrate the utility of these methods. A direct sensitivity method is implemented to solve a source inversion problem for steady state internal flows subject to convection diffusion. Real time performance is achieved using novel decomposition into offline and online calculations. Adjoint methods are used to reconstruct initial conditions of a contamination event in an external flow. We demonstrate an adjoint based transient solution. In addition, we investigated time domain decomposition algorithms in an attempt to improve the efficiency of transient simulations. Because derivative calculations are at the root of sensitivity calculations, we have developed hybrid automatic differentiation methods and implemented this approach for shape optimization for gas dynamics using the Euler equations. The hybrid automatic differentiation method was applied to a first order approximation of the Euler equations and used as a preconditioner. In comparison to other methods, the AD preconditioner showed better convergence behavior. Our ultimate target is to perform shape optimization and hp adaptivity using adjoint formulations in the Premo compressible fluid flow simulator. A mathematical formulation for mixed-level simulation algorithms has been developed where different physics interact at potentially different spatial resolutions in a single domain. To minimize the implementation effort, explicit solution methods can be considered, however, implicit methods are preferred if computational efficiency is of high priority. We present the use of a partial elimination nonlinear solver technique to solve these mixed level problems and show how these formulation are closely coupled to intrusive optimization approaches and sensitivity analyses. Production codes are typically not designed for sensitivity analysis or large scale optimization. The implementation of our optimization libraries into multiple production simulation codes in which each code has their own linear algebra interface becomes an intractable problem. In an attempt to streamline this task, we have developed a standard interface between the numerical algorithm (such as optimization) and the underlying linear algebra. These interfaces (TSFCore and TSFCoreNonlin) have been adopted by the Trilinos framework and the goal is to promote the use of these interfaces especially with new developments. Finally, an adjoint based a posteriori error estimator has been developed for discontinuous Galerkin discretization of Poisson's equation. The goal is to investigate other ways to leverage the adjoint calculations and we show how the convergence of the forward problem can be improved by adapting the grid using adjoint-based error estimates. Error estimation is usually conducted with continuous adjoints but if discrete adjoints are available it may be possible to reuse the discrete version for error estimation. We investigate the advantages and disadvantages of continuous and discrete adjoints through a simple example
Model Order Reduction
An increasing complexity of models used to predict real-world systems leads to the need for algorithms to replace complex models with far simpler ones, while preserving the accuracy of the predictions. This three-volume handbook covers methods as well as applications. This third volume focuses on applications in engineering, biomedical engineering, computational physics and computer science
Towards efficient SPICE-accurate nonlinear circuit simulation with on-the-fly support-circuit preconditioners
SPICE-accurate simulation of present-day large-scale nonlinear integrated circuit (IC) systems with millions of linear/nonlinear components can be prohibitively expensive, and thus extremely challenging. In this paper, we present a novel support-circuit preconditioning (SCP) technique for tackling large-scale nonlinear circuit simulations by exploiting sparsified graphs of a given circuit network. By extracting support graphs (SGs) from the original linear circuit networks, and combining them with nonlinear devices, support-circuit preconditioner can be efficiently computed using existing matrix solvers, allowing for on-the-fly updates during transient simulations when adopted in Krylov-subspace iterative solvers. Experimental results for a variety of large-scale circuit designs show that the proposed method achieves up to 22X speedups in solving the matrices involved in DC and transient (TR) simulations, and up to 8X reduction in memory usage, when compared with the simulator powered by the state-of-the-art direct solver KLU. © 2012 ACM