3 research outputs found

    A 3-D Fast transform-based preconditioner for large-scale power grid analysis on massively parallel architectures

    No full text
    Efficient analysis of on-chip power delivery networks is one of the most challenging problems that EDA is confronted with. This paper addresses the problem of simulating general multi-layer power delivery networks with significant via resistances. An iterative solution method is combined with an efficient and extremely parallel preconditioning mechanism based on the application of a 3D Fast Transform solver, which enables harnessing the computational resources of massively parallel architectures, such as GPUs. Experimental evaluation of the proposed methodology on a set of large-scale industrial benchmarks demonstrates a speed-up of 290.2X for a 2.62M-node design over a state-of-the-art parallel direct solver, and a speed-up of 75.5X for a 10.51M-node design over a parallel iterative solver with a general-purpose preconditioner, when GPUs are utilized. © 2014 IEEE

    Parallel Simulation for VLSI Power Grid

    Get PDF
    Due to the increasing complexity of VLSI circuits, power grid simulation has become more and more time-consuming. Hence, there is a need for fast and accurate power grid simulator. In order to perform power grid simulation in a timely manner, parallel algorithms have been developed to accelerate the simulation. In this dissertation, we present parallel algorithms and software for power grid simulation on CPU-GPU platforms. The power grid is divided into disjoint partitions. The partitions are enlarged using Breath First Search (BFS) method. In the partition enlarging process, a portion of edges are ignored to make the matrix factorization light-weight. Solving the enlarged partitions using a direct solver serves as a preconditioner for the Preconditioned Conjugate Gradient (PCG) method that is used to solve the power grid. This work combines the advantages of direct solvers and iterative solvers to obtain an efficient hybrid parallel solver. Two-tier parallelism is harnessed using MPI for partitions and CUDA within each partition. The experiments conducted on supercomputing clusters demonstrate significant speed improvements over a state-of-the-art direct solver in both static and transient analysis
    corecore