602 research outputs found

    Parallel Algorithms for Time and Frequency Domain Circuit Simulation

    Get PDF
    As a most critical form of pre-silicon verification, transistor-level circuit simulation is an indispensable step before committing to an expensive manufacturing process. However, considering the nature of circuit simulation, it can be computationally expensive, especially for ever-larger transistor circuits with more complex device models. Therefore, it is becoming increasingly desirable to accelerate circuit simulation. On the other hand, the emergence of multi-core machines offers a promising solution to circuit simulation besides the known application of distributed-memory clustered computing platforms, which provides abundant hardware computing resources. This research addresses the limitations of traditional serial circuit simulations and proposes new techniques for both time-domain and frequency-domain parallel circuit simulations. For time-domain simulation, this dissertation presents a parallel transient simulation methodology. This new approach, called WavePipe, exploits coarse-grained application-level parallelism by simultaneously computing circuit solutions at multiple adjacent time points in a way resembling hardware pipelining. There are two embodiments in WavePipe: backward and forward pipelining schemes. While the former creates independent computing tasks that contribute to a larger future time step, the latter performs predictive computing along the forward direction. Unlike existing relaxation methods, WavePipe facilitates parallel circuit simulation without jeopardizing convergence and accuracy. As a coarse-grained parallel approach, it requires low parallel programming effort, furthermore it creates new avenues to have a full utilization of increasingly parallel hardware by going beyond conventional finer grained parallel device model evaluation and matrix solutions. This dissertation also exploits the recently developed explicit telescopic projective integration method for efficient parallel transient circuit simulation by addressing the stability limitation of explicit numerical integration. The new method allows the effective time step controlled by accuracy requirement instead of stability limitation. Therefore, it not only leads to noticeable efficiency improvement, but also lends itself to straightforward parallelization due to its explicit nature. For frequency-domain simulation, this dissertation presents a parallel harmonic balance approach, applicable to the steady-state and envelope-following analyses of both driven and autonomous circuits. The new approach is centered on a naturally-parallelizable preconditioning technique that speeds up the core computation in harmonic balance based analysis. The proposed method facilitates parallel computing via the use of domain knowledge and simplifies parallel programming compared with fine-grained strategies. As a result, favorable runtime speedups are achieved

    Parallel VLSI Circuit Analysis and Optimization

    Get PDF
    The prevalence of multi-core processors in recent years has introduced new opportunities and challenges to Electronic Design Automation (EDA) research and development. In this dissertation, a few parallel Very Large Scale Integration (VLSI) circuit analysis and optimization methods which utilize the multi-core computing platform to tackle some of the most difficult contemporary Computer-Aided Design (CAD) problems are presented. The first CAD application that is addressed in this dissertation is analyzing and optimizing mesh-based clock distribution network. Mesh-based clock distribution network (also known as clock mesh) is used in high-performance microprocessor designs as a reliable way of distributing clock signals to the entire chip. The second CAD application addressed in this dissertation is the Simulation Program with Integrated Circuit Emphasis (SPICE) like circuit simulation. SPICE simulation is often regarded as the bottleneck of the design flow. Recently, parallel circuit simulation has attracted a lot of attention. The first part of the dissertation discusses circuit analysis techniques. First, a combination of clock network specific model order reduction algorithm and a port sliding scheme is presented to tackle the challenges in analyzing large clock meshes with a large number of clock drivers. Our techniques run much faster than the standard SPICE simulation and existing model order reduction techniques. They also provide a basis for the clock mesh optimization. Then, a hierarchical multi-algorithm parallel circuit simulation (HMAPS) framework is presented as an novel technique of parallel circuit simulation. The inter-algorithm parallelism approach in HMAPS is completely different from the existing intra-algorithm parallel circuit simulation techniques and achieves superlinear speedup in practice. The second part of the dissertation talks about parallel circuit optimization. A modified asynchronous parallel pattern search (APPS) based method which utilizes the efficient clock mesh simulation techniques for the clock driver size optimization problem is presented. Our modified APPS method runs much faster than a continuous optimization method and effectively reduces the clock skew for all test circuits. The third part of the dissertation describes parallel performance modeling and optimization of the HMAPS framework. The performance models and runtime optimization scheme improve the speed of HMAPS further more. The dynamically adapted HMAPS becomes a complete solution for parallel circuit simulation

    GRChombo : Numerical Relativity with Adaptive Mesh Refinement

    Full text link
    In this work, we introduce GRChombo: a new numerical relativity code which incorporates full adaptive mesh refinement (AMR) using block structured Berger-Rigoutsos grid generation. The code supports non-trivial "many-boxes-in-many-boxes" mesh hierarchies and massive parallelism through the Message Passing Interface (MPI). GRChombo evolves the Einstein equation using the standard BSSN formalism, with an option to turn on CCZ4 constraint damping if required. The AMR capability permits the study of a range of new physics which has previously been computationally infeasible in a full 3+1 setting, whilst also significantly simplifying the process of setting up the mesh for these problems. We show that GRChombo can stably and accurately evolve standard spacetimes such as binary black hole mergers and scalar collapses into black holes, demonstrate the performance characteristics of our code, and discuss various physics problems which stand to benefit from the AMR technique.Comment: 48 pages, 24 figure

    Unstructured Grid Dynamical Modeling of Planetary Atmospheres using planetMPAS: The Influence of the Rigid Lid, Computational Efficiency, and Examples of Martian and Jovian Application

    Full text link
    We present a new planetary global circulation model, planetMPAS, based on the state-of-the-art NCAR MPAS General Circulation Model. Taking advantage of the cross compatibility between WRF and MPAS, planetMPAS includes most of the planetWRF physics parameterization schemes for terrestrial planets such as Mars and Titan. PlanetMPAS also includes a set of physics that represents radiative transfer, dry convection, moist convection and its associated microphysics for the Jovian atmosphere. We demonstrate that, despite the rigid-lid approximation, planetMPAS is suitable to simulate the climate systems in Martian and Jovian atmospheres with potential application to slow-rotating planets such as Titan. Simulations using planetMPAS show that the new model can reproduce many aspects of the observed features on Mars and Jupiter, such as the seasonal CO2 cycle, polar argon enrichment, zonal mean temperature, and qualitative dust opacity on Mars, as well as the equatorial superrotation and banded zonal wind patterns on Jupiter.Comment: Manuscript has 61 pages, 20 figures, 2 tables, submitted to Planetary and Space Scienc

    GAMER: a GPU-Accelerated Adaptive Mesh Refinement Code for Astrophysics

    Full text link
    We present the newly developed code, GAMER (GPU-accelerated Adaptive MEsh Refinement code), which has adopted a novel approach to improve the performance of adaptive mesh refinement (AMR) astrophysical simulations by a large factor with the use of the graphic processing unit (GPU). The AMR implementation is based on a hierarchy of grid patches with an oct-tree data structure. We adopt a three-dimensional relaxing TVD scheme for the hydrodynamic solver, and a multi-level relaxation scheme for the Poisson solver. Both solvers have been implemented in GPU, by which hundreds of patches can be advanced in parallel. The computational overhead associated with the data transfer between CPU and GPU is carefully reduced by utilizing the capability of asynchronous memory copies in GPU, and the computing time of the ghost-zone values for each patch is made to diminish by overlapping it with the GPU computations. We demonstrate the accuracy of the code by performing several standard test problems in astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster system. We measure the performance of the code by performing purely-baryonic cosmological simulations in different hardware implementations, in which detailed timing analyses provide comparison between the computations with and without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are demonstrated using 1 GPU with 4096^3 effective resolution and 16 GPUs with 8192^3 effective resolution, respectively.Comment: 60 pages, 22 figures, 3 tables. More accuracy tests are included. Accepted for publication in ApJ

    3D simulation of magneto-mechanical coupling in MRI scanners using high order FEM and POD

    Get PDF
    Magnetic Resonance Imaging (MRI) scanners have become an essential tool in the medi-cal industry due to their ability to produce high resolution images of the human body. To generate an image of the body, MRI scanners combine strong static magnetic fields with transient gradient magnetic fields. The interaction of these magnetic fields with the con-ducting components present in superconducting MRI scanners gives rise to an important problem in the design of new MRI scanners. The transient magnetic fields give rise to the appearance of eddy currents in conducting components. These eddy currents, in turn, result in electromagnetic stresses, which cause the conducting components to deform and vibrate. The vibrations are undesirable as they lead to a deterioration in image quality (with image artefacts) and to the generation of noise, which can cause patient discomfort. The eddy currents, in addition, lead to heat being dissipated and deposited into the cryo-stat, which is filled with helium in order to maintain the coils in a superconducting state. This deposition of heat can cause helium boil off and potentially result in a costly magnet quench. Understanding the mechanisms involved in the generation of these vibrations and the heat being deposited into the cryostat are, therefore, key for a successful MRI scanner design. This involves the solution of a coupled magneto-mechanical problem, which is the focus of this work.In this thesis, a new computational methodology for the solution of three-dimensional (3D) magneto-mechanical coupled problems with application to MRI scanner design is presented. To achieve this, first an accurate mathematical description of the magneto-mechanical coupling is presented, which is based on a Lagrangian formulation and the assumption of small displacements. Then, the problem is linearised using an AC-DC splitting of the fields, and a variational formulation for the solution of the linearised prob-lem in a time-harmonic setting is presented. The problem is then discretised using high order finite elements, where a combination of hierarchical H1 and H(curl) basis func-tions is used. An efficient staggered algorithm for the solution of the coupled system is proposed, which combines the DC and AC stages and makes use of preconditioned iter-ative solvers when appropriate. This finite element methodology is then applied to a set of challenging academic and industrially relevant problems in order to demonstrate its accuracy and efficiency.This finite element methodology results in the accurate and efficient solution of the magneto-mechanical problem of interest. However, in the design stage of a new MRI scanner, this coupled problem must be solved repeatedly for varying model parameters such as frequency or material properties. Thus, even if an efficient finite element solver is available for the solution of the coupled problem, the need for these repeated simulations result in a bottleneck in terms of computational cost, which leads to an increase in design time and its associated financial implications. Therefore, in order to optimise this process, the application of Reduced Order Modelling (ROM) techniques is considered. A ROM based on the Proper Orthogonal Decomposition (POD) method is presented and applied to a series of challenging MRI configurations. The accuracy and efficiency of this ROM is demonstrated by performing comparisons against the full order or high fidelity finite element software, showing great performance in terms of computational speed-up, which has major benefits in the optimisation of the design process of new MRI scanners

    Center for Aeronautics and Space Information Sciences

    Get PDF
    This report summarizes the research done during 1991/92 under the Center for Aeronautics and Space Information Science (CASIS) program. The topics covered are computer architecture, networking, and neural nets
    corecore