602 research outputs found
Parallel Algorithms for Time and Frequency Domain Circuit Simulation
As a most critical form of pre-silicon verification, transistor-level circuit simulation
is an indispensable step before committing to an expensive manufacturing process.
However, considering the nature of circuit simulation, it can be computationally
expensive, especially for ever-larger transistor circuits with more complex device models.
Therefore, it is becoming increasingly desirable to accelerate circuit simulation.
On the other hand, the emergence of multi-core machines offers a promising solution
to circuit simulation besides the known application of distributed-memory clustered
computing platforms, which provides abundant hardware computing resources. This
research addresses the limitations of traditional serial circuit simulations and proposes
new techniques for both time-domain and frequency-domain parallel circuit
simulations.
For time-domain simulation, this dissertation presents a parallel transient simulation
methodology. This new approach, called WavePipe, exploits coarse-grained
application-level parallelism by simultaneously computing circuit solutions at multiple
adjacent time points in a way resembling hardware pipelining. There are two
embodiments in WavePipe: backward and forward pipelining schemes. While the
former creates independent computing tasks that contribute to a larger future time
step, the latter performs predictive computing along the forward direction. Unlike
existing relaxation methods, WavePipe facilitates parallel circuit simulation without jeopardizing convergence and accuracy. As a coarse-grained parallel approach, it requires
low parallel programming effort, furthermore it creates new avenues to have a
full utilization of increasingly parallel hardware by going beyond conventional finer
grained parallel device model evaluation and matrix solutions.
This dissertation also exploits the recently developed explicit telescopic projective
integration method for efficient parallel transient circuit simulation by addressing the
stability limitation of explicit numerical integration. The new method allows the
effective time step controlled by accuracy requirement instead of stability limitation.
Therefore, it not only leads to noticeable efficiency improvement, but also lends itself
to straightforward parallelization due to its explicit nature.
For frequency-domain simulation, this dissertation presents a parallel harmonic
balance approach, applicable to the steady-state and envelope-following analyses of
both driven and autonomous circuits. The new approach is centered on a naturally-parallelizable
preconditioning technique that speeds up the core computation in harmonic
balance based analysis. The proposed method facilitates parallel computing
via the use of domain knowledge and simplifies parallel programming compared with
fine-grained strategies. As a result, favorable runtime speedups are achieved
Parallel VLSI Circuit Analysis and Optimization
The prevalence of multi-core processors in recent years has introduced new
opportunities and challenges to Electronic Design Automation (EDA) research and
development. In this dissertation, a few parallel Very Large Scale Integration (VLSI)
circuit analysis and optimization methods which utilize the multi-core computing
platform to tackle some of the most difficult contemporary Computer-Aided Design
(CAD) problems are presented. The first CAD application that is addressed
in this dissertation is analyzing and optimizing mesh-based clock distribution network.
Mesh-based clock distribution network (also known as clock mesh) is used in
high-performance microprocessor designs as a reliable way of distributing clock signals
to the entire chip. The second CAD application addressed in this dissertation
is the Simulation Program with Integrated Circuit Emphasis (SPICE) like circuit
simulation. SPICE simulation is often regarded as the bottleneck of the design flow.
Recently, parallel circuit simulation has attracted a lot of attention.
The first part of the dissertation discusses circuit analysis techniques. First, a
combination of clock network specific model order reduction algorithm and a port sliding
scheme is presented to tackle the challenges in analyzing large clock meshes with
a large number of clock drivers. Our techniques run much faster than the standard
SPICE simulation and existing model order reduction techniques. They also provide
a basis for the clock mesh optimization. Then, a hierarchical multi-algorithm parallel
circuit simulation (HMAPS) framework is presented as an novel technique of parallel circuit simulation. The inter-algorithm parallelism approach in HMAPS is completely
different from the existing intra-algorithm parallel circuit simulation techniques and
achieves superlinear speedup in practice. The second part of the dissertation talks
about parallel circuit optimization. A modified asynchronous parallel pattern search
(APPS) based method which utilizes the efficient clock mesh simulation techniques for
the clock driver size optimization problem is presented. Our modified APPS method
runs much faster than a continuous optimization method and effectively reduces the
clock skew for all test circuits. The third part of the dissertation describes parallel
performance modeling and optimization of the HMAPS framework. The performance
models and runtime optimization scheme improve the speed of HMAPS further more.
The dynamically adapted HMAPS becomes a complete solution for parallel circuit
simulation
GRChombo : Numerical Relativity with Adaptive Mesh Refinement
In this work, we introduce GRChombo: a new numerical relativity code which
incorporates full adaptive mesh refinement (AMR) using block structured
Berger-Rigoutsos grid generation. The code supports non-trivial
"many-boxes-in-many-boxes" mesh hierarchies and massive parallelism through the
Message Passing Interface (MPI). GRChombo evolves the Einstein equation using
the standard BSSN formalism, with an option to turn on CCZ4 constraint damping
if required. The AMR capability permits the study of a range of new physics
which has previously been computationally infeasible in a full 3+1 setting,
whilst also significantly simplifying the process of setting up the mesh for
these problems. We show that GRChombo can stably and accurately evolve standard
spacetimes such as binary black hole mergers and scalar collapses into black
holes, demonstrate the performance characteristics of our code, and discuss
various physics problems which stand to benefit from the AMR technique.Comment: 48 pages, 24 figure
Unstructured Grid Dynamical Modeling of Planetary Atmospheres using planetMPAS: The Influence of the Rigid Lid, Computational Efficiency, and Examples of Martian and Jovian Application
We present a new planetary global circulation model, planetMPAS, based on the
state-of-the-art NCAR MPAS General Circulation Model. Taking advantage of the
cross compatibility between WRF and MPAS, planetMPAS includes most of the
planetWRF physics parameterization schemes for terrestrial planets such as Mars
and Titan. PlanetMPAS also includes a set of physics that represents radiative
transfer, dry convection, moist convection and its associated microphysics for
the Jovian atmosphere. We demonstrate that, despite the rigid-lid
approximation, planetMPAS is suitable to simulate the climate systems in
Martian and Jovian atmospheres with potential application to slow-rotating
planets such as Titan. Simulations using planetMPAS show that the new model can
reproduce many aspects of the observed features on Mars and Jupiter, such as
the seasonal CO2 cycle, polar argon enrichment, zonal mean temperature, and
qualitative dust opacity on Mars, as well as the equatorial superrotation and
banded zonal wind patterns on Jupiter.Comment: Manuscript has 61 pages, 20 figures, 2 tables, submitted to Planetary
and Space Scienc
GAMER: a GPU-Accelerated Adaptive Mesh Refinement Code for Astrophysics
We present the newly developed code, GAMER (GPU-accelerated Adaptive MEsh
Refinement code), which has adopted a novel approach to improve the performance
of adaptive mesh refinement (AMR) astrophysical simulations by a large factor
with the use of the graphic processing unit (GPU). The AMR implementation is
based on a hierarchy of grid patches with an oct-tree data structure. We adopt
a three-dimensional relaxing TVD scheme for the hydrodynamic solver, and a
multi-level relaxation scheme for the Poisson solver. Both solvers have been
implemented in GPU, by which hundreds of patches can be advanced in parallel.
The computational overhead associated with the data transfer between CPU and
GPU is carefully reduced by utilizing the capability of asynchronous memory
copies in GPU, and the computing time of the ghost-zone values for each patch
is made to diminish by overlapping it with the GPU computations. We demonstrate
the accuracy of the code by performing several standard test problems in
astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster
system. We measure the performance of the code by performing purely-baryonic
cosmological simulations in different hardware implementations, in which
detailed timing analyses provide comparison between the computations with and
without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are
demonstrated using 1 GPU with 4096^3 effective resolution and 16 GPUs with
8192^3 effective resolution, respectively.Comment: 60 pages, 22 figures, 3 tables. More accuracy tests are included.
Accepted for publication in ApJ
3D simulation of magneto-mechanical coupling in MRI scanners using high order FEM and POD
Magnetic Resonance Imaging (MRI) scanners have become an essential tool in the medi-cal industry due to their ability to produce high resolution images of the human body. To generate an image of the body, MRI scanners combine strong static magnetic fields with transient gradient magnetic fields. The interaction of these magnetic fields with the con-ducting components present in superconducting MRI scanners gives rise to an important problem in the design of new MRI scanners. The transient magnetic fields give rise to the appearance of eddy currents in conducting components. These eddy currents, in turn, result in electromagnetic stresses, which cause the conducting components to deform and vibrate. The vibrations are undesirable as they lead to a deterioration in image quality (with image artefacts) and to the generation of noise, which can cause patient discomfort. The eddy currents, in addition, lead to heat being dissipated and deposited into the cryo-stat, which is filled with helium in order to maintain the coils in a superconducting state. This deposition of heat can cause helium boil off and potentially result in a costly magnet quench. Understanding the mechanisms involved in the generation of these vibrations and the heat being deposited into the cryostat are, therefore, key for a successful MRI scanner design. This involves the solution of a coupled magneto-mechanical problem, which is the focus of this work.In this thesis, a new computational methodology for the solution of three-dimensional (3D) magneto-mechanical coupled problems with application to MRI scanner design is presented. To achieve this, first an accurate mathematical description of the magneto-mechanical coupling is presented, which is based on a Lagrangian formulation and the assumption of small displacements. Then, the problem is linearised using an AC-DC splitting of the fields, and a variational formulation for the solution of the linearised prob-lem in a time-harmonic setting is presented. The problem is then discretised using high order finite elements, where a combination of hierarchical H1 and H(curl) basis func-tions is used. An efficient staggered algorithm for the solution of the coupled system is proposed, which combines the DC and AC stages and makes use of preconditioned iter-ative solvers when appropriate. This finite element methodology is then applied to a set of challenging academic and industrially relevant problems in order to demonstrate its accuracy and efficiency.This finite element methodology results in the accurate and efficient solution of the magneto-mechanical problem of interest. However, in the design stage of a new MRI scanner, this coupled problem must be solved repeatedly for varying model parameters such as frequency or material properties. Thus, even if an efficient finite element solver is available for the solution of the coupled problem, the need for these repeated simulations result in a bottleneck in terms of computational cost, which leads to an increase in design time and its associated financial implications. Therefore, in order to optimise this process, the application of Reduced Order Modelling (ROM) techniques is considered. A ROM based on the Proper Orthogonal Decomposition (POD) method is presented and applied to a series of challenging MRI configurations. The accuracy and efficiency of this ROM is demonstrated by performing comparisons against the full order or high fidelity finite element software, showing great performance in terms of computational speed-up, which has major benefits in the optimisation of the design process of new MRI scanners
Center for Aeronautics and Space Information Sciences
This report summarizes the research done during 1991/92 under the Center for Aeronautics and Space Information Science (CASIS) program. The topics covered are computer architecture, networking, and neural nets
- …