3,327 research outputs found
A GPU-accelerated Direct-sum Boundary Integral Poisson-Boltzmann Solver
In this paper, we present a GPU-accelerated direct-sum boundary integral
method to solve the linear Poisson-Boltzmann (PB) equation. In our method, a
well-posed boundary integral formulation is used to ensure the fast convergence
of Krylov subspace based linear algebraic solver such as the GMRES. The
molecular surfaces are discretized with flat triangles and centroid
collocation. To speed up our method, we take advantage of the parallel nature
of the boundary integral formulation and parallelize the schemes within CUDA
shared memory architecture on GPU. The schemes use only
size-of-double device memory for a biomolecule with triangular surface
elements and partial charges. Numerical tests of these schemes show
well-maintained accuracy and fast convergence. The GPU implementation using one
GPU card (Nvidia Tesla M2070) achieves 120-150X speed-up to the implementation
using one CPU (Intel L5640 2.27GHz). With our approach, solving PB equations on
well-discretized molecular surfaces with up to 300,000 boundary elements will
take less than about 10 minutes, hence our approach is particularly suitable
for fast electrostatics computations on small to medium biomolecules
A pseudospectral matrix method for time-dependent tensor fields on a spherical shell
We construct a pseudospectral method for the solution of time-dependent,
non-linear partial differential equations on a three-dimensional spherical
shell. The problem we address is the treatment of tensor fields on the sphere.
As a test case we consider the evolution of a single black hole in numerical
general relativity. A natural strategy would be the expansion in tensor
spherical harmonics in spherical coordinates. Instead, we consider the simpler
and potentially more efficient possibility of a double Fourier expansion on the
sphere for tensors in Cartesian coordinates. As usual for the double Fourier
method, we employ a filter to address time-step limitations and certain
stability issues. We find that a tensor filter based on spin-weighted spherical
harmonics is successful, while two simplified, non-spin-weighted filters do not
lead to stable evolutions. The derivatives and the filter are implemented by
matrix multiplication for efficiency. A key technical point is the construction
of a matrix multiplication method for the spin-weighted spherical harmonic
filter. As example for the efficient parallelization of the double Fourier,
spin-weighted filter method we discuss an implementation on a GPU, which
achieves a speed-up of up to a factor of 20 compared to a single core CPU
implementation.Comment: 33 pages, 9 figure
SCALABLE INTEGRATED CIRCUIT SIMULATION ALGORITHMS FOR ENERGY-EFFICIENT TERAFLOP HETEROGENEOUS PARALLEL COMPUTING PLATFORMS
Integrated circuit technology has gone through several decades of aggressive scaling.It is increasingly challenging to analyze growing design complexity. Post-layout SPICE simulation can be computationally prohibitive due to the huge amount of parasitic elements, which can easily boost the computation and memory cost. As the decrease in device size, the circuits become more vulnerable to process variations. Designers need to statistically simulate the probability that a circuit does not meet the performance metric, which requires millions times of simulations to capture rare failure events.
Recent, multiprocessors with heterogeneous architecture have emerged as mainstream computing platforms. The heterogeneous computing platform can achieve highthroughput energy efficient computing. However, the application of such platform is not trivial and needs to reinvent existing algorithms to fully utilize the computing resources. This dissertation presents several new algorithms to address those aforementioned two significant and challenging issues on the heterogeneous platform.
Harmonic Balance (HB) analysis is essential for efficient verification of large postlayout RF and microwave integrated circuits (ICs). However, existing methods either suffer from excessively long simulation time and prohibitively large memory consumption or exhibit poor stability. This dissertation introduces a novel transient-simulation guided graph sparsification technique, as well as an efficient runtime performance modeling approach tailored for heterogeneous manycore CPU-GPU computing system to build nearly-optimal subgraph preconditioners that can lead to minimum HB simulation runtime. Additionally, we propose a novel heterogeneous parallel sparse block matrix algorithm by taking advantages of the structure of HB Jacobian matrices as well as GPU’s streaming multiprocessors to achieve optimal workload balancing during the preconditioning phase of HB analysis. We also show how the proposed preconditioned iterative algorithm can efficiently adapt to heterogeneous computing systems with different CPU and GPU computing capabilities. Extensive experimental results show that our HB solver can achieve up to 20X speedups and 5X memory reduction when compared with the state-of-the-art direct solver highly optimized for twelve-core CPUs.
In nowadays variation-aware IC designs, cell characterizations and SRAM memory yield analysis require many thousands or even millions of repeated SPICE simulations for relatively small nonlinear circuits. In this dissertation, for the first time, we present a massively parallel SPICE simulator on GPU, TinySPICE, for efficiently analyzing small nonlinear circuits. TinySPICE integrates a highly-optimized shared-memory based matrix solver and fast parametric three-dimensional (3D) LUTs based device evaluation method. A novel circuit clustering method is also proposed to improve the stability and efficiency of the matrix solver. Compared with CPU-based SPICE simulator, TinySPICE achieves up to 264X speedups for parametric SRAM yield analysis without loss of accuracy
- …