1,764 research outputs found
Enhancing speed and scalability of the ParFlow simulation code
Regional hydrology studies are often supported by high resolution simulations
of subsurface flow that require expensive and extensive computations. Efficient
usage of the latest high performance parallel computing systems becomes a
necessity. The simulation software ParFlow has been demonstrated to meet this
requirement and shown to have excellent solver scalability for up to 16,384
processes. In the present work we show that the code requires further
enhancements in order to fully take advantage of current petascale machines. We
identify ParFlow's way of parallelization of the computational mesh as a
central bottleneck. We propose to reorganize this subsystem using fast mesh
partition algorithms provided by the parallel adaptive mesh refinement library
p4est. We realize this in a minimally invasive manner by modifying selected
parts of the code to reinterpret the existing mesh data structures. We evaluate
the scaling performance of the modified version of ParFlow, demonstrating good
weak and strong scaling up to 458k cores of the Juqueen supercomputer, and test
an example application at large scale.Comment: The final publication is available at link.springer.co
Parallel Algorithms for Time and Frequency Domain Circuit Simulation
As a most critical form of pre-silicon verification, transistor-level circuit simulation
is an indispensable step before committing to an expensive manufacturing process.
However, considering the nature of circuit simulation, it can be computationally
expensive, especially for ever-larger transistor circuits with more complex device models.
Therefore, it is becoming increasingly desirable to accelerate circuit simulation.
On the other hand, the emergence of multi-core machines offers a promising solution
to circuit simulation besides the known application of distributed-memory clustered
computing platforms, which provides abundant hardware computing resources. This
research addresses the limitations of traditional serial circuit simulations and proposes
new techniques for both time-domain and frequency-domain parallel circuit
simulations.
For time-domain simulation, this dissertation presents a parallel transient simulation
methodology. This new approach, called WavePipe, exploits coarse-grained
application-level parallelism by simultaneously computing circuit solutions at multiple
adjacent time points in a way resembling hardware pipelining. There are two
embodiments in WavePipe: backward and forward pipelining schemes. While the
former creates independent computing tasks that contribute to a larger future time
step, the latter performs predictive computing along the forward direction. Unlike
existing relaxation methods, WavePipe facilitates parallel circuit simulation without jeopardizing convergence and accuracy. As a coarse-grained parallel approach, it requires
low parallel programming effort, furthermore it creates new avenues to have a
full utilization of increasingly parallel hardware by going beyond conventional finer
grained parallel device model evaluation and matrix solutions.
This dissertation also exploits the recently developed explicit telescopic projective
integration method for efficient parallel transient circuit simulation by addressing the
stability limitation of explicit numerical integration. The new method allows the
effective time step controlled by accuracy requirement instead of stability limitation.
Therefore, it not only leads to noticeable efficiency improvement, but also lends itself
to straightforward parallelization due to its explicit nature.
For frequency-domain simulation, this dissertation presents a parallel harmonic
balance approach, applicable to the steady-state and envelope-following analyses of
both driven and autonomous circuits. The new approach is centered on a naturally-parallelizable
preconditioning technique that speeds up the core computation in harmonic
balance based analysis. The proposed method facilitates parallel computing
via the use of domain knowledge and simplifies parallel programming compared with
fine-grained strategies. As a result, favorable runtime speedups are achieved
Scalability of broadcast performance in wireless network-on-chip
Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.Peer ReviewedPostprint (published version
Performance of a finite volume CEM code on multicomputers
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/77161/1/AIAA-1994-236-711.pd
High-order, Dispersionless "Fast-Hybrid" Wave Equation Solver. Part I: Sampling Cost via Incident-Field Windowing and Recentering
This paper proposes a frequency/time hybrid integral-equation method for the
time dependent wave equation in two and three-dimensional spatial domains.
Relying on Fourier Transformation in time, the method utilizes a fixed
(time-independent) number of frequency-domain integral-equation solutions to
evaluate, with superalgebraically-small errors, time domain solutions for
arbitrarily long times. The approach relies on two main elements, namely, 1) A
smooth time-windowing methodology that enables accurate band-limited
representations for arbitrarily-long time signals, and 2) A novel Fourier
transform approach which, in a time-parallel manner and without causing
spurious periodicity effects, delivers numerically dispersionless
spectrally-accurate solutions. A similar hybrid technique can be obtained on
the basis of Laplace transforms instead of Fourier transforms, but we do not
consider the Laplace-based method in the present contribution. The algorithm
can handle dispersive media, it can tackle complex physical structures, it
enables parallelization in time in a straightforward manner, and it allows for
time leaping---that is, solution sampling at any given time at
-bounded sampling cost, for arbitrarily large values of ,
and without requirement of evaluation of the solution at intermediate times.
The proposed frequency-time hybridization strategy, which generalizes to any
linear partial differential equation in the time domain for which
frequency-domain solutions can be obtained (including e.g. the time-domain
Maxwell equations), and which is applicable in a wide range of scientific and
engineering contexts, provides significant advantages over other available
alternatives such as volumetric discretization, time-domain integral equations,
and convolution-quadrature approaches.Comment: 33 pages, 8 figures, revised and extended manuscript (and now
including direct comparisons to existing CQ and TDIE solver implementations)
(Part I of II
Scalable parallel simulation of variably saturated flow
In this thesis we develop highly accurate simulation tools for variably saturated flow through porous media able to take advantage of the latest supercomputing resources. Hence, we aim for parallel scalability to very large compute resources of over 105 CPU cores. Our starting point is the parallel subsurface flow simulator ParFlow. This library is of widespread use in the hydrology community and known to have excellent parallel scalability up to 16k processes. We first investigate the numerical tools this library implements in order to perform the simulations it was designed for. ParFlow solves the governing equation for subsurface flow with a cell centered finite difference (FD) method. The code targets high performance computing (HPC) systems by means of distributed memory parallelism. We propose to reorganize ParFlow's mesh subsystem by using fast partitioning algorithms provided by the parallel adaptive mesh refinement (AMR) library p4est. We realize this in a minimally invasive manner by modifying selected parts of the code to reinterpret the existing mesh data structures. Furthermore, we evaluate the scaling performance of the modified version of ParFlow, demonstrating excellent weak and strong scaling up to 458k cores of the Juqueen supercomputer at the JĂĽlich Supercomputing Centre. The above mentioned results were obtained for uniform meshes and hence without explicitly exploiting the AMR capabilities of the p4est library. A natural extension of our work is to activate such functionality and make ParFlow a true AMR application. Enabling ParFlow to use AMR is challenging for several reasons: It may be based on assumptions on the parallel partition that cannot be maintained with AMR, it may use mesh-related metadata that is replicated on all CPUs, and it may assume uniform meshes in the construction of mathematical operators. Additionally, the use of locally refined meshes will certainly change the spectral properties of these operators. In this work, we develop an algorithmic approach to activate the usage of locally refined grids in ParFlow. AMR allows meshes where elements of different size neighbor each other. In this case, ParFlow may incur erroneous results when it attempts to communicate data between inter-element boundaries. We propose and discuss two solutions to this issue operating at two different levels: The first manipulates the indices of the degrees of freedom, While the second operates directly on the degrees of freedom. Both approaches aim to introduce minimal changes to the original ParFlow code. In an AMR framework, the FD method taken by ParFlow will require modifications to correctly deal with different size elements. Mixed finite elements (MFE) are on the other hand better suited for the usage of AMR. It is known that the cell centered FD method used in ParFlow might be reinterpreted as a MFE discretization using Raviart-Thomas elements of lower order. We conclude this thesis presenting a block preconditioner for saddle point problems arising from a MFE on locally refined meshes. We evaluate its robustness with respect to various classes of coefficients for uniform and locally refined meshes
High order resolution and parallel implementation on unstructured grids
The numerical solution of the two-dimensional inviscid Euler flow equations is given. The unstructured mesh is generated by the advancing front technique. A cell-centred upwind finite volume method has been adopted to discretize the Euler equations. Both explicit and point implicit time stepping algorithms are derived. The flux calculation using Roe's and Osher's approximate Riemann solvers are studied. It is shown that both the Roe and Osher's schemes produce an accurate representation of discontinuities (e.g. shock wave). It is also shown that better convergence performance has been achieved by the point implicit scheme than that by the explicit scheme. Validations have been done for subsonic and transonic flow over airfoils, supersonic flow past a compression corner and hypersonic flow past cylinder and blunt body geometries. An adaptive remeshing procedure is also applied to the numerical solution with the objective of getting improved results.
The issue of high order reconstruction on unstructured grids has been discussed. The methodology of the Taylor series expansion is adopted. The calculation of the gradient at a reference point is carried out by the use of either the Green-Gauss integral formula or the least-square methods. Some recently developed limiter construction methods have been used and their performance has been demonstrated using the test example of the transonic flow over a RAE 2822 airfoil. It has been shown that similar pressure distributions are obtained by all limiters except for shock wave regions where the limiter is active. The convergence problem is illustrated by the mid-mod type limiter. It seems only the Venkatakrishnan limiter provides improved convergence. Other limiters do not appear to work as well as that shown in their original publications. Also the convergence history given by the least-square method appears better than that by the Green-Gauss method in the test
- …