1,764 research outputs found

    Enhancing speed and scalability of the ParFlow simulation code

    Full text link
    Regional hydrology studies are often supported by high resolution simulations of subsurface flow that require expensive and extensive computations. Efficient usage of the latest high performance parallel computing systems becomes a necessity. The simulation software ParFlow has been demonstrated to meet this requirement and shown to have excellent solver scalability for up to 16,384 processes. In the present work we show that the code requires further enhancements in order to fully take advantage of current petascale machines. We identify ParFlow's way of parallelization of the computational mesh as a central bottleneck. We propose to reorganize this subsystem using fast mesh partition algorithms provided by the parallel adaptive mesh refinement library p4est. We realize this in a minimally invasive manner by modifying selected parts of the code to reinterpret the existing mesh data structures. We evaluate the scaling performance of the modified version of ParFlow, demonstrating good weak and strong scaling up to 458k cores of the Juqueen supercomputer, and test an example application at large scale.Comment: The final publication is available at link.springer.co

    Parallel Algorithms for Time and Frequency Domain Circuit Simulation

    Get PDF
    As a most critical form of pre-silicon verification, transistor-level circuit simulation is an indispensable step before committing to an expensive manufacturing process. However, considering the nature of circuit simulation, it can be computationally expensive, especially for ever-larger transistor circuits with more complex device models. Therefore, it is becoming increasingly desirable to accelerate circuit simulation. On the other hand, the emergence of multi-core machines offers a promising solution to circuit simulation besides the known application of distributed-memory clustered computing platforms, which provides abundant hardware computing resources. This research addresses the limitations of traditional serial circuit simulations and proposes new techniques for both time-domain and frequency-domain parallel circuit simulations. For time-domain simulation, this dissertation presents a parallel transient simulation methodology. This new approach, called WavePipe, exploits coarse-grained application-level parallelism by simultaneously computing circuit solutions at multiple adjacent time points in a way resembling hardware pipelining. There are two embodiments in WavePipe: backward and forward pipelining schemes. While the former creates independent computing tasks that contribute to a larger future time step, the latter performs predictive computing along the forward direction. Unlike existing relaxation methods, WavePipe facilitates parallel circuit simulation without jeopardizing convergence and accuracy. As a coarse-grained parallel approach, it requires low parallel programming effort, furthermore it creates new avenues to have a full utilization of increasingly parallel hardware by going beyond conventional finer grained parallel device model evaluation and matrix solutions. This dissertation also exploits the recently developed explicit telescopic projective integration method for efficient parallel transient circuit simulation by addressing the stability limitation of explicit numerical integration. The new method allows the effective time step controlled by accuracy requirement instead of stability limitation. Therefore, it not only leads to noticeable efficiency improvement, but also lends itself to straightforward parallelization due to its explicit nature. For frequency-domain simulation, this dissertation presents a parallel harmonic balance approach, applicable to the steady-state and envelope-following analyses of both driven and autonomous circuits. The new approach is centered on a naturally-parallelizable preconditioning technique that speeds up the core computation in harmonic balance based analysis. The proposed method facilitates parallel computing via the use of domain knowledge and simplifies parallel programming compared with fine-grained strategies. As a result, favorable runtime speedups are achieved

    Scalability of broadcast performance in wireless network-on-chip

    Get PDF
    Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.Peer ReviewedPostprint (published version

    Performance of a finite volume CEM code on multicomputers

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/77161/1/AIAA-1994-236-711.pd

    High-order, Dispersionless "Fast-Hybrid" Wave Equation Solver. Part I: O(1)\mathcal{O}(1) Sampling Cost via Incident-Field Windowing and Recentering

    Get PDF
    This paper proposes a frequency/time hybrid integral-equation method for the time dependent wave equation in two and three-dimensional spatial domains. Relying on Fourier Transformation in time, the method utilizes a fixed (time-independent) number of frequency-domain integral-equation solutions to evaluate, with superalgebraically-small errors, time domain solutions for arbitrarily long times. The approach relies on two main elements, namely, 1) A smooth time-windowing methodology that enables accurate band-limited representations for arbitrarily-long time signals, and 2) A novel Fourier transform approach which, in a time-parallel manner and without causing spurious periodicity effects, delivers numerically dispersionless spectrally-accurate solutions. A similar hybrid technique can be obtained on the basis of Laplace transforms instead of Fourier transforms, but we do not consider the Laplace-based method in the present contribution. The algorithm can handle dispersive media, it can tackle complex physical structures, it enables parallelization in time in a straightforward manner, and it allows for time leaping---that is, solution sampling at any given time TT at O(1)\mathcal{O}(1)-bounded sampling cost, for arbitrarily large values of TT, and without requirement of evaluation of the solution at intermediate times. The proposed frequency-time hybridization strategy, which generalizes to any linear partial differential equation in the time domain for which frequency-domain solutions can be obtained (including e.g. the time-domain Maxwell equations), and which is applicable in a wide range of scientific and engineering contexts, provides significant advantages over other available alternatives such as volumetric discretization, time-domain integral equations, and convolution-quadrature approaches.Comment: 33 pages, 8 figures, revised and extended manuscript (and now including direct comparisons to existing CQ and TDIE solver implementations) (Part I of II

    Scalable parallel simulation of variably saturated flow

    Get PDF
    In this thesis we develop highly accurate simulation tools for variably saturated flow through porous media able to take advantage of the latest supercomputing resources. Hence, we aim for parallel scalability to very large compute resources of over 105 CPU cores. Our starting point is the parallel subsurface flow simulator ParFlow. This library is of widespread use in the hydrology community and known to have excellent parallel scalability up to 16k processes. We first investigate the numerical tools this library implements in order to perform the simulations it was designed for. ParFlow solves the governing equation for subsurface flow with a cell centered finite difference (FD) method. The code targets high performance computing (HPC) systems by means of distributed memory parallelism. We propose to reorganize ParFlow's mesh subsystem by using fast partitioning algorithms provided by the parallel adaptive mesh refinement (AMR) library p4est. We realize this in a minimally invasive manner by modifying selected parts of the code to reinterpret the existing mesh data structures. Furthermore, we evaluate the scaling performance of the modified version of ParFlow, demonstrating excellent weak and strong scaling up to 458k cores of the Juqueen supercomputer at the JĂĽlich Supercomputing Centre. The above mentioned results were obtained for uniform meshes and hence without explicitly exploiting the AMR capabilities of the p4est library. A natural extension of our work is to activate such functionality and make ParFlow a true AMR application. Enabling ParFlow to use AMR is challenging for several reasons: It may be based on assumptions on the parallel partition that cannot be maintained with AMR, it may use mesh-related metadata that is replicated on all CPUs, and it may assume uniform meshes in the construction of mathematical operators. Additionally, the use of locally refined meshes will certainly change the spectral properties of these operators. In this work, we develop an algorithmic approach to activate the usage of locally refined grids in ParFlow. AMR allows meshes where elements of different size neighbor each other. In this case, ParFlow may incur erroneous results when it attempts to communicate data between inter-element boundaries. We propose and discuss two solutions to this issue operating at two different levels: The first manipulates the indices of the degrees of freedom, While the second operates directly on the degrees of freedom. Both approaches aim to introduce minimal changes to the original ParFlow code. In an AMR framework, the FD method taken by ParFlow will require modifications to correctly deal with different size elements. Mixed finite elements (MFE) are on the other hand better suited for the usage of AMR. It is known that the cell centered FD method used in ParFlow might be reinterpreted as a MFE discretization using Raviart-Thomas elements of lower order. We conclude this thesis presenting a block preconditioner for saddle point problems arising from a MFE on locally refined meshes. We evaluate its robustness with respect to various classes of coefficients for uniform and locally refined meshes

    High order resolution and parallel implementation on unstructured grids

    Get PDF
    The numerical solution of the two-dimensional inviscid Euler flow equations is given. The unstructured mesh is generated by the advancing front technique. A cell-centred upwind finite volume method has been adopted to discretize the Euler equations. Both explicit and point implicit time stepping algorithms are derived. The flux calculation using Roe's and Osher's approximate Riemann solvers are studied. It is shown that both the Roe and Osher's schemes produce an accurate representation of discontinuities (e.g. shock wave). It is also shown that better convergence performance has been achieved by the point implicit scheme than that by the explicit scheme. Validations have been done for subsonic and transonic flow over airfoils, supersonic flow past a compression corner and hypersonic flow past cylinder and blunt body geometries. An adaptive remeshing procedure is also applied to the numerical solution with the objective of getting improved results. The issue of high order reconstruction on unstructured grids has been discussed. The methodology of the Taylor series expansion is adopted. The calculation of the gradient at a reference point is carried out by the use of either the Green-Gauss integral formula or the least-square methods. Some recently developed limiter construction methods have been used and their performance has been demonstrated using the test example of the transonic flow over a RAE 2822 airfoil. It has been shown that similar pressure distributions are obtained by all limiters except for shock wave regions where the limiter is active. The convergence problem is illustrated by the mid-mod type limiter. It seems only the Venkatakrishnan limiter provides improved convergence. Other limiters do not appear to work as well as that shown in their original publications. Also the convergence history given by the least-square method appears better than that by the Green-Gauss method in the test
    • …
    corecore