605 research outputs found

    Smolyak's algorithm: A powerful black box for the acceleration of scientific computations

    Full text link
    We provide a general discussion of Smolyak's algorithm for the acceleration of scientific computations. The algorithm first appeared in Smolyak's work on multidimensional integration and interpolation. Since then, it has been generalized in multiple directions and has been associated with the keywords: sparse grids, hyperbolic cross approximation, combination technique, and multilevel methods. Variants of Smolyak's algorithm have been employed in the computation of high-dimensional integrals in finance, chemistry, and physics, in the numerical solution of partial and stochastic differential equations, and in uncertainty quantification. Motivated by this broad and ever-increasing range of applications, we describe a general framework that summarizes fundamental results and assumptions in a concise application-independent manner

    Jump at the onset of saltation

    Full text link
    We reveal a discontinuous transition in the saturated flux for aeolian saltation by simulating explicitly particle motion in turbulent flow. The discontinuity is followed by a coexistence interval with two metastable solutions. The modification of the wind profile due to momentum exchange exhibits a second maximum at high shear strength. The saturated flux depends on the strength of the wind as qs=q0+A(u∗−ut)(u∗2+ut2)q_s=q_0+A(u_*-u_t)(u_*^2+u_t^2)

    Role of Reaction Intermediate Diffusion on the Performance of Platinum Electrodes in Solid Acid Fuel Cells

    Get PDF
    Understanding the reaction pathways for the hydrogen oxidation reaction (HOR) and the oxygen reduction reaction (ORR) is the key to design electrodes for solid acid fuel cells (SAFCs). In general, electrochemical reactions of a fuel cell are considered to occur at the triple-phase boundary where an electrocatalyst, electrolyte and gas phase are in contact. In this concept, diffusion processes of reaction intermediates from the catalyst to the electrolyte remain unconsidered. Here, we unravel the reaction pathways for open-structured Pt electrodes with various electrode thicknesses from 15 to 240 nm. These electrodes are characterized by a triple-phase boundary length and a thickness-depending double-phase boundary area. We reveal that the double-phase boundary is the active catalytic interface for the HOR. For Pt layers ≤ 60 nm, the HOR rate is rate-limited by the processes at the gas/catalyst and/or the catalyst/electrolyte interface while the hydrogen surface diffusion step is fast. For thicker layers (>60 nm), the diffusion of reaction intermediates on the surface of Pt be-comes the limiting process. For the ORR, the predominant reaction pathway is via the triple-phase boundary. The double-phase boundary contributes additionally with a diffusion length of a few nanometers. Based on our results, we propose that the molecular reaction mechanism at the electrode interfaces based upon the triple-phase boundary concept may need to be extended to an effective area near the triple-phase boundary length to include all catalytically relevant diffusion processes of the reaction intermediates. © 2021 by the authors. Licensee MDPI, Basel, Switzerland

    Tips for implementing multigrid methods on domains containing holes

    Full text link
    As part of our development of a computer code to perform 3D `constrained evolution' of Einstein's equations in 3+1 form, we discuss issues regarding the efficient solution of elliptic equations on domains containing holes (i.e., excised regions), via the multigrid method. We consider as a test case the Poisson equation with a nonlinear term added, as a means of illustrating the principles involved, and move to a "real world" 3-dimensional problem which is the solution of the conformally flat Hamiltonian constraint with Dirichlet and Robin boundary conditions. Using our vertex-centered multigrid code, we demonstrate globally second-order-accurate solutions of elliptic equations over domains containing holes, in two and three spatial dimensions. Keys to the success of this method are the choice of the restriction operator near the holes and definition of the location of the inner boundary. In some cases (e.g. two holes in two dimensions), more and more smoothing may be required as the mesh spacing decreases to zero; however for the resolutions currently of interest to many numerical relativists, it is feasible to maintain second order convergence by concentrating smoothing (spatially) where it is needed most. This paper, and our publicly available source code, are intended to serve as semi-pedagogical guides for those who may wish to implement similar schemes.Comment: 18 pages, 11 figures, LaTeX. Added clarifications and references re. scope of paper, mathematical foundations, relevance of work. Accepted for publication in Classical & Quantum Gravit

    GPU-Accelerated Large-Eddy Simulation of Turbulent Channel Flows

    Get PDF
    High performance computing clusters that are augmented with cost and power efficient graphics processing unit (GPU) provide new opportunities to broaden the use of large-eddy simulation technique to study high Reynolds number turbulent flows in fluids engineering applications. In this paper, we extend our earlier work on multi-GPU acceleration of an incompressible Navier-Stokes solver to include a large-eddy simulation (LES) capability. In particular, we implement the Lagrangian dynamic subgrid scale model and compare our results against existing direct numerical simulation (DNS) data of a turbulent channel flow at Reτ = 180. Overall, our LES results match fairly well with the DNS data. Our results show that the Reτ = 180 case can be entirely simulated on a single GPU, whereas higher Reynolds cases can benefit from a GPU cluster

    Carbon uptake and water use in woodlands and forests in southern Australia during an extreme heat wave event in the ‘Angry Summer’ of 2012/2013

    Get PDF
    As a result of climate change warmer temperatures are projected through the 21st century and are already increasing above modelled predictions. Apart from increases in the mean, warm/hot temperature extremes are expected to become more prevalent in the future, along with an increase in the frequency of droughts. It is crucial to better understand the response of terrestrial ecosystems to such temperature extremes for predicting land-surface feedbacks in a changing climate. While land-surface feedbacks in drought conditions and during heat waves have been reported from Europe and the US, direct observations of the impact of such extremes on the carbon and water cycles in Australia have been lacking. During the 2012/2013 summer, Australia experienced a record-breaking heat wave with an exceptional spatial extent that lasted for several weeks. In this study we synthesised eddy-covariance measurements from seven woodlands and one forest site across three biogeographic regions in southern Australia. These observations were combined with model results from BIOS2 (Haverd et al., 2013a, b) to investigate the effect of the summer heat wave on the carbon and water exchange of terrestrial ecosystems which are known for their resilience toward hot and dry conditions. We found that water-limited woodland and energy-limited forest ecosystems responded differently to the heat wave. During the most intense part of the heat wave, the woodlands experienced decreased latent heat flux (23 % of background value), increased Bowen ratio (154 %) and reduced carbon uptake (60 %). At the same time the forest ecosystem showed increased latent heat flux (151 %), reduced Bowen ratio (19 %) and increased carbon uptake (112 %). Higher temperatures caused increased ecosystem respiration at all sites (up to 139 %). During daytime all ecosystems remained carbon sinks, but carbon uptake was reduced in magnitude. The number of hours during which the ecosystem acted as a carbon sink was also reduced, which switched the woodlands into a carbon source on a daily average. Precipitation occurred after the first, most intense part of the heat wave, and the subsequent cooler temperatures in the temperate woodlands led to recovery of the carbon sink, decreased the Bowen ratio (65 %) and hence increased evaporative cooling. Gross primary productivity in the woodlands recovered quickly with precipitation and cooler temperatures but respiration remained high. While the forest proved relatively resilient to this short-term heat extreme the response of the woodlands is the first direct evidence that the carbon sinks of large areas of Australia may not be sustainable in a future climate with an increased number, intensity and duration of heat waves.Eva van Gorsel, Sebastian Wolf, James Cleverly, Peter Isaac, Vanessa Haverd, Cäcilia Ewenz, Stefan Arndt, Jason Beringer, Víctor Resco de Dios, Bradley J. Evans, Anne Griebel, Lindsay B. Hutley, Trevor Keenan, Natascha Kljun, Craig Macfarlane, Wayne S. Meyer, Ian McHugh, Elise Pendall, Suzanne M. Prober and Richard Silberstei

    A Full-Depth Amalgamated Parallel 3D Geometric Multigrid Solver for GPU Clusters

    Get PDF
    Numerical computations of incompressible flow equations with pressure-based algorithms necessitate the solution of an elliptic Poisson equation, for which multigrid methods are known to be very efficient. In our previous work we presented a dual-level (MPI-CUDA) parallel implementation of the Navier-Stokes equations to simulate buoyancy-driven incompressible fluid flows on GPU clusters with simple iterative methods while focusing on the scalability of the overall solver. In the present study we describe the implementation and performance of a multigrid method to solve the pressure Poisson equation within our MPI-CUDA parallel incompressible flow solver. Various design decisions and algorithmic choices for multigrid methods are explored in light of NVIDIA’s recent Fermi architecture. We discuss how unique aspects of an MPI-CUDA implementation for GPU clusters is related to the software choices made to implement the multigrid method. We propose a new coarse grid solution method of embedded multigrid with amalgamation and show that the parallel implementation retains the numerical efficiency of the multigrid method. Performance measurements on the NCSA Lincoln and TACC Longhorn clusters are presented for up to 64 GPUs

    An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters

    Get PDF
    Modern graphics processing units (GPUs) with many-core architectures have emerged as general-purpose parallel computing platforms that can accelerate simulation science applications tremendously. While multi-GPU workstations with several TeraFLOPS of peak computing power are available to accelerate computational problems, larger problems require even more resources. Conventional clusters of central processing units (CPU) are now being augmented with multiple GPUs in each compute-node to tackle large problems. The heterogeneous architecture of a multi-GPU cluster with a deep memory hierarchy creates unique challenges in developing scalable and efficient simulation codes. In this study, we pursue mixed MPI-CUDA implementations and investigate three strategies to probe the efficiency and scalability of incompressible flow computations on the Lincoln Tesla cluster at the National Center for Supercomputing Applications (NCSA). We exploit some of the advanced features of MPI and CUDA programming to overlap both GPU data transfer and MPI communications with computations on the GPU. We sustain approximately 2.4 TeraFLOPS on the 64 nodes of the NCSA Lincoln Tesla cluster using 128 GPUs with a total of 30,720 processing elements. Our results demonstrate that multi-GPU clusters can substantially accelerate computational fluid dynamics (CFD) simulations

    Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism

    Get PDF
    High performance computing using graphics processing units (GPUs) is gaining popularity in the scientific computing field, with many large compute clusters being augmented with multiple GPUs in each node. We investigate hybrid tri-level (MPI-OpenMP-CUDA) parallel implementations to explore the efficiency and scalability of incompressible flow computations on GPU clusters up to 128 GPUS. This work details some of the unique issues faced when merging fine-grain parallelism on the GPU using CUDA with coarse-grain parallelism using OpenMP for intra-node and MPI for inter-node communication. Comparisons between the tri-level MPI-OpenMP-CUDA and dual-level MPI-CUDA implementations are shown using computationally large computational fluid dynamics (CFD) simulations. Our results demonstrate that a tri-level parallel implementation does not provide a significant advantage in performance over the dual-level implementation, however further research is needed to justify our conclusion for a cluster with a high GPU per node density or when using software that can utilize OpenMP’s fine-grain parallelism more effectively
    • …
    corecore