9 research outputs found

    A scalable, efficient scheme for evaluation of stencil computations over unstructured meshes

    Get PDF
    pre-printStencil computations are a common class of operations that appear in many computational scientific and engineering applications. Stencil computations often benefit from compile-time analysis, exploiting data-locality, and parallelism. Post-processing of discontinuous Galerkin (dG) simulation solutions with B-spline kernels is an example of a numerical method which requires evaluating computationally intensive stencil operations over a mesh. Previous work on stencil computations has focused on structured meshes, while giving little attention to unstructured meshes. Performing stencil operations over an unstructured mesh requires sampling of heterogeneous elements which often leads to inefficient memory access patterns and limits data locality/reuse. In this paper, we present an efficient method for performing stencil computations over unstructured meshes which increases data-locality and cache efficiency, and a scalable approach for stencil tiling and concurrent execution. We provide experimental results in the context of post-processing of dG solutions that demonstrate the effectiveness of our approach

    Dynamic earthquake rupture modelled with an unstructured 3-D spectral element method applied to the 2011 M9 Tohoku earthquake

    Get PDF
    An important goal of computational seismology is to simulate dynamic earthquake rupture and strong ground motion in realistic models that include crustal heterogeneities and complex fault geometries. To accomplish this, we incorporate dynamic rupture modelling capabilities in a spectral element solver on unstructured meshes, the 3-D open source code SPECFEM3D, and employ state-of-the-art software for the generation of unstructured meshes of hexahedral elements. These tools provide high flexibility in representing fault systems with complex geometries, including faults with branches and non-planar faults. The domain size is extended with progressive mesh coarsening to maintain an accurate resolution of the static field. Our implementation of dynamic rupture does not affect the parallel scalability of the code. We verify our implementation by comparing our results to those of two finite element codes on benchmark problems including branched faults. Finally, we present a preliminary dynamic rupture model of the 2011 M_w 9.0 Tohoku earthquake including a non-planar plate interface with heterogeneous frictional properties and initial stresses. Our simulation reproduces qualitatively the depth-dependent frequency content of the source and the large slip close to the trench observed for this earthquake

    Fast GPU-Based Seismogram Simulation From Microseismic Events in Marine Environments Using Heterogeneous Velocity Models

    Get PDF
    A novel approach is presented for fast generation of synthetic seismograms due to microseismic events, using heterogeneous marine velocity models. The partial differential equations (PDEs) for the 3D elastic wave equation have been numerically solved using the Fourier domain pseudo-spectral method which is parallelizable on the graphics processing unit (GPU) cards, thus making it faster compared to traditional CPU based computing platforms. Due to computationally expensive forward simulation of large geological models, several combinations of individual synthetic seismic traces are used for specified microseismic event locations, in order to simulate the effect of realistic microseismic activity patterns in the subsurface. We here explore the patterns generated by few hundreds of microseismic events with different source mechanisms using various combinations, both in event amplitudes and origin times, using the simulated pressure and three component particle velocity fields via 1D, 2D and 3D seismic visualizations.Shell Projects and Technolog

    AxiSEM: broadband 3-D seismic wavefields in axisymmetric media

    Get PDF
    We present a methodology to compute 3-D global seismic wavefields for realistic earthquake sources in visco-elastic anisotropic media, covering applications across the observable seismic frequency band with moderate computational resources. This is accommodated by mandating axisymmetric background models that allow for a multipole expansion such that only a 2-D computational domain is needed, whereas the azimuthal third dimension is computed analytically on the fly. This dimensional collapse opens doors for storing space–time wavefields on disk that can be used to compute Fréchet sensitivity kernels for waveform tomography. We use the corresponding publicly available AxiSEM (<a href="www.axisem.info"target="_blank">www.axisem.info</a>) open-source spectral-element code, demonstrate its excellent scalability on supercomputers, a diverse range of applications ranging from normal modes to small-scale lowermost mantle structures, tomographic models, and comparison with observed data, and discuss further avenues to pursue with this methodology

    Forward and adjoint simulations of seismic wave propagation on emerging large-scale GPU architectures

    No full text
    Computational seismology is an area of wide sociological and economic impact, ranging from earthquake risk assessment to subsurface imaging and oil and gas exploration. At the core of these simulations is the modeling of wave propagation in a complex medium. Here we report on the extension of the high-order finite-element seismic wave simulation package SPECFEM3D to support the largest scale hybrid and homogeneous supercomputers. Starting from an existing highly tuned MPI code, we migrated to a CUDA version. In order to be of immediate impact to the science mission of computational seismologists, we had to port the entire production package, rather than just individual kernels. One of the challenges in parallelizing finite element codes is the potential for race conditions during the assembly phase. We therefore investigated different methods such as mesh coloring or atomic updates on the GPU. In order to achieve strong scaling, we needed to ensure good overlap of data motion at all levels, including internode and host-accelerator transfers. Finally we carefully tuned the GPU implementation. The new MPI/CUDA solver exhibits excellent scalability and achieves speedup on a node-to-node basis over the carefully tuned equivalent multi-core MPI solver. To demonstrate the performance of both the forward and adjoint functionality, we present two case studies run on the Cray XE6 CPU and Cray XK6 GPU architectures up to 896 nodes: (1) focusing on most commonly used forward simulations, we simulate seismic wave propagation generated by earthquakes in Turkey, and (2) testing the most complex seismic inversion type of the package, we use ambient seismic noise to image 3-D crust and mantle structure beneath western Europe. © 2012 IEEE

    PDE Solvers for Hybrid CPU-GPU Architectures

    Get PDF
    Many problems of scientific and industrial interest are investigated through numerically solving partial differential equations (PDEs). For some of these problems, the scope of the investigation is limited by the costs of computational resources. A new approach to reducing these costs is the use of coprocessors, such as graphics processing units (GPUs) and Many Integrated Core (MIC) cards, which can execute floating point operations at a higher rate than a central processing unit (CPU) of the same cost. This is achieved through the use of a large number of processors in a single device, each with very limited dedicated memory per thread. Codes for a number of continuum methods, such as boundary element methods (BEM), finite element methods (FEM) and finite difference methods (FDM) have already been implemented on coprocessor architectures. These methods were designed before the adoption of coprocessor architectures, so implementing them efficiently with reduced thread-level memory can be challenging. There are other methods that do operate efficiently with limited thread-level memory, such as Monte Carlo methods (MCM) and lattice Boltzmann methods (LBM) for kinetic formulations of PDEs, but they are not competitive on CPUs and generally have poorer convergence than the continuum methods. In this work, we introduce a class of methods in which the parallelism of kinetic formulations on GPUs is combined with the better convergence of continuum methods on CPUs. We first extend an existing Feynman-Kac formulation for determining the principal eigenpair of an elliptic operator to create a version that can retrieve arbitrarily many eigenpairs. This new method is implemented for multiple GPUs, and combined with a standard deflation preconditioner on multiple CPUs to create a hybrid concurrent method with superior convergence to that of the deflation preconditioner alone. The hybrid method exhibits good parallelism, with an efficiency of 80% on a problem with 300 million unknowns, run on a configuration of 324 CPU cores and 54 GPUs.Doctor of Philosoph

    Doctor of Philosophy

    Get PDF
    dissertationMemory access irregularities are a major bottleneck for bandwidth limited problems on Graphics Processing Unit (GPU) architectures. GPU memory systems are designed to allow consecutive memory accesses to be coalesced into a single memory access. Noncontiguous accesses within a parallel group of threads working in lock step may cause serialized memory transfers. Irregular algorithms may have data-dependent control flow and memory access, which requires runtime information to be evaluated. Compile time methods for evaluating parallelism, such as static dependence graphs, are not capable of evaluating irregular algorithms. The goals of this dissertation are to study irregularities within the context of unstructured mesh and sparse matrix problems, analyze the impact of vectorization widths on irregularities, and present data-centric methods that improve control flow and memory access irregularity within those contexts. Reordering associative operations has often been exploited for performance gains in parallel algorithms. This dissertation presents a method for associative reordering of stencil computations over unstructured meshes that increases data reuse through caching. This novel parallelization scheme offers considerable speedups over standard methods. Vectorization widths can have significant impact on performance in vectorized computations. Although the hardware vector width is generally fixed, the logical vector width used within a computation can range from one up to the width of the computation. Significant performance differences can occur due to thread scheduling and resource limitations. This dissertation analyzes the impact of vectorization widths on dense numerical computations such as 3D dG postprocessing. It is difficult to efficiently perform dynamic updates on traditional sparse matrix formats. Explicitly controlling memory segmentation allows for in-place dynamic updates in sparse matrices. Dynamically updating the matrix without rebuilding or sorting greatly improves processing time and overall throughput. This dissertation presents a new sparse matrix format, dynamic compressed sparse row (DCSR), which allows for dynamic streaming updates to a sparse matrix. A new method for parallel sparse matrix-matrix multiplication (SpMM) that uses dynamic updates is also presented

    Flexible Modellerweiterung und Optimierung von Erdbebensimulationen

    Get PDF
    Simulations of realistic earthquake scenarios require scalable software and extensive supercomputing resources. With increasing fidelity in simulations, advanced rheological and source models need to be incorporated. I introduce a domain-specific language in order to handle the model flexibility in combination with the high efficiency requirements. The contributions in this thesis enabled the to date largest and longest dynamic rupture simulation of the 2004 Sumatra earthquake.Realistische Erdbebensimulationen benötigen skalierbare Software und beträchtliche Rechenressourcen. Mit zunehmender Genauigkeit der Simulationen müssen fortschrittliche rheologische und Quellmodelle integriert werden. Ich führe eine domänenspezifische Sprache ein, um die Modelflexibilität in Kombination mit den hohen Effizienzanforderungen zu beherrschen. Die Beiträge in dieser Arbeit haben die bisher größte und längste dynamische Bruchsimulation des Sumatra-Erdbebens von 2004 ermöglicht

    GPUクラスタにおけるアプリケーション高速化に関する研究

    Get PDF
    筑波大学 (University of Tsukuba)201
    corecore