Search CORE

14 research outputs found

Gamra: Simple meshing for complex earthquakes

Author: Barbot Sylvain
Landry Walter
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

The static offsets caused by earthquakes are well described by elastostatic models with a discontinuity in the displacement along the fault. A traditional approach to model this discontinuity is to align the numerical mesh with the fault and solve the equations using finite elements. However, this distorted mesh can be difficult to generate and update. We present a new numerical method, inspired by the Immersed Interface Method (Leveque and Li, 1994), for solving the elastostatic equations with embedded discontinuities. This method has been carefully designed so that it can be used on parallel machines on an adapted finite difference grid. We have implemented this method in Gamra, a new code for earth modeling. We demonstrate the correctness of the method with analytic tests, and we demonstrate its practical performance by solving a realistic earthquake model to extremely high precision

Elsevier - Publisher Connector

Caltech Authors

DR-NTU (Digital Repository of NTU)

Improving Scientist Productivity, Architecture Portability, and Performance in ParFlow

Author: Burke Michael
Publication venue: 'IUScholarWorks'
Publication date: 01/05/2020
Field of study

Legacy scientific applications represent significant investments by universities, engineers, and researchers and contain valuable implementations of key scientific computations. Over time hardware architectures have changed. Adapting existing code to new architectures is time consuming, expensive, and increases code complexity. The increase in complexity negatively affects the scientific impact of the applications. There is an immediate need to reduce complexity. We propose using abstractions to manage and reduce code complexity, improving scientific impact of applications. This thesis presents a set of abstractions targeting boundary conditions in iterative solvers. Many scientific applications represent physical phenomena as a set of partial differential equations (PDEs). PDEs are structured around steady state and boundary condition equations, starting from initial conditions. The proposed abstractions separate architecture specific implementation details from the primary computation. We use ParFlow to demonstrate the effectiveness of the abstractions. ParFlow is a hydrologic and geoscience application that simulates surface and subsurface water flow. The abstractions have enabled ParFlow developers to successfully add new boundary conditions for the first time in 15 years, and have enabled an experimental OpenMP version of ParFlow that is transparent to computational scientists. This is achieved without requiring expensive rewrites of key computations or major codebase changes; improving developer productivity, enabling hardware portability, and allowing transparent performance optimizations

Boise State University - ScholarWorks

COMPUTATIONAL SCIENCE CENTER

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Argonne's Laboratory computing center - 2007 annual report.

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Generating and auto-tuning parallel stencil codes

Author: Christen Matthias-Michael
Publication venue
Publication date: 01/01/2011
Field of study

In this thesis, we present a software framework, Patus, which generates high performance stencil codes for different types of hardware platforms, including current multicore CPU and graphics processing unit architectures. The ultimate goals of the framework are productivity, portability (of both the code and performance), and achieving a high performance on the target platform. A stencil computation updates every grid point in a structured grid based on the values of its neighboring points. This class of computations occurs frequently in scientific and general purpose computing (e.g., in partial differential equation solvers or in image processing), justifying the focus on this kind of computation. The proposed key ingredients to achieve the goals of productivity, portability, and performance are domain specific languages (DSLs) and the auto-tuning methodology. The Patus stencil specification DSL allows the programmer to express a stencil computation in a concise way independently of hardware architecture-specific details. Thus, it increases the programmer productivity by disburdening her or him of low level programming model issues and of manually applying hardware platform-specific code optimization techniques. The use of domain specific languages also implies code reusability: once implemented, the same stencil specification can be reused on different hardware platforms, i.e., the specification code is portable across hardware architectures. Constructing the language to be geared towards a special purpose makes it amenable to more aggressive optimizations and therefore to potentially higher performance. Auto-tuning provides performance and performance portability by automated adaptation of implementation-specific parameters to the characteristics of the hardware on which the code will run. By automating the process of parameter tuning — which essentially amounts to solving an integer programming problem in which the objective function is the number representing the code's performance as a function of the parameter configuration, — the system can also be used more productively than if the programmer had to fine-tune the code manually. We show performance results for a variety of stencils, for which Patus was used to generate the corresponding implementations. The selection includes stencils taken from two real-world applications: a simulation of the temperature within the human body during hyperthermia cancer treatment and a seismic application. These examples demonstrate the framework's flexibility and ability to produce high performance code

edoc

Recommended from our members

COMPUTATIONAL SCIENCE CENTER

Author: DAVENPORT J.
Publication venue: Brookhaven National Laboratory
Publication date: 01/11/2005
Field of study

The Brookhaven Computational Science Center brings together researchers in biology, chemistry, physics, and medicine with applied mathematicians and computer scientists to exploit the remarkable opportunities for scientific discovery which have been enabled by modern computers. These opportunities are especially great in computational biology and nanoscience, but extend throughout science and technology and include, for example, nuclear and high energy physics, astrophysics, materials and chemical science, sustainable energy, environment, and homeland security. To achieve our goals we have established a close alliance with applied mathematicians and computer scientists at Stony Brook and Columbia Universities

UNT Digital Library

COMPUTATIONAL SCIENCE CENTER

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Automatic Parallelization of Tiled Stencil Loop Nests on GPUs

Author: Di Peng
Publication venue: UNSW, Sydney
Publication date: 01/01/2013
Field of study

This thesis attempts to design and implement a compiler framework based on the polyhedral model. The compiler automatically parallelizes loop nests; especially stencil kernels, into efficient GPU code by loop tiling transformations which the polyhedral model describes. To enhance parallel performance, we introduce three practically efficient techniques to process different types of loop nests. The experimental results of our compiler framework have demonstrated that these advanced techniques can outperform previous approaches. Firstly, we aim to find efficient tiling transformations without violating data dependences. How to select a tile's shape and size is an open issue that is performance-critical and influenced by GPU's hardware constraints. We propose an approach to determine the tile shapes out of consideration for improving two-level parallelism of GPUs. The new approach finds appropriate tiling hyperplanes by embedding parallelism-enhancing constraints into the polyhedral model to maximize intra-tile, i.e., intra-SM parallelism. This improves the load balance among the streaming processors (SPs), which execute a wavefront of loop iterations within a tile. We eliminate parallelism-hindering false dependences to optimize inter-tile, i.e., inter-SM parallelism. This improves the load balance among the streaming multiprocessors (SMs), which execute a wavefront of tiles. Furthermore, to avoid combinatorial explosion of tile size's configurations, we present a model-driven approach to automating tile size selection that is performance-critical for loop tiling transformations, especially for DOACROSS loop nests. Our tile size selection model accurately estimates the execution times of tiled loop nests running on GPUs. The selected tile sizes lead to the performance results that are close to the best observed for a range of problem sizes tested. Finally, to address the difficulty and low-performance of parallelizing widely used SOR stencil loop nests, we present a new tiled parallel SOR method, called MLSOR, which admits more efficient data-parallel SIMD execution on GPUs. Unlike the previous two approaches that are dependence-preserving, the basic idea is to algorithmically restructure a stencil kernel based on a non-dependence-preserving parallelization scheme to avoid pipelining for higher parallelism. The new approach can be implemented in compilers through a pattern matching pass to optimize SOR-like DOACROSS loop nests on GPUs

UNSWorks