28 research outputs found
Fast GPU-Based Seismogram Simulation From Microseismic Events in Marine Environments Using Heterogeneous Velocity Models
A novel approach is presented for fast generation of synthetic seismograms
due to microseismic events, using heterogeneous marine velocity models. The
partial differential equations (PDEs) for the 3D elastic wave equation have
been numerically solved using the Fourier domain pseudo-spectral method which
is parallelizable on the graphics processing unit (GPU) cards, thus making it
faster compared to traditional CPU based computing platforms. Due to
computationally expensive forward simulation of large geological models,
several combinations of individual synthetic seismic traces are used for
specified microseismic event locations, in order to simulate the effect of
realistic microseismic activity patterns in the subsurface. We here explore the
patterns generated by few hundreds of microseismic events with different source
mechanisms using various combinations, both in event amplitudes and origin
times, using the simulated pressure and three component particle velocity
fields via 1D, 2D and 3D seismic visualizations.Shell Projects and Technolog
Multilayered abstractions for partial differential equations
How do we build maintainable, robust, and performance-portable scientific
applications? This thesis argues that the answer to this software engineering
question in the context of the finite element method is through the use of
layers of Domain-Specific Languages (DSLs) to separate the various concerns in
the engineering of such codes.
Performance-portable software achieves high performance on multiple diverse
hardware platforms without source code changes. We demonstrate that finite
element solvers written in a low-level language are not performance-portable,
and therefore code must be specialised to the target architecture by a code
generation framework. A prototype compiler for finite element variational forms
that generates CUDA code is presented, and is used to explore how good
performance on many-core platforms in automatically-generated finite element
applications can be achieved. The differing code generation requirements for
multi- and many-core platforms motivates the design of an additional
abstraction, called PyOP2, that enables unstructured mesh applications to be
performance-portable.
We present a runtime code generation framework comprised of the Unified Form
Language (UFL), the FEniCS Form Compiler, and PyOP2. This toolchain separates
the succinct expression of a numerical method from the selection and
generation of efficient code for local assembly. This is further decoupled from
the selection of data formats and algorithms for efficient parallel
implementation on a specific target architecture.
We establish the successful separation of these concerns by demonstrating the
performance-portability of code generated from a single high-level source code
written in UFL across sequential C, CUDA, MPI and OpenMP targets. The
performance of the generated code exceeds the performance of comparable
alternative toolchains on multi-core architectures.Open Acces
Automated cache optimisations of stencil computations for partial differential equations
This thesis focuses on numerical methods that solve partial differential equations.
Our focal point is the finite difference method, which solves partial
differential equations by approximating derivatives with explicit finite differences.
These partial differential equation solvers consist of stencil computations on structured grids.
Stencils for computing real-world practical applications are patterns often
characterised by many memory accesses and non-trivial arithmetic expressions
that lead to high computational costs compared to simple stencils used in much prior
proof-of-concept work.
In addition, the loop nests to express stencils on structured grids may often be complicated.
This work is highly motivated by a specific domain of stencil computations where one of the challenges is non-aligned to the structured grid ("off-the-grid") operations.
These operations update neighbouring grid points through scatter and gather operations via non-affine memory accesses, such as {A[B[i]]}.
In addition to this challenge, these practical stencils often include many computation fields (need to store multiple grid copies), complex data dependencies and imperfect loop nests.
In this work, we aim to increase the performance of stencil kernel execution.
We study automated cache-memory-dependent optimisations for stencil computations.
This work consists of two core parts with their respective contributions.The first part of our work tries to reduce the data movement in stencil computations of practical interest.
Data movement is a dominant factor affecting the performance of high-performance computing applications.
It has long been a target of optimisations due to its impact on execution time and energy consumption.
This thesis tries to relieve this cost by applying temporal blocking optimisations, also known as time-tiling, to stencil computations.
Temporal blocking is a well-known technique to enhance data reuse in stencil computations.
However, it is rarely used in practical applications but rather in theoretical examples to prove its efficacy.
Applying temporal blocking to scientific simulations is more complex.
More specifically, in this work, we focus on the application context of seismic and medical imaging.
In this area, we often encounter scatter and gather operations due to signal sources and receivers at arbitrary locations in the computational domain.
These operations make the application of temporal blocking challenging.
We present an approach to overcome this challenge and successfully apply temporal blocking.In the second part of our work, we extend the first part as an automated approach targeting a wide range of simulations modelled with partial differential equations.
Since temporal blocking is error-prone, tedious to apply by hand and highly complex to assimilate theoretically and practically, we are motivated to automate its application and automatically generate code that benefits from it.
We discuss algorithmic approaches and present a generalised compiler pipeline to automate the application of temporal blocking.
These passes are written in the Devito compiler. They are used to accelerate the computation of stencil kernels in areas such as seismic and medical imaging, computational fluid dynamics and machine learning.
\href{www.devitoproject.org}{Devito} is a Python package to implement optimised stencil computation (e.g., finite differences, image processing, machine learning) from high-level symbolic problem definitions.
Devito builds on \href{www.sympy.org}{SymPy} and employs automated code generation and just-in-time compilation to execute optimised computational kernels on several computer platforms, including CPUs, GPUs, and clusters thereof.
We show how we automate temporal blocking code generation without user intervention and often achieve better time-to-solution.
We enable domain-specific optimisation through compiler passes and offer temporal blocking gains from a high-level symbolic abstraction.
These automated optimisations benefit various computational kernels for solving real-world application problems.Open Acces
Productive and efficient computational science through domain-specific abstractions
In an ideal world, scientific applications are computationally efficient,
maintainable and composable and allow scientists to work very productively. We
argue that these goals are achievable for a specific application field by
choosing suitable domain-specific abstractions that encapsulate domain
knowledge with a high degree of expressiveness.
This thesis demonstrates the design and composition of
domain-specific abstractions by abstracting the stages a scientist goes
through in formulating a problem of numerically solving a partial differential
equation. Domain knowledge is used to transform this problem into a different,
lower level representation and decompose it into parts which can be solved
using existing tools. A system for the portable solution of partial
differential equations using the finite element method on unstructured meshes
is formulated, in which contributions from different scientific communities
are composed to solve sophisticated problems.
The concrete implementations of these domain-specific abstractions are
Firedrake and PyOP2. Firedrake allows scientists to describe variational
forms and discretisations for linear and non-linear finite element problems
symbolically, in a notation very close to their mathematical models. PyOP2
abstracts the performance-portable parallel execution of local computations
over the mesh on a range of hardware architectures, targeting multi-core CPUs,
GPUs and accelerators. Thereby, a separation of concerns is achieved, in which
Firedrake encapsulates domain knowledge about the finite element method
separately from its efficient parallel execution in PyOP2, which in turn is
completely agnostic to the higher abstraction layer.
As a consequence of the composability of those abstractions, optimised
implementations for different hardware architectures can be
automatically generated without any changes to a single high-level
source. Performance matches or exceeds what is realistically attainable by
hand-written code. Firedrake and PyOP2 are combined to form a tool chain that
is demonstrated to be competitive with or faster than available alternatives
on a wide range of different finite element problems.Open Acces
Code generation for 3D partial differential equation models from a high-level functional intermediate language
Partial Differential Equation (PDE) modelling is an important tool in scientific domains for bridging
theory with reality; however, they can be complex to program and even more difficult to abstract. The
evolving parallel computing landscape is also making it increasingly difficult to write and maintain codes
(such as PDE models) which retain performance across different parallel platforms. Computational
scientists should be able to focus on their science instead of also having to become high performance
computing experts in order to take advantage of faster parallel hardware. Current methods targeting this
problem either concentrate on very niche applications, are too simplistic for real world problems or are
too low-level to be easily programmable. Domain Specific Languages (DSLs) are a popular approach,
but they have two opposing goals: improving programmability, while also providing high performance.
This thesis presents a solution for developing performance portable 3D PDE models, using room
acoustics simulations as a case study, by raising the abstraction level in the existing hardware-agnostic,
intermediary language LIFT. This functional language and compiler is designed for DSLs to compile into
and provides a separation of concerns for developing parallel applications. This separation enables DSL
writers to focus on developing high-level abstractions providing productivity to the user, while LIFT turns
the intermediary parallel representation these abstractions compile down to into hardware-optimised
code. A suite of composable, algorithmic primitives enables LIFT to reuse functionality across domains
and an exploratory search space provides a way to find the best optimisations for a given platform.
As this thesis shows, room acoustic simulations are expressible in LIFT with only a few small
changes to the framework. These expressions are able to achieve comparable or better performance
to original hand-written benchmarks. Furthermore, such expressions enable room acoustics models to
run across multiple platforms and easily swap in optimisations. Being able to test out what optimisations
give the best performance for a given platform — without rewriting or retuning — allows computational
scientists to focus on their own work.
Optimisations previously inaccessible in LIFT are developed that target 3D stencils generally, including 3D PDE models. In particular, 2.5D Tiling and compiler passes to inline private arrays and structs
are added to the LIFT ecosystem, giving high performance to various 3D stencil codes. The 2.5D Tiling
optimisation is coded functionally for the first time in LIFT and is selected automatically by additional
rewrite rules. These rewrite rules, such as the one for 2.5D Tiling, are explored in a search space to find
the best set of optimisations for an application on a given platform.
Building on previous work, LIFT is extended to enable complex boundary conditions and room
shapes for room acoustics models. This is the first intermediate representation in a high-level code generator to do so. Additionally, it is also the first high-level framework to support frequency-dependent
boundary handling for room acoustics simulations. Combined, these contributions show that high-level
abstractions for 3D PDE models are possible, enabling computational scientists to optimise and parallelise their codes more easily across different parallel platforms
Communication-avoiding optimizations for large-scale unstructured-mesh applications with OP2
This thesis presents data movement-reducing and communication-avoiding optimizations and their practicable implementation for large-scale unstructured-mesh numerical simulation applications. Utilizing the high-level abstractions of the OP2 domain-specific library, we reason about techniques for reduced communications across a consecutive sequence of loops – a loop-chain. The optimizations are explored for shared-memory systems where multiple processors share a common memory space and distributed-memory systems that comprise separate memory spaces across multiple nodes. We elucidate the challenges when executing unstructured-mesh applications on large-scale high-performance systems that are specifically related to data sharing and movement, synchronization, and communication among processes. A key feature of the work is to mitigate these problems for real-world, large-scale applications and computing kernels, bringing together proven and effective techniques within a DSL framework.
On shared-memory systems, We explore cache-blocking tiling, a key technique for exploiting data locality, in unstructured-mesh applications by integrating the SLOPE library, a cache-blocking tiling library, with OP2. For distributed-memory systems, we analyze the trade-off between increased redundant computation in place of data movement and design a new communication-avoiding back-end for OP2 that applies these techniques automatically to any OP2 application targeting CPUs and GPUs.
The communication-avoiding optimizations are applied to two non-trivial applications, including the OP2 version of Rolls Royce’s production CFD application, Hydra, on problem sizes representative of real-world workloads. Results demonstrate how, for select configurations, the new communication-avoiding back-end provides between 30 – 65% runtime reductions for the loop-chains in these applications on both an HPE Cray EX system and an NVIDIA V100 GPU cluster. We model and examine the determinants and characteristics of a given unstructured-mesh loop-chain that lead to performance benefits with communication-avoidance techniques, providing insights into the general feasibility and profitability of using the optimizations for this class of applications
Mining a Small Medical Data Set by Integrating the Decision Tree and t-test
[[abstract]]Although several researchers have used statistical methods to prove that aspiration followed by the injection of 95% ethanol left in situ (retention) is an effective treatment for ovarian endometriomas, very few discuss the different conditions that could generate different recovery rates for the patients. Therefore, this study adopts the statistical method and decision tree techniques together to analyze the postoperative status of ovarian endometriosis patients under different conditions. Since our collected data set is small, containing only 212 records, we use all of these data as the training data. Therefore, instead of using a resultant tree to generate rules directly, we use the value of each node as a cut point to generate all possible rules from the tree first. Then, using t-test, we verify the rules to discover some useful description rules after all possible rules from the tree have been generated. Experimental results show that our approach can find some new interesting knowledge about recurrent ovarian endometriomas under different conditions.[[journaltype]]國外[[incitationindex]]EI[[booktype]]紙本[[countrycodes]]FI
Monte Carlo Method with Heuristic Adjustment for Irregularly Shaped Food Product Volume Measurement
Volume measurement plays an important role in the production and processing of food products. Various methods have been
proposed to measure the volume of food products with irregular shapes based on 3D reconstruction. However, 3D reconstruction
comes with a high-priced computational cost. Furthermore, some of the volume measurement methods based on 3D reconstruction
have a low accuracy. Another method for measuring volume of objects uses Monte Carlo method. Monte Carlo method performs
volume measurements using random points. Monte Carlo method only requires information regarding whether random points
fall inside or outside an object and does not require a 3D reconstruction. This paper proposes volume measurement using a
computer vision system for irregularly shaped food products without 3D reconstruction based on Monte Carlo method with
heuristic adjustment. Five images of food product were captured using five cameras and processed to produce binary images.
Monte Carlo integration with heuristic adjustment was performed to measure the volume based on the information extracted from
binary images. The experimental results show that the proposed method provided high accuracy and precision compared to the
water displacement method. In addition, the proposed method is more accurate and faster than the space carving method