5,065 research outputs found
A Domain Specific Approach to High Performance Heterogeneous Computing
Users of heterogeneous computing systems face two problems: firstly, in
understanding the trade-off relationships between the observable
characteristics of their applications, such as latency and quality of the
result, and secondly, how to exploit knowledge of these characteristics to
allocate work to distributed computing platforms efficiently. A domain specific
approach addresses both of these problems. By considering a subset of
operations or functions, models of the observable characteristics or domain
metrics may be formulated in advance, and populated at run-time for task
instances. These metric models can then be used to express the allocation of
work as a constrained integer program, which can be solved using heuristics,
machine learning or Mixed Integer Linear Programming (MILP) frameworks. These
claims are illustrated using the example domain of derivatives pricing in
computational finance, with the domain metrics of workload latency or makespan
and pricing accuracy. For a large, varied workload of 128 Black-Scholes and
Heston model-based option pricing tasks, running upon a diverse array of 16
Multicore CPUs, GPUs and FPGAs platforms, predictions made by models of both
the makespan and accuracy are generally within 10% of the run-time performance.
When these models are used as inputs to machine learning and MILP-based
workload allocation approaches, a latency improvement of up to 24 and 270 times
over the heuristic approach is seen.Comment: 14 pages, preprint draft, minor revisio
Recommended from our members
Optimisations for quadrature representations of finite element tensors through automated code generation
We examine aspects of the computation of finite element matrices and vectors
which are made possible by automated code generation. Given a variational form
in a syntax which resembles standard mathematical notation, the low-level
computer code for building finite element tensors, typically matrices, vectors
and scalars, can be generated automatically via a form compiler. In particular,
the generation of code for computing finite element matrices using a quadrature
approach is addressed. For quadrature representations, a number of optimisation
strategies which are made possible by automated code generation are presented.
The relative performance of two different automatically generated
representations of finite element matrices is examined, with a particular
emphasis on complicated variational forms. It is shown that approaches which
perform best for simple forms are not tractable for more complicated problems
in terms of run time performance, the time required to generate the code or the
size of the generated code. The approach and optimisations elaborated here are
effective for a range of variational forms
A metadata-enhanced framework for high performance visual effects
This thesis is devoted to reducing the interactive latency of image processing computations in
visual effects. Film and television graphic artists depend upon low-latency feedback to receive
a visual response to changes in effect parameters. We tackle latency with a domain-specific optimising
compiler which leverages high-level program metadata to guide key computational and
memory hierarchy optimisations. This metadata encodes static and dynamic information about
data dependence and patterns of memory access in the algorithms constituting a visual effect –
features that are typically difficult to extract through program analysis – and presents it to the
compiler in an explicit form. By using domain-specific information as a substitute for program
analysis, our compiler is able to target a set of complex source-level optimisations that a vendor
compiler does not attempt, before passing the optimised source to the vendor compiler for
lower-level optimisation.
Three key metadata-supported optimisations are presented. The first is an adaptation of
space and schedule optimisation – based upon well-known compositions of the loop fusion and
array contraction transformations – to the dynamic working sets and schedules of a runtimeparameterised
visual effect. This adaptation sidesteps the costly solution of runtime code generation
by specialising static parameters in an offline process and exploiting dynamic metadata to
adapt the schedule and contracted working sets at runtime to user-tunable parameters. The second
optimisation comprises a set of transformations to generate SIMD ISA-augmented source code.
Our approach differs from autovectorisation by using static metadata to identify parallelism, in
place of data dependence analysis, and runtime metadata to tune the data layout to user-tunable
parameters for optimal aligned memory access. The third optimisation comprises a related set
of transformations to generate code for SIMT architectures, such as GPUs. Static dependence
metadata is exploited to guide large-scale parallelisation for tens of thousands of in-flight threads.
Optimal use of the alignment-sensitive, explicitly managed memory hierarchy is achieved by identifying
inter-thread and intra-core data sharing opportunities in memory access metadata.
A detailed performance analysis of these optimisations is presented for two industrially developed
visual effects. In our evaluation we demonstrate up to 8.1x speed-ups on Intel and AMD
multicore CPUs and up to 6.6x speed-ups on NVIDIA GPUs over our best hand-written implementations
of these two effects. Programmability is enhanced by automating the generation of
SIMD and SIMT implementations from a single programmer-managed scalar representation
Model Coupling between the Weather Research and Forecasting Model and the DPRI Large Eddy Simulator for Urban Flows on GPU-accelerated Multicore Systems
In this report we present a novel approach to model coupling for
shared-memory multicore systems hosting OpenCL-compliant accelerators, which we
call The Glasgow Model Coupling Framework (GMCF). We discuss the implementation
of a prototype of GMCF and its application to coupling the Weather Research and
Forecasting Model and an OpenCL-accelerated version of the Large Eddy Simulator
for Urban Flows (LES) developed at DPRI.
The first stage of this work concerned the OpenCL port of the LES. The
methodology used for the OpenCL port is a combination of automated analysis and
code generation and rule-based manual parallelization. For the evaluation, the
non-OpenCL LES code was compiled using gfortran, fort and pgfortran}, in each
case with auto-parallelization and auto-vectorization. The OpenCL-accelerated
version of the LES achieves a 7 times speed-up on a NVIDIA GeForce GTX 480
GPGPU, compared to the fastest possible compilation of the original code
running on a 12-core Intel Xeon E5-2640.
In the second stage of this work, we built the Glasgow Model Coupling
Framework and successfully used it to couple an OpenMP-parallelized WRF
instance with an OpenCL LES instance which runs the LES code on the GPGPI. The
system requires only very minimal changes to the original code. The report
discusses the rationale, aims, approach and implementation details of this
work.Comment: This work was conducted during a research visit at the Disaster
Prevention Research Institute of Kyoto University, supported by an EPSRC
Overseas Travel Grant, EP/L026201/
Statistical performance analysis with dynamic workload using S-NET
Volkmar Wieser, Philip K. F. Hölzenspies, Michael Roßbory, and Raimund Kirner, 'Statistical performance analysis with dynamic workload using S-NET'. Paper presented at the Workshop on Feedback-Directed Compiler Optimization for Multi-Core Architectures. Paris, France 23-25 January 2012In this paper the ADVANCE approach for engineering con- current software systems with well-balanced hardware ef- ficiency is adressed using the stream processing language S-Net. To obtain the cost information in the concurrent system the metrics throughput, latency, and jitter are evalu- ated by analyzing generated synthetical data as well as using an industrial related application in the future. As fall-out an Eclipse plugin for S-Net has been developed to provide sup- port for syntax highlighting, content assistance, hover help, and more, for easier and faster development. The presented results of the current work are on the one hand an indicator for the status quo of the ADVANCE vision and on the other hand used to improve the applied statistical analysis tech- niques within ADVANCE. Like the ADVANCE project, this work is still under development, but further improvements and speedups are expected in the near future
Model Exploration Using OpenMOLE - a workflow engine for large scale distributed design of experiments and parameter tuning
OpenMOLE is a scientific workflow engine with a strong emphasis on workload
distribution. Workflows are designed using a high level Domain Specific
Language (DSL) built on top of Scala. It exposes natural parallelism constructs
to easily delegate the workload resulting from a workflow to a wide range of
distributed computing environments. In this work, we briefly expose the strong
assets of OpenMOLE and demonstrate its efficiency at exploring the parameter
set of an agent simulation model. We perform a multi-objective optimisation on
this model using computationally expensive Genetic Algorithms (GA). OpenMOLE
hides the complexity of designing such an experiment thanks to its DSL, and
transparently distributes the optimisation process. The example shows how an
initialisation of the GA with a population of 200,000 individuals can be
evaluated in one hour on the European Grid Infrastructure.Comment: IEEE High Performance Computing and Simulation conference 2015, Jun
2015, Amsterdam, Netherland
- …