6,464 research outputs found
Autonomic management of multiple non-functional concerns in behavioural skeletons
We introduce and address the problem of concurrent autonomic management of
different non-functional concerns in parallel applications build as a
hierarchical composition of behavioural skeletons. We first define the problems
arising when multiple concerns are dealt with by independent managers, then we
propose a methodology supporting coordinated management, and finally we discuss
how autonomic management of multiple concerns may be implemented in a typical
use case. The paper concludes with an outline of the challenges involved in
realizing the proposed methodology on distributed target architectures such as
clusters and grids. Being based on the behavioural skeleton concept proposed in
the CoreGRID GCM, it is anticipated that the methodology will be readily
integrated into the current reference implementation of GCM based on Java
ProActive and running on top of major grid middleware systems.Comment: 20 pages + cover pag
Type-driven automated program transformations and cost modelling for optimising streaming programs on FPGAs
In this paper we present a novel approach to program optimisation based on compiler-based type-driven program transformations and a fast and accurate cost/performance model for the target architecture. We target streaming programs for the problem domain of scientific computing, such as numerical weather prediction. We present our theoretical framework for type-driven program transformation, our target high-level language and intermediate representation languages and the cost model and demonstrate the effectiveness of our approach by comparison with a commercial toolchain
A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Applications
Heterogeneous High-Performance Computing
(HPC) platforms present a significant programming challenge,
especially because the key users of HPC resources are scientists,
not parallel programmers. We contend that compiler technology
has to evolve to automatically create the best program variant
by transforming a given original program. We have developed a
novel methodology based on type transformations for generating
correct-by-construction design variants, and an associated
light-weight cost model for evaluating these variants for
implementation on FPGAs. In this paper we present a key
enabler of our approach, the cost model. We discuss how we
are able to quickly derive accurate estimates of performance
and resource-utilization from the designās representation in our
intermediate language. We show results confirming the accuracy
of our cost model by testing it on three different scientific
kernels. We conclude with a case-study that compares a solution
generated by our framework with one from a conventional
high-level synthesis tool, showing better performance and
power-efficiency using our cost model based approach
Toward optimised skeletons for heterogeneous parallel architecture with performance cost model
High performance architectures are increasingly heterogeneous with shared and
distributed memory components, and accelerators like GPUs. Programming such
architectures is complicated and performance portability is a major issue as the
architectures evolve. This thesis explores the potential for algorithmic skeletons
integrating a dynamically parametrised static cost model, to deliver portable
performance for mostly regular data parallel programs on heterogeneous archi-
tectures.
The rst contribution of this thesis is to address the challenges of program-
ming heterogeneous architectures by providing two skeleton-based programming
libraries: i.e. HWSkel for heterogeneous multicore clusters and GPU-HWSkel
that enables GPUs to be exploited as general purpose multi-processor devices.
Both libraries provide heterogeneous data parallel algorithmic skeletons including
hMap, hMapAll, hReduce, hMapReduce, and hMapReduceAll.
The second contribution is the development of cost models for workload dis-
tribution. First, we construct an architectural cost model (CM1) to optimise
overall processing time for HWSkel heterogeneous skeletons on a heterogeneous
system composed of networks of arbitrary numbers of nodes, each with an ar-
bitrary number of cores sharing arbitrary amounts of memory. The cost model
characterises the components of the architecture by the number of cores, clock
speed, and crucially the size of the L2 cache. Second, we extend the HWSkel cost
model (CM1) to account for GPU performance. The extended cost model (CM2)
is used in the GPU-HWSkel library to automatically nd a good distribution
for both a single heterogeneous multicore/GPU node, and clusters of heteroge-
neous multicore/GPU nodes. Experiments are carried out on three heterogeneous
multicore clusters, four heterogeneous multicore/GPU clusters, and three single
heterogeneous multicore/GPU nodes. The results of experimental evaluations for
four data parallel benchmarks, i.e. sumEuler, Image matching, Fibonacci, and
Matrix Multiplication, show that our combined heterogeneous skeletons and cost
models can make good use of resources in heterogeneous systems. Moreover using
cores together with a GPU in the same host can deliver good performance either
on a single node or on multiple node architectures
- ā¦