1,684 research outputs found
Towards High-Level Programming of Multi-GPU Systems Using the SkelCL Library
Application programming for GPUs (Graphics Processing Units) is complex and error-prone, because the popular approaches — CUDA and OpenCL — are intrinsically low-level and offer no special support for systems consisting of multiple GPUs. The SkelCL library presented in this paper
is built on top of the OpenCL standard and offers preimplemented recurring computation and communication patterns (skeletons) which greatly simplify programming for multiGPU systems. The library also provides an abstract vector data type and a high-level data (re)distribution mechanism to shield the programmer from the low-level data transfers between the system’s main memory and multiple GPUs. In this
paper, we focus on the specific support in SkelCL for systems with multiple GPUs and use a real-world application study from the area of medical imaging to demonstrate the reduced programming effort and competitive performance of SkelCL as compared to OpenCL and CUDA. Besides, we illustrate how SkelCL adapts to large-scale, distributed heterogeneous
systems in order to simplify their programming
Towards an Adaptive Skeleton Framework for Performance Portability
The proliferation of widely available, but very different, parallel architectures
makes the ability to deliver good parallel performance
on a range of architectures, or performance portability, highly desirable.
Irregularly-parallel problems, where the number and size
of tasks is unpredictable, are particularly challenging and require
dynamic coordination.
The paper outlines a novel approach to delivering portable parallel
performance for irregularly parallel programs. The approach
combines declarative parallelism with JIT technology, dynamic
scheduling, and dynamic transformation.
We present the design of an adaptive skeleton library, with a task
graph implementation, JIT trace costing, and adaptive transformations.
We outline the architecture of the protoype adaptive skeleton
execution framework in Pycket, describing tasks, serialisation,
and the current scheduler.We report a preliminary evaluation of the
prototype framework using 4 micro-benchmarks and a small case
study on two NUMA servers (24 and 96 cores) and a small cluster
(17 hosts, 272 cores). Key results include Pycket delivering good
sequential performance e.g. almost as fast as C for some benchmarks;
good absolute speedups on all architectures (up to 120 on
128 cores for sumEuler); and that the adaptive transformations do
improve performance
MAGDA: A Mobile Agent based Grid Architecture
Mobile agents mean both a technology
and a programming paradigm. They allow for a
flexible approach which can alleviate a number
of issues present in distributed and Grid-based
systems, by means of features such as migration,
cloning, messaging and other provided mechanisms.
In this paper we describe an architecture
(MAGDA – Mobile Agent based Grid Architecture)
we have designed and we are currently
developing to support programming and execution
of mobile agent based application upon Grid
systems
Adaptive structured parallelism for computational grids
Algorithmic skeletons abstract commonly-used patterns of parallel computation, communication, and interaction. They provide top-down design composition and control inheritance throughout the whole structure. Parallel programs are expressed by interweaving parameterised skeletons analogously to the way sequential structured programs are constructed.
This design paradigm, known as structured parallelism, provides a high-level parallel programming method which allows the abstract description of programs and fosters portability. That is to say, structured parallelism requires the description of the algorithm rather than its implementation, providing a clear and consistent meaning across platforms while their associated structure depends on the particular implementation. By decoupling the structure from the meaning of a parallel program, it benefits entirely from any performance improvements in the systems infrastructure
A Comparison of Big Data Frameworks on a Layered Dataflow Model
In the world of Big Data analytics, there is a series of tools aiming at
simplifying programming applications to be executed on clusters. Although each
tool claims to provide better programming, data and execution models, for which
only informal (and often confusing) semantics is generally provided, all share
a common underlying model, namely, the Dataflow model. The Dataflow model we
propose shows how various tools share the same expressiveness at different
levels of abstraction. The contribution of this work is twofold: first, we show
that the proposed model is (at least) as general as existing batch and
streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to
understand high-level data-processing applications written in such frameworks.
Second, we provide a layered model that can represent tools and applications
following the Dataflow paradigm and we show how the analyzed tools fit in each
level.Comment: 19 pages, 6 figures, 2 tables, In Proc. of the 9th Intl Symposium on
High-Level Parallel Programming and Applications (HLPP), July 4-5 2016,
Muenster, German
Toward a Formal Semantics for Autonomic Components
Autonomic management can improve the QoS provided by parallel/ distributed
applications. Within the CoreGRID Component Model, the autonomic management is
tailored to the automatic - monitoring-driven - alteration of the component
assembly and, therefore, is defined as the effect of (distributed) management
code. This work yields a semantics based on hypergraph rewriting suitable to
model the dynamic evolution and non-functional aspects of Service Oriented
Architectures and component-based autonomic applications. In this regard, our
main goal is to provide a formal description of adaptation operations that are
typically only informally specified. We contend that our approach makes easier
to raise the level of abstraction of management code in autonomic and adaptive
applications.Comment: 11 pages + cover pag
High Performance Composition Operators in Component Models
International audienceScientific numerical applications are always expecting more computing and storage capabilities to compute at finer grain and/or to integrate more phenomena in their computations. Even though, they are getting more complex to develop. However, the continual growth of computing and storage capabilities is achieved with an increase complexity of infrastructures. Thus, there is an important challenge to define programming abstractions able to deal with software and hardware complexity. An interesting approach is represented by software component models. This chapter first analyzes how high performance interactions are only partially supported by specialized component models. Then, it introduces HLCM, a component model that aims at efficiently supporting all kinds of static compositions
The Integration of Task and Data Parallel Skeletons
We describe a skeletal parallel programming library which integrates task and data parallel constructs within an API for C++. Traditional skeletal requirements for higher orderness and polymorphism are achieved through exploitation of operator overloading and templates, while the underlying parallelism is provided by MPI. We present a case study describing two algorithms for the travelling salesman problem
- …