138,016 research outputs found
Structured arrows : a type-based framework for structured parallelism
This thesis deals with the important problem of parallelising sequential code. Despite the importance of parallelism in modern computing, writing parallel software still relies on many low-level and often error-prone approaches. These low-level approaches can lead to serious execution problems such as deadlocks and race conditions. Due to the non-deterministic behaviour of most parallel programs, testing parallel software can be both tedious and time-consuming. A way of providing guarantees of correctness for parallel programs would therefore provide significant benefit. Moreover, even if we ignore the problem of correctness, achieving good speedups is not straightforward, since this generally involves rewriting a program to consider a (possibly large) number of alternative parallelisations.
This thesis argues that new languages and frameworks are needed. These language and frameworks must not only support high-level parallel programming constructs, but must also provide predictable cost models for these parallel constructs. Moreover, they need to be built around solid, well-understood theories that ensure that: (a) changes to the source code will not change the functional behaviour of a program, and (b) the speedup obtained by doing the necessary changes is predictable. Algorithmic skeletons are parametric implementations of common patterns of parallelism that provide good abstractions for creating new high-level languages, and also support frameworks for parallel computing that satisfy the correctness and predictability requirements that we require.
This thesis presents a new type-based framework, based on the connection between structured parallelism and structured patterns of recursion, that provides parallel structures as type abstractions that can be used to statically parallelise a program. Specifically, this thesis exploits hylomorphisms as a single, unifying construct to represent the functional behaviour of parallel programs, and to perform correct code rewritings between alternative parallel implementations, represented as algorithmic skeletons. This thesis also defines a mechanism for deriving cost models for parallel constructs from a queue-based operational semantics. In this way, we can provide strong static guarantees about the correctness of a parallel program, while simultaneously achieving predictable speedups.âThis work was supported by the University of St Andrews (School of Computer
Science); by the EU FP7 grant âParaPhrase:Parallel Patterns Adaptive Heterogeneous Multicore Systemsâ (n. 288570); by the EU H2020
grant âRePhrase: Refactoring Parallel Heterogeneous Resource-Aware Applications
- a Software Engineering Approachâ (ICT-644235), by COST
Action IC1202 (TACLe), supported by COST (European Cooperation Science and Technology); and by EPSRC grant âDiscovery: Pattern Discovery
and Program Shaping for Manycore Systemsâ (EP/P020631/1)â -- Acknowledgement
A Comparison of Big Data Frameworks on a Layered Dataflow Model
In the world of Big Data analytics, there is a series of tools aiming at
simplifying programming applications to be executed on clusters. Although each
tool claims to provide better programming, data and execution models, for which
only informal (and often confusing) semantics is generally provided, all share
a common underlying model, namely, the Dataflow model. The Dataflow model we
propose shows how various tools share the same expressiveness at different
levels of abstraction. The contribution of this work is twofold: first, we show
that the proposed model is (at least) as general as existing batch and
streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to
understand high-level data-processing applications written in such frameworks.
Second, we provide a layered model that can represent tools and applications
following the Dataflow paradigm and we show how the analyzed tools fit in each
level.Comment: 19 pages, 6 figures, 2 tables, In Proc. of the 9th Intl Symposium on
High-Level Parallel Programming and Applications (HLPP), July 4-5 2016,
Muenster, German
Static Analysis of Deterministic Negotiations
Negotiation diagrams are a model of concurrent computation akin to workflow
Petri nets. Deterministic negotiation diagrams, equivalent to the much studied
and used free-choice workflow Petri nets, are surprisingly amenable to
verification. Soundness (a property close to deadlock-freedom) can be decided
in PTIME. Further, other fundamental questions like computing summaries or the
expected cost, can also be solved in PTIME for sound deterministic negotiation
diagrams, while they are PSPACE-complete in the general case.
In this paper we generalize and explain these results. We extend the
classical "meet-over-all-paths" (MOP) formulation of static analysis problems
to our concurrent setting, and introduce Mazurkiewicz-invariant analysis
problems, which encompass the questions above and new ones. We show that any
Mazurkiewicz-invariant analysis problem can be solved in PTIME for sound
deterministic negotiations whenever it is in PTIME for sequential
flow-graphs---even though the flow-graph of a deterministic negotiation diagram
can be exponentially larger than the diagram itself. This gives a common
explanation to the low-complexity of all the analysis questions studied so far.
Finally, we show that classical gen/kill analyses are also an instance of our
framework, and obtain a PTIME algorithm for detecting anti-patterns in
free-choice workflow Petri nets.
Our result is based on a novel decomposition theorem, of independent
interest, showing that sound deterministic negotiation diagrams can be
hierarchically decomposed into (possibly overlapping) smaller sound diagrams.Comment: To appear in the Proceedings of LICS 2017, IEEE Computer Societ
CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-core Architectures
The use of graphics processing units (GPUs) in
high-performance parallel computing continues to become more
prevalent, often as part of a heterogeneous system. For years,
CUDA has been the de facto programming environment for
nearly all general-purpose GPU (GPGPU) applications. In spite
of this, the framework is available only on NVIDIA GPUs,
traditionally requiring reimplementation in other frameworks
in order to utilize additional multi- or many-core devices.
On the other hand, OpenCL provides an open and vendorneutral
programming environment and runtime system. With
implementations available for CPUs, GPUs, and other types of
accelerators, OpenCL therefore holds the promise of a âwrite
once, run anywhereâ ecosystem for heterogeneous computing.
Given the many similarities between CUDA and OpenCL,
manually porting a CUDA application to OpenCL is typically
straightforward, albeit tedious and error-prone. In response
to this issue, we created CU2CL, an automated CUDA-to-
OpenCL source-to-source translator that possesses a novel design
and clever reuse of the Clang compiler framework. Currently,
the CU2CL translator covers the primary constructs found in
CUDA runtime API, and we have successfully translated many
applications from the CUDA SDK and Rodinia benchmark suite.
The performance of our automatically translated applications via
CU2CL is on par with their manually ported countparts
- âŠ