141,828 research outputs found
Run-time support for parallel object-oriented computing: the NIP lazy task creation technique and the NIP object-based software distributed shared memory
PhD ThesisAdvances in hardware technologies combined with decreased costs
have started a trend towards massively parallel architectures that utilise
commodity components. It is thought unreasonable to expect software
developers to manage the high degree of parallelism that is made
available by these architectures. This thesis argues that a new
programming model is essential for the development of parallel
applications and presents a model which embraces the notions of
object-orientation and implicit identification of parallelism. The new
model allows software engineers to concentrate on development issues,
using the object-oriented paradigm, whilst being freed from the burden
of explicitly managing parallel activity.
To support the programming model, the semantics of an execution
model are defined and implemented as part of a run-time support
system for object-oriented parallel applications. Details of the novel
techniques from the run-time system, in the areas of lazy task creation
and object-based, distributed shared memory, are presented.
The tasklet construct for representing potentially parallel
computation is introduced and further developed by this thesis. Three
caching techniques that take advantage of memory access patterns
exhibited in object-oriented applications are explored. Finally, the
performance characteristics of the introduced run-time techniques are
analysed through a number of benchmark applications
Exploring ‘Instancing’ and Its Applications in 3D Programming
‘Instancing’ is a technique widely used in 3D programming to draw multiple copies of an object with a single drawing command. The conventional approach of drawing several copies of an object is to send a separate drawing command for each copy. However, instancing facilitates drawing several copies of an object with repeating patterns substantially quicker than conventional approaches. With instancing, an object’s geometry data is stored once for drawing several copies of it. Without instancing, information is stored per copy requiring an additional amount of memory for each additional copy, so, an object’s geometry is read afresh each time it is drawn. Instancing makes better memory usage and faster execution of the program as it knows the geometry of an object before drawing multiple copies of it. Millions, or even billions of objects can be drawn in the blink of an eye because a Graphics Processing Unit (GPU) accelerates computation with its massively parallel architecture. Instancing is popular in film and animation for rendering forests, flower fields, crowd simulations, and more. This research explores different applications of instancing. While drawing multiple copies of the same object, different patterns or characteristics are also incorporated. For example, a tulip festival is generated from a single tulip plant by instantiating an assortment of colors in different rows. A floral park with varied patterns of plants, soldiers in a battlefield in different movements, and more have been explored
Handling Parallelism in a Concurrency Model
Programming models for concurrency are optimized for dealing with
nondeterminism, for example to handle asynchronously arriving events. To shield
the developer from data race errors effectively, such models may prevent shared
access to data altogether. However, this restriction also makes them unsuitable
for applications that require data parallelism. We present a library-based
approach for permitting parallel access to arrays while preserving the safety
guarantees of the original model. When applied to SCOOP, an object-oriented
concurrency model, the approach exhibits a negligible performance overhead
compared to ordinary threaded implementations of two parallel benchmark
programs.Comment: MUSEPAT 201
Group Communication Patterns for High Performance Computing in Scala
We developed a Functional object-oriented Parallel framework (FooPar) for
high-level high-performance computing in Scala. Central to this framework are
Distributed Memory Parallel Data structures (DPDs), i.e., collections of data
distributed in a shared nothing system together with parallel operations on
these data. In this paper, we first present FooPar's architecture and the idea
of DPDs and group communications. Then, we show how DPDs can be implemented
elegantly and efficiently in Scala based on the Traversable/Builder pattern,
unifying Functional and Object-Oriented Programming. We prove the correctness
and safety of one communication algorithm and show how specification testing
(via ScalaCheck) can be used to bridge the gap between proof and
implementation. Furthermore, we show that the group communication operations of
FooPar outperform those of the MPJ Express open source MPI-bindings for Java,
both asymptotically and empirically. FooPar has already been shown to be
capable of achieving close-to-optimal performance for dense matrix-matrix
multiplication via JNI. In this article, we present results on a parallel
implementation of the Floyd-Warshall algorithm in FooPar, achieving more than
94 % efficiency compared to the serial version on a cluster using 100 cores for
matrices of dimension 38000 x 38000
On Designing Multicore-aware Simulators for Biological Systems
The stochastic simulation of biological systems is an increasingly popular
technique in bioinformatics. It often is an enlightening technique, which may
however result in being computational expensive. We discuss the main
opportunities to speed it up on multi-core platforms, which pose new challenges
for parallelisation techniques. These opportunities are developed in two
general families of solutions involving both the single simulation and a bulk
of independent simulations (either replicas of derived from parameter sweep).
Proposed solutions are tested on the parallelisation of the CWC simulator
(Calculus of Wrapped Compartments) that is carried out according to proposed
solutions by way of the FastFlow programming framework making possible fast
development and efficient execution on multi-cores.Comment: 19 pages + cover pag
High-Level Programming for Medical Imaging on Multi-GPU Systems Using the SkelCL Library
Application development for modern high-performance systems with Graphics Processing Units (GPUs) relies on low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy and error-prone programs.
In this paper, we present SkelCL – a high-level programming model for systems with multiple GPUs and its implementation as a library on top of OpenCL. SkelCL provides three main enhancements to the OpenCL standard: 1) computations are conveniently expressed using parallel patterns (skeletons); 2) memory management is simplified using parallel container data types; 3) an automatic data (re)distribution mechanism allows for scalability when using multi-GPU systems.
We use a real-world example from the field of medical imaging to motivate the design of our programming model and we show how application development using SkelCL is simplified without sacrificing performance: we were able to reduce the code size in our imaging example application by 50% while introducing only a moderate runtime overhead of less than 5%
- …