Search CORE

141,828 research outputs found

Run-time support for parallel object-oriented computing: the NIP lazy task creation technique and the NIP object-based software distributed shared memory

Author: Parastatidis Savas
Publication venue: Newcastle University
Publication date: 01/01/2000
Field of study

PhD ThesisAdvances in hardware technologies combined with decreased costs have started a trend towards massively parallel architectures that utilise commodity components. It is thought unreasonable to expect software developers to manage the high degree of parallelism that is made available by these architectures. This thesis argues that a new programming model is essential for the development of parallel applications and presents a model which embraces the notions of object-orientation and implicit identification of parallelism. The new model allows software engineers to concentrate on development issues, using the object-oriented paradigm, whilst being freed from the burden of explicitly managing parallel activity. To support the programming model, the semantics of an execution model are defined and implemented as part of a run-time support system for object-oriented parallel applications. Details of the novel techniques from the run-time system, in the areas of lazy task creation and object-based, distributed shared memory, are presented. The tasklet construct for representing potentially parallel computation is introduced and further developed by this thesis. Three caching techniques that take advantage of memory access patterns exhibited in object-oriented applications are explored. Finally, the performance characteristics of the introduced run-time techniques are analysed through a number of benchmark applications

Newcastle University eTheses

Exploring ‘Instancing’ and Its Applications in 3D Programming

Author: Chalich Zane S
Yasmin Shamima
Publication venue: EWU Digital Commons
Publication date: 18/05/2020
Field of study

‘Instancing’ is a technique widely used in 3D programming to draw multiple copies of an object with a single drawing command. The conventional approach of drawing several copies of an object is to send a separate drawing command for each copy. However, instancing facilitates drawing several copies of an object with repeating patterns substantially quicker than conventional approaches. With instancing, an object’s geometry data is stored once for drawing several copies of it. Without instancing, information is stored per copy requiring an additional amount of memory for each additional copy, so, an object’s geometry is read afresh each time it is drawn. Instancing makes better memory usage and faster execution of the program as it knows the geometry of an object before drawing multiple copies of it. Millions, or even billions of objects can be drawn in the blink of an eye because a Graphics Processing Unit (GPU) accelerates computation with its massively parallel architecture. Instancing is popular in film and animation for rendering forests, flower fields, crowd simulations, and more. This research explores different applications of instancing. While drawing multiple copies of the same object, different patterns or characteristics are also incorporated. For example, a tulip festival is generated from a single tulip plant by instantiating an assortment of colors in different rows. A floral park with varied patterns of plants, soldiers in a battlefield in different movements, and more have been explored

Eastern Washington University: EWU Digital Commons

Handling Parallelism in a Concurrency Model

Author: Meyer Bertrand
Nanz Sebastian
Schill Mischael
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Programming models for concurrency are optimized for dealing with nondeterminism, for example to handle asynchronously arriving events. To shield the developer from data race errors effectively, such models may prevent shared access to data altogether. However, this restriction also makes them unsuitable for applications that require data parallelism. We present a library-based approach for permitting parallel access to arrays while preserving the safety guarantees of the original model. When applied to SCOOP, an object-oriented concurrency model, the approach exhibits a negligible performance overhead compared to ordinary threaded implementations of two parallel benchmark programs.Comment: MUSEPAT 201

arXiv.org e-Print Archive

Crossref

Group Communication Patterns for High Performance Computing in Scala

Author: Hargreaves Felix P.
Merkle Daniel
Schneider-Kamp Peter
Publication venue
Publication date: 01/01/2014
Field of study

We developed a Functional object-oriented Parallel framework (FooPar) for high-level high-performance computing in Scala. Central to this framework are Distributed Memory Parallel Data structures (DPDs), i.e., collections of data distributed in a shared nothing system together with parallel operations on these data. In this paper, we first present FooPar's architecture and the idea of DPDs and group communications. Then, we show how DPDs can be implemented elegantly and efficiently in Scala based on the Traversable/Builder pattern, unifying Functional and Object-Oriented Programming. We prove the correctness and safety of one communication algorithm and show how specification testing (via ScalaCheck) can be used to bridge the gap between proof and implementation. Furthermore, we show that the group communication operations of FooPar outperform those of the MPJ Express open source MPI-bindings for Java, both asymptotically and empirically. FooPar has already been shown to be capable of achieving close-to-optimal performance for dense matrix-matrix multiplication via JNI. In this article, we present results on a parallel implementation of the Floyd-Warshall algorithm in FooPar, achieving more than 94 % efficiency compared to the serial version on a cluster using 100 cores for matrices of dimension 38000 x 38000

arXiv.org e-Print Archive

CiteSeerX

Crossref

Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming

Author: Cole Murray
Publication venue
Publication date: 01/01/2004
Field of study

Edinburgh Research Explorer

On Designing Multicore-aware Simulators for Biological Systems

Author: Aldinucci Marco
Coppo Mario
Damiani Ferruccio
Drocco Maurizio
Torquati Massimo
Troina Angelo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/10/2010
Field of study

The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It often is an enlightening technique, which may however result in being computational expensive. We discuss the main opportunities to speed it up on multi-core platforms, which pose new challenges for parallelisation techniques. These opportunities are developed in two general families of solutions involving both the single simulation and a bulk of independent simulations (either replicas of derived from parameter sweep). Proposed solutions are tested on the parallelisation of the CWC simulator (Calculus of Wrapped Compartments) that is carried out according to proposed solutions by way of the FastFlow programming framework making possible fast development and efficient execution on multi-cores.Comment: 19 pages + cover pag

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Institutional Research Information System University of Turin

High-Level Programming for Medical Imaging on Multi-GPU Systems Using the SkelCL Library

Author: Gorlatch Sergei
Steuwer Michel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Application development for modern high-performance systems with Graphics Processing Units (GPUs) relies on low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy and error-prone programs. In this paper, we present SkelCL – a high-level programming model for systems with multiple GPUs and its implementation as a library on top of OpenCL. SkelCL provides three main enhancements to the OpenCL standard: 1) computations are conveniently expressed using parallel patterns (skeletons); 2) memory management is simplified using parallel container data types; 3) an automatic data (re)distribution mechanism allows for scalability when using multi-GPU systems. We use a real-world example from the field of medical imaging to motivate the design of our programming model and we show how application development using SkelCL is simplified without sacrificing performance: we were able to reduce the code size in our imaging example application by 50% while introducing only a moderate runtime overhead of less than 5%

CiteSeerX

Elsevier - Publisher Connector

Enlighten