Search CORE

6,023 research outputs found

Skeletons for Distributed Topological Computation

Author: Breitinger S.
Carr H.
Chattopadhyay A.
Edelsbrunner H.
Geng Z.
Kale L.
Morozov D.
O’Sullivan B.
Schroeder W.
Telea A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Parallel implementation of topological algorithms is highly desirable, but the challenges, from reconstructing algorithms around independent threads through to runtime load balancing, have proven to be formidable. This problem, made all the more acute by the diversity of hardware platforms, has led to new kinds of implementation platform for computational science, with sophisticated runtime systems managing and coordinating large threadcounts to keep processing elements heavily utilized. While simpler and more portable than direct management of threads, these approaches still entangle program logic with resource management. Similar kinds of highly parallel runtime system have also been developed for functional languages. Here, however, language support for higher-order functions allows a cleaner separation between the algorithm and `skeletons' that express generic patterns of parallel computation. We report results on using this technique to develop a distributed version of the Joint Contour Net, a generalization of the Contour Tree to multifields. We present performance comparisons against a recent Haskell implementation using shared-memory parallelism, and initial work on a skeleton for distributed memory implementation that utilizes an innovative strategy to reduce inter-process communication overheads

Crossref

White Rose Research Online

Towards an Adaptive Skeleton Framework for Performance Portability

Author: Maier Patrick
Morton John Magnus
Trinder Phil
Publication venue: School of Computing Science, University of Glasgow
Publication date: 21/12/2015
Field of study

The proliferation of widely available, but very different, parallel architectures makes the ability to deliver good parallel performance on a range of architectures, or performance portability, highly desirable. Irregularly-parallel problems, where the number and size of tasks is unpredictable, are particularly challenging and require dynamic coordination. The paper outlines a novel approach to delivering portable parallel performance for irregularly parallel programs. The approach combines declarative parallelism with JIT technology, dynamic scheduling, and dynamic transformation. We present the design of an adaptive skeleton library, with a task graph implementation, JIT trace costing, and adaptive transformations. We outline the architecture of the protoype adaptive skeleton execution framework in Pycket, describing tasks, serialisation, and the current scheduler.We report a preliminary evaluation of the prototype framework using 4 micro-benchmarks and a small case study on two NUMA servers (24 and 96 cores) and a small cluster (17 hosts, 272 cores). Key results include Pycket delivering good sequential performance e.g. almost as fast as C for some benchmarks; good absolute speedups on all architectures (up to 120 on 128 cores for sumEuler); and that the adaptive transformations do improve performance

Enlighten

Geometry-Oblivious FMM for Compressing Dense SPD Matrices

Author: Biros George
Levitt James
Reiz Severin
Yu Chenhan D.
Publication venue
Publication date: 01/07/2017
Field of study

We present GOFMM (geometry-oblivious FMM), a novel method that creates a hierarchical low-rank approximation, "compression," of an arbitrary dense symmetric positive definite (SPD) matrix. For many applications, GOFMM enables an approximate matrix-vector multiplication in

N \log N

or even

N

time, where

N

is the matrix size. Compression requires

N \log N

storage and work. In general, our scheme belongs to the family of hierarchical matrix approximation methods. In particular, it generalizes the fast multipole method (FMM) to a purely algebraic setting by only requiring the ability to sample matrix entries. Neither geometric information (i.e., point coordinates) nor knowledge of how the matrix entries have been generated is required, thus the term "geometry-oblivious." Also, we introduce a shared-memory parallel scheme for hierarchical matrix computations that reduces synchronization barriers. We present results on the Intel Knights Landing and Haswell architectures, and on the NVIDIA Pascal architecture for a variety of matrices.Comment: 13 pages, accepted by SC'1

arXiv.org e-Print Archive

Crossref

Porting Decision Tree Algorithms to Multicore using FastFlow

Author: A.C. Sodan
I. Park
J.E. Gehrke
J.R. Quinlan
K. Asanovic
M. Aldinucci
M. Cole
M. Coppola
M. Joshi
M. Vanneschi
M. Zaki
M.K. Sreenivas
R. Jin
R.D. Blumofe
S. Ruggieri
S. Ruggieri
T. Lim
W. Thies
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an approach for easy-yet-efficient porting of an implementation of the C4.5 algorithm on multicores. The parallel porting requires minimal changes to the original sequential code, and it is able to exploit up to 7X speedup on an Intel dual-quad core machine.Comment: 18 pages + cove

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

UnipiEprints

MAGDA: A Mobile Agent based Grid Architecture

Author: B. DI MARTINO
MAZZOCCA NICOLA
R. AVERSA
S. VENTICINQUE
Publication venue
Publication date: 01/01/2006
Field of study

Mobile agents mean both a technology and a programming paradigm. They allow for a flexible approach which can alleviate a number of issues present in distributed and Grid-based systems, by means of features such as migration, cloning, messaging and other provided mechanisms. In this paper we describe an architecture (MAGDA – Mobile Agent based Grid Architecture) we have designed and we are currently developing to support programming and execution of mobile agent based application upon Grid systems

Archivio della ricerca - Università degli studi di Napoli Federico II

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

High-Level Programming for Medical Imaging on Multi-GPU Systems Using the SkelCL Library

Author: Gorlatch Sergei
Steuwer Michel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Application development for modern high-performance systems with Graphics Processing Units (GPUs) relies on low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy and error-prone programs. In this paper, we present SkelCL – a high-level programming model for systems with multiple GPUs and its implementation as a library on top of OpenCL. SkelCL provides three main enhancements to the OpenCL standard: 1) computations are conveniently expressed using parallel patterns (skeletons); 2) memory management is simplified using parallel container data types; 3) an automatic data (re)distribution mechanism allows for scalability when using multi-GPU systems. We use a real-world example from the field of medical imaging to motivate the design of our programming model and we show how application development using SkelCL is simplified without sacrificing performance: we were able to reduce the code size in our imaging example application by 50% while introducing only a moderate runtime overhead of less than 5%

CiteSeerX

Elsevier - Publisher Connector

Enlighten

MaSiF: Machine learning guided auto-tuning of parallel skeletons

Author: Cole Murray
Collins Alexander
Fensch Christian
Leather Hugh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2013
Field of study

Heriot Watt Pure

FullSWOF_Paral: Comparison of two parallelization strategies (MPI and SKELGIS) on a software designed for hydrology applications

Author: Cordier Stéphane
Coullon Hélène
Delestre Olivier
Laguerre Christian
Le Minh Hoang
Pierre Daniel
Sadaka Georges
Publication venue: 'EDP Sciences'
Publication date: 18/07/2013
Field of study

In this paper, we perform a comparison of two approaches for the parallelization of an existing, free software, FullSWOF 2D (http://www. univ-orleans.fr/mapmo/soft/FullSWOF/ that solves shallow water equations for applications in hydrology) based on a domain decomposition strategy. The first approach is based on the classical MPI library while the second approach uses Parallel Algorithmic Skeletons and more precisely a library named SkelGIS (Skeletons for Geographical Information Systems). The first results presented in this article show that the two approaches are similar in terms of performance and scalability. The two implementation strategies are however very different and we discuss the advantages of each one.Comment: 27 page

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

HAL Descartes