Search CORE

17,796 research outputs found

FooPar: A Functional Object Oriented Parallel Framework in Scala

Author: A Grama
A Grama
E Dekel
F Darema
GL Taboada
K Hwang
M Odersky
MJ Quinn
R Loogen
SJ Thompson
V Kumar
Publication venue
Publication date: 13/06/2013
Field of study

We present FooPar, an extension for highly efficient Parallel Computing in the multi-paradigm programming language Scala. Scala offers concise and clean syntax and integrates functional programming features. Our framework FooPar combines these features with parallel computing techniques. FooPar is designed modular and supports easy access to different communication backends for distributed memory architectures as well as high performance math libraries. In this article we use it to parallelize matrix matrix multiplication and show its scalability by a isoefficiency analysis. In addition, results based on a empirical analysis on two supercomputers are given. We achieve close-to-optimal performance wrt. theoretical peak performance. Based on this result we conclude that FooPar allows to fully access Scala's design features without suffering from performance drops when compared to implementations purely based on C and MPI

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Southern Denmark Research Output

Group Communication Patterns for High Performance Computing in Scala

Author: Hargreaves Felix P.
Merkle Daniel
Schneider-Kamp Peter
Publication venue
Publication date: 01/01/2014
Field of study

We developed a Functional object-oriented Parallel framework (FooPar) for high-level high-performance computing in Scala. Central to this framework are Distributed Memory Parallel Data structures (DPDs), i.e., collections of data distributed in a shared nothing system together with parallel operations on these data. In this paper, we first present FooPar's architecture and the idea of DPDs and group communications. Then, we show how DPDs can be implemented elegantly and efficiently in Scala based on the Traversable/Builder pattern, unifying Functional and Object-Oriented Programming. We prove the correctness and safety of one communication algorithm and show how specification testing (via ScalaCheck) can be used to bridge the gap between proof and implementation. Furthermore, we show that the group communication operations of FooPar outperform those of the MPJ Express open source MPI-bindings for Java, both asymptotically and empirically. FooPar has already been shown to be capable of achieving close-to-optimal performance for dense matrix-matrix multiplication via JNI. In this article, we present results on a parallel implementation of the Floyd-Warshall algorithm in FooPar, achieving more than 94 % efficiency compared to the serial version on a cluster using 100 cores for matrices of dimension 38000 x 38000

arXiv.org e-Print Archive

CiteSeerX

Crossref

An evolutionary computation approach for optimizing connectivity in disaster response scenarios

Author: Alander
Aschenbruck
Aschenbruck
Aschenbruck
Aschenbruck
Asimakopoulou
Bai
Bani-Yassein
Bao
Beraldi
Berg
Boukerche
Camp
D.G. Reina
Dengiz
E. Asimakopoulou
F. Barrero
Fall
Goldberg
Hanzo
Lakshmi
Layuan
Martinez-Torres
Michalewicz
Murray
N. Bessis
Panichpapiboon
Perkins
Reina
Reina
Reina
Royer
S.L. Toral Marín
Toral
Tuna
Tzu-Chiang
Vecchio
Xu
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Crossref

Edge Hill University Research Information Repository

UDORA - University of Derby Online Research Archive

Distributed and parallel sparse convex optimization for radio interferometry with PURIFY

Author: Cai Xiaohao
Christidi Ilektra
d'Avezac Mayeul
Guichard Roland
McEwen Jason D.
Perez-Suarez David
Pratley Luke
Publication venue
Publication date: 11/03/2019
Field of study

Next generation radio interferometric telescopes are entering an era of big data with extremely large data sets. While these telescopes can observe the sky in higher sensitivity and resolution than before, computational challenges in image reconstruction need to be overcome to realize the potential of forthcoming telescopes. New methods in sparse image reconstruction and convex optimization techniques (cf. compressive sensing) have shown to produce higher fidelity reconstructions of simulations and real observations than traditional methods. This article presents distributed and parallel algorithms and implementations to perform sparse image reconstruction, with significant practical considerations that are important for implementing these algorithms for Big Data. We benchmark the algorithms presented, showing that they are considerably faster than their serial equivalents. We then pre-sample gridding kernels to scale the distributed algorithms to larger data sizes, showing application times for 1 Gb to 2.4 Tb data sets over 25 to 100 nodes for up to 50 billion visibilities, and find that the run-times for the distributed algorithms range from 100 milliseconds to 3 minutes per iteration. This work presents an important step in working towards computationally scalable and efficient algorithms and implementations that are needed to image observations of both extended and compact sources from next generation radio interferometers such as the SKA. The algorithms are implemented in the latest versions of the SOPT (https://github.com/astro-informatics/sopt) and PURIFY (https://github.com/astro-informatics/purify) software packages {(Versions 3.1.0)}, which have been released alongside of this article.Comment: 25 pages, 5 figure

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

Highly parallel sparse Cholesky factorization

Author: Gilbert John R.
Schreiber Robert
Publication venue
Publication date
Field of study

Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms

NASA Technical Reports Server