Search CORE

247 research outputs found

Group Communication Patterns for High Performance Computing in Scala

Author: Hargreaves Felix P.
Merkle Daniel
Schneider-Kamp Peter
Publication venue
Publication date: 01/01/2014
Field of study

We developed a Functional object-oriented Parallel framework (FooPar) for high-level high-performance computing in Scala. Central to this framework are Distributed Memory Parallel Data structures (DPDs), i.e., collections of data distributed in a shared nothing system together with parallel operations on these data. In this paper, we first present FooPar's architecture and the idea of DPDs and group communications. Then, we show how DPDs can be implemented elegantly and efficiently in Scala based on the Traversable/Builder pattern, unifying Functional and Object-Oriented Programming. We prove the correctness and safety of one communication algorithm and show how specification testing (via ScalaCheck) can be used to bridge the gap between proof and implementation. Furthermore, we show that the group communication operations of FooPar outperform those of the MPJ Express open source MPI-bindings for Java, both asymptotically and empirically. FooPar has already been shown to be capable of achieving close-to-optimal performance for dense matrix-matrix multiplication via JNI. In this article, we present results on a parallel implementation of the Floyd-Warshall algorithm in FooPar, achieving more than 94 % efficiency compared to the serial version on a cluster using 100 cores for matrices of dimension 38000 x 38000

arXiv.org e-Print Archive

CiteSeerX

Crossref

FooPar: A Functional Object Oriented Parallel Framework in Scala

Author: A Grama
A Grama
E Dekel
F Darema
GL Taboada
K Hwang
M Odersky
MJ Quinn
R Loogen
SJ Thompson
V Kumar
Publication venue
Publication date: 13/06/2013
Field of study

We present FooPar, an extension for highly efficient Parallel Computing in the multi-paradigm programming language Scala. Scala offers concise and clean syntax and integrates functional programming features. Our framework FooPar combines these features with parallel computing techniques. FooPar is designed modular and supports easy access to different communication backends for distributed memory architectures as well as high performance math libraries. In this article we use it to parallelize matrix matrix multiplication and show its scalability by a isoefficiency analysis. In addition, results based on a empirical analysis on two supercomputers are given. We achieve close-to-optimal performance wrt. theoretical peak performance. Based on this result we conclude that FooPar allows to fully access Scala's design features without suffering from performance drops when compared to implementations purely based on C and MPI

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Southern Denmark Research Output

Design of efficient Java message-passing collectives on multi-core clusters

Author: Doallo Ramón
López Taboada Guillermo
Ramos Garea Sabela
Touriño Juan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

This is a post-peer-review, pre-copyedit version of an article published in The Journal of Supercomputing. The final authenticated version is available online at: https://doi.org/10.1007/s11227-010-0464-5[Abstract] This paper presents a scalable and efficient Message-Passing in Java (MPJ) collective communication library for parallel computing on multi-core architectures. The continuous increase in the number of cores per processor underscores the need for scalable parallel solutions. Moreover, current system deployments are usually multi-core clusters, a hybrid shared/distributed memory architecture which increases the complexity of communication protocols. Here, Java represents an attractive choice for the development of communication middleware for these systems, as it provides built-in networking and multithreading support. As the gap between Java and compiled languages performance has been narrowing for the last years, Java is an emerging option for High Performance Computing (HPC). Our MPJ collective communication library increases Java HPC applications performance on multi-core clusters: (1) providing multi-core aware collective primitives; (2) implementing several algorithms (up to six) per collective operation, whereas publicly available MPJ libraries are usually restricted to one algorithm; (3) analyzing the efficiency of thread-based collective operations; (4) selecting at runtime the most efficient algorithm depending on the specific multi-core system architecture, and the number of cores and message length involved in the collective operation; (5) supporting the automatic performance tuning of the collectives depending on the system and communication parameters; and (6) allowing its integration in any MPJ implementation as it is based on MPJ point-to-point primitives. A performance evaluation on an InfiniBand and Gigabit Ethernet multi-core cluster has shown that the implemented collectives significantly outperform the original ones, as well as higher speedups when analyzing the impact of their use on collective communications intensive Java HPC applications. Finally, the presented library has been successfully integrated in MPJ Express (http://mpj-express.org), and will be distributed with the next release.Ministerio de Ciencia e Innovación; TIN2010-16735Ministerio de Educación; FPU; AP2009-2112Xunta de Galicia; PGIDIT06PXIB105228P

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Teaching Parallel Programming Using Java

Author: Akhtar Aleem
Carpenter Bryan
Javed Ansar
Shafi Aamir
Publication venue
Publication date: 01/01/2014
Field of study

This paper presents an overview of the "Applied Parallel Computing" course taught to final year Software Engineering undergraduate students in Spring 2014 at NUST, Pakistan. The main objective of the course was to introduce practical parallel programming tools and techniques for shared and distributed memory concurrent systems. A unique aspect of the course was that Java was used as the principle programming language. The course was divided into three sections. The first section covered parallel programming techniques for shared memory systems that include multicore and Symmetric Multi-Processor (SMP) systems. In this section, Java threads was taught as a viable programming API for such systems. The second section was dedicated to parallel programming tools meant for distributed memory systems including clusters and network of computers. We used MPJ Express-a Java MPI library-for conducting programming assignments and lab work for this section. The third and the final section covered advanced topics including the MapReduce programming model using Hadoop and the General Purpose Computing on Graphics Processing Units (GPGPU).Comment: 8 Pages, 6 figures, MPJ Express, MPI Java, Teaching Parallel Programmin

arXiv.org e-Print Archive

Crossref

Portsmouth University Research Portal (Pure)

Java in the High Performance Computing arena: Research, practice and experience

Author: Doallo Ramón
Expósito Roberto R.
López Taboada Guillermo
Ramos Garea Sabela
Touriño Juan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

This is a post-peer-review, pre-copyedit version of an article published in Science of Computer Programming. The final authenticated version is available online at: https://doi.org/10.1016/j.scico.2011.06.002[Abstract] The rising interest in Java for High Performance Computing (HPC) is based on the appealing features of this language for programming multi-core cluster architectures, particularly the built-in networking and multithreading support, and the continuous increase in Java Virtual Machine (JVM) performance. However, its adoption in this area is being delayed by the lack of analysis of the existing programming options in Java for HPC and thorough and up-to-date evaluations of their performance, as well as the unawareness on current research projects in this field, whose solutions are needed in order to boost the embracement of Java in HPC. This paper analyzes the current state of Java for HPC, both for shared and distributed memory programming, presents related research projects, and finally, evaluates the performance of current Java HPC solutions and research developments on two shared memory environments and two InfiniBand multi-core clusters. The main conclusions are that: (1) the significant interest in Java for HPC has led to the development of numerous projects, although usually quite modest, which may have prevented a higher development of Java in this field; (2) Java can achieve almost similar performance to natively compiled languages, both for sequential and parallel applications, being an alternative for HPC programming; (3) the recent advances in the efficient support of Java communications on shared memory and low-latency networks are bridging the gap between Java and natively compiled applications in HPC. Thus, the good prospects of Java in this area are attracting the attention of both industry and academia, which can take significant advantage of Java adoption in HPC.Ministerio de Ciencia e Innovación; TIN2010-16735Ministerio de Educación, Cultura y Deporte; AP2009-211

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

F-MPJ: scalable Java message-passing communications on parallel systems

Author: Doallo Ramón
López Taboada Guillermo
Touriño Juan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Device level communication libraries for high‐performance computing in Java

Author: Baker Mark
Carpenter Bryan
Doallo Ramón
López Taboada Guillermo
Shafi Aamir
Touriño Juan
Publication venue: 'Wiley'
Publication date: 01/01/2011
Field of study

This is the peer reviewed version of the following article: Taboada, G. L., Touriño, J. , Doallo, R. , Shafi, A. , Baker, M. and Carpenter, B. (2011), Device level communication libraries for high‐performance computing in Java. Concurrency Computat.: Pract. Exper., 23: 2382-2403. doi:10.1002/cpe.1777, which has been published in final form at https://doi.org/10.1002/cpe.1777. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.[Abstract] Since its release, the Java programming language has attracted considerable attention from the high‐performance computing (HPC) community because of its portability, high programming productivity, and built‐in multithreading and networking support. As a consequence, several initiatives have been taken to develop a high‐performance Java message‐passing library to program distributed memory architectures, such as clusters. The performance of Java message‐passing applications relies heavily on the communications performance. Thus, the design and implementation of low‐level communication devices that support message‐passing libraries is an important research issue in Java for HPC. MPJ Express is our Java message‐passing implementation for developing high‐performance parallel Java applications. Its public release currently contains three communication devices: the first one is built using the Java New Input/Output (NIO) package for the TCP/IP; the second one is specifically designed for the Myrinet Express library on Myrinet; and the third one supports thread‐based shared memory communications. Although these devices have been successfully deployed in many production environments, previous performance evaluations of MPJ Express suggest that the buffering layer, tightly coupled with these devices, incurs a certain degree of copying overhead, which represents one of the main performance penalties. This paper presents a more efficient Java message‐passing communications device, based on Java Input/Output sockets, that avoids this buffering overhead. Moreover, this device implements several strategies, both in the communication protocol and in the HPC hardware support, which optimizes Java message‐passing communications. In order to evaluate its benefits, this paper analyzes the performance of this device comparatively with other Java and native message‐passing libraries on various high‐speed networks, such as Gigabit Ethernet, Scalable Coherent Interface, Myrinet, and InfiniBand, as well as on a shared memory multicore scenario. The reported communication overhead reduction encourages the upcoming incorporation of this device in MPJ ExpressMinisterio de Ciencia e Innovación; TIN2010-16735

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

NPB-MPJ: NAS Parallel Benchmarks Implementation for Message-Passing in Java

Author: Doallo Ramón
López Taboada Guillermo
Mallón Damián A.
Touriño Juan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/05/2009
Field of study

This is a post-peer-review, pre-copyedit version. The final authenticated version is available online at: http://dx.doi.org/10.1109/PDP.2009.59[Abstract] Java is a valuable and emerging alternative for the development of parallel applications, thanks to the availability of several Java message-passing libraries and its full multithreading support. The combination of both shared and distributed memory programming is an interesting option for parallel programming multi-core systems. However, the concerns about Java performance are hindering its adoption in this field, although it is difficult to evaluate accurately its performance due to the lack of standard benchmarks in Java. This paper presents NPB-MPJ, the first extensive implementation of the NAS Parallel Benchmarks (NPB), the standard parallel benchmark suite, for Message-Passing in Java (MPJ) libraries. Together with the design and implementation details of NPB-MPJ, this paper gathers several optimization techniques that can serve as a guide for the development of more efficient Java applications for High Performance Computing (HPC). NPB-MPJ has been used in the performance evaluation of Java against C/Fortran parallel libraries on two representative multi-core clusters. Thus, NPB-MPJ provides an up-to-date snapshot of MPJ performance, whose comparative analysis of current Java and native parallel solutions confirms that MPJ is an alternative for parallel programming multi-core systems.Ministerio de Educación y Ciencia; TIN2004-07797-C02Ministerio de Educación y Ciencia; TIN2007-67537-C03-02Xunta de Galicia; PGIDIT06PXIB105228P

Repositorio da Universidade da Coruña

Low‐latency Java communication devices on RDMA‐enabled networks

Author: Bailey
Carpenter
Chang
Dean
Dunning
Expósito
Expósito
Fox
Maassen
Nieuwpoort
Philippsen
Saini
Shafi
Suganuma
Taboada
Taboada
Taboada
Thiruvathukal
Veldema
Welsh
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

This is the peer reviewed version of the following article: Expósito, R. R., Taboada, G. L., Ramos, S., Touriño, J., & Doallo, R. (2015). Low‐latency Java communication devices on RDMA‐enabled networks. Concurrency and Computation: Practice and Experience, 27(17), 4852-4879., which has been published in final form at https://doi.org/10.1002/cpe.3473. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.[Abstract] Providing high‐performance inter‐node communication is a key capability for running high performance computing applications efficiently on parallel architectures. In fact, current systems deployments are aggregating a significant number of cores interconnected via advanced networking hardware with Remote Direct Memory Access (RDMA) mechanisms, that enable zero‐copy and kernel‐bypass features. The use of Java for parallel programming is becoming more promising thanks to some useful characteristics of this language, particularly its built‐in multithreading support, portability, easy‐to‐learn properties, and high productivity, along with the continuous increase in the performance of the Java virtual machine. However, current parallel Java applications generally suffer from inefficient communication middleware, mainly based on protocols with high communication overhead that do not take full advantage of RDMA‐enabled networks. This paper presents efficient low‐level Java communication devices that overcome these constraints by fully exploiting the underlying RDMA hardware, providing low‐latency and high‐bandwidth communications for parallel Java applications. The performance evaluation conducted on representative RDMA networks and parallel systems has shown significant point‐to‐point performance increases compared with previous Java communication middleware, allowing to obtain up to 40% improvement in application‐level performance on 4096 cores of a Cray XE6 supercomputer.Ministerio de Economía y Competitividad; TIN2013-42148-PXunta de Galicia; GRC2013/055Ministerio de Educación y Ciencia; AP2010-434

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Design of Scalable Java Communication Middleware for Multi-Core Systems

Author: Doallo Ramón
Expósito Roberto R.
López Taboada Guillermo
Ramos Garea Sabela
Touriño Juan
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/02/2013
Field of study

This is a post-peer-review, pre-copyedit version of an article published in The Computer Journal. The final authenticated version is available online at: https://doi.org/10.1093/comjnl/bxs122[Abstract] This paper presents smdev, a shared memory communication middleware for multi-core systems. smdev provides a simple and powerful messaging application program interface that is able to exploit the underlying multi-core architecture replacing inter-process and network-based communications by threads and shared memory transfers. The performance evaluation of smdev on several multi-core systems has shown noticeable improvements compared with other Java shared memory solutions, reaching and even overcoming the performance of natively compiled libraries. Thus, smdev has obtained start-up latencies around 0.76 μs and almost 90 Gbps bandwidth for point-to-point communications, as well as high performance and scalability both for collective operations and representative messaging kernels. This fact has motivated the integration of smdev in F-MPJ, our message-passing implementation in Java.Ministerio de Ciencia e Innovación; TIN2010-1673

Repositorio da Universidade da Coruña