3,233 research outputs found

    Performance Analysis for Mesh and Mesh-Spectral Archetype Applications

    Get PDF
    This document outlines a simple method for benchmarking a parallel communication library and for using the results to model the performance of applications developed with that communication library. We use compositional performance analysis - decomposing a parallel program into its modular parts and analyzing their respective performances - to gain perspective on the performance of the whole program. This model is useful for predicting parallel program execution times for different types of program archetypes, (e.g., mesh and mesh-spectral) using communication libraries built with different message-passing schemes (e.g., Fortran M and Fortran with MPI) running on different architectures (e.g., IBM SP2 and a network of Pentium personal computers)

    HPCmatlab: A Framework for Fast Prototyping of Parallel Applications in Matlab

    Get PDF
    AbstractThe HPCmatlab framework has been developed for Distributed Memory Programming in Matlab/Octave using the Message Passing Interface (MPI). The communication routines in the MPI library are implemented using MEX wrappers. Point-to-point, collective as well as one-sided communication is supported. Benchmarking results show better performance than the Mathworks Distributed Computing Server. HPCmatlab has been used to successfully parallelize and speed up Matlab applications developed for scientific computing. The application results show good scalability, while preserving the ease of programmability. HPCmatlab also enables shared memory programming using Pthreads and Parallel I/O using the ADIOS package

    Achieving Efficient Strong Scaling with PETSc using Hybrid MPI/OpenMP Optimisation

    Full text link
    The increasing number of processing elements and decreas- ing memory to core ratio in modern high-performance platforms makes efficient strong scaling a key requirement for numerical algorithms. In order to achieve efficient scalability on massively parallel systems scientific software must evolve across the entire stack to exploit the multiple levels of parallelism exposed in modern architectures. In this paper we demonstrate the use of hybrid MPI/OpenMP parallelisation to optimise parallel sparse matrix-vector multiplication in PETSc, a widely used scientific library for the scalable solution of partial differential equations. Using large matrices generated by Fluidity, an open source CFD application code which uses PETSc as its linear solver engine, we evaluate the effect of explicit communication overlap using task-based parallelism and show how to further improve performance by explicitly load balancing threads within MPI processes. We demonstrate a significant speedup over the pure-MPI mode and efficient strong scaling of sparse matrix-vector multiplication on Fujitsu PRIMEHPC FX10 and Cray XE6 systems

    Study of Raspberry Pi 2 Quad-core Cortex A7 CPU Cluster as a Mini Supercomputer

    Full text link
    High performance computing (HPC) devices is no longer exclusive for academic, R&D, or military purposes. The use of HPC device such as supercomputer now growing rapidly as some new area arise such as big data, and computer simulation. It makes the use of supercomputer more inclusive. Todays supercomputer has a huge computing power, but requires an enormous amount of energy to operate. In contrast a single board computer (SBC) such as Raspberry Pi has minimum computing power, but require a small amount of energy to operate, and as a bonus it is small and cheap. This paper covers the result of utilizing many Raspberry Pi 2 SBCs, a quad-core Cortex A7 900 MHz, as a cluster to compensate its computing power. The high performance linpack (HPL) is used to benchmark the computing power, and a power meter with resolution 10mV / 10mA is used to measure the power consumption. The experiment shows that the increase of number of cores in every SBC member in a cluster is not giving significant increase in computing power. This experiment give a recommendation that 4 nodes is a maximum number of nodes for SBC cluster based on the characteristic of computing performance and power consumption.Comment: Pre-print of conference paper on International Conference on Information Technology and Electrical Engineerin
    corecore