Search CORE

11,327 research outputs found

Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach

Author: Ciorba Florina M.
Eleliemy Ahmed
Publication venue
Publication date: 01/01/2019
Field of study

Computationally-intensive loops are the primary source of parallelism in scientific applications. Such loops are often irregular and a balanced execution of their loop iterations is critical for achieving high performance. However, several factors may lead to an imbalanced load execution, such as problem characteristics, algorithmic, and systemic variations. Dynamic loop self-scheduling (DLS) techniques are devised to mitigate these factors, and consequently, improve application performance. On distributed-memory systems, DLS techniques can be implemented using a hierarchical master-worker execution model and are, therefore, called hierarchical DLS techniques. These techniques self-schedule loop iterations at two levels of hardware parallelism: across and within compute nodes. Hybrid programming approaches that combine the message passing interface (MPI) with open multi-processing (OpenMP) dominate the implementation of hierarchical DLS techniques. The MPI-3 standard includes the feature of sharing memory regions among MPI processes. This feature introduced the MPI+MPI approach that simplifies the implementation of parallel scientific applications. The present work designs and implements hierarchical DLS techniques by exploiting the MPI+MPI approach. Four well-known DLS techniques are considered in the evaluation proposed herein. The results indicate certain performance advantages of the proposed approach compared to the hybrid MPI+OpenMP approach

arXiv.org e-Print Archive

Crossref

edoc

A Taxonomy of Workflow Management Systems for Grid Computing

Author: Buyya Rajkumar
Yu Jia
Publication venue
Publication date: 01/01/2005
Field of study

With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure

arXiv.org e-Print Archive

CiteSeerX

A critical analysis of research potential, challenges and future directives in industrial wireless sensor networks

Author: Aslam Nauman
Cao Yue
Hussain Sajjad
Khan Noor Muhammad
Le-Minh Hoa
Raza Mohsin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

In recent years, Industrial Wireless Sensor Networks (IWSNs) have emerged as an important research theme with applications spanning a wide range of industries including automation, monitoring, process control, feedback systems and automotive. Wide scope of IWSNs applications ranging from small production units, large oil and gas industries to nuclear fission control, enables a fast-paced research in this field. Though IWSNs offer advantages of low cost, flexibility, scalability, self-healing, easy deployment and reformation, yet they pose certain limitations on available potential and introduce challenges on multiple fronts due to their susceptibility to highly complex and uncertain industrial environments. In this paper a detailed discussion on design objectives, challenges and solutions, for IWSNs, are presented. A careful evaluation of industrial systems, deadlines and possible hazards in industrial atmosphere are discussed. The paper also presents a thorough review of the existing standards and industrial protocols and gives a critical evaluation of potential of these standards and protocols along with a detailed discussion on available hardware platforms, specific industrial energy harvesting techniques and their capabilities. The paper lists main service providers for IWSNs solutions and gives insight of future trends and research gaps in the field of IWSNs

Northumbria Research Link

Crossref

Enlighten

A Parallel Adaptive P3M code with Hierarchical Particle Reordering

Author: Anderson
Bagla
Balsara
Barnes
Becciani
Blumenthal
Bode
Boris
Brieu
Couchman
Couchman
Dave
Decyk
Dubinski
Dubinski
Eastwood
Efstathiou
Evrard
Ferrell
Frenk
Frigo
Gingold
Greengard
H.M.P. Couchman
Hernquist
Hernquist
Hockney
Kawata
Kravtsov
Li
Lia
MacFarland
Miocchi
Monaghan
Navarro
Pearce
Robert J. Thacker
Serna
Snir
Spergel
Springel
Springel
Steinmetz
Sugimoto
Swarztrauber
Thacker
Thacker
Thacker
Thacker
Theuns
Vetterling
Wadsley
White
Wisdom
Wood
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

We discuss the design and implementation of HYDRA_OMP a parallel implementation of the Smoothed Particle Hydrodynamics-Adaptive P3M (SPH-AP3M) code HYDRA. The code is designed primarily for conducting cosmological hydrodynamic simulations and is written in Fortran77+OpenMP. A number of optimizations for RISC processors and SMP-NUMA architectures have been implemented, the most important optimization being hierarchical reordering of particles within chaining cells, which greatly improves data locality thereby removing the cache misses typically associated with linked lists. Parallel scaling is good, with a minimum parallel scaling of 73% achieved on 32 nodes for a variety of modern SMP architectures. We give performance data in terms of the number of particle updates per second, which is a more useful performance metric than raw MFlops. A basic version of the code will be made available to the community in the near future.Comment: 34 pages, 12 figures, accepted for publication in Computer Physics Communication

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Dynamic Loop Scheduling Using MPI Passive-Target Remote Memory Access

Author: Ciorba Florina M.
Eleliemy Ahmed
Publication venue
Publication date: 01/01/2018
Field of study

Scientific applications often contain large computationally-intensive parallel loops. Loop scheduling techniques aim to achieve load balanced executions of such applications. For distributed-memory systems, existing dynamic loop scheduling (DLS) libraries are typically MPI-based, and employ a master-worker execution model to assign variably-sized chunks of loop iterations. The master-worker execution model may adversely impact performance due to the master-level contention. This work proposes a distributed chunk-calculation approach that does not require the master-worker execution scheme. Moreover, it considers the novel features in the latest MPI standards, such as passive-target remote memory access, shared-memory window creation, and atomic read-modify-write operations. To evaluate the proposed approach, five well-known DLS techniques, two applications, and two heterogeneous hardware setups have been considered. The DLS techniques implemented using the proposed approach outperformed their counterparts implemented using the traditional master-worker execution model

arXiv.org e-Print Archive

Crossref

edoc

SKIRT: hybrid parallelization of radiative transfer simulations

Author: Baes Maarten
Camps Peter
Van De Putte Dries
Verstocken Sam
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

We describe the design, implementation and performance of the new hybrid parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which has been used extensively for modeling the continuum radiation of dusty astrophysical systems including late-type galaxies and dusty tori. The hybrid scheme combines distributed memory parallelization, using the standard Message Passing Interface (MPI) to communicate between processes, and shared memory parallelization, providing multiple execution threads within each process to avoid duplication of data structures. The synchronization between multiple threads is accomplished through atomic operations without high-level locking (also called lock-free programming). This improves the scaling behavior of the code and substantially simplifies the implementation of the hybrid scheme. The result is an extremely flexible solution that adjusts to the number of available nodes, processors and memory, and consequently performs well on a wide variety of computing architectures.Comment: 21 pages, 20 figure

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Density Functional Theory calculation on many-cores hybrid CPU-GPU architectures

Author: Alexey Neelov
Goedecker S.
Jean-François Méhaut
Luigi Genovese
Matthieu Ospici
Stefan Goedecker
Thierry Deutsch
Publication venue
Publication date: 01/01/2009
Field of study

The implementation of a full electronic structure calculation code on a hybrid parallel architecture with Graphic Processing Units (GPU) is presented. The code which is on the basis of our implementation is a GNU-GPL code based on Daubechies wavelets. It shows very good performances, systematic convergence properties and an excellent efficiency on parallel computers. Our GPU-based acceleration fully preserves all these properties. In particular, the code is able to run on many cores which may or may not have a GPU associated. It is thus able to run on parallel and massive parallel hybrid environment, also with a non-homogeneous ratio CPU/GPU. With double precision calculations, we may achieve considerable speedup, between a factor of 20 for some operations and a factor of 6 for the whole DFT code.Comment: 14 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

edoc

HAL-CEA

Green and efficient RAN architectures

Author: Cardona Narcis
Chatzinotas Symeon
Correia Luis
Deruyck Margot
Garcia Concepción
Garcia-Lozano Mario
Gonzalez David
Grazioso Paolo
Joseph Wout
Lema Maria
Papaj Jan
Ruiz Silvia
Studer Lucio
Velez Fernando J
Publication venue: 'River Publishers'
Publication date: 01/01/2016
Field of study

Ghent University Academic Bibliography