Search CORE

2,035 research outputs found

Load Balancing for Parallel Loops in Workstation Clusters

Author: Kim Tae-Hyung
Purtilo James M.
Publication venue
Publication date: 15/10/1998
Field of study

Load imbalance is a serious impediment to achieving good performance in parallel processing. Global load balancing schemes are not adequately manage to balance parallel tasks generated from a single application. Dynamic loop scheduling methods are known to be useful in balancing parallel loops on shared-memory multiprocessor machines. However, their centralized nature causes a bottleneck for the relatively small number of processors in workstation clusters because of order-of-magnitude differences in communications overheads. Moreover, improvements of basic loop scheduling methods have not dealt effectively with irregularly distributed workloads in parallel loops, which commonly occur in applications for workstation clusters. In this paper, we present a new decentralized balancing method for parallel loops on workstation clusters. (Also cross-referenced as UMIACS-TR-96-6

Digital Repository at the University of Maryland

Optimisation of a parallel ocean general circulation model

Author: Beare MI
Stevens DP
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1997
Field of study

Abstract. This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing rou- tines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel e?ciency of the model is adversely a?ected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers

Crossref

Directory of Open Access Journals

HAL-INSU

University of East Anglia digital repository

Designing a scalable dynamic load -balancing algorithm for pipelined single program multiple data applications on a non-dedicated heterogeneous network of workstations

Author: Osman Ashraf
Publication venue: The Research Repository @ WVU
Publication date: 01/12/2003
Field of study

Dynamic load balancing strategies have been shown to be the most critical part of an efficient implementation of various applications on large distributed computing systems. The need for dynamic load balancing strategies increases when the underlying hardware is a non-dedicated heterogeneous network of workstations (HNOW). This research focuses on the single program multiple data (SPMD) programming model as it has been extensively used in parallel programming for its simplicity and scalability in terms of computational power and memory size.;This dissertation formally defines and addresses the problem of designing a scalable dynamic load-balancing algorithm for pipelined SPMD applications on non-dedicated HNOW. During this process, the HNOW parameters, SPMD application characteristics, and load-balancing performance parameters are identified.;The dissertation presents a taxonomy that categorizes general load balancing algorithms and a methodology that facilitates creating new algorithms that can harness the HNOW computing power and still preserve the scalability of the SPMD application.;The dissertation devises a new algorithm, DLAH (Dynamic Load-balancing Algorithm for HNOW). DLAH is based on a modified diffusion technique, which incorporates the HNOW parameters. Analytical performance bound for the worst-case scenario of the diffusion technique has been derived.;The dissertation develops and utilizes an HNOW simulation model to conduct extensive simulations. These simulations were used to validate DLAH and compare its performance to related dynamic algorithms. The simulations results show that DLAH algorithm is scalable and performs well for both homogeneous and heterogeneous networks. Detailed sensitivity analysis was conducted to study the effects of key parameters on performance

The Research Repository @ WVU (West Virginia University)

CRAUL: Compiler and Run-Time Integration for Adaptation under Load

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/1999
Field of study

Crossref

Hierarchical Parallelisation of Functional Renormalisation Group Calculations -- hp-fRG

Author: Rohe Daniel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

The functional renormalisation group (fRG) has evolved into a versatile tool in condensed matter theory for studying important aspects of correlated electron systems. Practical applications of the method often involve a high numerical effort, motivating the question in how far High Performance Computing (HPC) can leverage the approach. In this work we report on a multi-level parallelisation of the underlying computational machinery and show that this can speed up the code by several orders of magnitude. This in turn can extend the applicability of the method to otherwise inaccessible cases. We exploit three levels of parallelisation: Distributed computing by means of Message Passing (MPI), shared-memory computing using OpenMP, and vectorisation by means of SIMD units (single-instruction-multiple-data). Results are provided for two distinct High Performance Computing (HPC) platforms, namely the IBM-based BlueGene/Q system JUQUEEN and an Intel Sandy-Bridge-based development cluster. We discuss how certain issues and obstacles were overcome in the course of adapting the code. Most importantly, we conclude that this vast improvement can actually be accomplished by introducing only moderate changes to the code, such that this strategy may serve as a guideline for other researcher to likewise improve the efficiency of their codes

arXiv.org e-Print Archive

Crossref

Juelich Shared Electronic Resources

The Parallel Implementation of a Full Configuration Interaction Program

Author: Alexeev Yuri
Gan Zhengting
Gordon Mark
Gordon Mark
Kendall Ricky
Publication venue: Iowa State University Digital Repository
Publication date: 01/07/2003
Field of study

Both the replicated and distributed data parallel full configuration interaction (FCI) implementations are described. The implementation of the FCI algorithm is organized in a hybrid strings-integral driven approach. Redundant communication is avoided, and the network performance is further optimized by an improved distributed data interface library. Examples show linear scalability of the distributed data code on both PC and workstation clusters. The new parallel implementation greatly extends the hardware on which parallel FCI calculations can be performed. The timing data on the workstation cluster show great potential for using the new parallel FCI algorithm in expanding applications of complete active space self-consistent field applications

Digital Repository @ Iowa State University (ISU)

Crossref

Parallel Implementation of the PHOENIX Generalized Stellar Atmosphere Program

Author: Baron E.
E. Baron
France Allard
Peter H. Hauschildt
Rybicki G. B.
Schweitzer A.
Publication venue: 'University of Chicago Press'
Publication date: 17/07/1996
Field of study

We describe the parallel implementation of our generalized stellar atmosphere and NLTE radiative transfer computer program PHOENIX. We discuss the parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. Our implementation uses a MIMD design based on a relatively small number of MPI library calls. We report the results of test calculations on a number of different parallel computers and discuss the results of scalability tests.Comment: To appear in ApJ, 1997, vol 483. LaTeX, 34 pages, 3 Figures, uses AASTeX macros and styles natbib.sty, and psfig.st

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Monitorable network and CPU load statistics and their application to scheduling

Author: Meyer Trevor Ethan
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/1995
Field of study

Recent trends in high-speed computing have moved towards the use of networks of workstations as a cost-effective approach to parallel computing. One recently proposed solution involves the use of an existing network of workstation-class computers as a single multiprocessor, and much research is ongoing in this area;This dissertation describes work in the area of process scheduling on networks of workstations, specifically in the area of load analysis. After presenting extensive background in the field, measures of CPU and network load are defined, and a test parallel application program presented, written for a network-multiprocessing software package called PVM. A series of experiments is then detailed, whose goal was to discover the relationship between the run time of the test application and the loads on the participating workstations and networks. The experiments include measurement of CPU loading and network loading, both during test application runs, during artificially elevated loads, and during quiet conditions. Results of the experiments are presented, and the applications of the results to the problem of task scheduling examined. It is then claimed that several easily measured load measures are useful to task scheduling, by allowing run time to be predicted within a margin of error, and allowing limiting network segments to be detected and avoided

Digital Repository @ Iowa State University (ISU)