Search CORE

251 research outputs found

Run-time Support for Parallelization of Data-Parallel Applications on Adaptive and Nonuniform Computational Environments

Author: Kaddoura Maher
Ranka Sanjay
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1995
Field of study

In this paper we discuss the runtime support required for the parallelization of unstructured data parallel applications on nonuniform and adaptive environments. The approach presented is reasonably general and is applicable to a wide variety of regular as well as irregular applications. We present performance results for the solution of an unstructured mesh on a cluster of heterogeneous workstations

Syracuse University Research Facility and Collaborative Environment

Integrating Algorithmic and Systemic Load Balancing Strategies in Parallel Scientific Applications

Author: Ghafoor Sheikh Khaled
Publication venue: Scholars Junction
Publication date: 13/12/2003
Field of study

Load imbalance is a major source of performance degradation in parallel scientific applications. Load balancing increases the efficient use of existing resources and improves performance of parallel applications running in distributed environments. At a coarse level of granularity, advances in runtime systems for parallel programs have been proposed in order to control available resources as efficiently as possible by utilizing idle resources and using task migration. At a finer granularity level, advances in algorithmic strategies for dynamically balancing computational loads by data redistribution have been proposed in order to respond to variations in processor performance during the execution of a given parallel application. Algorithmic and systemic load balancing strategies have complementary set of advantages. An integration of these two techniques is possible and it should result in a system, which delivers advantages over each technique used in isolation. This thesis presents a design and implementation of a system that combines an algorithmic fine-grained data parallel load balancing strategy called Fractiling with a systemic coarse-grained task-parallel load balancing system called Hector. It also reports on experimental results of running N-body simulations under this integrated system. The experimental results indicate that a distributed runtime environment, which combines both algorithmic and systemic load balancing strategies, can provide performance advantages with little overhead, underscoring the importance of this approach in large complex scientific applications

Scholars Junction - Mississippi State University Institutional Repository

Distributed memory compiler methods for irregular problems: Data copy reuse and runtime partitioning

Author: Das Raja
Mavriplis Dimitri
Ponnusamy Ravi
Saltz Joel
Publication venue
Publication date: 01/10/1991
Field of study

Outlined here are two methods which we believe will play an important role in any distributed memory compiler able to handle sparse and unstructured problems. We describe how to link runtime partitioners to distributed memory compilers. In our scheme, programmers can implicitly specify how data and loop iterations are to be distributed between processors. This insulates users from having to deal explicitly with potentially complex algorithms that carry out work and data partitioning. We also describe a viable mechanism for tracking and reusing copies of off-processor data. In many programs, several loops access the same off-processor memory locations. As long as it can be verified that the values assigned to off-processor memory locations remain unmodified, we show that we can effectively reuse stored off-processor data. We present experimental data from a 3-D unstructured Euler solver run on iPSC/860 to demonstrate the usefulness of our methods

NASA Technical Reports Server

Syracuse University Research Facility and Collaborative Environment

SpECTRE: A Task-based Discontinuous Galerkin Code for Relativistic Astrophysics

Author: Bohn Andy
Deppe Nils
Diener Peter
Field Scott E.
Foucart Francois
Hébert François
Kidder Lawrence E.
Lippuner Jonas
Miller Jonah
Ott Christian D.
Scheel Mark A.
Schnetter Erik
Teukolsky Saul A.
Vincent Trevor
Publication venue: 'Elsevier BV'
Publication date: 15/04/2017
Field of study

We introduce a new relativistic astrophysics code, SpECTRE, that combines a discontinuous Galerkin method with a task-based parallelism model. SpECTRE's goal is to achieve more accurate solutions for challenging relativistic astrophysics problems such as core-collapse supernovae and binary neutron star mergers. The robustness of the discontinuous Galerkin method allows for the use of high-resolution shock capturing methods in regions where (relativistic) shocks are found, while exploiting high-order accuracy in smooth regions. A task-based parallelism model allows efficient use of the largest supercomputers for problems with a heterogeneous workload over disparate spatial and temporal scales. We argue that the locality and algorithmic structure of discontinuous Galerkin methods will exhibit good scalability within a task-based parallelism framework. We demonstrate the code on a wide variety of challenging benchmark problems in (non)-relativistic (magneto)-hydrodynamics. We demonstrate the code's scalability including its strong scaling on the NCSA Blue Waters supercomputer up to the machine's full capacity of 22,380 nodes using 671,400 threads.Comment: 41 pages, 13 figures, and 7 tables. Ancillary data contains simulation input file

arXiv.org e-Print Archive

Louisiana State University

Caltech Authors

Run-time and compile-time support for adaptive irregular problems

Author: Hwang Yuan-Shin
Moon Bongki
Ponnusamy Ravi
Sharma Shamik D.
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1994
Field of study

In adaptive irregular problems the data arrays are accessed via indirection arrays, and data access patterns change during computation. Implementing such problems on distributed memory machines requires support for dynamic data partitioning, efficient preprocessing and fast data migration. This research presents efficient runtime primitives for such problems. This new set of primitives is part of the CHAOS library. It subsumes the previous PARTI library which targeted only static irregular problems. To demonstrate the efficacy of the runtime support, two real adaptive irregular applications have been parallelized using CHAOS primitives: a molecular dynamics code (CHARMM) and a particle-in-cell code (DSMC). The paper also proposes extensions to Fortran D which can allow compilers to generate more efficient code for adaptive problems. These language extensions have been implemented in the Syracuse Fortran 90D/HPF prototype compiler. The performance of the compiler parallelized codes is compared with the hand parallelized versions

Syracuse University Research Facility and Collaborative Environment

Performance and Memory Space Optimizations for Embedded Systems

Author: Yemliha Taylan
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/2011
Field of study

Embedded systems have three common principles: real-time performance, low power consumption, and low price (limited hardware). Embedded computers use chip multiprocessors (CMPs) to meet these expectations. However, one of the major problems is lack of efficient software support for CMPs; in particular, automated code parallelizers are needed. The aim of this study is to explore various ways to increase performance, as well as reducing resource usage and energy consumption for embedded systems. We use code restructuring, loop scheduling, data transformation, code and data placement, and scratch-pad memory (SPM) management as our tools in different embedded system scenarios. The majority of our work is focused on loop scheduling. Main contributions of our work are: We propose a memory saving strategy that exploits the value locality in array data by storing arrays in a compressed form. Based on the compressed forms of the input arrays, our approach automatically determines the compressed forms of the output arrays and also automatically restructures the code. We propose and evaluate a compiler-directed code scheduling scheme, which considers both parallelism and data locality. It analyzes the code using a locality parallelism graph representation, and assigns the nodes of this graph to processors.We also introduce an Integer Linear Programming based formulation of the scheduling problem. We propose a compiler-based SPM conscious loop scheduling strategy for array/loop based embedded applications. The method is to distribute loop iterations across parallel processors in an SPM-conscious manner. The compiler identifies potential SPM hits and misses, and distributes loop iterations such that the processors have close execution times. We present an SPM management technique using Markov chain based data access. We propose a compiler directed integrated code and data placement scheme for 2-D mesh based CMP architectures. Using a Code-Data Affinity Graph (CDAG) to represent the relationship between loop iterations and array data, it assigns the sets of loop iterations to processing cores and sets of data blocks to on-chip memories. We present a memory bank aware dynamic loop scheduling scheme for array intensive applications.The goal is to minimize the number of memory banks needed for executing the group of loop iterations

Syracuse University Research Facility and Collaborative Environment

Distributed Memory Compiler Methods for Irregular Problems -- Data Copy Reuse and Runtime Partitioning

Author: Das Raja
Mavriplis Dimitri
Ponnusamy Ravi
Saltz Joel
Publication venue: SURFACE at Syracuse University
Publication date: 01/10/1991
Field of study

This paper outlines two methods which we believe will play an important role in any distributed memory compiler able to handle sparse and unstructured problems. We describe how to link runtime partitioners to distributed memory compilers. In our scheme, programmers can implicitly specify how data and loop iterations are to be distributed between processors. This insulates users from having to deal explicitly with potentially complex algorithms that carry out work and data partitioning. We also describe a viable mechanism for tracking and reusing copies of off-processor data. In many programs, several loops access the same off-processor memory locations. As long as it can be verified that the values assigned to off-processor memory locations remain unmodified, we show that we can effectively reuse stored off-processor data. We present experimental data from a 3-D unstructured Euler solver run on an iPSC/860 to demonstrate the usefulness of our methods

Syracuse University Research Facility and Collaborative Environment

Runtime and language support for compiling adaptive irregular programs on distributed-memory machines

Author: Baden
Berger
Bird
Bodin
Bokhari
Bozkus
Brooks
Brooks
Chakrabarti
Chapman
Das
Das
Fox
Hiranandani
Koelbel
Lam
Leland
Lu
Mansour
Mavriplis
Mirchandaney
Nicol
Nour-Omid
Pathon
Ponnusamy
Ponnusamy
Rault
Rosing
Saltz
Saltz
V. Hanxleden
V. Hanxleden
Van Gunsteren
Venkatakrishnan
Venkatkrishnan
Vidwans
Weiner
Williams
Williams
Wilmoth
Wu
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Run-time and Compile-time Support for Adaptive Irregular Problems

Author: Das Raja
Hwang Yuan-Shin
Moon Bongki
Ponnusamy Ravi
Saltz Joel
Sharma Shamik D.
Publication venue
Publication date: 15/10/1998
Field of study

Digital Repository at the University of Maryland