Search CORE

33 research outputs found

Compilation techniques for irregular problems on parallel machines

Author: Das Subhendu
Publication venue: W&M ScholarWorks
Publication date: 01/01/1994
Field of study

Massively parallel computers have ushered in the era of teraflop computing. Even though large and powerful machines are being built, they are used by only a fraction of the computing community. The fundamental reason for this situation is that parallel machines are difficult to program. Development of compilers that automatically parallelize programs will greatly increase the use of these machines.;A large class of scientific problems can be categorized as irregular computations. In this class of computation, the data access patterns are known only at runtime, creating significant difficulties for a parallelizing compiler to generate efficient parallel codes. Some compilers with very limited abilities to parallelize simple irregular computations exist, but the methods used by these compilers fail for any non-trivial applications code.;This research presents development of compiler transformation techniques that can be used to effectively parallelize an important class of irregular programs. A central aim of these transformation techniques is to generate codes that aggressively prefetch data. Program slicing methods are used as a part of the code generation process. In this approach, a program written in a data-parallel language, such as HPF, is transformed so that it can be executed on a distributed memory machine. An efficient compiler runtime support system has been developed that performs data movement and software caching

College of William & Mary: W&M Publish

Extending HPF for advanced data parallel applications

Author: Chapman Barbara
Mehrotra Piyush
Zima Hans
Publication venue
Publication date
Field of study

The stated goal of High Performance Fortran (HPF) was to 'address the problems of writing data parallel programs where the distribution of data affects performance'. After examining the current version of the language we are led to the conclusion that HPF has not fully achieved this goal. While the basic distribution functions offered by the language - regular block, cyclic, and block cyclic distributions - can support regular numerical algorithms, advanced applications such as particle-in-cell codes or unstructured mesh solvers cannot be expressed adequately. We believe that this is a major weakness of HPF, significantly reducing its chances of becoming accepted in the numeric community. The paper discusses the data distribution and alignment issues in detail, points out some flaws in the basic language, and outlines possible future paths of development. Furthermore, we briefly deal with the issue of task parallelism and its integration with the data parallel paradigm of HPF

NASA Technical Reports Server

PARTI primitives for unstructured and block structured problems

Author: A. Sussman
André
Ashcraft
Baden
Baxter
Berger
Berryman
Brezany
Chase
Chen
Cheung
Choudhary
Chrisochoides
D. Mavriplis
Das
Foster
Fox
Fox
Gerndt
Hatcher
Havlak
Hiranandani
Hiranandani
J. Saltz
K. Crowley
Kennedy
Koelbel
Lemke
Li
Li
Li
Lonsdale
Lu
Lui
Mavriplis
Mirchandaney
Pothen
R. Das
R. Ponnusamy
Rogers
Rosing
S. Gupta
Saltz
Saltz
Saltz
Simon
Tseng
Vatsa
Williams
Williams
Zima
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Distributed memory compiler methods for irregular problems: Data copy reuse and runtime partitioning

Author: Das Raja
Mavriplis Dimitri
Ponnusamy Ravi
Saltz Joel
Publication venue
Publication date: 01/10/1991
Field of study

Outlined here are two methods which we believe will play an important role in any distributed memory compiler able to handle sparse and unstructured problems. We describe how to link runtime partitioners to distributed memory compilers. In our scheme, programmers can implicitly specify how data and loop iterations are to be distributed between processors. This insulates users from having to deal explicitly with potentially complex algorithms that carry out work and data partitioning. We also describe a viable mechanism for tracking and reusing copies of off-processor data. In many programs, several loops access the same off-processor memory locations. As long as it can be verified that the values assigned to off-processor memory locations remain unmodified, we show that we can effectively reuse stored off-processor data. We present experimental data from a 3-D unstructured Euler solver run on iPSC/860 to demonstrate the usefulness of our methods

NASA Technical Reports Server

Syracuse University Research Facility and Collaborative Environment

Scalability and Performance Analysis of OpenMP Codes Using the Periscope Toolkit

Author: Benedict Shajulin
Gerndt Michael
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 10/02/2015
Field of study

In this paper, we present two new approaches while rendering necessary extensions to Periscope to perform scalability and performance analysis on OpenMP codes. Periscope is an online-based performance analysis toolkit which consists of a user defined number of analysis agents that automatically search for the performance properties while the application is running. In order to detect the scalability and performance bottlenecks of OpenMP codes using Periscope, a few newly defined performance properties and meta properties are formalized. We manifest our implementation by evaluating NAS OpenMP benchmarks. As shown in our results, our approach identifies the code regions which do not scale well and other performance problems, e.g. load imbalance in NAS parallel benchmarks

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Comparing Solution Methods for Dynamic Equilibrium Economies

Author: Jesus Fernandez-Villaverde
Juan F. Rubio-Ramirez
S. Boragan Aruoba
Publication venue
Publication date
Field of study

This paper compares solution methods for dynamic equilibrium economies. We compute and simulate the stochastic neoclassical growth model with leisure choice using Undetermined Coefficients in levels and in logs, Finite Elements, Chebyshev Polynomials, Second and Fifth Order Perturbations and Value Function Iteration for several calibrations. We document the performance of the methods in terms of computing time, implementation complexity and accuracy and we present some conclusions about our preferred approaches based on the reported evidence.Dynamic Equilibrium Economies, Computational Methods, Linear and Nonlinear Solution Methods

Research Papers in Economics

ARTICLE NO. PC971367 A Library-Based Approach to Task Parallelism in a Data-Parallel Language

Author: Alok Choudhary
David R. Kohr
Ian Foster
Rakesh Krishnaiyer
Publication venue
Publication date
Field of study

Pure data-parallel languages such as High Performance Fortran version 1 (HPF) do not allow efficient expression of mixed task/data-parallel computations or the coupling of separately compiled data-parallel modules. In this paper, we show how these common parallel program structures can be represented, with only minor extensions to the HPF model, by using a coordination library based on the Message Passing Interface (MPI). This library allows data-parallel tasks to exchange distributed data structures using calls to simple communication functions. We present microbenchmark results that characterize the performance of this library and that quantify the impact of optimizations that allow reuse of communication schedules in common situations. In addition, results from two-dimensional FFT, convolution, and multiblock programs demonstrate that the HPF/ MPI library can provide performance superior to that of pure HPF. We conclude that this synergistic combination of two parallel programming standards represents a useful approach to task parallelism in a data-parallel framework, increasing the range of problems addressable in HPF without requiring complex compile

CiteSeerX

Compiling global name-space programs for distributed execution

Author: Koelbel Charles
Mehrotra Piyush
Publication venue
Publication date
Field of study

Distributed memory machines do not provide hardware support for a global address space. Thus programmers are forced to partition the data across the memories of the architecture and use explicit message passing to communicate data between processors. The compiler support required to allow programmers to express their algorithms using a global name-space is examined. A general method is presented for analysis of a high level source program and along with its translation to a set of independently executing tasks communicating via messages. If the compiler has enough information, this translation can be carried out at compile-time. Otherwise run-time code is generated to implement the required data movement. The analysis required in both situations is described and the performance of the generated code on the Intel iPSC/2 is presented

NASA Technical Reports Server

Recommended from our members

Final report on the Copper Mountain conference on multigrid methods

Author
Publication venue: Front Range Scientific Computations, Inc., Denver, CO (United States). Dept. of Mathematics
Publication date: 01/10/1997
Field of study

The Copper Mountain Conference on Multigrid Methods was held on April 6-11, 1997. It took the same format used in the previous Copper Mountain Conferences on Multigrid Method conferences. Over 87 mathematicians from all over the world attended the meeting. 56 half-hour talks on current research topics were presented. Talks with similar content were organized into sessions. Session topics included: fluids; domain decomposition; iterative methods; basics; adaptive methods; non-linear filtering; CFD; applications; transport; algebraic solvers; supercomputing; and student paper winners

UNT Digital Library

Distributed Memory Compiler Methods for Irregular Problems -- Data Copy Reuse and Runtime Partitioning

Author: Das Raja
Mavriplis Dimitri
Ponnusamy Ravi
Saltz Joel
Publication venue: SURFACE at Syracuse University
Publication date: 01/10/1991
Field of study

This paper outlines two methods which we believe will play an important role in any distributed memory compiler able to handle sparse and unstructured problems. We describe how to link runtime partitioners to distributed memory compilers. In our scheme, programmers can implicitly specify how data and loop iterations are to be distributed between processors. This insulates users from having to deal explicitly with potentially complex algorithms that carry out work and data partitioning. We also describe a viable mechanism for tracking and reusing copies of off-processor data. In many programs, several loops access the same off-processor memory locations. As long as it can be verified that the values assigned to off-processor memory locations remain unmodified, we show that we can effectively reuse stored off-processor data. We present experimental data from a 3-D unstructured Euler solver run on an iPSC/860 to demonstrate the usefulness of our methods

Syracuse University Research Facility and Collaborative Environment