Search CORE

415 research outputs found

Optimal dynamic remapping of parallel computations

Author: Nicol David M.
Reynolds Paul F., Jr.
Publication venue
Publication date
Field of study

A large class of computations are characterized by a sequence of phases, with phase changes occurring unpredictably. The decision problem was considered regarding the remapping of workload to processors in a parallel computation when the utility of remapping and the future behavior of the workload is uncertain, and phases exhibit stable execution requirements during a given phase, but requirements may change radically between phases. For these problems a workload assignment generated for one phase may hinder performance during the next phase. This problem is treated formally for a probabilistic model of computation with at most two phases. The fundamental problem of balancing the expected remapping performance gain against the delay cost was addressed. Stochastic dynamic programming is used to show that the remapping decision policy minimizing the expected running time of the computation has an extremely simple structure. Because the gain may not be predictable, the performance of a heuristic policy that does not require estimnation of the gain is examined. The heuristic method's feasibility is demonstrated by its use on an adaptive fluid dynamics code on a multiprocessor. The results suggest that except in extreme cases, the remapping decision problem is essentially that of dynamically determining whether gain can be achieved by remapping after a phase change. The results also suggest that this heuristic is applicable to computations with more than two phases

NASA Technical Reports Server

Statistical methodologies for the control of dynamic remapping

Author: Nicol D. M.
Saltz J. H.
Publication venue
Publication date
Field of study

Following an initial mapping of a problem onto a multiprocessor machine or computer network, system performance often deteriorates with time. In order to maintain high performance, it may be necessary to remap the problem. The decision to remap must take into account measurements of performance deterioration, the cost of remapping, and the estimated benefits achieved by remapping. We examine the tradeoff between the costs and the benefits of remapping two qualitatively different kinds of problems. One problem assumes that performance deteriorates gradually, the other assumes that performance deteriorates suddenly. We consider a variety of policies for governing when to remap. In order to evaluate these policies, statistical models of problem behaviors are developed. Simulation results are presented which compare simple policies with computationally expensive optimal decision policies; these results demonstrate that for each problem type, the proposed simple policies are effective and robust

NASA Technical Reports Server

RIACS

Author: Oliger Joseph
Publication venue
Publication date
Field of study

Topics considered include: high-performance computing; cognitive and perceptual prostheses (computational aids designed to leverage human abilities); autonomous systems. Also included: development of a 3D unstructured grid code based on a finite volume formulation and applied to the Navier-stokes equations; Cartesian grid methods for complex geometry; multigrid methods for solving elliptic problems on unstructured grids; algebraic non-overlapping domain decomposition methods for compressible fluid flow problems on unstructured meshes; numerical methods for the compressible navier-stokes equations with application to aerodynamic flows; research in aerodynamic shape optimization; S-HARP: a parallel dynamic spectral partitioner; numerical schemes for the Hamilton-Jacobi and level set equations on triangulated domains; application of high-order shock capturing schemes to direct simulation of turbulence; multicast technology; network testbeds; supercomputer consolidation project

NASA Technical Reports Server

Load Balancing Unstructured Adaptive Grids for CFD Problems

Author: Biswas Rupak
Oliker Leonid
Publication venue
Publication date
Field of study

Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. A dynamic load balancing method is presented that balances the workload across all processors with a global view. After each parallel tetrahedral mesh adaption, the method first determines if the new mesh is sufficiently unbalanced to warrant a repartitioning. If so, the adapted mesh is repartitioned, with new partitions assigned to processors so that the redistribution cost is minimized. The new partitions are accepted only if the remapping cost is compensated by the improved load balance. Results indicate that this strategy is effective for large-scale scientific computations on distributed-memory multiprocessors

NASA Technical Reports Server

Principles for problem aggregation and assignment in medium scale multiprocessors

Author: Nicol David M.
Saltz Joel H.
Publication venue
Publication date
Field of study

One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior

NASA Technical Reports Server

Optimal pre-scheduling of problem remappings

Author: Nicol David M.
Saltz Joel H.
Publication venue
Publication date
Field of study

A large class of scientific computational problems can be characterized as a sequence of steps where a significant amount of computation occurs each step, but the work performed at each step is not necessarily identical. Two good examples of this type of computation are: (1) regridding methods which change the problem discretization during the course of the computation, and (2) methods for solving sparse triangular systems of linear equations. Recent work has investigated a means of mapping such computations onto parallel processors; the method defines a family of static mappings with differing degrees of importance placed on the conflicting goals of good load balance and low communication/synchronization overhead. The performance tradeoffs are controllable by adjusting the parameters of the mapping method. To achieve good performance it may be necessary to dynamically change these parameters at run-time, but such changes can impose additional costs. If the computation's behavior can be determined prior to its execution, it can be possible to construct an optimal parameter schedule using a low-order-polynomial-time dynamic programming algorithm. Since the latter can be expensive, the performance is studied of the effect of a linear-time scheduling heuristic on one of the model problems, and it is shown to be effective and nearly optimal

NASA Technical Reports Server

RIACS

Author: Moore Robert C.
Publication venue
Publication date
Field of study

The Research Institute for Advanced Computer Science (RIACS) was established by the Universities Space Research Association (USRA) at the NASA Ames Research Center (ARC) on June 6, 1983. RIACS is privately operated by USRA, a consortium of universities that serves as a bridge between NASA and the academic community. Under a five-year co-operative agreement with NASA, research at RIACS is focused on areas that are strategically enabling to the Ames Research Center's role as NASA's Center of Excellence for Information Technology. The primary mission of RIACS is charted to carry out research and development in computer science. This work is devoted in the main to tasks that are strategically enabling with respect to NASA's bold mission in space exploration and aeronautics. There are three foci for this work: (1) Automated Reasoning. (2) Human-Centered Computing. and (3) High Performance Computing and Networking. RIACS has the additional goal of broadening the base of researcher in these areas of importance to the nation's space and aeronautics enterprises. Through its visiting scientist program, RIACS facilitates the participation of university-based researchers, including both faculty and students, in the research activities of NASA and RIACS. RIACS researchers work in close collaboration with NASA computer scientists on projects such as the Remote Agent Experiment on Deep Space One mission, and Super-Resolution Surface Modeling

NASA Technical Reports Server

Hypergraph-partitioning-based remapping models for image-space-parallel direct volume rendering of unstructured grids

Author: Aykanat C.
Cambazoglu B.B.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

In this work, image-space-parallel direct volume rendering (DVR) of unstructured grids is investigated for distributed-memory architectures. A hypergraph-partitioning-based model is proposed for the adaptive screen partitioning problem in this context. The proposed model aims to balance the rendering loads of processors while trying to minimize the amount of data replication. In the parallel DVR framework we adopted, each data primitive is statically owned by its home processor, which is responsible from replicating its primitives on other processors. Two appropriate remapping models are proposed by enhancing the above model for use within this framework. These two remapping models aim to minimize the total volume of communication in data replication while balancing the rendering loads of processors. Based on the proposed models, a parallel DVR algorithm is developed. The experiments conducted on a PC cluster show that the proposed remapping models achieve better speedup values compared to the remapping models previously suggested for image-space-parallel DVR. © 2007 IEEE

Bilkent University Institutional Repository

Experiences with Mesh-like computations using Prediction Binary Trees

Author: CORDASCO G
COSENZA B
DE CHIARA R
ERRA UGO
SCARANO V.
Publication venue: Warszawa : Szkoła Wyższa Psychologii Społecznej.
Publication date: 01/01/2009
Field of study

In this paper we aim at exploiting the temporal coherence among successive phases of a computation, in order to implement a load-balancing technique in mesh-like computations to be mapped on a cluster of processors. A key concept, on which the load balancing schema is built on, is the use of a Predictor component that is in charge of providing an estimation of the unbalancing between successive phases. By using this information, our method partitions the computation in balanced tasks through the Prediction Binary Tree (PBT). At each new phase, current PBT is updated by using previous phase computing time for each task as next phase's cost estimate. The PBT is designed so that it balances the load across the tasks as well as reduces {\em dependency} among processors for higher performances. Reducing dependency is obtained by using rectangular tiles of the mesh, of almost-square shape (i. e. one dimension is at most twice the other). By reducing dependency, one can reduce inter-processors communication or exploit local dependencies among tasks (such as data locality). Furthermore, we also provide two heuristics which take advantage of data-locality. Our strategy has been assessed on a significant problem, Parallel Ray Tracing. Our implementation shows a good scalability, and improves performance in both cheaper commodity cluster and high performance clusters with low latency networks. We report different measurements showing that tasks granularity is a key point for the performances of our decomposition/mapping strategy

Archivio della Ricerca - Università della Basilicata

Archivio della Ricerca - Università di Salerno

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

A hypergraph-partitioning based remapping model for image-space parallel volume rendering

Author: Cambazoğlu Berkant Barla
Publication venue: Bilkent University
Publication date: 01/01/2000
Field of study

Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent Univ., 2000.Thesis (Master's) -- Bilkent University, 2000.Includes bibliographical references leaves 72-76.Cambazoğlu, Berkant BarlaM.S

Bilkent University Institutional Repository