5 research outputs found

    An assumed partition algorithm for determining processor inter-communication

    Get PDF
    The recent advent of parallel machines with tens of thousands of processors is presenting new challenges for obtaining scalability. A particular challenge for large-scale scientific software is determining the inter-processor communications required by the computation when a global description of the data is unavailable or too costly to store. We present a type of rendezvous algorithm that determines communication partners in a scalable manner by assuming the global distribution of the data. We demonstrate the scaling properties of the algorithm on up to 32,000 processors in the context of determining communication patterns for a matrix-vector multiply in the hypre software library. Our algorithm is very general and is applicable to a variety of situations in parallel computing

    Interprocessor Communication with Limited Memory

    No full text
    Abstract—Many parallel applications require periodic redistribution of workloads and associated data. In a distributed memory computer, this redistribution can be difficult if limited memory is available for receiving messages. We propose a model for optimizing the exchange of messages under such circumstances which we call the minimum phase remapping problem. We first show that the problem is NP-Complete, and then analyze several methodologies for addressing it. First, we show how the problem can be phrased as an instance of multicommodity flow. Next, we study a continuous approximation to the problem. We show that this continuous approximation has a solution which requires at most two more phases than the optimal discrete solution, but the question of how to consistently obtain a good discrete solution from the continuous problem remains open. We also devise a simple and practical approximation algorithm for the problem with a bound of 1.5 times the optimal number of phases. We also present an empirical study of variations of our algorithms which indicate that our approaches are quite practical. Index Terms—Interprocessor communication, dynamic load balancing, data migration, scheduling, NP-completeness, approximation algorithms.
    corecore