6 research outputs found
Recommended from our members
LDRD report : parallel repartitioning for optimal solver performance.
We have developed infrastructure, utilities and partitioning methods to improve data partitioning in linear solvers and preconditioners. Our efforts included incorporation of data repartitioning capabilities from the Zoltan toolkit into the Trilinos solver framework, (allowing dynamic repartitioning of Trilinos matrices); implementation of efficient distributed data directories and unstructured communication utilities in Zoltan and Trilinos; development of a new multi-constraint geometric partitioning algorithm (which can generate one decomposition that is good with respect to multiple criteria); and research into hypergraph partitioning algorithms (which provide up to 56% reduction of communication volume compared to graph partitioning for a number of emerging applications). This report includes descriptions of the infrastructure and algorithms developed, along with results demonstrating the effectiveness of our approaches
Communication Support for Adaptive Computation\Lambda
Ali Pinary and Bruce Hendricksonz 1 Introduction In this work we address two problems associated with redistributing data amongst processors. The first problem is that of determining the inter-processor communication pattern necessary to perform a calculation like matrix-vector multiplication. Consider the situation when a calculation is first described or when it is repartitioned after dynamic load balancing. Processors do not know what communication operations to perform to enable the matrix-vector multiplication to proceed. Assuming the matrix is partitioned by rows, looking at its own domain allows each processor can determine what it wants to receive, but it does not know which processor owns these desired data. We propose a distributed directory algorithm to efficiently determine the communication pattern (i.e., what a processor needs to receive from and send to every other processor). Our experiments show that the proposed algorithm performs efficiently on large numbers of processors