Search CORE

5 research outputs found

Bandwidth optimal all-reduce algorithms for clusters of workstations

Author: Bar-Noy
Bar-Noy
Bruck
Bruck
Faraj
Faraj
Faraj
Gropp
Iannello
Karwande
Knodel
Lane
Patarasuk
Pitch Patarasuk
Rabenseifner
Rabenseifner
Thakur
van de Geijn
Xin Yuan
Yuan
Publication venue: 'Elsevier BV'
Publication date
Field of study

A Study of Process Arrival Patterns for MPI Collective Operations

Author: Ahmad Faraj
Pitch Patarasuk
Xin Yuan
Publication venue
Publication date: 01/01/2007
Field of study

Process arrival pattern, which denotes the timing when different processes arrive at an MPI collective operation, can have a significant impact on the performance of the operation. In this work, we characterize the process arrival patterns in a set of MPI programs on two common cluster platforms, use a micro-benchmark to study the process arrival patterns in MPI programs with balanced loads, and investigate the impacts of the process arrival pattern on collective algorithms. Our results show that (1) the differences between the times when different processes arrive at a collective operation are usually sufficiently large to significantly affect the performance; (2) application developers in general cannot effectively control the process arrival patterns in their MPI programs in cluster environments: balancing loads at the application level does not balance the process arrival patterns; and (3) the performance of the collective communication algorithms is sensitive to process arrival patterns. These results indicate that the process arrival pattern is an important factor that must be taken into consideration in developing and optimizing MPI collective routines. We propose a scheme that achieves high performance with different process arrival patterns, and demonstrate that by explicitly considering process arrival pattern, more efficient MPI collective routines than the current ones can be obtained

CiteSeerX

Crossref

Software Approaches to Manage Resource Tradeoffs of Power and Energy Constrained Applications

Author: Medhat Ramy
Publication venue: 'University of Waterloo'
Publication date: 01/01/2017
Field of study

Power and energy efficiency have become an increasingly important design metric for a wide spectrum of computing devices. Battery efficiency, which requires a mixture of energy and power efficiency, is exceedingly important especially since there have been no groundbreaking advances in battery capacity recently. The need for energy and power efficiency stretches from small embedded devices to portable computers to large scale data centers. The projected future of computing demand, referred to as exascale computing, demands that researchers find ways to perform exaFLOPs of computation at a power bound much lower than would be required by simply scaling today's standards. There is a large body of work on power and energy efficiency for a wide range of applications and at different levels of abstraction. However, there is a lack of work studying the nuances of different tradeoffs that arise when operating under a power/energy budget. Moreover, there is no work on constructing a generalized model of applications running under power/energy constraints, which allows the designer to optimize their resource consumption, be it power, energy, time, bandwidth, or space. There is need for an efficient model that can provide bounds on the optimality of an application's resource consumption, becoming a basis against which online resource management heuristics can be measured. In this thesis, we tackle the problem of managing resource tradeoffs of power/energy constrained applications. We begin by studying the nuances of power/energy tradeoffs with the response time and throughput of stream processing applications. We then study the power performance tradeoff of batch processing applications to identify a power configuration that maximizes performance under a power bound. Next, we study the tradeoff of power/energy with network bandwidth and precision. Finally, we study how to combine tradeoffs into a generalized model of applications running under resource constraints. The work in this thesis presents detailed studies of the power/energy tradeoff with response time, throughput, performance, network bandwidth, and precision of stream and batch processing applications. To that end, we present an adaptive algorithm that manages stream processing tradeoffs of response time and throughput at the CPU level. At the task-level, we present an online heuristic that adaptively distributes bounded power in a cluster to improve performance, as well as an offline approach to optimally bound performance. We demonstrate how power can be used to reduce bandwidth bottlenecks and extend our offline approach to model bandwidth tradeoffs. Moreover, we present a tool that identifies parts of a program that can be downgraded in precision with minimal impact on accuracy, and maximal impact on energy consumption. Finally, we combine all the above tradeoffs into a flexible model that is efficient to solve and allows for bounding and/or optimizing the consumption of different resources

University of Waterloo's Institutional Repository