6 research outputs found

    One-Sided Communication for High Performance Computing Applications

    Get PDF
    Thesis (Ph.D.) - Indiana University, Computer Sciences, 2009Parallel programming presents a number of critical challenges to application developers. Traditionally, message passing, in which a process explicitly sends data and another explicitly receives the data, has been used to program parallel applications. With the recent growth in multi-core processors, the level of parallelism necessary for next generation machines is cause for concern in the message passing community. The one-sided programming paradigm, in which only one of the two processes involved in communication actively participates in message transfer, has seen increased interest as a potential replacement for message passing. One-sided communication does not carry the heavy per-message overhead associated with modern message passing libraries. The paradigm offers lower synchronization costs and advanced data manipulation techniques such as remote atomic arithmetic and synchronization operations. These combine to present an appealing interface for applications with random communication patterns, which traditionally present message passing implementations with difficulties. This thesis presents a taxonomy of both the one-sided paradigm and of applications which are ideal for the one-sided interface. Three case studies, based on real-world applications, are used to motivate both taxonomies and verify the applicability of the MPI one-sided communication and Cray SHMEM one-sided interfaces to real-world problems. While our results show a number of short-comings with existing implementations, they also suggest that a number of applications could benefit from the one-sided paradigm. Finally, an implementation of the MPI one-sided interface within Open MPI is presented, which provides a number of unique performance features necessary for efficient use of the one-sided programming paradigm

    A performance evaluation of lock-free synchronization protocols

    No full text
    In this paper, we investigate the practical performance of lock-free techniques that provide synchronization on shared-memory multiprocessors. Our goal is to provide a technique to allow designers of new protocols to quickly determine an algorithm’s performance characteristics. We develop a simple analytical performance model based on the architectural observations that memory accesses are expensive, synchronization instructions are more expensive, and that optimistic synchronization policies result in wasted communication bandwidth which can slow the system as a whole. Using our model, we evaluate the performance of five existing lock-free synchronization protocols. We validate our analysis by comparing our results with simulations of a parallel machine. Given this analysis, we identify those protocols which show promise of good performance in practice. In addition, we note that no existing protocols provide insensitivity to common delays while still offering performance equivalent to locks. Accordingly, we introduce a protocol, based on a combination of existing lock-free techniques, which satisfies these criteria.

    A Performance Evaluation of Lock-Free Synchronization Protocols

    No full text
    In this paper, we investigate the practical performance of lock-free techniques that provide synchronization on shared-memory multiprocessors. Our goal is to provide a technique to allow designers of new protocols to quickly determine an algorithm’s performance characteristics. We develop a simple analytical performance model based on the architectural observations that memory accesses are expensive, synchronization instructions are more expensive, and that optimistic synchronization policies result in wasted communication bandwidth which can slow the system as a whole. Using our model, we evaluate the performance of five existing lock-free synchronization protocols. We validate our analysis by comparing our results with simulations of a parallel machine. Given this analysis, we identify those protocols which show promise of good performance in practice. In addition, we note that no existing protocols provide insensitivity to common delays while still offering performance equivalent to locks. Accordingly, we introduce a protocol, based on a combination of existing lock-free techniques, which satisfies these criteria.
    corecore