93 research outputs found
Improving the scalability of parallel N-body applications with an event driven constraint based execution model
The scalability and efficiency of graph applications are significantly
constrained by conventional systems and their supporting programming models.
Technology trends like multicore, manycore, and heterogeneous system
architectures are introducing further challenges and possibilities for emerging
application domains such as graph applications. This paper explores the space
of effective parallel execution of ephemeral graphs that are dynamically
generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The
workloads are expressed using the semantics of an Exascale computing execution
model called ParalleX. For comparison, results using conventional execution
model semantics are also presented. We find improved load balancing during
runtime and automatic parallelism discovery improving efficiency using the
advanced semantics for Exascale computing.Comment: 11 figure
UPC++: A high-performance communication framework for asynchronous computation
UPC++ is a C++ library that supports high-performance computation via an asynchronous communication framework. This paper describes a new incarnation that differs substantially from its predecessor, and we discuss the reasons for our design decisions. We present new design features, including future-based asynchrony management, distributed objects, and generalized Remote Procedure Call (RPC). We show microbenchmark performance results demonstrating that one-sided Remote Memory Access (RMA) in UPC++ is competitive with MPI-3 RMA; on a Cray XC40 UPC++ delivers up to a 25% improvement in the latency of blocking RMA put, and up to a 33% bandwidth improvement in an RMA throughput test. We showcase the benefits of UPC++ with irregular applications through a pair of application motifs, a distributed hash table and a sparse solver component. Our distributed hash table in UPC++ delivers near-linear weak scaling up to 34816 cores of a Cray XC40. Our UPC++ implementation of the sparse solver component shows robust strong scaling up to 2048 cores, where it outperforms variants communicating using MPI by up to 3.1x. UPC++ encourages the use of aggressive asynchrony in low-overhead RMA and RPC, improving programmer productivity and delivering high performance in irregular applications
GASNet Specification, v1.1
This document has been superseded by:
GASNet Specification, v1.8.1
(LBNL-2001064)
https://escholarship.org/uc/item/03b5g0q4
This GASNet specification describes a network-independent and language-independent high-performance communication interface intended for use in implementing the runtime system for global address space languages (such as UPC or Titanium)
Recommended from our members
Bulk file I/O extensions to Java
The file I/O classes present in Java have proven too inefficient to meet the demands of high-performance applications that perform large amounts of I/O. The inefficiencies stem primarily from the library interface which requires programs to read arrays a single element at a time. We present two extensions to the Java I/O libraries which alleviate this problem. The first adds bulk (array) I/O operations to the existing libraries, removing much of the overhead currently associated with array I/O. The second is a new library that adds direct support for asynchronous I/O to enable masking I/O latency with overlapped computation. The extensions were implemented in Titanium, a high-performance, parallel dialect of Java. We present experimental results that compare the performance of the extensions with the existing I/O libraries on a simple, external merge sort application. The results demonstrate that our extensions deliver vastly superior I/O performance for this array-based application
- …