6,489 research outputs found
{MAHEVE}: An Efficient Reliable Mapping of Asynchronous Iterative Applications on volatile and Heterogeneous Environments
International audienceThe asynchronous iteration model, called AIAC, has been proven to be an efficient solution for heterogeneous and distributed architectures. An efficient mapping of application tasks is essential to reduce their execution time. In this paper we present a new mapping algorithm, called MAHEVE (Mapping Algorithm for HEterogeneous and Volatile Environments) which is efficient on such architectures and integrates a fault tolerance mechanism to resist computing node failures. Our experiments show gains on a typical AIAC application execution time up to 65%, executed on distributed clusters architectures containing more than 400 computing cores with the JaceP2P-V2 environment
Performance Testing of Distributed Component Architectures
Performance characteristics, such as response time, throughput andscalability, are key quality attributes of distributed applications. Current practice,however, rarely applies systematic techniques to evaluate performance characteristics.We argue that evaluation of performance is particularly crucial in early developmentstages, when important architectural choices are made. At first glance, thiscontradicts the use of testing techniques, which are usually applied towards the endof a project. In this chapter, we assume that many distributed systems are builtwith middleware technologies, such as the Java 2 Enterprise Edition (J2EE) or theCommon Object Request Broker Architecture (CORBA). These provide servicesand facilities whose implementations are available when architectures are defined.We also note that it is the middleware functionality, such as transaction and persistenceservices, remote communication primitives and threading policy primitives,that dominates distributed system performance. Drawing on these observations, thischapter presents a novel approach to performance testing of distributed applications.We propose to derive application-specific test cases from architecture designs so thatthe performance of a distributed application can be tested based on the middlewaresoftware at early stages of a development process. We report empirical results thatsupport the viability of the approach
Improving the scalability of parallel N-body applications with an event driven constraint based execution model
The scalability and efficiency of graph applications are significantly
constrained by conventional systems and their supporting programming models.
Technology trends like multicore, manycore, and heterogeneous system
architectures are introducing further challenges and possibilities for emerging
application domains such as graph applications. This paper explores the space
of effective parallel execution of ephemeral graphs that are dynamically
generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The
workloads are expressed using the semantics of an Exascale computing execution
model called ParalleX. For comparison, results using conventional execution
model semantics are also presented. We find improved load balancing during
runtime and automatic parallelism discovery improving efficiency using the
advanced semantics for Exascale computing.Comment: 11 figure
Gunrock: GPU Graph Analytics
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs, have presented two
significant challenges to developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We characterize the performance of
various optimization strategies and evaluate Gunrock's overall performance on
different GPU architectures on a wide range of graph primitives that span from
traversal-based algorithms and ranking algorithms, to triangle counting and
bipartite-graph-based algorithms. The results show that on a single GPU,
Gunrock has on average at least an order of magnitude speedup over Boost and
PowerGraph, comparable performance to the fastest GPU hardwired primitives and
CPU shared-memory graph libraries such as Ligra and Galois, and better
performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing
(TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance
Graph Processing Library on the GPU
- …