96,180 research outputs found
Preparing HPC Applications for the Exascale Era: A Decoupling Strategy
Production-quality parallel applications are often a mixture of diverse
operations, such as computation- and communication-intensive, regular and
irregular, tightly coupled and loosely linked operations. In conventional
construction of parallel applications, each process performs all the
operations, which might result inefficient and seriously limit scalability,
especially at large scale. We propose a decoupling strategy to improve the
scalability of applications running on large-scale systems.
Our strategy separates application operations onto groups of processes and
enables a dataflow processing paradigm among the groups. This mechanism is
effective in reducing the impact of load imbalance and increases the parallel
efficiency by pipelining multiple operations. We provide a proof-of-concept
implementation using MPI, the de-facto programming system on current
supercomputers. We demonstrate the effectiveness of this strategy by decoupling
the reduce, particle communication, halo exchange and I/O operations in a set
of scientific and data-analytics applications. A performance evaluation on
8,192 processes of a Cray XC40 supercomputer shows that the proposed approach
can achieve up to 4x performance improvement.Comment: The 46th International Conference on Parallel Processing (ICPP-2017
Towards Distributed Model Transformations with LinTra
Performance and scalability of model transformations are becoming
prominent topics in Model-Driven Engineering. In previous works we introduced
LinTra, a platform for executing model transformations in parallel. LinTra is
based on the Linda model of a coordination language and is intended to be used
as a middleware where high-level model transformation languages are compiled.
In this paper we present the initial results of our analyses on the scalability of
out-place model-to-model transformation executions in LinTra when the models
and the processing elements are distributed over a set of machines.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
Parallel distributed algorithms of the beta-model of the small world graphs
The research goal is to develop a large-scale agent-based simulation environment to support implementations of Internet simulation applications.The Small Worlds (SW) graphs are used to model Web sites and social networks of Internet users. Each vertex represents the identity of a simple agent. In order to cope with scalability issues, we have to consider distributed parallel
processing. The focus of this paper is to present two parallel-distributed algorithms for the construction of a particular type of SW graph called Beta-model. The first algorithm serializes the graph construction, while the second constructs the graph in parallel
A Comparison of Parallel Graph Processing Implementations
The rapidly growing number of large network analysis problems has led to the
emergence of many parallel and distributed graph processing systems---one
survey in 2014 identified over 80. Since then, the landscape has evolved; some
packages have become inactive while more are being developed. Determining the
best approach for a given problem is infeasible for most developers. To enable
easy, rigorous, and repeatable comparison of the capabilities of such systems,
we present an approach and associated software for analyzing the performance
and scalability of parallel, open-source graph libraries. We demonstrate our
approach on five graph processing packages: GraphMat, the Graph500, the Graph
Algorithm Platform Benchmark Suite, GraphBIG, and PowerGraph using synthetic
and real-world datasets. We examine previously overlooked aspects of parallel
graph processing performance such as phases of execution and energy usage for
three algorithms: breadth first search, single source shortest paths, and
PageRank and compare our results to Graphalytics.Comment: 10 pages, 10 figures, Submitted to EuroPar 2017 and rejected. Revised
and submitted to IEEE Cluster 201
A Framework for Megascale Agent Based Model Simulations on Graphics Processing Units
Agent-based modeling is a technique for modeling dynamic systems from the bottom up. Individual elements of the system are represented computationally as agents. The system-level behaviors emerge from the micro-level interactions of the agents. Contemporary state-of-the-art agent-based modeling toolkits are essentially discrete-event simulators designed to execute serially on the Central Processing Unit (CPU). They simulate Agent-Based Models (ABMs) by executing agent actions one at a time. In addition to imposing an un-natural execution order, these toolkits have limited scalability. In this article, we investigate data-parallel computer architectures such as Graphics Processing Units (GPUs) to simulate large scale ABMs. We have developed a series of efficient, data parallel algorithms for handling environment updates, various agent interactions, agent death and replication, and gathering statistics. We present three fundamental innovations that provide unprecedented scalability. The first is a novel stochastic memory allocator which enables parallel agent replication in O(1) average time. The second is a technique for resolving precedence constraints for agent actions in parallel. The third is a method that uses specialized graphics hardware, to gather and process statistical measures. These techniques have been implemented on a modern day GPU resulting in a substantial performance increase. We believe that our system is the first ever completely GPU based agent simulation framework. Although GPUs are the focus of our current implementations, our techniques can easily be adapted to other data-parallel architectures. We have benchmarked our framework against contemporary toolkits using two popular ABMs, namely, SugarScape and StupidModel.GPGPU, Agent Based Modeling, Data Parallel Algorithms, Stochastic Simulations
- …