4,474 research outputs found
A GPU Implementation for Two-Dimensional Shallow Water Modeling
In this paper, we present a GPU implementation of a two-dimensional shallow
water model. Water simulations are useful for modeling floods, river/reservoir
behavior, and dam break scenarios. Our GPU implementation shows vast
performance improvements over the original Fortran implementation. By taking
advantage of the GPU, researchers and engineers will be able to study water
systems more efficiently and in greater detail.Comment: 9 pages, 1 figur
A visual Analytics System for Optimizing Communications in Massively Parallel Applications
Current and future supercomputers have tens of thousands of compute nodes interconnected with high-dimensional networks and complex network topologies for improved performance. Application developers are required to write scalable parallel programs in order to achieve high throughput on these machines. Application performance is largely determined by efficient inter-process communication. A common way to analyze and optimize performance is through profiling parallel codes to identify communication bottlenecks. However, understanding gigabytes of profile data is not a trivial task. In this paper, we present a visual analytics system for identifying the scalability bottlenecks and improving the communication efficiency of massively parallel applications. Visualization methods used in this system are designed to comprehend large-scale and varied communication patterns on thousands of nodes in complex networks such as the 5D torus and the dragonfly. We also present efficient rerouting and remapping algorithms that can be coupled with our interactive visual analytics design for performance optimization. We demonstrate the utility of our system with several case studies using three benchmark applications on two leading supercomputers. The mapping suggestion from our system led to 38% improvement in hop-bytes for MiniAMR application on 4,096 MPI processes.This research has been sponsored in part by the U.S. National Science Foundation through grant IIS-1320229, and the U.S. Department of Energy through grants DE-SC0012610 and DE-SC0014917. This research has been funded in part and used resources of the Argonne Leadership Computing Facility at Argonne National Lab- oratory, which is supported by the Office of Science of the U.S. Department of Energy under contract no. DE-AC02-06CH11357. This work was supported in part by the DOE Office of Science, ASCR, under award numbers 57L38, 57L32, 57L11, 57K50, and 508050
Revisiting Actor Programming in C++
The actor model of computation has gained significant popularity over the
last decade. Its high level of abstraction makes it appealing for concurrent
applications in parallel and distributed systems. However, designing a
real-world actor framework that subsumes full scalability, strong reliability,
and high resource efficiency requires many conceptual and algorithmic additives
to the original model.
In this paper, we report on designing and building CAF, the "C++ Actor
Framework". CAF targets at providing a concurrent and distributed native
environment for scaling up to very large, high-performance applications, and
equally well down to small constrained systems. We present the key
specifications and design concepts---in particular a message-transparent
architecture, type-safe message interfaces, and pattern matching
facilities---that make native actors a viable approach for many robust,
elastic, and highly distributed developments. We demonstrate the feasibility of
CAF in three scenarios: first for elastic, upscaling environments, second for
including heterogeneous hardware like GPGPUs, and third for distributed runtime
systems. Extensive performance evaluations indicate ideal runtime behaviour for
up to 64 cores at very low memory footprint, or in the presence of GPUs. In
these tests, CAF continuously outperforms the competing actor environments
Erlang, Charm++, SalsaLite, Scala, ActorFoundry, and even the OpenMPI.Comment: 33 page
Analysis of Distributed Systems Dynamics with Erlang Performance Lab
Modern, highly concurrent and large-scale systems require new methods for design, testing and monitoring. Their dynamics and scale require real-time tools, providing a holistic view of the whole system and the ability of showing a more detailed view when needed. Such tools can help identifying the causes of unwanted states, which is hardly possible with static analysis or metrics-based approach. In this paper a new tool for analysis of distributed systems in Erlang is presented. It provides real-time monitoring of system dynamics on different levels of abstraction. The tool has been used for analyzing a large-scale urban traffic simulation system running on a cluster of 20 computing nodes
A Visual Analytics Framework for Reviewing Streaming Performance Data
Understanding and tuning the performance of extreme-scale parallel computing
systems demands a streaming approach due to the computational cost of applying
offline algorithms to vast amounts of performance log data. Analyzing large
streaming data is challenging because the rate of receiving data and limited
time to comprehend data make it difficult for the analysts to sufficiently
examine the data without missing important changes or patterns. To support
streaming data analysis, we introduce a visual analytic framework comprising of
three modules: data management, analysis, and interactive visualization. The
data management module collects various computing and communication performance
metrics from the monitored system using streaming data processing techniques
and feeds the data to the other two modules. The analysis module automatically
identifies important changes and patterns at the required latency. In
particular, we introduce a set of online and progressive analysis methods for
not only controlling the computational costs but also helping analysts better
follow the critical aspects of the analysis results. Finally, the interactive
visualization module provides the analysts with a coherent view of the changes
and patterns in the continuously captured performance data. Through a
multi-faceted case study on performance analysis of parallel discrete-event
simulation, we demonstrate the effectiveness of our framework for identifying
bottlenecks and locating outliers.Comment: This is the author's preprint version that will be published in
Proceedings of IEEE Pacific Visualization Symposium, 202
- …