20,501 research outputs found
Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study
Analytic, first-principles performance modeling of distributed-memory
applications is difficult due to a wide spectrum of random disturbances caused
by the application and the system. These disturbances (commonly called "noise")
destroy the assumptions of regularity that one usually employs when
constructing simple analytic models. Despite numerous efforts to quantify,
categorize, and reduce such effects, a comprehensive quantitative understanding
of their performance impact is not available, especially for long delays that
have global consequences for the parallel application. In this work, we
investigate various traces collected from synthetic benchmarks that mimic real
applications on simulated and real message-passing systems in order to pinpoint
the mechanisms behind delay propagation. We analyze the dependence of the
propagation speed of idle waves emanating from injected delays with respect to
the execution and communication properties of the application, study how such
delays decay under increased noise levels, and how they interact with each
other. We also show how fine-grained noise can make a system immune against the
adverse effects of propagating idle waves. Our results contribute to a better
understanding of the collective phenomena that manifest themselves in
distributed-memory parallel applications.Comment: 10 pages, 9 figures; title change
Parallel Anisotropic Unstructured Grid Adaptation
Computational Fluid Dynamics (CFD) has become critical to the design and analysis of aerospace vehicles. Parallel grid adaptation that resolves multiple scales with anisotropy is identified as one of the challenges in the CFD Vision 2030 Study to increase the capacity and capability of CFD simulation. The Study also cautions that computer architectures are undergoing a radical change and dramatic increases in algorithm concurrency will be required to exploit full performance. This paper reviews four different methods to parallel anisotropic grid generation. They cover both ends of the spectrum: (i) using existing state-of-the-art software optimized for a single core and modifying it for parallel platforms and (ii) designing and implementing scalable software with incomplete, but rapidly maturating functionality. A brief overview for each grid adaptation system is presented in the context of a telescopic approach for multilevel concurrency. These methods employ different approaches to enable parallel execution, which provides a unique opportunity to illustrate the relative behavior of each approach. Qualitative and quantitative metric evaluations are used to draw lessons for future developments in this critical area for parallel CFD simulation
Parallelizing RRT on distributed-memory architectures
This paper addresses the problem of improving the performance of the Rapidly-exploring Random Tree (RRT) algorithm by parallelizing it. For scalability reasons we do so on a distributed-memory architecture, using the message-passing paradigm. We present three parallel versions of RRT along with the technicalities involved in their implementation. We also evaluate the algorithms and study how they behave on different motion planning problems
The Scalability-Efficiency/Maintainability-Portability Trade-off in Simulation Software Engineering: Examples and a Preliminary Systematic Literature Review
Large-scale simulations play a central role in science and the industry.
Several challenges occur when building simulation software, because simulations
require complex software developed in a dynamic construction process. That is
why simulation software engineering (SSE) is emerging lately as a research
focus. The dichotomous trade-off between scalability and efficiency (SE) on the
one hand and maintainability and portability (MP) on the other hand is one of
the core challenges. We report on the SE/MP trade-off in the context of an
ongoing systematic literature review (SLR). After characterizing the issue of
the SE/MP trade-off using two examples from our own research, we (1) review the
33 identified articles that assess the trade-off, (2) summarize the proposed
solutions for the trade-off, and (3) discuss the findings for SSE and future
work. Overall, we see evidence for the SE/MP trade-off and first solution
approaches. However, a strong empirical foundation has yet to be established;
general quantitative metrics and methods supporting software developers in
addressing the trade-off have to be developed. We foresee considerable future
work in SSE across scientific communities.Comment: 9 pages, 2 figures. Accepted for presentation at the Fourth
International Workshop on Software Engineering for High Performance Computing
in Computational Science and Engineering (SEHPCCSE 2016
A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures
Scientific problems that depend on processing large amounts of data require
overcoming challenges in multiple areas: managing large-scale data
distribution, co-placement and scheduling of data with compute resources, and
storing and transferring large volumes of data. We analyze the ecosystems of
the two prominent paradigms for data-intensive applications, hereafter referred
to as the high-performance computing and the Apache-Hadoop paradigm. We propose
a basis, common terminology and functional factors upon which to analyze the
two approaches of both paradigms. We discuss the concept of "Big Data Ogres"
and their facets as means of understanding and characterizing the most common
application workloads found across the two paradigms. We then discuss the
salient features of the two paradigms, and compare and contrast the two
approaches. Specifically, we examine common implementation/approaches of these
paradigms, shed light upon the reasons for their current "architecture" and
discuss some typical workloads that utilize them. In spite of the significant
software distinctions, we believe there is architectural similarity. We discuss
the potential integration of different implementations, across the different
levels and components. Our comparison progresses from a fully qualitative
examination of the two paradigms, to a semi-quantitative methodology. We use a
simple and broadly used Ogre (K-means clustering), characterize its performance
on a range of representative platforms, covering several implementations from
both paradigms. Our experiments provide an insight into the relative strengths
of the two paradigms. We propose that the set of Ogres will serve as a
benchmark to evaluate the two paradigms along different dimensions.Comment: 8 pages, 2 figure
- …