1,122 research outputs found
Invasive compute balancing for applications with shared and hybrid parallelization
This is the author manuscript. The final version is available from the publisher via the DOI in this record.Achieving high scalability with dynamically adaptive algorithms in high-performance computing (HPC) is a non-trivial task. The invasive paradigm using compute migration represents an efficient alternative to classical data migration approaches for such algorithms in HPC. We present a core-distribution scheduler which realizes the migration of computational power by distributing the cores depending on the requirements specified by one or more parallel program instances. We validate our approach with different benchmark suites for simulations with artificial workload as well as applications based on dynamically adaptive shallow water simulations, and investigate concurrently executed adaptivity parameter studies on realistic Tsunami simulations. The invasive approach results in significantly faster overall execution times and higher hardware utilization than alternative approaches. A dynamic resource management is therefore mandatory for a more efficient execution of scenarios similar to our simulations, e.g. several Tsunami simulations in urgent computing, to overcome strong scalability challenges in the area of HPC. The optimizations obtained by invasive migration of cores can be generalized to similar classes of algorithms with dynamic resource requirements.This work was supported by the German Research Foundation (DFG) as part
of the Transregional Collaborative Research Centre ”Invasive Computing”
(SFB/TR 89)
Evaluation of an efficient etack-RLE clustering concept for dynamically adaptive grids
This is the author accepted manuscript. The final version is available from the Society for Industrial and Applied Mathematics via the DOI in this record.Abstract.
One approach to tackle the challenge of efficient implementations for parallel PDE simulations
on dynamically changing grids is the usage of space-filling curves (SFC). While SFC algorithms
possess advantageous properties such as low memory requirements and close-to-optimal partitioning
approaches with linear complexity, they require efficient communication strategies for keeping and
utilizing the connectivity information, in particular for dynamically changing grids. Our approach
is to use a sparse communication graph to store the connectivity information and to transfer data
block-wise. This permits efficient generation of multiple partitions per memory context (denoted
by clustering) which - in combination with a run-length encoding (RLE) - directly leads to elegant
solutions for shared, distributed and hybrid parallelization and allows cluster-based optimizations.
While previous work focused on specific aspects, we present in this paper an overall compact
summary of the stack-RLE clustering approach completed by aspects on the vertex-based communication
that ease up understanding the approach. The central contribution of this work is the proof
of suitability of the stack-RLE clustering approach for an efficient realization of different, relevant
building blocks of Scientific Computing methodology and real-life CSE applications: We show 95%
strong scalability for small-scale scalability benchmarks on 512 cores and weak scalability of over 90%
on 8192 cores for finite-volume solvers and changing grid structure in every time step; optimizations
of simulation data backends by writer tasks; comparisons of analytical benchmarks to analyze the
adaptivity criteria; and a Tsunami simulation as a representative real-world showcase of a wave propagation
for our approach which reduces the overall workload by 95% for parallel fully-adaptive mesh
refinement and, based on a comparison with SFC-ordered regular grid cells, reduces the computation
time by a factor of 7.6 with improved results and a factor of 62.2 with results of similar accuracy of
buoy station dataThis work was partly supported by the German Research
Foundation (DFG) as part of the Transregional Collaborative Research Centre “Invasive
Computing” (SFB/TR 89)
Checkpoint and run-time adaptation with pluggable parallelisation
Enabling applications for computational Grids requires new approaches to develop applications that can effectively cope with resource volatility. Applications must be resilient to resource faults, adapting the behaviour to available resources. This paper describes an approach to application-level adaptation that efficiently supports application-level checkpointing. The key of this work is the concept of pluggable parallelisation, which localises parallelisation issues into multiple modules that can be (un)plugged to match resource availability. This paper shows how pluggable parallelisation can be extended to effectively support checkpointing and run-time adaptation. We present the developed pluggable mechanism that helps the programmer to include checkpointing in the base (sequential). Based on these mechanisms and on previous work on pluggable parallelisation, our approach is able to automatically add support for checkpointing in parallel execution environments. Moreover, applications can adapt from a sequential execution to a multi-cluster configuration. Adaptation can be performed by checkpointing the application and restarting on a different mode or can be performed during run-time. Pluggable parallelisation intrinsically promotes the separation of software functionality from fault-tolerance and adaptation issues facilitating their analysis and evolution. The work presented in this paper reinforces this idea by showing the feasibility of the approach and performance benefits that can be achieved.(undefined
Enhancing Energy Production with Exascale HPC Methods
High Performance Computing (HPC) resources have become the key actor for achieving more ambitious challenges in many disciplines. In this step beyond, an explosion on the available parallelism and the use of special purpose
processors are crucial. With such a goal, the HPC4E project applies new exascale HPC techniques to energy industry simulations, customizing them if necessary, and going beyond the state-of-the-art in the required HPC exascale
simulations for different energy sources. In this paper, a general overview of these methods is presented as well as some specific preliminary results.The research leading to these results has received funding from the European Union's Horizon 2020 Programme (2014-2020) under the HPC4E Project (www.hpc4e.eu), grant agreement n° 689772, the Spanish Ministry of
Economy and Competitiveness under the CODEC2 project (TIN2015-63562-R), and
from the Brazilian Ministry of Science, Technology and Innovation through Rede
Nacional de Pesquisa (RNP). Computer time on Endeavour cluster is provided by the
Intel Corporation, which enabled us to obtain the presented experimental results in
uncertainty quantification in seismic imagingPostprint (author's final draft
Parallel computing 2011, ParCo 2011: book of abstracts
This book contains the abstracts of the presentations at the conference Parallel Computing 2011, 30 August - 2 September 2011, Ghent, Belgiu
Developing Real-Time Emergency Management Applications: Methodology for a Novel Programming Model Approach
The last years have been characterized by the arising of highly distributed computing
platforms composed of a heterogeneity of computing and communication resources including
centralized high-performance computing architectures (e.g. clusters or large shared-memory
machines), as well as multi-/many-core components also integrated into mobile nodes
and network facilities. The emerging of computational paradigms such as Grid and Cloud
Computing, provides potential solutions to integrate such platforms with data systems, natural
phenomena simulations, knowledge discovery and decision support systems responding to a
dynamic demand of remote computing and communication resources and services.
In this context time-critical applications, notably emergency management systems, are
composed of complex sets of application components specialized for executing specific
computations, which are able to cooperate in such a way as to perform a global goal in a
distributed manner. Since the last years the scientific community has been involved in facing
with the programming issues of distributed systems, aimed at the definition of applications
featuring an increasing complexity in the number of distributed components, in the spatial
distribution and cooperation between interested parties and in their degree of heterogeneity.
Over the last decade the research trend in distributed computing has been focused on
a crucial objective. The wide-ranging composition of distributed platforms in terms of
different classes of computing nodes and network technologies, the strong diffusion of
applications that require real-time elaborations and online compute-intensive processing as
in the case of emergency management systems, lead to a pronounced tendency of systems
towards properties like self-managing, self-organization, self-controlling and strictly speaking
adaptivity.
Adaptivity implies the development, deployment, execution and management of applications
that, in general, are dynamic in nature. Dynamicity concerns the number and the specific
identification of cooperating components, the deployment and composition of the most
suitable versions of software components on processing and networking resources and
services, i.e., both the quantity and the quality of the application components to achieve
the needed Quality of Service (QoS). In time-critical applications the QoS specification
can dynamically vary during the execution, according to the user intentions and the
Developing Real-Time Emergency
Management Applications: Methodology for
a Novel Programming Model Approach
Gabriele Mencagli and Marco Vanneschi
Department of Computer Science, University of Pisa, L. Bruno Pontecorvo, Pisa
Italy
2
2 Will-be-set-by-IN-TECH
information produced by sensors and services, as well as according to the monitored state
and performance of networks and nodes.
The general reference point for this kind of systems is the Grid paradigm which, by
definition, aims to enable the access, selection and aggregation of a variety of distributed and
heterogeneous resources and services. However, though notable advancements have been
achieved in recent years, current Grid technology is not yet able to supply the needed software
tools with the features of high adaptivity, ubiquity, proactivity, self-organization, scalability
and performance, interoperability, as well as fault tolerance and security, of the emerging
applications.
For this reason in this chapter we will study a methodology for designing high-performance
computations able to exploit the heterogeneity and dynamicity of distributed environments
by expressing adaptivity and QoS-awareness directly at the application level. An effective
approach needs to address issues like QoS predictability of different application configurations
as well as the predictability of reconfiguration costs. Moreover adaptation strategies need to
be developed assuring properties like the stability degree of a reconfiguration decision and the
execution optimality (i.e. select reconfigurations accounting proper trade-offs among different
QoS objectives). In this chapter we will present the basic points of a novel approach that lays
the foundations for future programming model environments for time-critical applications
such as emergency management systems.
The organization of this chapter is the following. In Section 2 we will compare the existing
research works for developing adaptive systems in critical environments, highlighting their
drawbacks and inefficiencies. In Section 3, in order to clarify the application scenarios that
we are considering, we will present an emergency management system in which the run-time
selection of proper application configuration parameters is of great importance for meeting the
desired QoS constraints. In Section 4we will describe the basic points of our approach in terms
of how compute-intensive operations can be programmed, how they can be dynamically
modified and how adaptation strategies can be expressed. In Section 5 our approach will
be contextualize to the definition of an adaptive parallel module, which is a building block
for composing complex and distributed adaptive computations. Finally in Section 6 we will
describe a set of experimental results that show the viability of our approach and in Section 7
we will give the concluding remarks of this chapter
Shortening Time-to-Discovery with Dynamic Software Updates for Parallel High Performance Applications
Despite using multiple concurrent processors, a typical high performance parallel application is long-running, taking hours, even days to arrive at a solution. To modify a running high performance parallel application, the programmer has to stop the computation, change the code, redeploy, and enqueue the updated version to be scheduled to run, thus wasting not only the programmer’s time, but also expensive computing resources. To address these inefficiencies, this article describes how dynamic software updates can be used to modify a parallel application on the fly, thus saving the programmer’s time and using expensive computing resources more productively. The net effect of updating parallel applications dynamically reduces their time-to-discovery metrics, the total time it takes from posing a problem to arriving at a solution. To explore the benefits of dynamic updates for high performance applications, this article takes a two-pronged approach. First, we describe our experience in building and evaluating a system for dynamically updating applications running on a parallel cluster. We then review a large body of literature describing the existing state of the art in dynamic software updates and point out how this research can be applied to high performance applications. Our experimental results indicate that dynamic software updates have the potential to become a powerful tool in reducing the time-to-discovery metrics for high performance parallel applications
- …