3,668 research outputs found
Enabling adaptive scientific workflows via trigger detection
Next generation architectures necessitate a shift away from traditional
workflows in which the simulation state is saved at prescribed frequencies for
post-processing analysis. While the need to shift to in~situ workflows has been
acknowledged for some time, much of the current research is focused on static
workflows, where the analysis that would have been done as a post-process is
performed concurrently with the simulation at user-prescribed frequencies.
Recently, research efforts are striving to enable adaptive workflows, in which
the frequency, composition, and execution of computational and data
manipulation steps dynamically depend on the state of the simulation. Adapting
the workflow to the state of simulation in such a data-driven fashion puts
extremely strict efficiency requirements on the analysis capabilities that are
used to identify the transitions in the workflow. In this paper we build upon
earlier work on trigger detection using sublinear techniques to drive adaptive
workflows. Here we propose a methodology to detect the time when sudden heat
release occurs in simulations of turbulent combustion. Our proposed method
provides an alternative metric that can be used along with our former metric to
increase the robustness of trigger detection. We show the effectiveness of our
metric empirically for predicting heat release for two use cases.Comment: arXiv admin note: substantial text overlap with arXiv:1506.0825
Adaptive planning for distributed systems using goal accomplishment tracking
Goal accomplishment tracking is the process of monitoring the progress of a task or series of tasks towards completing a goal. Goal accomplishment tracking is used to monitor goal progress in a variety of domains, including workflow processing, teleoperation and industrial manufacturing. Practically, it involves the constant monitoring of task execution, analysis of this data to determine the task progress and notification of interested parties. This information is usually used in a passive way to observe goal progress. However, responding to this information may prevent goal failures. In addition, responding proactively in an opportunistic way can also lead to goals being completed faster. This paper proposes an architecture to support the adaptive planning of tasks for fault tolerance or opportunistic task execution based on goal accomplishment tracking. It argues that dramatically increased performance can be gained by monitoring task execution and altering plans dynamically
Diva: A Declarative and Reactive Language for In-Situ Visualization
The use of adaptive workflow management for in situ visualization and
analysis has been a growing trend in large-scale scientific simulations.
However, coordinating adaptive workflows with traditional procedural
programming languages can be difficult because system flow is determined by
unpredictable scientific phenomena, which often appear in an unknown order and
can evade event handling. This makes the implementation of adaptive workflows
tedious and error-prone. Recently, reactive and declarative programming
paradigms have been recognized as well-suited solutions to similar problems in
other domains. However, there is a dearth of research on adapting these
approaches to in situ visualization and analysis. With this paper, we present a
language design and runtime system for developing adaptive systems through a
declarative and reactive programming paradigm. We illustrate how an adaptive
workflow programming system is implemented using our approach and demonstrate
it with a use case from a combustion simulation.Comment: 11 pages, 5 figures, 6 listings, 1 table, to be published in LDAV
2020. The article has gone through 2 major revisions: Emphasized
contributions, features and examples. Addressed connections between DIVA and
FRP. In sec. 3, we fixed a design flaw and addressed it in sec. 3.3-3.4.
Re-designed sec. 5 with a more concrete example and benchmark results.
Simplified the syntax of DIV
ASCR/HEP Exascale Requirements Review Report
This draft report summarizes and details the findings, results, and
recommendations derived from the ASCR/HEP Exascale Requirements Review meeting
held in June, 2015. The main conclusions are as follows. 1) Larger, more
capable computing and data facilities are needed to support HEP science goals
in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of
the demand at the 2025 timescale is at least two orders of magnitude -- and in
some cases greater -- than that available currently. 2) The growth rate of data
produced by simulations is overwhelming the current ability, of both facilities
and researchers, to store and analyze it. Additional resources and new
techniques for data analysis are urgently needed. 3) Data rates and volumes
from HEP experimental facilities are also straining the ability to store and
analyze large and complex data volumes. Appropriately configured
leadership-class facilities can play a transformational role in enabling
scientific discovery from these datasets. 4) A close integration of HPC
simulation and data analysis will aid greatly in interpreting results from HEP
experiments. Such an integration will minimize data movement and facilitate
interdependent workflows. 5) Long-range planning between HEP and ASCR will be
required to meet HEP's research needs. To best use ASCR HPC resources the
experimental HEP program needs a) an established long-term plan for access to
ASCR computational and data resources, b) an ability to map workflows onto HPC
resources, c) the ability for ASCR facilities to accommodate workflows run by
collaborations that can have thousands of individual members, d) to transition
codes to the next-generation HPC platforms that will be available at ASCR
facilities, e) to build up and train a workforce capable of developing and
using simulations and analysis to support HEP scientific research on
next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio
Using simple PID-inspired controllers for online resilient resource management of distributed scientific workflows
Scientific workflows have become mainstream for conducting large-scale scientific research. As a result, many workflow applications and Workflow Management Systems (WMSs) have been developed as part of the cyberinfrastructure to allow scientists to execute their applications seamlessly on a range of distributed platforms. Although the scientific community has addressed this challenge from both theoretical and practical approaches, failure prediction, detection, and recovery still raise many research questions. In this paper, we propose an approach inspired by the control theory developed as part of autonomic computing to predict failures before they happen, and mitigated them when possible. The proposed approach is inspired on the proportional–integral–derivative controller (PID controller) control loop mechanism, which is widely used in industrial control systems, where the controller will react to adjust its output to mitigate faults. PID controllers aim to detect the possibility of a non-steady state far enough in advance so that an action can be performed to prevent it from happening. To demonstrate the feasibility of the approach, we tackle two common execution faults of large scale data-intensive workflows—data storage overload and memory overflow. We developed a simulator, which implements and evaluates simple standalone PID-inspired controllers to autonomously manage data and memory usage of a data-intensive bioinformatics workflow that consumes/produces over 4.4 TB of data, and requires over 24 TB of memory to run all tasks concurrently. Experimental results obtained via simulation indicate that workflow executions may significantly benefit from the controller-inspired approach, in particular under online and unknown conditions. Simulation results show that nearly-optimal executions (slowdown of 1.01) can be attained when using our proposed method, and faults are detected and mitigated far in advance of their occurrence
- …