3,469 research outputs found
TAP: Time-Aware Provenance for Distributed Systems
In this paper, we explore the use of provenance for analyzing execution dynamics in distributed systems. We argue that provenance could have significant practical benefits for system administrators, e.g., for reasoning about changes in a system’s state, diagnosing protocol misconfigurations, detecting intrusions, and pinpointing performance bottlenecks. However, to realize this vision, we must revisit several aspects of provenance management. As a first step, we present time-aware provenance (TAP), an enhanced provenance model that explicitly represents time, distributed state, and state changes. We outline our research agenda towards developing novel query processing, languages, and optimization techniques that can be used to efficiently and securely query time-aware provenance, even in the presence of transient state or untrusted nodes
Data Workflow - A Workflow Model for Continuous Data Processing
Online data or streaming data are getting more and more important for enterprise information systems, e.g. by integrating sensor data and workflows. The continuous flow of data provided e.g. by sensors requires new workflow models addressing the data perspective of these applications, since continuous data is potentially infinite while business process instances are always finite.\ud
In this paper a formal workflow model is proposed with data driven coordination and explicating properties of the continuous data processing. These properties can be used to optimize data workflows, i.e., reducing the computational power for processing the workflows in an engine by reusing intermediate processing results in several workflows
Towards Exascale Scientific Metadata Management
Advances in technology and computing hardware are enabling scientists from
all areas of science to produce massive amounts of data using large-scale
simulations or observational facilities. In this era of data deluge, effective
coordination between the data production and the analysis phases hinges on the
availability of metadata that describe the scientific datasets. Existing
workflow engines have been capturing a limited form of metadata to provide
provenance information about the identity and lineage of the data. However,
much of the data produced by simulations, experiments, and analyses still need
to be annotated manually in an ad hoc manner by domain scientists. Systematic
and transparent acquisition of rich metadata becomes a crucial prerequisite to
sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and
domain-agnostic metadata management infrastructure that can meet the demands of
extreme-scale science is notable by its absence.
To address this gap in scientific data management research and practice, we
present our vision for an integrated approach that (1) automatically captures
and manipulates information-rich metadata while the data is being produced or
analyzed and (2) stores metadata within each dataset to permeate
metadata-oblivious processes and to query metadata through established and
standardized data access interfaces. We motivate the need for the proposed
integrated approach using applications from plasma physics, climate modeling
and neuroscience, and then discuss research challenges and possible solutions
- …