1,372 research outputs found
Provenance Threat Modeling
Provenance systems are used to capture history metadata, applications include
ownership attribution and determining the quality of a particular data set.
Provenance systems are also used for debugging, process improvement,
understanding data proof of ownership, certification of validity, etc. The
provenance of data includes information about the processes and source data
that leads to the current representation. In this paper we study the security
risks provenance systems might be exposed to and recommend security solutions
to better protect the provenance information.Comment: 4 pages, 1 figure, conferenc
The First Provenance Challenge
The first Provenance Challenge was set up in order to provide a forum for the community to help understand the capabilities of different provenance systems and the expressiveness of their provenance representations. To this end, a Functional Magnetic Resonance Imaging workflow was defined, which participants had to either simulate or run in order to produce some provenance representation, from which a set of identified queries had to be implemented and executed. Sixteen teams responded to the challenge, and submitted their inputs. In this paper, we present the challenge workflow and queries, and summarise the participants contributions
Towards Exascale Scientific Metadata Management
Advances in technology and computing hardware are enabling scientists from
all areas of science to produce massive amounts of data using large-scale
simulations or observational facilities. In this era of data deluge, effective
coordination between the data production and the analysis phases hinges on the
availability of metadata that describe the scientific datasets. Existing
workflow engines have been capturing a limited form of metadata to provide
provenance information about the identity and lineage of the data. However,
much of the data produced by simulations, experiments, and analyses still need
to be annotated manually in an ad hoc manner by domain scientists. Systematic
and transparent acquisition of rich metadata becomes a crucial prerequisite to
sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and
domain-agnostic metadata management infrastructure that can meet the demands of
extreme-scale science is notable by its absence.
To address this gap in scientific data management research and practice, we
present our vision for an integrated approach that (1) automatically captures
and manipulates information-rich metadata while the data is being produced or
analyzed and (2) stores metadata within each dataset to permeate
metadata-oblivious processes and to query metadata through established and
standardized data access interfaces. We motivate the need for the proposed
integrated approach using applications from plasma physics, climate modeling
and neuroscience, and then discuss research challenges and possible solutions
Querying and managing opm-compliant scientific workflow provenance
Provenance, the metadata that records the derivation history of scientific results, is important in scientific workflows to interpret, validate, and analyze the result of scientific computing. Recently,
to promote and facilitate interoperability among heterogeneous provenance systems, the Open Provenance Model (OPM) has been proposed and has played an important role in the community.
In this dissertation, to efficiently query and manage OPM-compliant provenance, we first propose a provenance collection framework that collects both prospective provenance, which captures
an abstract workflow specification as a recipe for future data derivation and retrospective provenance, which captures past workflow execution and data derivation information. We then
propose a relational database-based provenance system, called OPMPROV that stores, reasons, and queries prospective and retrospective provenance, which is OPM-compliant provenance. We finally propose OPQL, an OPM-level provenance query language, that is directly defined over the OPM model. An OPQL query takes an
OPM graph as input and produces an OPM graph as output; therefore, OPQL queries are not tightly coupled to the underlying provenance storage strategies. Our provenance store, provenance collection framework, and provenance query language feature the native support of the OPM model
Information provenance for open distributed collaborative system
In autonomously managed distributed systems for collaboration, provenance can facilitate reuse of information that are interchanged, repetition of successful experiments, or to provide evidence for trust mechanisms that certain information existed at a certain period during collaboration. In this paper, we propose domain independent information provenance architecture for open collaborative distributed systems. The proposed system uses XML for interchanging information and RDF to track information provenance. The use of XML and RDF also ensures that information is universally acceptable even among heterogeneous nodes. Our proposed information provenance model can work on any operating systems or workflows.<br /
Visualization of Network Data Provenance
Visualization facilitates the understanding of scientific data both through exploration and explanation of the visualized data. Provenance also contributes to the understanding of data by containing the contributing factors behind a result. The visualization of provenance, although supported in existing workflow management systems, generally focuses on small (medium) sized provenance data, lacking techniques to deal with big data with high complexity. This paper discusses visualization techniques developed for exploration and explanation of provenance, including layout algorithm, visual style, graph abstraction techniques, and graph matching algorithm, to deal with the high complexity. We demonstrate through application to two extensively analyzed case studies that involved provenance capture and use over three year projects, the first involving provenance of a satellite imagery ingest processing pipeline and the other of provenance in a large-scale computer network testbed
- …