47,138 research outputs found
Recommended from our members
Provenance as First Class Cloud Data
Digital provenance is meta-data that describes the ancestry or history of a digital object. Most work on provenance focuses on how provenance increases the value of data to consumers. However, provenance is also valuable to storage providers. For example, provenance can provide hints on access patterns, detect anomalous behavior, and provide enhanced user search capabilities. As the next generation storage providers, cloud vendors are in the unique position to capitalize on this opportunity to incorporate provenance as a fundamental storage system primitive. To date, cloud offerings have not yet done so. We provide motivation for providers to treat provenance as first class data in the cloud and based on our experience with provenance in a local storage system, suggest a set of requirements that make provenance feasible and attractive.Engineering and Applied Science
Technical Note: Comparison of storage strategies of sea surface microlayer samples
The sea surface microlayer (SML) is an important biogeochemical system whose physico-chemical analysis often necessitates some degree of sample storage. However, many SML components degrade with time so the development of optimal storage protocols is paramount. We here briefly review some commonly used treatment and storage protocols. Using freshwater and saline SML samples from a river estuary, we investigated temporal changes in surfactant activity (SA) and the absorbance and fluorescence of chromophoric dissolved organic matter (CDOM) over four weeks, following selected sample treatment and storage protocols. Some variability in the effectiveness of individual protocols most likely reflects sample provenance. None of the various protocols examined performed any better than dark storage at 4 °C without pre-treatment. We therefore recommend storing samples refrigerated in the dark
Distributed storage and queryng techniques for a semantic web of scientific workflow provenance
In scientific workflow environments, scientists depend on provenance, which records the history of an experiment. Resource Description Framework is frequently used to represent provenance based on vocabularies such as the Open Provenance Model. For complex scientific workflows that generate large amounts of RDF triples, single-machine provenance management becomes inadequate over time. In this thesis, we research how HBase capabilities can be leveraged for distributed storage and querying of provenance data represented in RDF. We architect the ProvBase system that incorporates an HBase/Hadoop backend, propose a storage schema to hold provenance triples, and design querying algorithms to evaluate SPARQL queries in the system. We conduct an experimental study to show the feasibility of our approach
Recommended from our members
Layering in Provenance Systems
Digital provenance describes the ancestry or history of a digital object. Most existing provenance systems, however, operate at only one level of abstraction: the sys- tem call layer, a workflow specification, or the high-level constructs of a particular application. The provenance collectable in each of these layers is different, and all of it can be important. Single-layer systems fail to account for the different levels of abstraction at which users need to reason about their data and processes. These systems cannot integrate data provenance across layers and cannot answer questions that require an integrated view of the provenance.
We have designed a provenance collection structure facilitating the integration of provenance across multiple levels of abstraction, including a workflow engine, a web browser, and an initial runtime Python provenance tracking wrapper. We layer these components atop provenance-aware network storage (NFS) that builds upon a Provenance-Aware Storage System (PASS). We discuss the challenges of building systems that integrate provenance across multiple layers of abstraction, present how we augmented systems in each layer to integrate provenance, and present use cases that demonstrate how provenance spanning multiple layers provides functionality not available in existing systems. Our evaluation shows that the overheads imposed by layering provenance systems are reasonable.Engineering and Applied Science
Provenance in scientific workflow systems
Journal ArticleThe automated tracking and storage of provenance information promises to be a major advantage of scientific workflow systems. We discuss issues related to data and workflow provenance, and present techniques for focusing user attention on meaningful provenance through "user views," for managing the provenance of nested scientific data, and for using information about the evolution of a workflow specification to understand the difference in the provenance of similar data products
Presentation Panel on Management and Storage
What data should be kept over the long term? How does one track digital provenance? Should we track digital provenance? What constitutes a master/archival copy? What options are being evaluated for storage of large data sets “in perpetuity”? Cost/benefit analysis for storage solutions? What challenges are repositories facing with this data type?
http://www.dpconline.org/handbook/digital-preservation/why-digital-preservation-matter
Provenance-Aware Sensor Data Storage
Sensor network data has both historical and realtime value. Making historical sensor data useful, in particular, requires storage, naming, and indexing. Sensor data presents new challenges in these areas. Such data is location-specific but also distributed; it is collected in a particular physical location and may be most useful there, but it has additional value when combined with other sensor data collections in a larger distributed system. Thus, arranging location-sensitive peer-to-peer storage is one challenge. Sensor data sets do not have obvious names, so naming them in a globally useful fashion is another challenge. The last challenge arises from the need to index these sensor data sets to make them searchable. The key to sensor data identity is provenance, the full history or lineage of the data. We show how provenance addresses the naming and indexing issues and then present a
research agenda for constructing distributed, indexed repositories of sensor data.Engineering and Applied Science
Provenance-Aware Sensor Data Storage
Sensor network data has both historical and realtime value. Making historical sensor data useful, in particular, requires storage, naming, and indexing. Sensor data presents new challenges in these areas. Such data is location-specific but also distributed; it is collected in a particular physical location and may be most useful there, but it has additional value when combined with other sensor data collections in a larger distributed system. Thus, arranging location-sensitive peer-to-peer storage is one challenge. Sensor data sets do not have obvious names, so naming them in a globally useful fashion is another challenge. The last challenge arises from the need to index these sensor data sets to make them searchable. The key to sensor data identity is provenance, the full history or lineage of the data. We show how provenance addresses the naming and indexing issues and then present a
research agenda for constructing distributed, indexed repositories of sensor data.Engineering and Applied Science
- …