6,332 research outputs found
Requirements for Provenance on the Web
From where did this tweet originate? Was this quote from the New York Times modified? Daily, we rely on data from the Web but often it is difficult or impossible to determine where it came from or how it was produced. This lack of provenance is particularly evident when people and systems deal with Web information or with any environment where information comes from sources of varying quality. Provenance is not captured pervasively in information systems. There are major technical, social, and economic impediments that stand in the way of using provenance effectively. This paper synthesizes requirements for provenance on the Web for a number of dimensions focusing on three key aspects of provenance: the content of provenance, the management of provenance records, and the uses of provenance information. To illustrate these requirements, we use three synthesized scenarios that encompass provenance problems faced by Web users toda
Enhancing Workflow with a Semantic Description of Scientific Intent
Peer reviewedPreprin
Using Provenance to support Good Laboratory Practice in Grid Environments
Conducting experiments and documenting results is daily business of
scientists. Good and traceable documentation enables other scientists to
confirm procedures and results for increased credibility. Documentation and
scientific conduct are regulated and termed as "good laboratory practice."
Laboratory notebooks are used to record each step in conducting an experiment
and processing data. Originally, these notebooks were paper based. Due to
computerised research systems, acquired data became more elaborate, thus
increasing the need for electronic notebooks with data storage, computational
features and reliable electronic documentation. As a new approach to this, a
scientific data management system (DataFinder) is enhanced with features for
traceable documentation. Provenance recording is used to meet requirements of
traceability, and this information can later be queried for further analysis.
DataFinder has further important features for scientific documentation: It
employs a heterogeneous and distributed data storage concept. This enables
access to different types of data storage systems (e. g. Grid data
infrastructure, file servers). In this chapter we describe a number of building
blocks that are available or close to finished development. These components
are intended for assembling an electronic laboratory notebook for use in Grid
environments, while retaining maximal flexibility on usage scenarios as well as
maximal compatibility overlap towards each other. Through the usage of such a
system, provenance can successfully be used to trace the scientific workflow of
preparation, execution, evaluation, interpretation and archiving of research
data. The reliability of research results increases and the research process
remains transparent to remote research partners.Comment: Book Chapter for "Data Provenance and Data Management for eScience,"
of Studies in Computational Intelligence series, Springer. 25 pages, 8
figure
PAV ontology: provenance, authoring and versioning
Provenance is a critical ingredient for establishing trust of published
scientific content. This is true whether we are considering a data set, a
computational workflow, a peer-reviewed publication or a simple scientific
claim with supportive evidence. Existing vocabularies such as DC Terms and the
W3C PROV-O are domain-independent and general-purpose and they allow and
encourage for extensions to cover more specific needs. We identify the specific
need for identifying or distinguishing between the various roles assumed by
agents manipulating digital artifacts, such as author, contributor and curator.
We present the Provenance, Authoring and Versioning ontology (PAV): a
lightweight ontology for capturing just enough descriptions essential for
tracking the provenance, authoring and versioning of web resources. We argue
that such descriptions are essential for digital scientific content. PAV
distinguishes between contributors, authors and curators of content and
creators of representations in addition to the provenance of originating
resources that have been accessed, transformed and consumed. We explore five
projects (and communities) that have adopted PAV illustrating their usage
through concrete examples. Moreover, we present mappings that show how PAV
extends the PROV-O ontology to support broader interoperability.
The authors strived to keep PAV lightweight and compact by including only
those terms that have demonstrated to be pragmatically useful in existing
applications, and by recommending terms from existing ontologies when
plausible.
We analyze and compare PAV with related approaches, namely Provenance
Vocabulary, DC Terms and BIBFRAME. We identify similarities and analyze their
differences with PAV, outlining strengths and weaknesses of our proposed model.
We specify SKOS mappings that align PAV with DC Terms.Comment: 22 pages (incl 5 tables and 19 figures). Submitted to Journal of
Biomedical Semantics 2013-04-26 (#1858276535979415). Revised article
submitted 2013-08-30. Second revised article submitted 2013-10-06. Accepted
2013-10-07. Author proofs sent 2013-10-09 and 2013-10-16. Published
2013-11-22. Final version 2013-12-06.
http://www.jbiomedsem.com/content/4/1/3
trackr: A Framework for Enhancing Discoverability and Reproducibility of Data Visualizations and Other Artifacts in R
Research is an incremental, iterative process, with new results relying and
building upon previous ones. Scientists need to find, retrieve, understand, and
verify results in order to confidently extend them, even when the results are
their own. We present the trackr framework for organizing, automatically
annotating, discovering, and retrieving results. We identify sources of
automatically extractable metadata for computational results, and we define an
extensible system for organizing, annotating, and searching for results based
on these and other metadata. We present an open-source implementation of these
concepts for plots, computational artifacts, and woven dynamic reports
generated in the R statistical computing language
- …