3,629 research outputs found
PAV ontology: provenance, authoring and versioning
Provenance is a critical ingredient for establishing trust of published
scientific content. This is true whether we are considering a data set, a
computational workflow, a peer-reviewed publication or a simple scientific
claim with supportive evidence. Existing vocabularies such as DC Terms and the
W3C PROV-O are domain-independent and general-purpose and they allow and
encourage for extensions to cover more specific needs. We identify the specific
need for identifying or distinguishing between the various roles assumed by
agents manipulating digital artifacts, such as author, contributor and curator.
We present the Provenance, Authoring and Versioning ontology (PAV): a
lightweight ontology for capturing just enough descriptions essential for
tracking the provenance, authoring and versioning of web resources. We argue
that such descriptions are essential for digital scientific content. PAV
distinguishes between contributors, authors and curators of content and
creators of representations in addition to the provenance of originating
resources that have been accessed, transformed and consumed. We explore five
projects (and communities) that have adopted PAV illustrating their usage
through concrete examples. Moreover, we present mappings that show how PAV
extends the PROV-O ontology to support broader interoperability.
The authors strived to keep PAV lightweight and compact by including only
those terms that have demonstrated to be pragmatically useful in existing
applications, and by recommending terms from existing ontologies when
plausible.
We analyze and compare PAV with related approaches, namely Provenance
Vocabulary, DC Terms and BIBFRAME. We identify similarities and analyze their
differences with PAV, outlining strengths and weaknesses of our proposed model.
We specify SKOS mappings that align PAV with DC Terms.Comment: 22 pages (incl 5 tables and 19 figures). Submitted to Journal of
Biomedical Semantics 2013-04-26 (#1858276535979415). Revised article
submitted 2013-08-30. Second revised article submitted 2013-10-06. Accepted
2013-10-07. Author proofs sent 2013-10-09 and 2013-10-16. Published
2013-11-22. Final version 2013-12-06.
http://www.jbiomedsem.com/content/4/1/3
Chemical information matters: an e-Research perspective on information and data sharing in the chemical sciences
Recently, a number of organisations have called for open access to scientific information and especially to the data obtained from publicly funded research, among which the Royal Society report and the European Commission press release are particularly notable. It has long been accepted that building research on the foundations laid by other scientists is both effective and efficient. Regrettably, some disciplines, chemistry being one, have been slow to recognise the value of sharing and have thus been reluctant to curate their data and information in preparation for exchanging it. The very significant increases in both the volume and the complexity of the datasets produced has encouraged the expansion of e-Research, and stimulated the development of methodologies for managing, organising, and analysing "big data". We review the evolution of cheminformatics, the amalgam of chemistry, computer science, and information technology, and assess the wider e-Science and e-Research perspective. Chemical information does matter, as do matters of communicating data and collaborating with data. For chemistry, unique identifiers, structure representations, and property descriptors are essential to the activities of sharing and exchange. Open science entails the sharing of more than mere facts: for example, the publication of negative outcomes can facilitate better understanding of which synthetic routes to choose, an aspiration of the Dial-a-Molecule Grand Challenge. The protagonists of open notebook science go even further and exchange their thoughts and plans. We consider the concepts of preservation, curation, provenance, discovery, and access in the context of the research lifecycle, and then focus on the role of metadata, particularly the ontologies on which the emerging chemical Semantic Web will depend. Among our conclusions, we present our choice of the "grand challenges" for the preservation and sharing of chemical information
Ontology Summit 2008 Communiqué: Towards an open ontology repository
Each annual Ontology Summit initiative makes a statement appropriate to each Summits theme as part of our general advocacy designed to bring ontology science and engineering into the mainstream. The theme this year is "Towards an Open Ontology Repository". This communiqué represents the joint position of those who were engaged in the year's summit discourse on an Open Ontology Repository (OOR) and of those who endorse below. In this discussion, we have agreed that an "ontology repository is a facility where ontologies and related information artifacts can be stored, retrieved and managed."
We believe in the promise of semantic technologies based on logic, databases and the Semantic Web, a Web of exposed data and of interpretations of that data (i.e., of semantics), using common standards. Such technologies enable distinguishable, computable, reusable, and sharable meaning of Web and other artifacts, including data, documents, and services. We also believe that making that vision a reality requires additional supporting resources and these resources should be open, extensible, and provide common services over the ontologies
The lifecycle of provenance metadata and its associated challenges and opportunities
This chapter outlines some of the challenges and opportunities associated
with adopting provenance principles and standards in a variety of disciplines,
including data publication and reuse, and information sciences
A Linked Data Approach to Sharing Workflows and Workflow Results
A bioinformatics analysis pipeline is often highly elaborate, due to the inherent complexity of biological systems and the variety and size of datasets. A digital equivalent of the ‘Materials and Methods’ section in wet laboratory publications would be highly beneficial to bioinformatics, for evaluating evidence and examining data across related experiments, while introducing the potential to find associated resources and integrate them as data and services. We present initial steps towards preserving bioinformatics ‘materials and methods’ by exploiting the workflow paradigm for capturing the design of a data analysis pipeline, and RDF to link the workflow, its component services, run-time provenance, and a personalized biological interpretation of the results. An example shows the reproduction of the unique graph of an analysis procedure, its results, provenance, and personal interpretation of a text mining experiment. It links data from Taverna, myExperiment.org, BioCatalogue.org, and ConceptWiki.org. The approach is relatively ‘light-weight’ and unobtrusive to bioinformatics users
Recommended from our members
FAIR principles and the IEDB: short-term improvements and a long-term vision of OBO-foundry mediated machine-actionable interoperability.
The Immune Epitope Database (IEDB), at www.iedb.org, has the mission to make published experimental data relating to the recognition of immune epitopes easily available to the scientific public. By presenting curated data in a searchable database, we have liberated it from the tables and figures of journal articles, making it more accessible and usable by immunologists. Recently, the principles of Findability, Accessibility, Interoperability and Reusability have been formulated as goals that data repositories should meet to enhance the usefulness of their data holdings. We here examine how the IEDB complies with these principles and identify broad areas of success, but also areas for improvement. We describe short-term improvements to the IEDB that are being implemented now, as well as a long-term vision of true 'machine-actionable interoperability', which we believe will require community agreement on standardization of knowledge representation that can be built on top of the shared use of ontologies
The Role of Provenance Management in Accelerating the Rate of Astronomical Research
The availability of vast quantities of data through electronic archives has
transformed astronomical research. It has also enabled the creation of new
products, models and simulations, often from distributed input data and models,
that are themselves made electronically available. These products will only
provide maximal long-term value to astronomers when accompanied by records of
their provenance; that is, records of the data and processes used in the
creation of such products. We use the creation of image mosaics with the
Montage grid-enabled mosaic engine to emphasize the necessity of provenance
management and to understand the science requirements that higher-level
products impose on provenance management technologies. We describe experiments
with one technology, the "Provenance Aware Service Oriented Architecture"
(PASOA), that stores provenance information at each step in the computation of
a mosaic. The results inform the technical specifications of provenance
management systems, including the need for extensible systems built on common
standards. Finally, we describe examples of provenance management technology
emerging from the fields of geophysics and oceanography that have applicability
to astronomy applications.Comment: 8 pages, 1 figure; Proceedings of Science, 201
Community next steps for making globally unique identifiers work for biocollections data
Biodiversity data is being digitized and made available online at a rapidly increasing rate but current practices typically do not preserve linkages between these data, which impedes interoperation, provenance tracking, and assembly of larger datasets. For data associated with biocollections, the biodiversity community has long recognized that an essential part of establishing and preserving linkages is to apply globally unique identifiers at the point when data are generated in the field and to persist these identifiers downstream, but this is seldom implemented in practice. There has neither been coalescence towards one single identifier solution (as in some other domains), nor even a set of recommended best practices and standards to support multiple identifier schemes sharing consistent responses. In order to further progress towards a broader community consensus, a group of biocollections and informatics experts assembled in Stockholm in October 2014 to discuss community next steps to overcome current roadblocks. The workshop participants divided into four groups focusing on: identifier practice in current field biocollections; identifier application for legacy biocollections; identifiers as applied to biodiversity data records as they are published and made available in semantically marked-up publications; and cross-cutting identifier solutions that bridge across these domains. The main outcome was consensus on key issues, including recognition of differences between legacy and new biocollections processes, the need for identifier metadata profiles that can report information on identifier persistence missions, and the unambiguous indication of the type of object associated with the identifier. Current identifier characteristics are also summarized, and an overview of available schemes and practices is provided
- …