4,299 research outputs found

    A model of provenance applied to biodiversity datasets

    Get PDF
    Nowadays, the Web has become one of the main sources of biodiversity information. An increasing number of biodiversity research institutions add new specimens and their related information to their biological collections and make this information available on the Web. However, mechanisms which are currently available provide insufficient provenance of biodiversity information. In this paper, we propose a new biodiversity provenance model extending the W3C PROV Data Model. Biodiversity data is mapped to terms from relevant ontologies, such as Dublin Core and GeoSPARQL, stored in triple stores and queried using SPARQL endpoints. Additionally, we provide a use case using our provenance model to enrich collection data

    Searching Data: A Review of Observational Data Retrieval Practices in Selected Disciplines

    Get PDF
    A cross-disciplinary examination of the user behaviours involved in seeking and evaluating data is surprisingly absent from the research data discussion. This review explores the data retrieval literature to identify commonalities in how users search for and evaluate observational research data. Two analytical frameworks rooted in information retrieval and science technology studies are used to identify key similarities in practices as a first step toward developing a model describing data retrieval

    Theory and Practice of Data Citation

    Full text link
    Citations are the cornerstone of knowledge propagation and the primary means of assessing the quality of research, as well as directing investments in science. Science is increasingly becoming "data-intensive", where large volumes of data are collected and analyzed to discover complex patterns through simulations and experiments, and most scientific reference works have been replaced by online curated datasets. Yet, given a dataset, there is no quantitative, consistent and established way of knowing how it has been used over time, who contributed to its curation, what results have been yielded or what value it has. The development of a theory and practice of data citation is fundamental for considering data as first-class research objects with the same relevance and centrality of traditional scientific products. Many works in recent years have discussed data citation from different viewpoints: illustrating why data citation is needed, defining the principles and outlining recommendations for data citation systems, and providing computational methods for addressing specific issues of data citation. The current panorama is many-faceted and an overall view that brings together diverse aspects of this topic is still missing. Therefore, this paper aims to describe the lay of the land for data citation, both from the theoretical (the why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association for Information Science and Technology (JASIST), 201

    Community next steps for making globally unique identifiers work for biocollections data

    Get PDF
    Biodiversity data is being digitized and made available online at a rapidly increasing rate but current practices typically do not preserve linkages between these data, which impedes interoperation, provenance tracking, and assembly of larger datasets. For data associated with biocollections, the biodiversity community has long recognized that an essential part of establishing and preserving linkages is to apply globally unique identifiers at the point when data are generated in the field and to persist these identifiers downstream, but this is seldom implemented in practice. There has neither been coalescence towards one single identifier solution (as in some other domains), nor even a set of recommended best practices and standards to support multiple identifier schemes sharing consistent responses. In order to further progress towards a broader community consensus, a group of biocollections and informatics experts assembled in Stockholm in October 2014 to discuss community next steps to overcome current roadblocks. The workshop participants divided into four groups focusing on: identifier practice in current field biocollections; identifier application for legacy biocollections; identifiers as applied to biodiversity data records as they are published and made available in semantically marked-up publications; and cross-cutting identifier solutions that bridge across these domains. The main outcome was consensus on key issues, including recognition of differences between legacy and new biocollections processes, the need for identifier metadata profiles that can report information on identifier persistence missions, and the unambiguous indication of the type of object associated with the identifier. Current identifier characteristics are also summarized, and an overview of available schemes and practices is provided

    Predicting provenance of forensic soil samples:linking soil to ecological habitats by metabarcoding and supervised classification

    Get PDF
    Environmental DNA (eDNA) is increasingly applied in ecological studies, including studies with the primary purpose of criminal investigation, in which eDNA from soil can be used to pair samples or reveal sample provenance. We collected soil eDNA samples as part of a large national biodiversity research project across 130 sites in Denmark. We investigated the potential for soil eDNA metabarcoding in predicting provenance in terms of environmental conditions, habitat type and geographic regions. We used linear regression for predicting environmental gradients of light, soil moisture, pH and nutrient status (represented by Ellenberg Indicator Values, EIVs) and Quadratic Discriminant Analysis (QDA) to predict habitat type and geographic region. eDNA data performed relatively well as a predictor of environmental gradients (R2 > 0.81). Its ability to discriminate between habitat types was variable, with high accuracy for certain forest types and low accuracy for heathland, which was poorly predicted. Geographic region was also less accurately predicted by eDNA. We demonstrated the application of provenance prediction in forensic science by evaluating and discussing two mock crime scenes. Here, we listed the plant species from annotated sequences, which can further aid in identifying the likely habitat or, in case of rare species, a geographic region. Predictions of environmental gradients and habitat types together give an overall accurate description of a crime scene, but care should be taken when interpreting annotated sequences, e.g. due to erroneous assignments in GenBank. Our approach demonstrates that important habitat properties can be derived from soil eDNA, and exemplifies a range of potential applications of eDNA in forensic ecology
    corecore