244 research outputs found

    Community next steps for making globally unique identifiers work for biocollections data

    Get PDF
    Biodiversity data is being digitized and made available online at a rapidly increasing rate but current practices typically do not preserve linkages between these data, which impedes interoperation, provenance tracking, and assembly of larger datasets. For data associated with biocollections, the biodiversity community has long recognized that an essential part of establishing and preserving linkages is to apply globally unique identifiers at the point when data are generated in the field and to persist these identifiers downstream, but this is seldom implemented in practice. There has neither been coalescence towards one single identifier solution (as in some other domains), nor even a set of recommended best practices and standards to support multiple identifier schemes sharing consistent responses. In order to further progress towards a broader community consensus, a group of biocollections and informatics experts assembled in Stockholm in October 2014 to discuss community next steps to overcome current roadblocks. The workshop participants divided into four groups focusing on: identifier practice in current field biocollections; identifier application for legacy biocollections; identifiers as applied to biodiversity data records as they are published and made available in semantically marked-up publications; and cross-cutting identifier solutions that bridge across these domains. The main outcome was consensus on key issues, including recognition of differences between legacy and new biocollections processes, the need for identifier metadata profiles that can report information on identifier persistence missions, and the unambiguous indication of the type of object associated with the identifier. Current identifier characteristics are also summarized, and an overview of available schemes and practices is provided

    DRIVER Technology Watch Report

    Get PDF
    This report is part of the Discovery Workpackage (WP4) and is the third report out of four deliverables. The objective of this report is to give an overview of the latest technical developments in the world of digital repositories, digital libraries and beyond, in order to serve as theoretical and practical input for the technical DRIVER developments, especially those focused on enhanced publications. This report consists of two main parts, one part focuses on interoperability standards for enhanced publications, the other part consists of three subchapters, which give a landscape picture of current and surfacing technologies and communities crucial to DRIVER. These three subchapters contain the GRID, CRIS and LTP communities and technologies. Every chapter contains a theoretical explanation, followed by case studies and the outcomes and opportunities for DRIVER in this field

    Persistent Identifiers and Sharing of Digital Information About Scientific Specimens

    Get PDF
    Using persistent identifiers (PIDs) in digital data production and sharing concerning scientific specimens promotes an overarching goal, to allow for creation of relationships. The assignment of unique PIDs is an essential step for enabling findability and accessibility of digital data using the FAIR data model. Implementation of the digital extended specimen links the digital object record with associated and derived specimen parts and research data. Linking to atomized information such as collection event, collector, locality, collection, institutions, taxon (identification), people involved in analyzing and processing the specimen, other related specimens, and many other subsamples and derived and related data can be accomplished with a system incorporating numerous types of unique persistent ids. These many IDs need to be maintained by organizations to prevent broken links and provide redirects for older identifiers. While community development of best practice is influenced by experts in digital data architecture, it must incorporate challenges based on the history of data sharing concerning scientific specimens. The development of identifier systems and normalization around digital object structure and vocabulary needs to accommodate the needs of managers of diverse collections. Most providers are working with a collection management system and with limitations based on past decisions and limited time and finances, so data sharing practices should address these issues to encourage compliance. This paper will use a combination of reviews of the literature and of several interviews with workers in the field to explore community collaboration, persistent ids, and increased mobilization of shared data

    Escaping the Big Brother: an empirical study on factors influencing identification and information leakage on the Web

    Get PDF
    This paper presents a study on factors that may increase the risks of personal information leakage, due to the possibility of connecting user profiles that are not explicitly linked together. First, we introduce a technique for user identification based on cross-site checking and linking of user attributes. Then, we describe the experimental evaluation of the identification technique both on a real setting and on an online sample, showing its accuracy to discover unknown personal data. Finally, we combine the results on the accuracy of identification with the results of a questionnaire completed by the same subjects who performed the test on the real setting. The aim of the study was to discover possible factors that make users vulnerable to this kind of techniques. We found out that the number of social networks used, their features and especially the amount of profiles abandoned and forgotten by the user are factors that increase the likelihood of identification and the privacy risks

    Online Journalism: Crowdsourcing, and Media Websites in an Era of Participation

    Get PDF
    The era of journalism and the participation of the readers on online media websites have changed online journalism. The research interest is now focused on removing the distinction between the publisher/entrepreneur and the journalist/user, with the ultimate goal of actively involving citizens in the journalistic process but also in the web presence of media websites. The evolution of technology, the deep media crisis and the growing dissatisfaction of the citizens, create the conditions for journalism to work with citizens, and in particular through citizen journalism and journalism crowdsourcing. This concept is a form of collective online activity in which a person or a group of people volunteer to engage in work that always involves mutual benefit to both sides. The main research question of this research concerns the analysis of the current situation regarding crowdsourcing, co-creation and UGC and the adoption of best practices such as crowdcreation, comments from the users, crowdwisdom, instant-messaging applications (MIMs) and crowdvoting used by media websites around the world. Very few media have tried to apply even nowadays, the proposed model of journalism, which this study is going to research. The results of the study shape new perspectives and practices for online journalism and democracy

    Digital Extended Specimens: Enabling an Extensible Network of Biodiversity Data Records as Integrated Digital Objects on the Internet

    Get PDF
    The early twenty-first century has witnessed massive expansions in availability and accessibility of digital data in virtually all domains of the biodiversity sciences. Led by an array of asynchronous digitization activities spanning ecological, environmental, climatological, and biological collections data, these initiatives have resulted in a plethora of mostly disconnected and siloed data, leaving to researchers the tedious and time-consuming manual task of finding and connecting them in usable ways, integrating them into coherent data sets, and making them interoperable. The focus to date has been on elevating analog and physical records to digital replicas in local databases prior to elevating them to ever-growing aggregations of essentially disconnected discipline-specific information. In the present article, we propose a new interconnected network of digital objects on the Internet—the Digital Extended Specimen (DES) network—that transcends existing aggregator technology, augments the DES with third-party data through machine algorithms, and provides a platform for more efficient research and robust interdisciplinary discovery

    Sample data processing in an additive and reproducible taxonomic workflow by using character data persistently linked to preserved individual specimens

    Get PDF
    We present the model and implementation of a workflow that blazes a trail in systematic biology for the re-usability of character data (data on any kind of characters of pheno- and genotypes of organisms) and their additivity from specimen to taxon level. We take into account that any taxon characterization is based on a limited set of sampled individuals and characters, and that consequently any new individual and any new character may affect the recognition of biological entities and/or the subsequent delimitation and characterization of a taxon. Taxon concepts thus frequently change during the knowledge generation process in systematic biology. Structured character data are therefore not only needed for the knowledge generation process but also for easily adapting characterizations of taxa. We aim to facilitate the construction and reproducibility of taxon characterizations from structured character data of changing sample sets by establishing a stable and unambiguous association between each sampled individual and the data processed from it. Our workflow implementation uses the European Distributed Institute of Taxonomy Platform, a comprehensive taxonomic data management and publication environment to: (i) establish a reproducible connection between sampled individuals and all samples derived from them; (ii) stably link sample-based character data with the metadata of the respective samples; (iii) record and store structured specimen-based character data in formats allowing data exchange; (iv) reversibly assign sample metadata and character datasets to taxa in an editable classification and display them and (v) organize data exchange via standard exchange formats and enable the link between the character datasets and samples in research collections, ensuring high visibility and instant re-usability of the data. The workflow implemented will contribute to organizing the interface between phylogenetic analysis and revisionary taxonomic or monographic work

    Monitoring the open access policy of Horizon 2020

    Get PDF
    This study is framed within the context of the contract ‘Monitoring the open access policy of Horizon 2020 – RTD/2019/SC/021’, reporting an authoritative set of metrics for compliance with the European Commission open access mandate within the Framework Programme thus far, and providing advice on how to systematically monitor compliance in the future. Open access requirements for publications under Horizon 2020 are set out in Article 29.2 of the Horizon 2020 Model Grant Agreement (MGA). Regarding open access to research data, the Commission is conducting the Horizon 2020 Open Research Data Pilot (ORDP). The ORDP aims to improve and maximise access to, and reuse of, research data generated by Horizon 2020 projects, balancing the need for openness with the protection of intellectual rights, privacy concerns and security, and commercialisation, as well as questions of data management and preservation. The present study aims to examine, monitor and quantify compliance with the open access requirements of the MGA, for both publications and research data. The study concludes with specific recommendations to improve the monitoring of compliance with the policy under Horizon Europe, together with an assessment of the efficiency and effectiveness of the Horizon 2020 open access policy. The key findings of this study indicate that the European Commission’s leadership in the Open Science policy has paid off. Compliance has steadily increased over recent years, achieving a success rate that places the European Commission at the forefront globally (83% open access to scientific publications). What is also apparent from the study is that monitoring – particularly with regard to the specific terms of the policy – cannot be achieved by self-reporting alone, or without the European Commission collaborating closely with other funding agencies across Europe and beyond, to agree on common standards and the common elements of the underlying infrastructure. In particular, the European Open Science Cloud (EOSC) should encompass all such components that are needed to foster a linked ecosystem, in which information is exchanged on demand and which eases the process for both researchers (who only need to deposit once) and funders (who need only record information once)
    • …
    corecore