6,200 research outputs found

    Online Scientific Data Curation, Publication, and Archiving

    Get PDF
    Science projects are data publishers. The scale and complexity of current and future science data changes the nature of the publication process. Publication is becoming a major project component. At a minimum, a project must preserve the ephemeral data it gathers. Derived data can be reconstructed from metadata, but metadata is ephemeral. Longer term, a project should expect some archive to preserve the data. We observe that pub-lished scientific data needs to be available forever ? this gives rise to the data pyramid of versions and to data inflation where the derived data volumes explode. As an example, this article describes the Sloan Digital Sky Survey (SDSS) strategies for data publication, data access, curation, and preservation.Comment: original at http://research.microsoft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2002-7

    Enforcing public data archiving policies in academic publishing: A study of ecology journals

    Full text link
    To improve the quality and efficiency of research, groups within the scientific community seek to exploit the value of data sharing. Funders, institutions, and specialist organizations are developing and implementing strategies to encourage or mandate data sharing within and across disciplines, with varying degrees of success. Academic journals in ecology and evolution have adopted several types of public data archiving policies requiring authors to make data underlying scholarly manuscripts freely available. Yet anecdotes from the community and studies evaluating data availability suggest that these policies have not obtained the desired effects, both in terms of quantity and quality of available datasets. We conducted a qualitative, interview-based study with journal editorial staff and other stakeholders in the academic publishing process to examine how journals enforce data archiving policies. We specifically sought to establish who editors and other stakeholders perceive as responsible for ensuring data completeness and quality in the peer review process. Our analysis revealed little consensus with regard to how data archiving policies should be enforced and who should hold authors accountable for dataset submissions. Themes in interviewee responses included hopefulness that reviewers would take the initiative to review datasets and trust in authors to ensure the completeness and quality of their datasets. We highlight problematic aspects of these thematic responses and offer potential starting points for improvement of the public data archiving process.Comment: 35 pages, 1 figure, 1 tabl

    Citation and peer review of data: moving towards formal data publication

    Get PDF
    This paper discusses many of the issues associated with formally publishing data in academia, focusing primarily on the structures that need to be put in place for peer review and formal citation of datasets. Data publication is becoming increasingly important to the scientific community, as it will provide a mechanism for those who create data to receive academic credit for their work and will allow the conclusions arising from an analysis to be more readily verifiable, thus promoting transparency in the scientific process. Peer review of data will also provide a mechanism for ensuring the quality of datasets, and we provide suggestions on the types of activities one expects to see in the peer review of data. A simple taxonomy of data publication methodologies is presented and evaluated, and the paper concludes with a discussion of dataset granularity, transience and semantics, along with a recommended human-readable citation syntax

    Integrating Research Data Management into Geographical Information Systems

    Full text link
    Ocean modelling requires the production of high-fidelity computational meshes upon which to solve the equations of motion. The production of such meshes by hand is often infeasible, considering the complexity of the bathymetry and coastlines. The use of Geographical Information Systems (GIS) is therefore a key component to discretising the region of interest and producing a mesh appropriate to resolve the dynamics. However, all data associated with the production of a mesh must be provided in order to contribute to the overall recomputability of the subsequent simulation. This work presents the integration of research data management in QMesh, a tool for generating meshes using GIS. The tool uses the PyRDM library to provide a quick and easy way for scientists to publish meshes, and all data required to regenerate them, to persistent online repositories. These repositories are assigned unique identifiers to enable proper citation of the meshes in journal articles.Comment: Accepted, camera-ready version. To appear in the Proceedings of the 5th International Workshop on Semantic Digital Archives (http://sda2015.dke-research.de/), held in Pozna\'n, Poland on 18 September 2015 as part of the 19th International Conference on Theory and Practice of Digital Libraries (http://tpdl2015.info/

    Sharing Qualitative and Qualitative Longitudinal Data in the UK: Archiving Strategies and Development

    Get PDF
    Over the past two decades significant developments have occurred in the archiving of qualitative data in the UK. The first national archive for qualitative resources, Qualidata, was established in 1994. Since that time further scientific reviews have supported the expansion of data resources for qualitative and qualitative longitudinal (QL) research in the UK and fuelled the development of a new ethos of data sharing and re-use among qualitative researchers. These have included the Timescapes Study and Archive, an initiative funded from 2007 to scale up QL research and create a specialist resource of QL data for sharing and re-use. These trends are part of a wider movement to enhance the status of research data in all their diverse forms, inculcate an ethos of data sharing, and develop infrastructure to facilitate data discovery and re-use. In this paper we trace the history of these developments and provide an overview of data policy initiatives that have set out to advance data sharing in the UK. The paper reveals a mixed infrastructure for qualitative and QL data resources in the UK, and explores the value of this, along with the implications for managing and co-ordinating resources across a complex network. The paper concludes with some suggestions for developing this mixed infrastructure to further support data sharing and re-use in the UK and beyond

    The selection, appraisal and retention of digital scientific data: dighlights of an ERPANET/CODATA workshop

    Get PDF
    CODATA and ERPANET collaborated to convene an international archiving workshop on the selection, appraisal, and retention of digital scientific data, which was held on 15-17 December 2003 at the Biblioteca Nacional in Lisbon, Portugal. The workshop brought together more than 65 researchers, data and information managers, archivists, and librarians from 13 countries to discuss the issues involved in making critical decisions regarding the long-term preservation of the scientific record. One of the major aims for this workshop was to provide an international forum to exchange information about data archiving policies and practices across different scientific, institutional, and national contexts. Highlights from the workshop discussions are presented

    Keeping Research Data Safe 2: Final Report

    Get PDF
    The first Keeping Research Data Safe study funded by JISC made a major contribution to understanding of long-term preservation costs for research data by developing a cost model and indentifying cost variables for preserving research data in UK universities (Beagrie et al, 2008). However it was completed over a very constrained timescale of four months with little opportunity to follow up other major issues or sources of preservation cost information it identified. It noted that digital preservation costs are notoriously difficult to address in part because of the absence of good case studies and longitudinal information for digital preservation costs or cost variables. In January 2009 JISC issued an ITT for a study on the identification of long-lived digital datasets for the purposes of cost analysis. The aim of this work was to provide a larger body of material and evidence against which existing and future data preservation cost modelling exercises could be tested and validated. The proposal for the KRDS2 study was submitted in response by a consortium consisting of 4 partners involved in the original Keeping Research Data Safe study (Universities of Cambridge and Southampton, Charles Beagrie Ltd, and OCLC Research) and 4 new partners with significant data collections and interests in preservation costs (Archaeology Data Service, University of London Computer Centre, University of Oxford, and the UK Data Archive). A range of supplementary materials in support of this main report have been made available on the KRDS2 project website at http://www.beagrie.com/jisc.php. That website will be maintained and continuously updated with future work as a resource for KRDS users

    Archiving Software Surrogates on the Web for Future Reference

    Full text link
    Software has long been established as an essential aspect of the scientific process in mathematics and other disciplines. However, reliably referencing software in scientific publications is still challenging for various reasons. A crucial factor is that software dynamics with temporal versions or states are difficult to capture over time. We propose to archive and reference surrogates instead, which can be found on the Web and reflect the actual software to a remarkable extent. Our study shows that about a half of the webpages of software are already archived with almost all of them including some kind of documentation.Comment: TPDL 2016, Hannover, German

    Archiving Web Site Resources: A Records Management View

    Get PDF
    In this paper, we propose the use of records management principles to identify and manage Web site resources with enduring value as records. Current Web archiving activities, collaborative or organisational, whilst extremely valuable in their own right, often do not and cannot incorporate requirements for proper records management. Material collected under such initiatives therefore may not be reliable or authentic from a legal or archival perspective, with insufficient metadata collected about the object during its active life, and valuable materials destroyed whilst ephemeral items are maintained. Education, training, and collaboration between stakeholders are integral to avoiding these risks and successfully preserving valuable Web-based materials.

    Mandated data archiving greatly improves access to research data

    Full text link
    The data underlying scientific papers should be accessible to researchers both now and in the future, but how best can we ensure that these data are available? Here we examine the effectiveness of four approaches to data archiving: no stated archiving policy, recommending (but not requiring) archiving, and two versions of mandating data deposition at acceptance. We control for differences between data types by trying to obtain data from papers that use a single, widespread population genetic analysis, STRUCTURE. At one extreme, we found that mandated data archiving policies that require the inclusion of a data availability statement in the manuscript improve the odds of finding the data online almost a thousand-fold compared to having no policy. However, archiving rates at journals with less stringent policies were only very slightly higher than those with no policy at all. At one extreme, we found that mandated data archiving policies that require the inclusion of a data availability statement in the manuscript improve the odds of finding the data online almost a thousand fold compared to having no policy. However, archiving rates at journals with less stringent policies were only very slightly higher than those with no policy at all. We also assessed the effectiveness of asking for data directly from authors and obtained over half of the requested datasets, albeit with about 8 days delay and some disagreement with authors. Given the long term benefits of data accessibility to the academic community, we believe that journal based mandatory data archiving policies and mandatory data availability statements should be more widely adopted
    • …
    corecore