1,385 research outputs found

    Digital Preservation, Archival Science and Methodological Foundations for Digital Libraries

    Get PDF
    Digital libraries, whether commercial, public or personal, lie at the heart of the information society. Yet, research into their long‐term viability and the meaningful accessibility of their contents remains in its infancy. In general, as we have pointed out elsewhere, ‘after more than twenty years of research in digital curation and preservation the actual theories, methods and technologies that can either foster or ensure digital longevity remain startlingly limited.’ Research led by DigitalPreservationEurope (DPE) and the Digital Preservation Cluster of DELOS has allowed us to refine the key research challenges – theoretical, methodological and technological – that need attention by researchers in digital libraries during the coming five to ten years, if we are to ensure that the materials held in our emerging digital libraries are to remain sustainable, authentic, accessible and understandable over time. Building on this work and taking the theoretical framework of archival science as bedrock, this paper investigates digital preservation and its foundational role if digital libraries are to have long‐term viability at the centre of the global information society.

    RegenBase: a knowledge base of spinal cord injury biology for translational research.

    Get PDF
    Spinal cord injury (SCI) research is a data-rich field that aims to identify the biological mechanisms resulting in loss of function and mobility after SCI, as well as develop therapies that promote recovery after injury. SCI experimental methods, data and domain knowledge are locked in the largely unstructured text of scientific publications, making large scale integration with existing bioinformatics resources and subsequent analysis infeasible. The lack of standard reporting for experiment variables and results also makes experiment replicability a significant challenge. To address these challenges, we have developed RegenBase, a knowledge base of SCI biology. RegenBase integrates curated literature-sourced facts and experimental details, raw assay data profiling the effect of compounds on enzyme activity and cell growth, and structured SCI domain knowledge in the form of the first ontology for SCI, using Semantic Web representation languages and frameworks. RegenBase uses consistent identifier schemes and data representations that enable automated linking among RegenBase statements and also to other biological databases and electronic resources. By querying RegenBase, we have identified novel biological hypotheses linking the effects of perturbagens to observed behavioral outcomes after SCI. RegenBase is publicly available for browsing, querying and download.Database URL:http://regenbase.org

    Extracting, Transforming and Archiving Scientific Data

    Get PDF
    It is becoming common to archive research datasets that are not only large but also numerous. In addition, their corresponding metadata and the software required to analyse or display them need to be archived. Yet the manual curation of research data can be difficult and expensive, particularly in very large digital repositories, hence the importance of models and tools for automating digital curation tasks. The automation of these tasks faces three major challenges: (1) research data and data sources are highly heterogeneous, (2) future research needs are difficult to anticipate, (3) data is hard to index. To address these problems, we propose the Extract, Transform and Archive (ETA) model for managing and mechanizing the curation of research data. Specifically, we propose a scalable strategy for addressing the research-data problem, ranging from the extraction of legacy data to its long-term storage. We review some existing solutions and propose novel avenues of research.Comment: 8 pages, Fourth Workshop on Very Large Digital Libraries, 201

    Lurking in the Lab: Analysis of Data from Molecular Biology Laboratory Instruments

    Get PDF
    OBJECTIVE: This project examined primary research data files found on instruments in a molecular biology teaching laboratory. Experimental data files were analyzed in order to learn more about the types of data generated by these instruments (e.g. file formats), and to evaluate current laboratory data management practices. SETTING: This project examined experimental data files from instruments in a teaching laboratory at Brandeis University. METHODOLOGY: Experimental data files and associated metadata on instrument hard drives were captured and analyzed using Xplorer2 software. Formats were categorized as proprietary or open, and characteristics such as file naming conventions were noted. Discussions with the faculty member and lab staff guided the project scope and informed the findings. RESULTS: Files in both proprietary and open formats were found on the instrument hard drives. 62% of the experimental data files were in proprietary formats. Image files in various formats accounted for the most prevalent types of data found. Instrument users varied widely in their approaches to data management tasks such as file naming conventions. CONCLUSIONS: This study found inconsistent approaches to managing data on laboratory instruments. Prevalence of proprietary file formats is a concern with this type of data. Students express frustration in working with these data, and files in these proprietary formats could pose curation and preservation challenges in the future. Teaching labs afford an opportunity for librarians interested in learning more about primary research data and data management practices

    Engineering polymer informatics: Towards the computer-aided design of polymers

    Get PDF
    The computer-aided design of polymers is one of the holy grails of modern chemical informatics and of significant interest for a number of communities in polymer science. The paper outlines a vision for the in silico design of polymers and presents an information model for polymers based on modern semantic web technologies, thus laying the foundations for achieving the vision

    Supporting e-Science : Scientific Research Data Curation

    Get PDF
    One of the outcomes of the continuous development of science is creation of new methods of scientific research which, as a result, generate different types of research output including research data. While a significant attention is given to preservation of journal articles, books, and papers published in conference proceeding, less attention is given to the preservation of research data. To enable use of data accumulated in previous scientific research projects in a new scientific research, research data should be preserved. The activity of research data preservation is called data curation. Data curation has become necessary if science wants to avoid data loss. Unfortunately, science itself cannot take care of research data easily; it needs help from professionals like librarians and archivists to preserve research data in order to enable their re-use future scientific research. Although there are already some good solutions to this problem such as storing research data in digital repositories, no final decision has been made about who will take the responsibility for this kind of activity in the long run

    Requirements for a global data infrastructure in support of CMIP6

    Get PDF
    The World Climate Research Programme (WCRP)’s Working Group on Climate Modelling (WGCM) Infrastructure Panel (WIP) was formed in 2014 in response to the explosive growth in size and complexity of Coupled Model Intercomparison Projects (CMIPs) between CMIP3 (2005–2006) and CMIP5 (2011–2012). This article presents the WIP recommendations for the global data infrastruc- ture needed to support CMIP design, future growth, and evolution. Developed in close coordination with those who build and run the existing infrastructure (the Earth System Grid Federation; ESGF), the recommendations are based on several principles beginning with the need to separate requirements, implementation, and operations. Other im- portant principles include the consideration of the diversity of community needs around data – a data ecosystem – the importance of provenance, the need for automation, and the obligation to measure costs and benefits. This paper concentrates on requirements, recognizing the diversity of communities involved (modelers, analysts, soft- ware developers, and downstream users). Such requirements include the need for scientific reproducibility and account- ability alongside the need to record and track data usage. One key element is to generate a dataset-centric rather than system-centric focus, with an aim to making the infrastruc- ture less prone to systemic failure. With these overarching principles and requirements, the WIP has produced a set of position papers, which are summa- rized in the latter pages of this document. They provide spec- ifications for managing and delivering model output, includ- ing strategies for replication and versioning, licensing, data quality assurance, citation, long-term archiving, and dataset tracking. They also describe a new and more formal approach for specifying what data, and associated metadata, should be saved, which enables future data volumes to be estimated, particularly for well-defined projects such as CMIP6. The paper concludes with a future facing consideration of the global data infrastructure evolution that follows from the blurring of boundaries between climate and weather, and the changing nature of published scientific results in the digital age
    corecore