Scientific Data Collections: Use in Scholarly Communication and Implications for Data Curation

Abstract

127 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2009.The landscape of scientific production and scholarly communication is changing: Networked connectivity, the availability of data in digital forms and the development of computing tools are helping to bring about changes in the way science is conducted and communicated. These changes are having an impact on the range of scientific production and communication activities, including data collection, data management, and the publication and dissemination of primary research materials. Practices related to data (and by extension, data collections) are integral to scientific information work, and include activities such as the collection, transformation, processing, managing, sharing, preservation and archiving, accessing, and re-use of data. Understanding the nature of data-related practices and their relation to the production of scholarship is important for both theoretical and applied work in library and information science (LIS), as well as the emerging field of data curation. Data curation is the active and on-going management of data through its lifecycle of interest and usefulness to scholarship, science, and education. If scientific data sets and collections are to be gathered and organized for long-term use, applicable theories will be needed to guide a variety of new and necessary practices for their management and preservation.This dissertation concerns the development and use of shared scientific data collections (SDCs), the roles and functions they perform in the conduct of scientific production and scholarly communication. Research on scholarly practices provides a foundation for the development of information systems, services, and tools to support the production of scholarship and science. Scientific Data Collections in particular are essential to the conduct of 21st Century science, and the availability of primary research data is likely to re-structure the knowledge formalization processes in those fields served by a shared SDC. In addition, the availability of publicly accessible data stores opens new possibilities to re-use data to investigate research questions beyond the original purposes for which the data were generated. However, problems related to the long-term management and preservation, or curation, of data collections are complex. Durable solutions and best practices are not as yet definite, and while many data collections have the potential for wide scientific purpose or public appeal, there is as yet no framework for predicting which collections will be of most value to maintain for the long-term.Of particular interest are community-based, "Resource Collections" identified by the National Science Board (2005) in the Long-Lived Digital Data Collections report. At present, we have very little understanding of how they develop or the ways that they are used. It is anticipated that these collections will need consistent participation from domain scientists and data managers over the course of the data lifecycle, as curation activities will be integrally connected to the daily activities of research production. A scientific data collection from the neurosciences was selected as a case to analyze the features and characteristics of Resource Collections, and their significance for ongoing curation and stewardship in academic libraries.Based on this research, it is evident that Resource Collections can have features beyond those described in the NSB report, with potential for considerable variation across this level of collection. Analysis also shows that shared scientific data collections create an intersection of scientific production and scholarly communication, where they perform multiple roles, including data management, data sharing space for collaborative work, and data publishing functions like registration and certification. While it was anticipated that biologists would represent the most frequent collection users, end-use was predominantly by tool developers, informaticists and computational scientists, which has implications for both dissemination and curation activities. Finally, as shared data collections are emerging as an integral and significant part of the scientific record, collection lifecycle stages are proposed and related to their curation and stewardship.LimitedRestricted to the U of I community idenfinitely during batch ingest of legacy ETD

    Similar works