Site-based data curation: bridging data collection protocols and curatorial processes at scientifically significant sites

Abstract

Research conducted at scientifically significant sites produces an abundance of important and highly valuable data. Yet, though sites are logical points for coordinating the curation of these data, their unique needs have been under supported. Previous studies have shown that two principal stakeholder groups – scientific researchers and local resource managers – both need information that is most effectively collected and curated early in research workflows. However, well-designed site-based data curation interventions are necessary to accomplish this. Additionally, further research is needed to understand and align the data curation needs of researchers and resource managers, and to guide coordination of the data collection protocols used by researchers in the field and the data curation processes applied later by resource managers. This dissertation develops two case studies of research and curation at scientifically significant sites: geobiology at Yellowstone National Park and paleontology at the La Brea Tar Pits. The case studies investigate: What information do different stakeholders value about the natural sites at which they work? How do these values manifest in data collection protocols, curatorial processes, and infrastructures? And how are sometimes conflicting stakeholder priorities mediated through the use and development of shared information infrastructures? The case studies are developed through interviews with researchers and resource managers, as well as participatory methods to collaboratively develop “minimum information frameworks” – high level models of the information needed by all stakeholders. Approaches from systems analysis are adapted to model data collection and curation workflows, identifying points of curatorial intervention early in the processes of generating and working with data. Additionally, a general information model for site-based data collections is proposed with three classes of information documenting key aspects of the research project, a site’s structure, and individual specimens and measurements. This research contributes to our understanding of how data from scientifically significant sites can be aggregated, integrated and reused over the long term, and how both researcher and resource manager needs can be reflected and supported during information modeling, workflow documentation and the development of data infrastructure policy. It contributes prototypes of minimal information frameworks for both sites, as well as a general model that can serve as the basis for later site-based standards and infrastructure development

    Similar works