101 research outputs found

    Niche Modeling: Ecological Metaphors for Sustainable Software in Science

    Full text link
    This position paper is aimed at providing some history and provocations for the use of an ecological metaphor to describe software development environments. We do not claim that the ecological metaphor is the best or only way of looking at software - rather we want to ask if it can indeed be a productive and thought provoking one.Comment: Position paper submitted to: Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE) SC13, Sunday, 17 November 2013, Denver, CO, US

    Reconstructing the development of legacy database platforms

    Get PDF
    Through the “Migrating Research Data Collections” project, we are working to better understand and support data collection migration and change over time. We are developing case studies of database migration at a number of memory institutions, first focusing on Natural History Museums (NHMs; Thomer, Rayburn and Tyler, 2020; Thomer, Weber and Twidale 2018). In developing our case studies, we have found that there are surprisingly few published histories of database platforms and software – particularly for domain- or museum-specific platforms. Additionally, documentation of the function and format of legacy systems can be quite hard to find. This has motivated our efforts to reconstruct the development history of the data systems described by our study participants in interviews. The timeline presented on this poster has been developed through review of academic publications (if any) and other documentation of these data systems. Much of this has been found through digital archival research (e.g. the Internet Archive’s Wayback Machine). The full dataset, with references, underlying this timeline is available at bit.ly/36vaEoM We are using this resource to contextualize the evolution of the data systems in each of our case studies, and we further hope this work will be of interest to others studying infrastructure development and change.IMLS # RE-07-18-0118-18http://deepblue.lib.umich.edu/bitstream/2027.42/162598/1/IDCC2020_FINALTOPRINT.pdfDescription of IDCC2020_FINALTOPRINT.pdf : PosterSEL

    The craft of database curation: Taking cues from quiltmaking

    Get PDF
    Data migration within library, archive and museum collections is a critical process to maintaining collection data and ensuring its availability for future users. This work is also an under supported component of digital curation. In this poster we present the findings from 20 semi-structured interviews with archivists, collection managers and curators who have recently completed a data migration. One of our main findings is the similarities between craft work and migration practices in memory institutions. To demonstrate these similarities, we use quiltmaking as as a framework. These similarities include the practice of piecing multiple systems together to complete a workflow, relying on community collaboration, and inter-generational labor. Our hope is that by highlighting the craftful qualities already embedded in this work we can show alternative best practices to data migration and database management. This is in an effort to get a broader understanding of what a successful data migration can look like

    Site-based data curation: bridging data collection protocols and curatorial processes at scientifically significant sites

    Get PDF
    Research conducted at scientifically significant sites produces an abundance of important and highly valuable data. Yet, though sites are logical points for coordinating the curation of these data, their unique needs have been under supported. Previous studies have shown that two principal stakeholder groups – scientific researchers and local resource managers – both need information that is most effectively collected and curated early in research workflows. However, well-designed site-based data curation interventions are necessary to accomplish this. Additionally, further research is needed to understand and align the data curation needs of researchers and resource managers, and to guide coordination of the data collection protocols used by researchers in the field and the data curation processes applied later by resource managers. This dissertation develops two case studies of research and curation at scientifically significant sites: geobiology at Yellowstone National Park and paleontology at the La Brea Tar Pits. The case studies investigate: What information do different stakeholders value about the natural sites at which they work? How do these values manifest in data collection protocols, curatorial processes, and infrastructures? And how are sometimes conflicting stakeholder priorities mediated through the use and development of shared information infrastructures? The case studies are developed through interviews with researchers and resource managers, as well as participatory methods to collaboratively develop “minimum information frameworks” – high level models of the information needed by all stakeholders. Approaches from systems analysis are adapted to model data collection and curation workflows, identifying points of curatorial intervention early in the processes of generating and working with data. Additionally, a general information model for site-based data collections is proposed with three classes of information documenting key aspects of the research project, a site’s structure, and individual specimens and measurements. This research contributes to our understanding of how data from scientifically significant sites can be aggregated, integrated and reused over the long term, and how both researcher and resource manager needs can be reflected and supported during information modeling, workflow documentation and the development of data infrastructure policy. It contributes prototypes of minimal information frameworks for both sites, as well as a general model that can serve as the basis for later site-based standards and infrastructure development

    Complications in Climate Data Classification: The Political and Cultural Production of Variable Names

    Get PDF
    Model intercomparison projects are a unique and highly specialized form of data—intensive collaboration in the earth sciences. Typically, a set of pre‐determined boundary conditions (scenarios) are agreed upon by a community of model developers that then test and simulate each of those scenarios with individual ‘runs’ of a climate model. Because both the human expertise, and the computational power needed to produce an intercomparison project are exceptionally expensive, the data they produce are often archived for the broader climate science community to use in future research. Outside of high energy physics and astronomy sky surveys, climate modeling intercomparisons are one of the largest and most rapid methods of producing data in the natural sciences (Overpeck et al., 2010).But, like any collaborative eScience project, the discovery and broad accessibility of this data is dependent on classifications and categorizations in the form of structured metadata—namely the Climate and Forecast (CF) metadata standard, which provides a controlled vocabulary to normalize the naming of a dataset’s variables. Intriguingly, the CF standard’s original publication notes, “
conventions have been developed only for things we know we need. Instead of trying to foresee the future, we have added features as required and will continue to do this” (Gregory, 2003). Yet, qualitatively we’ve observed that  this is not the case; although the time period of intercomparison projects remains stable (2-3 years), the scale and complexity of models and their output continue to grow—and thus, data creation and variable names consistently outpace the ratification of CF.

    Supporting the long‐term curation and migration of natural history museum collections databases

    Full text link
    Migration of data collections from one platform to another is an important component of data curation – yet, there is surprisingly little guidance for information professionals faced with this task. Data migration may be particularly challenging when these data collections are housed in relational databases, due to the complex ways that data, data schemas, and relational database management software become intertwined over time. Here we present results from a study of the maintenance, evolution and migration of research databases housed in Natural History Museums. We find that database migration is an on‐going – rather than occasional – process for many Collection managers, and that they creatively appropriate and innovate on many existing technologies in their migration work. This paper contributes descriptions of a preliminary set of common adaptations and “migration patterns” in the practices of database curators. It also outlines the strategies they use when facing collection‐level data migration and describes the limitations of existing tools in supporting LAM and “small science” research database migration. We conclude by outlining future research directions for the maintenance and migration of collections and complex digital objects.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/147782/1/pra214505501055.pd

    Three Approaches to Documenting Database Migrations

    Get PDF
    Database migration is a crucial aspect of digital collections management, yet there are few best practices to guide practitioners in this work. There is also limited research on the patterns of use and processes motivating database migrations. In the “Migrating Research Data Collections” project, we are developing these best practices through a multi-case study of database and digital collections migration. We find that a first and fundamental problem faced by collection staff is a sheer lack of documentation about past database migrations. We contribute a discussion of ways information professionals can reconstruct missing documentation, and some three approaches that others might take for documenting migrations going forward. [This paper is a conference pre-print presented at IDCC 2020 after lightweight peer review.

    Taxonomic Work as Information Work: Design for Semantic Refactoring

    Get PDF
    Taxonomy is the branch of science concerned with classi- fying organisms: drawing the line between cats and dogs, fish and fowl, animals and vegetables. Modern taxonomic work is built on a hundreds-year-old tradition of qualitative research and description. There are aspects of this work that illustrate the pervasiveness and difficulty of a particular kind of qualitative data wrangling, which we call semantic refactoring: the review, normalization, and re-engineering of semantic structures. Because taxonomic work is con- ducted over long time spans, the processes underlying se- mantic refactoring become more visible. An examination of taxonomic data practices may inform our understanding of how (and if) collections of qualitative data scale, particularly when collaboratively created.NSF ABI Grant 1356515.Ope

    How and Why do Researchers Reference Data? A Study of Rhetorical Features and Functions of Data References in Academic Articles

    Full text link
    Data reuse is a common practice in the social sciences. While published data play an essential role in the production of social science research, they are not consistently cited, which makes it difficult to assess their full scholarly impact and give credit to the original data producers. Furthermore, it can be challenging to understand researchers' motivations for referencing data. Like references to academic literature, data references perform various rhetorical functions, such as paying homage, signaling disagreement, or drawing comparisons. This paper studies how and why researchers reference social science data in their academic writing. We develop a typology to model relationships between the entities that anchor data references, along with their features (access, actions, locations, styles, types) and functions (critique, describe, illustrate, interact, legitimize). We illustrate the use of the typology by coding multidisciplinary research articles (n=30) referencing social science data archived at the Inter-university Consortium for Political and Social Research (ICPSR). We show how our typology captures researchers' interactions with data and purposes for referencing data. Our typology provides a systematic way to document and analyze researchers' narratives about data use, extending our ability to give credit to data that support research.Comment: 35 pages, 2 appendices, 1 tabl

    Integrating research and teaching for data curation in iSchools

    Full text link
    The quickly changing nature of information science and technology creates unique and remarkable challenges in terms of developing curriculum focused on building data competencies. Faculties responsible for teaching current developments in information studies have the unique burden of needing to continuously update our curricula without sacrificing our broader teaching goals. This panel features diverse perspectives on teaching data curation skills in five US‐based schools of information at the undergraduate and graduate levels. Panelists will present their unique perspectives on pedagogical approaches in courses dedicated to data curation, digital preservation, description and access standards, as well as data access and interchange. Topics introduced will range from flipped classroom techniques, finding messy datasets, common pitfalls, hands‐on labs, cloud based tools, data carpentry labs, and sequencing learning objectives to match stages of the data life cycle. This panel will give ASIST conference participants an opportunity to see a range of junior faculty, each with IMLS funded research projects related to data curation, share their experiences of teaching data competencies in the classroom.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/163368/2/pra2285.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/163368/1/pra2285_am.pd
    • 

    corecore