61 research outputs found

    Reconstructing the development of legacy database platforms

    Get PDF
    Through the “Migrating Research Data Collections” project, we are working to better understand and support data collection migration and change over time. We are developing case studies of database migration at a number of memory institutions, first focusing on Natural History Museums (NHMs; Thomer, Rayburn and Tyler, 2020; Thomer, Weber and Twidale 2018). In developing our case studies, we have found that there are surprisingly few published histories of database platforms and software – particularly for domain- or museum-specific platforms. Additionally, documentation of the function and format of legacy systems can be quite hard to find. This has motivated our efforts to reconstruct the development history of the data systems described by our study participants in interviews. The timeline presented on this poster has been developed through review of academic publications (if any) and other documentation of these data systems. Much of this has been found through digital archival research (e.g. the Internet Archive’s Wayback Machine). The full dataset, with references, underlying this timeline is available at bit.ly/36vaEoM We are using this resource to contextualize the evolution of the data systems in each of our case studies, and we further hope this work will be of interest to others studying infrastructure development and change.IMLS # RE-07-18-0118-18http://deepblue.lib.umich.edu/bitstream/2027.42/162598/1/IDCC2020_FINALTOPRINT.pdfDescription of IDCC2020_FINALTOPRINT.pdf : PosterSEL

    The craft of database curation: Taking cues from quiltmaking

    Get PDF
    Data migration within library, archive and museum collections is a critical process to maintaining collection data and ensuring its availability for future users. This work is also an under supported component of digital curation. In this poster we present the findings from 20 semi-structured interviews with archivists, collection managers and curators who have recently completed a data migration. One of our main findings is the similarities between craft work and migration practices in memory institutions. To demonstrate these similarities, we use quiltmaking as as a framework. These similarities include the practice of piecing multiple systems together to complete a workflow, relying on community collaboration, and inter-generational labor. Our hope is that by highlighting the craftful qualities already embedded in this work we can show alternative best practices to data migration and database management. This is in an effort to get a broader understanding of what a successful data migration can look like

    Complications in Climate Data Classification: The Political and Cultural Production of Variable Names

    Get PDF
    Model intercomparison projects are a unique and highly specialized form of data—intensive collaboration in the earth sciences. Typically, a set of pre‐determined boundary conditions (scenarios) are agreed upon by a community of model developers that then test and simulate each of those scenarios with individual ‘runs’ of a climate model. Because both the human expertise, and the computational power needed to produce an intercomparison project are exceptionally expensive, the data they produce are often archived for the broader climate science community to use in future research. Outside of high energy physics and astronomy sky surveys, climate modeling intercomparisons are one of the largest and most rapid methods of producing data in the natural sciences (Overpeck et al., 2010).But, like any collaborative eScience project, the discovery and broad accessibility of this data is dependent on classifications and categorizations in the form of structured metadata—namely the Climate and Forecast (CF) metadata standard, which provides a controlled vocabulary to normalize the naming of a dataset’s variables. Intriguingly, the CF standard’s original publication notes, “
conventions have been developed only for things we know we need. Instead of trying to foresee the future, we have added features as required and will continue to do this” (Gregory, 2003). Yet, qualitatively we’ve observed that  this is not the case; although the time period of intercomparison projects remains stable (2-3 years), the scale and complexity of models and their output continue to grow—and thus, data creation and variable names consistently outpace the ratification of CF.

    Supporting the long‐term curation and migration of natural history museum collections databases

    Full text link
    Migration of data collections from one platform to another is an important component of data curation – yet, there is surprisingly little guidance for information professionals faced with this task. Data migration may be particularly challenging when these data collections are housed in relational databases, due to the complex ways that data, data schemas, and relational database management software become intertwined over time. Here we present results from a study of the maintenance, evolution and migration of research databases housed in Natural History Museums. We find that database migration is an on‐going – rather than occasional – process for many Collection managers, and that they creatively appropriate and innovate on many existing technologies in their migration work. This paper contributes descriptions of a preliminary set of common adaptations and “migration patterns” in the practices of database curators. It also outlines the strategies they use when facing collection‐level data migration and describes the limitations of existing tools in supporting LAM and “small science” research database migration. We conclude by outlining future research directions for the maintenance and migration of collections and complex digital objects.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/147782/1/pra214505501055.pd

    Three Approaches to Documenting Database Migrations

    Get PDF
    Database migration is a crucial aspect of digital collections management, yet there are few best practices to guide practitioners in this work. There is also limited research on the patterns of use and processes motivating database migrations. In the “Migrating Research Data Collections” project, we are developing these best practices through a multi-case study of database and digital collections migration. We find that a first and fundamental problem faced by collection staff is a sheer lack of documentation about past database migrations. We contribute a discussion of ways information professionals can reconstruct missing documentation, and some three approaches that others might take for documenting migrations going forward. [This paper is a conference pre-print presented at IDCC 2020 after lightweight peer review.

    Taxonomic Work as Information Work: Design for Semantic Refactoring

    Get PDF
    Taxonomy is the branch of science concerned with classi- fying organisms: drawing the line between cats and dogs, fish and fowl, animals and vegetables. Modern taxonomic work is built on a hundreds-year-old tradition of qualitative research and description. There are aspects of this work that illustrate the pervasiveness and difficulty of a particular kind of qualitative data wrangling, which we call semantic refactoring: the review, normalization, and re-engineering of semantic structures. Because taxonomic work is con- ducted over long time spans, the processes underlying se- mantic refactoring become more visible. An examination of taxonomic data practices may inform our understanding of how (and if) collections of qualitative data scale, particularly when collaboratively created.NSF ABI Grant 1356515.Ope

    The Phylogeny of a Dataset

    Get PDF
    ABSTRACT The field of evolutionary biology offers many approaches to study the changes that occur between and within generations of species; these methods have recently been adopted by cultural anthropologists, linguists and archaeologists to study the evolution of physical artifacts. In this paper, we further extend these approaches by using phylogenetic methods to model and visualize the evolution of a long-standing, widely used digital dataset in climate science. Our case study shows that clustering algorithms developed specifically for phylogenetic studies in evolutionary biology can be successfully adapted to the study of digital objects, and their known offspring. Although we note a number of limitations with our initial effort, we argue that a quantitative approach to studying how digital objects evolve, are reused, and spawn new digital objects represents an important direction for the future of Information Science

    Privacy Impact Assessments for Digital Repositories

    Get PDF
    Trustworthy data repositories ensure the security of their collections. We argue they should also ensure the security of researcher and human subject data. Here we demonstrate the use of a privacy impact assessment (PIA) to evaluate potential privacy risks to researchers using the ICPSR’s Open Badges Research Credential System as a case study. We present our workflow and discuss potential privacy risks and mitigations for those risks. [This paper is a conference pre-print presented at IDCC 2020 after lightweight peer review.]&nbsp

    The Product and System Specificities of Measuring Curation Impact

    Get PDF
    Using three datasets archived at the National Center for Atmospheric Research (NCAR), we describe the creation of a ‘data usage index’ for curation-specific impact assessments. Our work is focused on quantitatively evaluating climate and weather data used in earth and space science research, but we also discuss the application of this approach to other research data contexts. We conclude with some proposed future directions for metric-based work in data curation

    Taxonomy and the Production of Semantic Phenotypes

    Full text link
    Preprint of chapter appearing in "Studies on the Semantic Web: Volume 33: Application of Semantic Technology in Biodiversity Science"Taxonomists produce a myriad of phenotypic descriptions. Traditionally these are provided in terse (telegraphic) natural language. As seen in parallel within other fields of biology researchers are exploring ways to formalize parts of the taxonomic process so that aspects of it are more computational in nature. The currently used data formalizations, mechanisms for persisting data, applications, and computing approaches related to the production of semantic descriptions (phenotypes) are reviewed, they, and their adopters are limited in number. In order to move forward we step back and characterize taxonomists with respect to their typical workflow and tendencies. We then use these characteristics as a basis for exploring how we might create software that taxonomists will find intuitive within their cur- rent workflows, providing interface examples as thought experiments.NSF - DBI-1356381NSF 0956049https://deepblue.lib.umich.edu/bitstream/2027.42/148811/1/yoder_proof.pdfDescription of yoder_proof.pdf : Proof of book chapte
    • 

    corecore