17 research outputs found

    We\u27re Working On It: Transferring the Sloan Digital Sky Survey from Laboratory to Library

    Get PDF
    This article reports on the transfer of a massive scientific dataset from a national laboratory to a university library, and from one kind of workforce to another. We use the transfer of the Sloan Digital Sky Survey (SDSS) archive to examine the emergence of a new workforce for scientific research data management. Many individuals with diverse educational backgrounds and domain experience are involved in SDSS data management: domain scientists, computer scientists, software and systems engineers, programmers, and librarians. These types of positions have been described using terms such as research technologist, data scientist, e-science professional, data curator, and more. The findings reported here are based on semi-structured interviews, ethnographic participant observation, and archival studies from 2011-2013. The library staff conducting the data storage and archiving of the SDSS archive faced two performance problems. The preservation specialist and the system administrator worked together closely to discover and implement solutions to the slow data transfer and verification processes. The team overcame these slow-downs by problem solving, working in a team, and writing code. The library team lacked the astronomy domain knowledge necessary to meet some of their preservation and curation goals. The case study reveals the variety of expertise, experience, and individuals essential to the SDSS data management process. A variety of backgrounds and educational histories emerge in the data managers studied. Teamwork is necessary to bring disparate expertise together, especially between those with technical and domain education. The findings have implications for data management education, policy and relevant stakeholders. This article is part of continuing research on Knowledge Infrastructures

    We’re Working On It: Transferring the Sloan Digital Sky Survey from Laboratory to Library

    Get PDF
    This article reports on the transfer of a massive scientific dataset from a national laboratory to a university library, and from one kind of workforce to another. We use the transfer of the Sloan Digital Sky Survey (SDSS) archive to examine the emergence of a new workforce for scientific research data management. Many individuals with diverse educational backgrounds and domain experience are involved in SDSS data management: domain scientists, computer scientists, software and systems engineers, programmers, and librarians. These types of positions have been described using terms such as research technologist, data scientist, e-science professional, data curator, and more. The findings reported here are based on semi-structured interviews, ethnographic participant observation, and archival studies from 2011-2013. The library staff conducting the data storage and archiving of the SDSS archive faced two performance problems. The preservation specialist and the system administrator worked together closely to discover and implement solutions to the slow data transfer and verification processes. The team overcame these slow-downs by problem solving, working in a team, and writing code. The library team lacked the astronomy domain knowledge necessary to meet some of their preservation and curation goals. The case study reveals the variety of expertise, experience, and individuals essential to the SDSS data management process. A variety of backgrounds and educational histories emerge in the data managers studied. Teamwork is necessary to bring disparate expertise together, especially between those with technical and domain education. The findings have implications for data management education, policy and relevant stakeholders. This article is part of continuing research on Knowledge Infrastructures

    Embodying Research Methods into Fields and Tables: A Process of Informed Database Design

    Get PDF
    One of the invisible aspects of large research projects in the social sciences is the method by which observations and other collected data are managed. In sufficiently large projects, it may be effective to address the data management problem at the outset by creating a database architecture and data processing workflow. Research methods, assumptions and technical limitations often drive the structure of the data to be collected, but this is rarely discussed within the framework of the research. This design process represents a complex selection and trade-off matrix of predictive approximation, given that aspects of the analysis are not performed until the data is collected, and the design is done before the data collection is started. An elegant design can afford an equally elegant analysis of the data, but also creates a cycle where the data structure dictates the focus and granularity of the analysis. We were faced with the problem of creating a system to support the projected data collection projects for a major, multi-method, 5-year research project on data curation practices. Our research focuses on specific techno-social practices of astronomers and will rely on a large volume of complex and heterogeneous source materials, such as email archives, scholarly publications, websites, reports, metadata headers, as well as in-person interviews. The research questions focus on the data management, curation, and sharing practices of astronomers, how these practices evolved, and mapping who shares what, when, with whom, and why, with specific interest in what data they generate, use, keep and discard. We also ask what is most important to curate, and how do they do so, what do they expect to use and decide will be of future use to others, and who do they envision as future users? The database structure will act as the connective tissue for the full term of the project while embodying the research methods, facilitating analysis, enabling data sharing, and minimizing effort. However, the process also represents a complex selection and trade-off matrix of predictive approximation of the intended analysis as the design defines the data set and the data set drives the analysis. This process-oriented poster documents the matrix we followed, the challenges and the solutions developed while operationalizing a data system for a large research project with major relational and descriptive aspects. Our resulting system utilizes existing competencies and departmental resources while meeting basic prerequisites for data security, sharing, interoperability, best practices and extensibility

    Invisible Bodies: Representing Gender and Gender Variance in Medical Records and Health Data

    No full text
    Prior to 2010, there was virtually no population-based health data on trans and gender variant populations at the federal, state, or local level in California. The population was invisible in the data. This research project took the formation of such an odd silence in the data as the motivation to form a qualitative approach to studying how gender has been encoded in medical records and health data in California. The project focused on identifying which aspects of gender were able to be recorded and what aspects were not able to be represented within the affordances observed in the recordkeeping structures. The research materials documented practices in public health and medical recordkeeping contexts at a critical historical juncture, when practices were in flux, both locally and nationally, focusing on three research sites in California, in San Francisco, Los Angeles, and Sacramento. All of the sites were directly engaged with recordkeeping questions around how gender is encoded for trans and gender variant populations, working collaboratively to develop the information structures for representing gender and gender variance in their recordkeeping systems.The analysis produced documentation of the information structures encoding gender at each of the sites and concluded that there are six (6) major functional information recordkeeping elements conflated under the current system of binary gender markers, including elements of gender identity, social gender (pronouns), medical gender, legal gender, organ lists and sex assigned at birth. Several of these elements operate in unrelated fields of policy discourse, such as those around identity documents (legal gender) versus medical diagnoses (medical gender). These overlapping policy environments often conflict in reality, yielding uncountable populations of trans and gender variant persons attempting to access care with unresolvable incongruence between the legal, medical and social records of gender. The information disparities in the functionality to record gender variance suggests that system designers need to be more concerned with the edge effects affecting populations who are poorly represented within data structures, especially as these recordkeeping systems are scaled up to include the entire general population writ large. Many patients still withhold gender identity information from healthcare systems, which suggests that conflicts in recordkeeping practices cannot be resolved without first addressing basic safety and data security issues for trans and gender variant populations

    Follow the Data: How astronomers use and reuse data

    No full text
    We analyze the people and infrastructure involved in the building, sustaining, and curation of large astronomy sky surveys. Our research assesses what new infrastructures, divisions of labor, knowledge, and expertise are necessary for the proper care of data. Between May 2011- February 2012, we conducted fourteen interviews employing Sloan Digital Sky Survey (SDSS) data use as the focus. SDSS is a multi-faceted, multi-phased data-driven telescope project with hundreds of collaborators and thousands of users of the open data. The Follow the Data interview protocol identifies a single publication authored by each interviewee and uses it as a lens looking backward and forward to identify data uses leading into and out of the publication. The interviews revealed the ways these astronomers discover, locate, retrieve, and store external data for their research. Any given astronomy research project may employ multiple methods to discover, locate, retrieve, and store multiple datasets. Our research finds that informal and formal methods are used to discover and locate data, including person-to-person contact. Data retrieval and storage methods are often determined by the size of the dataset and the amount of infrastructure available to the researcher. Astronomy research practices are evolving rapidly with access to more data and better tools. The poster presentation will report further on how those data are used and reused in astronomy. Sands, A., Borgman, C. L., Wynholds, L., & Traweek, S. (2012, October 29). Follow the Data: How astronomers use and reuse data. Poster presented at the ASIS&T; 75th Annual Meeting, Baltimore, MD. Retrieved from http://www.asis.org/asist2012/abstracts/341.htm

    When Use Cases Are Not Useful: Data Practices, Astronomy, and Digital Libraries

    No full text
    As science becomes more dependent upon digital data, the need for data curation and for data digital libraries becomes more urgent. Questions remain about what researchers consider to be their data, their criteria for selecting and trusting data, and their orientation to data challenges. This paper reports findings from the first 18 months of research on astronomy data practices from the Data Conservancy. Initial findings suggest that issues for data production, use, preservation, and sharing revolve around factors that rarely are accommodated in use cases for digital library system design including trust in data, funding structures, communication channels, and perceptions of scientific value
    corecore