53 research outputs found

    What lies beneath?: Knowledge infrastructures in the subseafloor biosphere and beyond

    Get PDF
    We present preliminary findings from a three-year research project comprised of longitudinal qualitative case studies of data practices in four large, distributed, highly multidisciplinary scientific collaborations. This project follows a 2 ×× 2 research design: two of the collaborations are big science while two are little science, two have completed data collection activities while two are ramping up data collection. This paper is centered on one of these collaborations, a project bringing together scientists to study subseafloor microbial life. This collaboration is little science, characterized by small teams, using small amounts of data, to address specific questions. Our case study employs participant observation in a laboratory, interviews ( n=49n=49 to date) with scientists in the collaboration, and document analysis. We present a data workflow that is typical for many of the scientists working in the observed laboratory. In particular, we show that, although this workflow results in datasets apparently similar in form, nevertheless a large degree of heterogeneity exists across scientists in this laboratory in terms of the methods they employ to produce these datasets—even between scientists working on adjacent benches. To date, most studies of data in little science focus on heterogeneity in terms of the types of data produced: this paper adds another dimension of heterogeneity to existing knowledge about data in little science. This additional dimension makes more complex the task of management and curation of data for subsequent reuse. Furthermore, the nature of the factors that contribute to heterogeneity of methods suggest that this dimension of heterogeneity is a persistent and unavoidable feature of little science.Alfred P. Sloan Foundation (#20113194)Ope

    Data Management in the Long Tail: Science, Software, and Service

    Get PDF
    Scientists in all fields face challenges in managing and sustaining access to their research data. The larger and longer term the research project, the more likely that scientists are to have resources and dedicated staff to manage their technology and data, leaving those scientists whose work is based on smaller and shorter term projects at a disadvantage. The volume and variety of data to be managed varies by many factors, only two of which are the number of collaborators and length of the project. As part of an NSF project to conceptualize the Institute for Empowering Long Tail Research, we explored opportunities offered by Software as a Service (SaaS). These cloud-based services are popular in business because they reduce costs and labor for technology management, and are gaining ground in scientific environments for similar reasons. We studied three settings where scientists conduct research in small and medium-sized laboratories. Two were NSF Science and Technology Centers (CENS and C-DEBI) and the third was a workshop of natural reserve scientists and managers. These laboratories have highly diverse data and practices, make minimal use of standards for data or metadata, and lack resources for data management or sustaining access to their data, despite recognizing the need. We found that SaaS could address technical needs for basic document creation, analysis, and storage, but did not support the diverse and rapidly changing needs for sophisticated domain-specific tools and services. These are much more challenging knowledge infrastructure requirements that require long-term investments by multiple stakeholders.

    Unearthing the Infrastructure: Humans and Sensors in Field-Based Scientific Research

    Full text link
    Distributed sensing systems for studying scientific phenomena are critical applications of information technologies. By embedding computational intelligence in the environment of study, sensing systems allow researchers to study phenomena at spatial and temporal scales that were previously impossible to achieve. We present an ethnographic study of field research practices among researchers in the Center for Embedded Networked Sensing (CENS), a National Science Foundation Science & Technology Center devoted to developing wireless sensing systems for scientific and social applications. Using the concepts of boundary objects and trading zones, we trace the processes of collaborative research around sensor technology development and adoption within CENS. Over the 10-year lifespan of CENS, sensor technologies, sensor data, field research methods, and statistical expertise each emerged as boundary objects that were understood differently by the science and technology partners. We illustrate how sensing technologies were incompatible with field-based environmental research until researchers “unearthed” their infrastructures, explicitly reintroducing human skill and expertise into the data collection process and developing new collaborative languages that emphasized building dynamic sensing systems that addressed human needs. In collaborating around a dynamic sensing model, the sensing systems became embedded not in the environment of study, but in the practices of the scientists. Status and citation: This is the revised and accepted version, prior to publisher’s copy editing. Please quote the final version: Mayernik, Matthew S., Wallis, Jillian C., & Borgman, Christine L. (In press). Unearthing the infrastructure: Humans and sensors in field-based scientific research. Journal of Computer Supported Cooperative Work. doi: 10.1007/s10606-012-9178-

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    The Special Case of Scientific Data Sharing with Education

    Get PDF
    The seemingly simple task of reusing data for science education relies on the presence of scientific data, scientists willing to share, infrastructure to provide access, and mechanisms to share between the two disparate communities of scientists and science students. What makes sharing between scientists and science students a special case of data sharing, is that all of the implicit knowledge attending the data must pass along this same vector. Our work at the Center for Embedded Networked Sensing studying aspects of this data reuse problem has shown us a rough outline of how the future of this data sharing will look. Our approach is to start from the prospective of the scientists, looking for opportunities to support scientific research, and then leveraging the data for reuse by education. The investment needed to capture high quality scientific data necessitates the consideration of reuse by the general population as well as other interested scientific parties

    Whose data do you trust? Integrity issues in the preservation of scientific data

    Get PDF
    Integrity of content is a generic issue in curation and preservation, but has not been extensively studied in relation to scientific data. Data are now being seen as an important end product of scholarship in themselves. In this paper, we will discuss data integrity issues in relation to environmental and ecological data, and the implications of these issues on the development of data digital libraries. For users to trust and interpret the data in scientific digital libraries, they must be able to assess the integrity of those data. Criteria for data integrity vary by context, by scientific problem, by individual, and a variety of other factors. The goal of this research is to identify functional requirements for digital libraries of scientific data, encompassing both technical and social factors that can affect data integrity. Mechanisms to ensure data integrity have to be present at each stage in the data life cycle, from data collection to data preservation and curation. The implications of our research on data integrity are multi-fold for the iSchool research community, and we hope to promote discussion of these issues

    An Exploration of the Life Cycle of eScience Collaboratory Data

    Get PDF
    The success of eScience research depends not only upon effective collaboration between scientists and technologists but also upon the active involvement of information scientists. Archivists rarely receive scientific data until findings are published, by which time important information about their origins, context, and provenance may be lost. Research reported here addresses the lifecycles of data from ecological research with embedded networked sensing technologies. A better understanding of these processes will enable information scientists to participate in earlier stages of the life cycle and to improve curation of these types of scientific data. Evidence from our interview study and field research yields a nine lifecycle phases, and three types of lifecycle depending on the research goal. Findings include highlighting the impact of collaboration on the research processes and potential phases during which the integrity of the captured data is compromised

    Information challenges in collaborative science

    Full text link
    Collaborative research is on the rise, and presents difficult information challenges that must be overcome to make this mode of research effective. For instance distributed research is regularly plagued by the “distance matters” problem just as multi-disciplinary research is often plagued by misaligned terminology and assumptions. These problems can potentially be alleviated using information channels and mediation in new ways, as well as balancing technology with policy and incentives. In this panel we engage with information challenges in collaborative sciences from a variety of perspectives, with the hope that a rich discussion will emerge regarding these challenges.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/83168/1/14504701247_ftp.pd
    corecore