54 research outputs found
What lies beneath?: Knowledge infrastructures in the subseafloor biosphere and beyond
We present preliminary findings from a three-year research project comprised of longitudinal qualitative case studies of data practices in four large, distributed, highly multidisciplinary scientific collaborations. This project follows a 2 ×× 2 research design: two of the collaborations are big science while two are little science, two have completed data collection activities while two are ramping up data collection. This paper is centered on one of these collaborations, a project bringing together scientists to study subseafloor microbial life. This collaboration is little science, characterized by small teams, using small amounts of data, to address specific questions. Our case study employs participant observation in a laboratory, interviews ( n=49n=49 to date) with scientists in the collaboration, and document analysis. We present a data workflow that is typical for many of the scientists working in the observed laboratory. In particular, we show that, although this workflow results in datasets apparently similar in form, nevertheless a large degree of heterogeneity exists across scientists in this laboratory in terms of the methods they employ to produce these datasets—even between scientists working on adjacent benches. To date, most studies of data in little science focus on heterogeneity in terms of the types of data produced: this paper adds another dimension of heterogeneity to existing knowledge about data in little science. This additional dimension makes more complex the task of management and curation of data for subsequent reuse. Furthermore, the nature of the factors that contribute to heterogeneity of methods suggest that this dimension of heterogeneity is a persistent and unavoidable feature of little science.Alfred P. Sloan Foundation (#20113194)Ope
Data Management in the Long Tail: Science, Software, and Service
Scientists in all fields face challenges in managing and sustaining access to their research data. The larger and longer term the research project, the more likely that scientists are to have resources and dedicated staff to manage their technology and data, leaving those scientists whose work is based on smaller and shorter term projects at a disadvantage. The volume and variety of data to be managed varies by many factors, only two of which are the number of collaborators and length of the project. As part of an NSF project to conceptualize the Institute for Empowering Long Tail Research, we explored opportunities offered by Software as a Service (SaaS). These cloud-based services are popular in business because they reduce costs and labor for technology management, and are gaining ground in scientific environments for similar reasons. We studied three settings where scientists conduct research in small and medium-sized laboratories. Two were NSF Science and Technology Centers (CENS and C-DEBI) and the third was a workshop of natural reserve scientists and managers. These laboratories have highly diverse data and practices, make minimal use of standards for data or metadata, and lack resources for data management or sustaining access to their data, despite recognizing the need. We found that SaaS could address technical needs for basic document creation, analysis, and storage, but did not support the diverse and rapidly changing needs for sophisticated domain-specific tools and services. These are much more challenging knowledge infrastructure requirements that require long-term investments by multiple stakeholders.
Unearthing the Infrastructure: Humans and Sensors in Field-Based Scientific Research
Distributed sensing systems for studying scientific phenomena are critical applications of information technologies. By embedding computational intelligence in the environment of study, sensing systems allow researchers to study phenomena at spatial and temporal scales that were previously impossible to achieve. We present an ethnographic study of field research practices among researchers in the Center for Embedded Networked Sensing (CENS), a National Science Foundation Science & Technology Center devoted to developing wireless sensing systems for scientific and social applications. Using the concepts of boundary objects and trading zones, we trace the processes of collaborative research around sensor technology development and adoption within CENS. Over the 10-year lifespan of CENS, sensor technologies, sensor data, field research methods, and statistical expertise each emerged as boundary objects that were understood differently by the science and technology partners. We illustrate how sensing technologies were incompatible with field-based environmental research until researchers “unearthed” their infrastructures, explicitly reintroducing human skill and expertise into the data collection process and developing new collaborative languages that emphasized building dynamic sensing systems that addressed human needs. In collaborating around a dynamic sensing model, the sensing systems became embedded not in the environment of study, but in the practices of the scientists. Status and citation: This is the revised and accepted version, prior to publisher’s copy editing. Please quote the final version: Mayernik, Matthew S., Wallis, Jillian C., & Borgman, Christine L. (In press). Unearthing the infrastructure: Humans and sensors in field-based scientific research. Journal of Computer Supported Cooperative Work. doi: 10.1007/s10606-012-9178-
Finishing the euchromatic sequence of the human genome
The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead
The Special Case of Scientific Data Sharing with Education
The seemingly simple task of reusing data for science education relies on the presence of scientific data, scientists willing to share, infrastructure to provide access, and mechanisms to share between the two disparate communities of scientists and science students. What makes sharing between scientists and science students a special case of data sharing, is that all of the implicit knowledge
attending the data must pass along this same vector. Our work at the Center for Embedded Networked Sensing studying aspects of this data reuse problem has shown us a rough outline of how the future of this data sharing will look. Our approach is to start from the prospective of the
scientists, looking for opportunities to support scientific research, and then leveraging the data for reuse by education. The investment needed to capture high quality scientific data necessitates the consideration of reuse by the general population as well as other interested scientific parties
Whose data do you trust? Integrity issues in the preservation of scientific data
Integrity of content is a generic issue in curation and preservation,
but has not been extensively studied in relation to scientific data.
Data are now being seen as an important end product of
scholarship in themselves. In this paper, we will discuss data
integrity issues in relation to environmental and ecological data,
and the implications of these issues on the development of data
digital libraries. For users to trust and interpret the data in
scientific digital libraries, they must be able to assess the integrity
of those data. Criteria for data integrity vary by context, by
scientific problem, by individual, and a variety of other factors.
The goal of this research is to identify functional requirements for
digital libraries of scientific data, encompassing both technical
and social factors that can affect data integrity. Mechanisms to
ensure data integrity have to be present at each stage in the data
life cycle, from data collection to data preservation and curation.
The implications of our research on data integrity are multi-fold
for the iSchool research community, and we hope to promote
discussion of these issues
An Exploration of the Life Cycle of eScience Collaboratory Data
The success of eScience research depends not only upon effective
collaboration between scientists and technologists but also upon
the active involvement of information scientists. Archivists rarely
receive scientific data until findings are published, by which time
important information about their origins, context, and
provenance may be lost. Research reported here addresses the
lifecycles of data from ecological research with embedded
networked sensing technologies. A better understanding of these
processes will enable information scientists to participate in
earlier stages of the life cycle and to improve curation of these
types of scientific data. Evidence from our interview study and
field research yields a nine lifecycle phases, and three types of
lifecycle depending on the research goal. Findings include
highlighting the impact of collaboration on the research processes
and potential phases during which the integrity of the captured
data is compromised
Recommended from our members
The Distribution of Data Management Responsibility within Scientific Research Groups
Scientific data often are expensive to produce or impossible to reproduce. Those data may be of great future value for reuse, recombination, and replication by other researchers. However, the potential value of these data can only be achieved if the data producers manage them properly. Visions of data management and the role of the data producer have been constructed by data curators and funders from the top-down, but we have little understanding of what data management looks like on the ground. What do data producers see as their data management responsibilities? The exploratory research reported in this dissertation provides a rich description of data management tasks performed by members of six research groups and members' perception of data management responsibilities. Groups were selected from the Center for Embedded Networked Sensing (CENS), an NSF-funded Science and Technology Research Center, where researchers are already experiencing the data deluge. Document analysis, semi-structured interviews, and field observations were coded and analyzed for emergent themes and used to construct models of data management practices. Significant findings include: (i) these six research groups acquired a diverse array of data (ii) a generalized data life cycle can be applied to practices of these groups, (iii) researchers actively managed their data throughout the data life cycle to support their own use, and (iv) data management tasks were distributed between the members of a research group, and are tied to data handling tasks such as collection, processing, and analysis. The data management tasks performed by researchers are categorized into four core functions: selection for quality, verification for validity, storage for accessibility, and documentation for interpretability. A set of roles and responsibilities were identified for the data producers collaborating on each research project. These findings suggest that including author contribution statements in publications would assist future users of those data in determining who to contact for questions about their creation and context. This study reveals how, when, and why science and technology researchers manage their data and makes recommendations for data management within research groups that will make data more usable and sharable
Information challenges in collaborative science
Collaborative research is on the rise, and presents difficult information challenges that must be overcome to make this mode of research effective. For instance distributed research is regularly plagued by the “distance matters” problem just as multi-disciplinary research is often plagued by misaligned terminology and assumptions. These problems can potentially be alleviated using information channels and mediation in new ways, as well as balancing technology with policy and incentives. In this panel we engage with information challenges in collaborative sciences from a variety of perspectives, with the hope that a rich discussion will emerge regarding these challenges.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/83168/1/14504701247_ftp.pd
- …