63 research outputs found

    DataONE: Facilitating eScience through Collaboration

    Get PDF
    Objective: To introduce DataONE, a multi-institutional, multinational, and interdisciplinary collaboration that is developing the cyberinfrastructure and organizational structure to support the full information lifecycle of biological, ecological, and environmental data and tools to be used by researchers, educators, and the public at large. Setting: The dynamic world of data intensive science at the point it interacts with the grand challenges facing environmental sciences. Methods: Briefly discuss science’s “fourth paradigm,” then introduce how DataONE is being developed to answer the challenges presented by this new environment. Sociocultural perspectives are the primary focus of the discussion. Results: DataONE is highly collaborative. This is a result of its cyberinfrastructure architecture, its interdisciplinary nature, and its organizational diversity. The organizational structure of an agile management team, diverse leadership team, and productive working groups provides for a successful collaborative environment where substantial contributions to the DataONE mission have been made by a large number of people. Conclusions: Librarians and information science researchers are key partners in the development of DataONE. These roles are likely to grow as more scientists engage data at all points of the data lifecycle

    Assessment, Usability, and Sociocultural Impacts of DataONE

    Get PDF
    DataONE, funded from 2009-2019 by the U.S. National Science Foundation, is an early example of a large-scale project that built both a cyberinfrastructure and culture of data discovery, sharing, and reuse. DataONE used a Working Group model, where a diverse group of participants collaborated on targeted research and development activities to achieve broader project goals. This article summarizes the work carried out by two of DataONE’s working groups: Usability & Assessment (2009-2019) and Sociocultural Issues (2009-2014). The activities of these working groups provide a unique longitudinal look at how scientists, librarians, and other key stakeholders engaged in convergence research to identify and analyze practices around research data management through the development of boundary objects, an iterative assessment program, and reflection. Members of the working groups disseminated their findings widely in papers, presentations, and datasets, reaching international audiences through publications in 25 different journals and presentations to over 5,000 people at interdisciplinary venues. The working groups helped inform the DataONE cyberinfrastructure and influenced the evolving data management landscape. By studying working groups over time, the paper also presents lessons learned about the working group model for global large-scale projects that bring together participants from multiple disciplines and communities in convergence research

    Perceived discontinuities and continuities in transdisciplinary scientific working groups

    Get PDF
    We examine the DataONE (Data Observation Network for Earth) project, a transdisciplinary organization tasked with creating a cyberinfrastructure platform to ensure preservation of and access to environmental science and biological science data. Its objective was a difficult one to achieve, requiring innovative solutions. The DataONE project used a working group structure to organize its members. We use organizational discontinuity theory as our lens to understand the factors associated with success in such projects. Based on quantitative and qualitative data collected from DataONE members, we offer recommendations for the use of working groups in transdisciplinary synthesis. Recommendations include welcome diverse opinions and world views, establish shared communication practices, schedule periodic synchronous face-to-face meetings, and ensure the active participation of bridge builders or knowledge brokers such as librarians who know how to ask questions about disciplines not their own

    Complex adaptive systems theory applied to virtual scientific collaborations: The case of DataONE

    Get PDF
    This study is the exploration of the emergence of DataONE, a multidisciplinary, multinational, and multi-institutional virtual scientific collaboration to develop a cyberinfrastructure for earth sciences data, from the complex adaptive systems perspective. Data is generated through conducting 15 semi-structured interviews, observing three 3-day meetings, and 51 online surveys. The main contribution of this study is the development of a complexity framework and its application to a project such as DataONE. The findings reveal that DataONE behaves like a complex adaptive system: various individuals and institutions interacting, adapting, and coevolving to achieve their own and common goals; during the process new structures, relationships, and products emerge that harmonize with DataONE’s goals. DataONE is quite resilient to threats and adaptive to its environment, which are important strengths. The strength comes from its diversified structure and balanced management style that allows for frequent interaction among members. The study also offers insights to PI(s), managers, and funding institutions on how to treat complex systems. Additional results regarding multidisiplinarity, library and information sciences, and communication studies are presented as well

    Using peer review to support development of community resources for research data management

    Get PDF
    This work is licensed under a Creative Commons 1.0 Public Domain Dedication. The definitive version was published in Journal of eScience Librarianship 6 (2017): e1114, doi:10.7191/jeslib.2017.1114.To ensure that resources designed to teach skills and best practices for scientific research data sharing and management are useful, the maintainers of those materials need to evaluate and update them to ensure their accuracy, currency, and quality. This paper advances the use and process of outside peer review for community resources in addressing ongoing accuracy, quality, and currency issues. It further describes the next step of moving the updated materials to an online collaborative community platform for future iterative review in order to build upon mechanisms for open science, ongoing iteration, participation, and transparent community engagement.DataONE is supported by US National Science Foundation Awards 08- 30944 and 14-30508, William Michener, Principal Investigator; Matthew Jones, Patricia Cruse, David Vieglais, and Suzanne Allard, Co-Principal Investigators

    Data Curation Education in Research Centers Poster

    Get PDF
    The volume of scientific data is growing exponentially across all scientific disciplines. Competent information professionals are needed to sort, catalog, store, and retrieve this data for future research and education requirements. In response to this need, the goal of the Data Curation Education in Research Centers (DCERC) project is to develop curriculum to educate information science students in the critical field of scientific data curation. Three masters degree students at University of Tennessee (UT) and three doctoral students at the University of Illinois, Urbana-Champaign are completing year one of the program. Each brings to the field of data curation skills obtained from prior work in diverse scientific and engineering professions. In the summers of 2012 and 2013, the masters students will travel to the National Center for Atmospheric Research (NCAR) in Boulder, Colorado, to work alongside scientists and researchers and to experience the demands of data curation at the source of data creation. The NCAR experience will allow students to assimilate the skills learned from the Fundamentals in Data Curation course, which will be completed in Spring 2012. This poster session will display and demonstrate the goals, student achievements, and overall program performance by providing examples of the specific skill sets the students are obtaining, projects they are completing, and expected future milestones

    Server‐side workflow execution using data grid technology for reproducible analyses of data‐intensive hydrologic systems

    Get PDF
    Many geoscience disciplines utilize complex computational models for advancing understanding and sustainable management of Earth systems. Executing such models and their associated data preprocessing and postprocessing routines can be challenging for a number of reasons including (1) accessing and preprocessing the large volume and variety of data required by the model, (2) postprocessing large data collections generated by the model, and (3) orchestrating data processing tools, each with unique software dependencies, into workflows that can be easily reproduced and reused. To address these challenges, the work reported in this paper leverages the Workflow Structured Object functionality of the Integrated Rule‐Oriented Data System and demonstrates how it can be used to access distributed data, encapsulate hydrologic data processing as workflows, and federate with other community‐driven cyberinfrastructure systems. The approach is demonstrated for a study investigating the impact of drought on populations in the Carolinas region of the United States. The analysis leverages computational modeling along with data from the Terra Populus project and data management and publication services provided by the Sustainable Environment‐Actionable Data project. The work is part of a larger effort under the DataNet Federation Consortium project that aims to demonstrate data and computational interoperability across cyberinfrastructure developed independently by scientific communities.Plain Language SummaryExecuting computational workflows in the geosciences can be challenging, especially when dealing with large, distributed, and heterogeneous data sets and computational tools. We present a methodology for addressing this challenge using the Integrated Rule‐Oriented Data System (iRODS) Workflow Structured Object (WSO). We demonstrate the approach through an end‐to‐end application of data access, processing, and publication of digital assets for a scientific study analyzing drought in the Carolinas region of the United States.Key PointsReproducibility of data‐intensive analyses remains a significant challengeData grids are useful for reproducibility of workflows requiring large, distributed data setsData and computations should be co‐located on servers to create executable Web‐resourcesPeer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/137520/1/ess271_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/137520/2/ess271.pd

    Data Sharing And Data Reuse: An Investigation Of Descriptive Information Facilitators And Inhibitors

    Get PDF
    This dissertation examines how descriptive information inhibits or facilitates data sharing and reuse. DataONE serves as the test environment. The objective is to identify descriptive information made discoverable through DataONE and subsequently determine what of this descriptive information is helpful for scientists to determine data reusability. This study uses a mixed method approach, which includes a data profiling assessment in the form of a quantitative and qualitative content analysis and a quasi-experiment think-aloud. A quantitative and qualitative content analysis was conducted on a stratified sample of data extracted from DataONE to examine types of descriptive information made available through the shared data. Participants searched a quasi-experiment interface and thought-aloud about what information inhibited or facilitated them to determine data reusability. Additionally, participants completed a post result usefulness survey, post search rank order survey, and a post search factors survey. The quantitative and qualitative content analysis shows that the shared data contains 30 unique pieces of descriptive information found in the records. The quasi-experiment think-aloud indicates that scientists found pieces of descriptive information particularly useful for their ability to determine data reusability. These include: (a) the data description, (b) the attribute table, and (c) the research methods. In conclusion, metadata schema, member node standards, and community standards, impact what types of descriptive information are provided through the shared data. Attribute and unit lists, research methods information, and succinctly written abstracts facilitate data reuse. However long abstracts and having the same information in multiple places, and the exclusion of data descriptions inhibit data reuse. The findings and recommendations assist funding agencies and scientific organizations in understanding the current state of data being shared and prioritizing how to meet the needs of scientists regarding data reuse. This dissertation provides guidance to developers of current and future data sharing environments and infrastructures, research data management and scientific communities, scientific data managers, creators of data management plans, and funding agencies; and has implications beyond DataONE.Doctor of Philosoph
    corecore