681,551 research outputs found

    Managing scientific data with named data networking

    Get PDF
    Many scientific domains, such as climate science and High Energy Physics (HEP), have data management requirements that are not well supported by the IP network architecture. Named Data Networking (NDN) is a new network architecture whose service model is better aligned with the needs of data-oriented applications. NDN provides features such as best-location retrieval, caching, load sharing, and transparent failover that would otherwise be painstakingly (re-)implemented by each application using point-to-point semantics in an IP network. We present the first scientific data management application designed and implemented on top of NDN. We use this application to manage climate and HEP data over a dedicated, high-performance, testbed. Our application has two main components: a UI for dataset discovery queries and a federation of synchronized name catalogs. We show how NDN primitives can be used to implement common data management operations such as publishing, search, efficient retrieval, and publication access control

    Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

    Get PDF
    4th International Conference on Open RepositoriesThis presentation was part of the session : Conference PresentationsDate: 2009-05-19 03:00 PM – 04:30 PMLong-term preservation and stewardship of scientific data and research-related information is paramount to the future of science and scholarship. Disciplinary and interdisciplinary scientific data archives can offer capabilities for managing and preserving data for research, education, and decision-making activities of future communities representing various scientific and scholarly disciplines. However, meeting the requirements for a trusted digital repository presents challenges to ensure that archived collections will be discoverable, accessible, and usable in the future. Assessing whether scientific data archives meet the requirements for trustworthy repositories will help to ensure that todayâ s collections of scientific data will be available in the future. A continuing self-assessment of a long-term archive for interdisciplinary scientific data is being conducted to identify improvements needed to become a trustworthy repository for managing and providing access to interdisciplinary scientific data by future communities of users. Recommendations are offered for archives of scientific data to meet the requirements of a trustworthy repository.NAS

    Managing scientific data

    Get PDF
    Data-oriented scientific processes depend on fast, accurate analysis of experimental data generated through empirical observation and simulation. However, scientists are increasingly overwhelmed by the volume of data produced by their own experiments. With improving instrument precision and the complexity of the simulated models, data overload promises to only get worse. The inefficiency of existing database management systems (DBMSs) for addressing the requirements of scientists has led to many application-specific systems. Unlike their general-purpose counterparts, these systems require more resources, hindering reuse of knowledge. Still, the data-management community aspires to general-purpose scientific data management. Here, we explore the most important requirements of such systems and the techniques being used to address them

    BSML: A Binding Schema Markup Language for Data Interchange in Problem Solving Environments (PSEs)

    Full text link
    We describe a binding schema markup language (BSML) for describing data interchange between scientific codes. Such a facility is an important constituent of scientific problem solving environments (PSEs). BSML is designed to integrate with a PSE or application composition system that views model specification and execution as a problem of managing semistructured data. The data interchange problem is addressed by three techniques for processing semistructured data: validation, binding, and conversion. We present BSML and describe its application to a PSE for wireless communications system design

    DataJoint: managing big scientific data using MATLAB or Python

    Get PDF
    The rise of big data in modern research poses serious challenges for data management: Large and intricate datasets from diverse instrumentation must be precisely aligned, annotated, and processed in a variety of ways to extract new insights. While high levels of data integrity are expected, research teams have diverse backgrounds, are geographically dispersed, and rarely possess a primary interest in data science. Here we describe DataJoint, an open-source toolbox designed for manipulating and processing scientific data under the relational data model. Designed for scientists who need a flexible and expressive database language with few basic concepts and operations, DataJoint facilitates multi-user access, efficient queries, and distributed computing. With implementations in both MATLAB and Python, DataJoint is not limited to particular file formats, acquisition systems, or data modalities and can be quickly adapted to new experimental designs. DataJoint and related resources are available at http://datajoint.github.com

    Managing scientific research data: data packaging and organizing materials for curation

    Get PDF
    The SGS-LTER research site was established in 1980 by researchers at Colorado State University as part of a network of long-term research sites within the US LTER Network, supported by the National Science Foundation. Scientists within the Natural Resource Ecology Lab, Department of Forest and Rangeland Stewardship, Department of Soil and Crop Sciences, and Biology Department at CSU, California State Fullerton, USDA Agricultural Research Service, University of Northern Colorado, and the University of Wyoming, among others, have contributed to our understanding of the structure and functions of the shortgrass steppe and other diverse ecosystems across the network while maintaining a common mission and sharing expertise, data and infrastructure.Presentation held at the Front Range Data Librarian Meeting on June 16, 2014 at CSU Libraries and Natural Resource Ecology Laboratory in Fort Collins, Colorado.NSF Grant DEB-1027319

    Managing scientific data for long-term access and use

    Full text link
    Preservation of data for long-term use will require data management strategies that include curation and preservation planning and implementation. While data management and curatorial activities have been an integral part of some scientific domains for years (see for example, high energy particle physics), these are new concepts in other areas of science. Concepts such as provenance, representation for re-use, and work-flow capture are rarely understood, let alone addressed. By bringing together theories and best practices from archives, museum studies, and library and information science (LIS), it is possible to address these problems. on current research into scientific data management problems, this panel will consider questions about sharing and re-use of data, curation and preservation, and the intersection of scientific production and scholarly communication. Our research explores information work and problems across a range of scientific areas in the life and physical sciences, including genomics, neuroscience, ecology, and earth science. As more scientific work products are shifted to open or shared data collections (including archives, repositories and databases), we will need to understand how these systems are implemented and used to support collaboration and discovery, as well as scholarly and scientific communication.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/57315/1/14504301123_ftp.pd
    • …
    corecore