31 research outputs found

    Informatics solutions for large ocean optics datasets

    Get PDF
    Ocean Optics XXI, Glasgow, Scotland October 8-12 2012Lack of observations that span the wide range of critical space and time scales continues to limit many aspects of oceanography. As ocean observatories and observing networks mature, the role for optical technologies and approaches in helping to overcome this limitation continues to grow. As a result the quantity and complexity of data produced is increasing at a pace that threatens to overwhelm the capacity of individual researchers who must cope with large high-resolution datasets, complex, multi-stage analyses, and the challenges of preserving sufficient metadata and provenance information to ensure reproducibility and avoid costly reprocessing or data loss. We have developed approaches to address these new challenges in the context of a case study involving very large numbers (~1 billion) of images collected at coastal observatories by Imaging FlowCytobot, an automated submersible flow cytometer that produces high resolution images of plankton and other microscopic particles at rates up to 10 Hz for months to years. By developing partnerships amongst oceanographers generating and using such data and computer scientists focused on improving science outcomes, we have prototyped a replicable system. It provides simple and ubiquitous access to observational data and products via web services in standard formats; accelerates image processing by enabling algorithms developed with desktop applications to be rapidly deployed and evaluated on shared, high-performance servers; and improves data integrity by replacing error-prone manual data management processes with generalized, automated services. The informatics system is currently in operation for multiple Imaging FlowCytobot datasets and being tested with other types of ocean imagery.This research was supported by grants from the Gordon and Betty Moore Foundation, NSF, NASA, and ONR (NOPP)

    Environmental metabolomics : databases and tools for data analysis

    Get PDF
    Ā© The Author(s), 2015. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Marine Chemistry 177 (2015): 366ā€“373, doi:10.1016/j.marchem.2015.06.012.Metabolomics is the study of small molecules, or ā€˜metabolitesā€™, that are the end products of biological processes. While -omics technologies such as genomics, transcriptomics, and proteomics measure the metabolic potential of organisms, metabolomics provides detailed information on the organic compounds produced during metabolism and found within cells and in the environment. Improvements in analytical techniques have expanded our understanding of metabolomics and developments in computational tools have made metabolomics data accessible to a broad segment of the scientific community. Yet, metabolomics methods have only been applied to a limited number of projects in the marine environment. Here, we review analysis techniques for mass spectrometry data and summarize the current state of metabolomics databases. We then describe a boutique database developed in our laboratory for efficient data analysis and selection of mass spectral targets for metabolite identification. The code to implement the database is freely available on GitHub (https://github.com/joefutrelle/domdb). Data organization and analysis are critical, but often under-appreciated, components of metabolomics research. Future advances in environmental metabolomics will take advantage of continued development of new tools that facilitate analysis of large metabolomics datasets.The field data populating the database comes from scientific cruises funded by grants from the National Science Foundation to EBK and KL (Atlantic Ocean, OCE-1154320) and E.V. Armbrust (Pacific Ocean, OCE-1205233). The laboratory experiment with coastal seawater was funded by a grant from the Gulf of Mexico Research Initiative to EBK and H.K. White. The laboratory experiments with microbial isolates and the database development are funded by the Gordon and Betty Moore Foundation through Grant GBMF3304 to EBK

    Communitybased Metadata Integration for Environmental Research

    Get PDF
    ABSTRACT The ability to aggregate information about environmental data and analysis processes across tools and services and across projects provides a powerful capability for discovering resources and coordinating projects and a means to convey the rich, community-scale context of data. In this paper, we summarize the science and engineering use cases motivating the metadata and provenance infrastructure of the Environmental Cyberinfrastructure Demonstrator (ECID) Cyberenvironment project at the National Center for Supercomputing Applications (NCSA) and discuss the requirements driving our system design. The user-level metadata and provenance capabilities being developed within ECID are described and we summarize the team's experiences in building them, and show how our experience can inform the continuing development and refinement of collaborative environmental science environments

    Community-based metadata integration for environmental research

    Get PDF
    Proceedings of the Seventh International Conference on Hydroscience and Engineering, Philadelphia, PA, September 2006. http://hdl.handle.net/1860/732The ability to aggregate information about environmental data and analysis processes across tools and services and across projects provides a powerful capability for discovering resources and coordinating projects and a means to convey the rich, community-scale context of data. In this paper, we summarize the science and engineering use cases motivating the metadata and provenance infrastructure of the Environmental Cyberinfrastructure Demonstrator (ECID) Cyberenvironment project at the National Center for Supercomputing Applications (NCSA) and discuss the requirements driving our system design. The user-level metadata and provenance capabilities being developed within ECID are described and we summarize the teamā€™s experiences in building them, and show how our experience can inform the continuing development and refinement of collaborative environmental science environments

    Standards and practices for reporting plankton and other particle observations from images

    Get PDF
    This technical manual guides the user through the process of creating a data table for the submission of taxonomic and morphological information for plankton and other particles from images to a repository. Guidance is provided to produce documentation that should accompany the submission of plankton and other particle data to a repository, describes data collection and processing techniques, and outlines the creation of a data file. Field names include scientificName that represents the lowest level taxonomic classification (e.g., genus if not certain of species, family if not certain of genus) and scientificNameID, the unique identifier from a reference database such as the World Register of Marine Species or AlgaeBase. The data table described here includes the field names associatedMedia, scientificName/ scientificNameID for both automated and manual identification, biovolume, area_cross_section, length_representation and width_representation. Additional steps that instruct the user on how to format their data for a submission to the Ocean Biodiversity Information System (OBIS) are also included. Examples of documentation and data files are provided for the user to follow. The documentation requirements and data table format are approved by both NASAā€™s SeaWiFS Bio-optical Archive and Storage System (SeaBASS) and the National Science Foundationā€™s Biological and Chemical Oceanography Data Management Office (BCO-DMO).This report was an outcome of a working group supported by the Ocean Carbon and Biogeochemistry (OCB) project office, which is funded by the US National Science Foundation (OCE1558412) and the National Aeronautics and Space Administration (NNX17AB17G). AN, SB, and CP conceived and drafted the document. IC, IST, JF and HS contributed to the main body of the document as well as the example files. All members of the working group contributed to the content of the document, including the conceptualization of the data table and metadata format. We would also like thank the external reviewers Cecile Rousseaux (NASA GSFC), Susanne Menden-Deuer (URI) Frank Muller-Karger (USF), and Abigail Benson (USGS) for their valuable feedback

    The First Provenance Challenge

    No full text
    The first Provenance Challenge was set up in order to provide a forum for the community to help understand the capabilities of different provenance systems and the expressiveness of their provenance representations. To this end, a Functional Magnetic Resonance Imaging workflow was defined, which participants had to either simulate or run in order to produce some provenance representation, from which a set of identified queries had to be implemented and executed. Sixteen teams responded to the challenge, and submitted their inputs. In this paper, we present the challenge workflow and queries, and summarise the participants contributions

    Metadata Enrichment for Digital Preservation

    Get PDF
    Description of structural and semantic relationships and properties of, within, and between resources is seen as a key issue in digital preservation. But the markup languages used to encode descriptions for migration between and storage within digital repositories are subject to the same interpretive problems that complicate other uses of markup. This paper reports on a project that aims to address these problems by explicating facts that otherwise would not support automated inferencing. These facts are expressed as RDF [Resource Description Framework] triples, stored in and retrieved from a scalable RDF-based repository.Library of Congress, award number A6075published or submitted for publicationis peer reviewe

    Preserving Meaning, Not Just Objects: Semantics and Digital Preservation

    Get PDF
    The ECHO DEPository project is a digital preservation research and development project funded by the National Digital Information Infrastructure and Preservation Program (NDIIPP) and administered by the Library of Congress. A key goal of this project is to investigate both practical solutions for supporting digital preservation activities today, and the more fundamental research questions underlying the development of the next generation of digital preservation systems. To support on-the-ground preservation efforts in existing technical and organizational environments, we have developed tools to help curators collect and manage Web-based digital resources, such as the Web Archives Workbench (Kaczmarek et al., 2008), and to enhance existing repositories??? support for interoperability and emerging preservation standards, such as the Hub and Spoke Tool Suite (Habing et al., 2008). In the longer term, however, we recognize that successful digital preservation activities will require a more precise and complete account of the meaning of relationships within and among digital objects. This article describes project efforts to identify the core underlying semantic issues affecting long-term digital preservation, and to model how semantic inference may help next-generation archives head off long-term preservation risks.published or submitted for publicatio
    corecore