1,216 research outputs found

    AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture

    Get PDF
    The future of agricultural research depends on data. The sheer volume of agricultural biological data being produced today makes excellent data management essential. Governmental agencies, publishers and science funders require data management plans for publicly funded research. Furthermore, the value of data increases exponentially when they are properly stored, described, integrated and shared, so that they can be easily utilized in future analyses. AgBioData (https://www.agbiodata.org) is a consortium of people working at agricultural biological databases, data archives and knowledgbases who strive to identify common issues in database development, curation and management, with the goal of creating database products that are more Findable, Accessible, Interoperable and Reusable. We strive to promote authentic, detailed, accurate and explicit communication between all parties involved in scientific data. As a step toward this goal, we present the current state of biocuration, ontologies, metadata and persistence, database platforms, programmatic (machine) access to data, communication and sustainability with regard to data curation. Each section describes challenges and opportunities for these topics, along with recommendations and best practices

    Darwin Core: An Evolving Community-Developed Biodiversity Data Standard

    Get PDF
    Biodiversity data derive from myriad sources stored in various formats on many distinct hardware and software platforms. An essential step towards understanding global patterns of biodiversity is to provide a standardized view of these heterogeneous data sources to improve interoperability. Fundamental to this advance are definitions of common terms. This paper describes the evolution and development of Darwin Core, a data standard for publishing and integrating biodiversity information. We focus on the categories of terms that define the standard, differences between simple and relational Darwin Core, how the standard has been implemented, and the community processes that are essential for maintenance and growth of the standard. We present case-study extensions of the Darwin Core into new research communities, including metagenomics and genetic resources. We close by showing how Darwin Core records are integrated to create new knowledge products documenting species distributions and changes due to environmental perturbations

    A reporting format for leaf-level gas exchange data and metadata

    Get PDF
    Leaf-level gas exchange data support the mechanistic understanding of plant fluxes of carbon and water. These fluxes inform our understanding of ecosystem function, are an important constraint on parameterization of terrestrial biosphere models, are necessary to understand the response of plants to global environmental change, and are integral to efforts to improve crop production. Collection of these data using gas analyzers can be both technically challenging and time consuming, and individual studies generally focus on a small range of species, restricted time periods, or limited geographic regions. The high value of these data is exemplified by the many publications that reuse and synthesize gas exchange data, however the lack of metadata and data reporting conventions make full and efficient use of these data difficult. Here we propose a reporting format for leaf-level gas exchange data and metadata to provide guidance to data contributors on how to store data in repositories to maximize their discoverability, facilitate their efficient reuse, and add value to individual datasets. For data users, the reporting format will better allow data repositories to optimize data search and extraction, and more readily integrate similar data into harmonized synthesis products. The reporting format specifies data table variable naming and unit conventions, as well as metadata characterizing experimental conditions and protocols. For common data types that were the focus of this initial version of the reporting format, i.e., survey measurements, dark respiration, carbon dioxide and light response curves, and parameters derived from those measurements, we took a further step of defining required additional data and metadata that would maximize the potential reuse of those data types. To aid data contributors and the development of data ingest tools by data repositories we provided a translation table comparing the outputs of common gas exchange instruments. Extensive consultation with data collectors, data users, instrument manufacturers, and data scientists was undertaken in order to ensure that the reporting format met community needs. The reporting format presented here is intended to form a foundation for future development that will incorporate additional data types and variables as gas exchange systems and measurement approaches advance in the future. The reporting format is published in the U.S. Department of Energy's ESS-DIVE data repository, with documentation and future development efforts being maintained in a version control system

    Obo foundry food ontology interconnectivity

    Get PDF
    Since its creation in 2016, the FoodOn ontology has become an interconnected partner in various academic and government inter-agency ontology work spanning agricultural and public health domains. This paper examines existing and potential data interoperability capabilities arising from FoodOn and partner food-related ontologies belonging to the encyclopedic Open Biological and Biomedical Ontology Foundry (OBO) vocabulary platform, and how research organizations and industry might utilize them for their own operations or for data exchange. Projects are seeking standardized vocabulary across all direct food supply activities ranging from agricultural production, harvesting, preparation, food processing, marketing, distribution and consumption, as well as indirectly, within health, economic, food security and sustainability analysis and reporting tools. To satisfy this demand and provide data requires establishing domain specific ontologies whose curators coordinate closely to produce recommended patterns for food system vocabulary

    AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture

    Get PDF
    The future of agricultural research depends on data. The sheer volume of agricultural biological data being produced today makes excellent data management essential. Governmental agencies, publishers and science funders require data management plans for publicly funded research. Furthermore, the value of data increases exponentially when they are properly stored, described, integrated and shared, so that they can be easily utilized in future analyses. AgBioData (https://www.agbiodata.org) is a consortium of people working at agricultural biological databases, data archives and knowledgbases who strive to identify common issues in database development, curation and management, with the goal of creating database products that are more Findable, Accessible, Interoperable and Reusable. We strive to promote authentic, detailed, accurate and explicit communication between all parties involved in scientific data. As a step toward this goal, we present the current state of biocuration, ontologies, metadata and persistence, database platforms, programmatic (machine) access to data, communication and sustainability with regard to data curation. Each section describes challenges and opportunities for these topics, along with recommendations and best practices

    Complex adaptive systems based data integration : theory and applications

    Get PDF
    Data Definition Languages (DDLs) have been created and used to represent data in programming languages and in database dictionaries. This representation includes descriptions in the form of data fields and relations in the form of a hierarchy, with the common exception of relational databases where relations are flat. Network computing created an environment that enables relatively easy and inexpensive exchange of data. What followed was the creation of new DDLs claiming better support for automatic data integration. It is uncertain from the literature if any real progress has been made toward achieving an ideal state or limit condition of automatic data integration. This research asserts that difficulties in accomplishing integration are indicative of socio-cultural systems in general and are caused by some measurable attributes common in DDLs. This research’s main contributions are: (1) a theory of data integration requirements to fully support automatic data integration from autonomous heterogeneous data sources; (2) the identification of measurable related abstract attributes (Variety, Tension, and Entropy); (3) the development of tools to measure them. The research uses a multi-theoretic lens to define and articulate these attributes and their measurements. The proposed theory is founded on the Law of Requisite Variety, Information Theory, Complex Adaptive Systems (CAS) theory, Sowa’s Meaning Preservation framework and Zipf distributions of words and meanings. Using the theory, the attributes, and their measures, this research proposes a framework for objectively evaluating the suitability of any data definition language with respect to degrees of automatic data integration. This research uses thirteen data structures constructed with various DDLs from the 1960\u27s to date. No DDL examined (and therefore no DDL similar to those examined) is designed to satisfy the law of requisite variety. No DDL examined is designed to support CAS evolutionary processes that could result in fully automated integration of heterogeneous data sources. There is no significant difference in measures of Variety, Tension, and Entropy among DDLs investigated in this research. A direction to overcome the common limitations discovered in this research is suggested and tested by proposing GlossoMote, a theoretical mathematically sound description language that satisfies the data integration theory requirements. The DDL, named GlossoMote, is not merely a new syntax, it is a drastic departure from existing DDL constructs. The feasibility of the approach is demonstrated with a small scale experiment and evaluated using the proposed assessment framework and other means. The promising results require additional research to evaluate GlossoMote’s approach commercial use potential

    A survey of semantic web technology for agriculture.

    Get PDF
    ABSTRACT. Semantic web technologies have become a popular technique to apply meaning to unstructured data. They have been infrequently applied to problems within the agricultural domain when compared to complementary domains. Despite this lack of application, agriculture has a large number of semantic resources that have been developed by large NGOs such as the Food and Agriculture Organization (FAO). This survey is intended to motivate further research in the application of semantic web technologies for agricultural problems, by making available a self contained reference that provides: a comprehensive review of preexisting semantic resources and their construction methods, data interchange standards, as well as a survey of the current applications of semantic web technologies

    Xml Beyond The Tags

    Get PDF
    XML is quickly being utilized in the field of technical communication to transfer information from database to person and company to company. Often communicators will structure information without a second thought of how or why certain tags are used to mark up the information. Because the company or a manual says to use those tags, the communicator does so. However, if professionals want to unlock the true potential of XML for better sharing of information across platforms, they need to understand the effects the technology using XML as well as political and cultural factors have on the tags being used. This thesis reviewed literature from multiple fields utilizing XML to find how tag choices can be influenced. XML allows for the sharing of information across multiple platforms and databases. Because of this efficiency, XML is utilized by many technologies. Often communicators must tag information so that the technologies can find the marked up information; therefore, technologies like single sourcing, data mining, and knowledge management influence the types of tags created. Additionally, cultural and political influences are analyzed to see how they play a role in determining what tags are used and created for specific documents. The thesis concludes with predictions on the future of XML and the technological, political, and cultural influences associated with XML tag sets based on information found within the thesis
    • …
    corecore