8 research outputs found
Data sharing and ontology use among agricultural genetics, genomics, and breeding databases and resources of the AgBioData Consortium
Over the last several decades, there has been rapid growth in the number and
scope of agricultural genetics, genomics and breeding (GGB) databases and
resources. The AgBioData Consortium (https://www.agbiodata.org/) currently
represents 44 databases and resources covering model or crop plant and animal
GGB data, ontologies, pathways, genetic variation and breeding platforms
(referred to as 'databases' throughout). One of the goals of the Consortium is
to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data
management and the integration of datasets which requires data sharing, along
with structured vocabularies and/or ontologies. Two AgBioData working groups,
focused on Data Sharing and Ontologies, conducted a survey to assess the status
and future needs of the members in those areas. A total of 33 researchers
responded to the survey, representing 37 databases. Results suggest that data
sharing practices by AgBioData databases are in a healthy state, but it is not
clear whether this is true for all metadata and data types across all
databases; and that ontology use has not substantially changed since a similar
survey was conducted in 2017. We recommend 1) providing training for database
personnel in specific data sharing techniques, as well as in ontology use; 2)
further study on what metadata is shared, and how well it is shared among
databases; 3) promoting an understanding of data sharing and ontologies in the
stakeholder community; 4) improving data sharing and ontologies for specific
phenotypic data types and formats; and 5) lowering specific barriers to data
sharing and ontology use, by identifying sustainability solutions, and the
identification, promotion, or development of data standards. Combined, these
improvements are likely to help AgBioData databases increase development
efforts towards improved ontology use, and data sharing via programmatic means.Comment: 17 pages, 8 figure
Recommended from our members
The Gene Ontology in 2010: extensions and refinements
The Gene Ontology (GO) Consortium (http://www.geneontology.org) (GOC) continues to develop,
maintain and use a set of structured, controlled
vocabularies for the annotation of genes, gene
products and sequences. The GO ontologies
are expanding both in content and in structure.
Several new relationship types have been introduced
and used, along with existing relationships,
to create links between and within the GO domains.
These improve the representation of biology,
facilitate querying, and allow GO developers to systematically
check for and correct inconsistencies
within the GO. Gene product annotation using GO
continues to increase both in the number of total
annotations and in species coverage. GO tools,
such as OBO-Edit, an ontology-editing tool, and
AmiGO, the GOC ontology browser, have seen
major improvements in functionality, speed and
ease of use.This is the publisher’s final pdf. The published article is copyrighted by the author(s) and published by Oxford University Press. The published article can be found at: http://nar.oxfordjournals.org/
The Gene Ontology (GO) Cellular Component Ontology: integration with SAO (Subcellular Anatomy Ontology) and other recent developments.
BACKGROUND: The Gene Ontology (GO) (http://www.geneontology.org/) contains a set of terms for describing the activity and actions of gene products across all kingdoms of life. Each of these activities is executed in a location within a cell or in the vicinity of a cell. In order to capture this context, the GO includes a sub-ontology called the Cellular Component (CC) ontology (GO-CCO). The primary use of this ontology is for GO annotation, but it has also been used for phenotype annotation, and for the annotation of images. Another ontology with similar scope to the GO-CCO is the Subcellular Anatomy Ontology (SAO), part of the Neuroscience Information Framework Standard (NIFSTD) suite of ontologies. The SAO also covers cell components, but in the domain of neuroscience.
DESCRIPTION: Recently, the GO-CCO was enriched in content and links to the Biological Process and Molecular Function branches of GO as well as to other ontologies. This was achieved in several ways. We carried out an amalgamation of SAO terms with GO-CCO ones; as a result, nearly 100 new neuroscience-related terms were added to the GO. The GO-CCO also contains relationships to GO Biological Process and Molecular Function terms, as well as connecting to external ontologies such as the Cell Ontology (CL). Terms representing protein complexes in the Protein Ontology (PRO) reference GO-CCO terms for their species-generic counterparts. GO-CCO terms can also be used to search a variety of databases.
CONCLUSIONS: In this publication we provide an overview of the GO-CCO, its overall design, and some recent extensions that make use of additional spatial information. One of the most recent developments of the GO-CCO was the merging in of the SAO, resulting in a single unified ontology designed to serve the needs of GO annotators as well as the specific needs of the neuroscience community
Recommended from our members
Data sharing and ontology use among agricultural genetics, genomics, and breeding databases and resources of the Agbiodata Consortium
Over the last couple of decades, there has been a rapid growth in the number and scope of agricultural genetics, genomics and breeding databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources (https://www.agbiodata.org/databases) covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as 'databases' throughout). One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of datasets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups, focused on Data Sharing and Ontologies, respectively, conducted a Consortium-wide survey to assess the current status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data-sharing practices by AgBioData databases are in a fairly healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that, ontology use has not substantially changed since a similar survey was conducted in 2017. Based on our evaluation of the survey results, we recommend (i) providing training for database personnel in a specific data-sharing techniques, as well as in ontology use; (ii) further study on what metadata is shared, and how well it is shared among databases; (iii) promoting an understanding of data sharing and ontologies in the stakeholder community; (iv) improving data sharing and ontologies for specific phenotypic data types and formats; and (v) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and the identification, promotion, or development of data standards. Combined, these improvements are likely to help AgBioData databases increase development efforts towards improved ontology use, and data sharing via programmatic means. Database URL https://www.agbiodata.org/databases
Crowdsourcing biocuration: The Community Assessment of Community Annotation with Ontologies (CACAO).
Experimental data about gene functions curated from the primary literature have enormous value for research scientists in understanding biology. Using the Gene Ontology (GO), manual curation by experts has provided an important resource for studying gene function, especially within model organisms. Unprecedented expansion of the scientific literature and validation of the predicted proteins have increased both data value and the challenges of keeping pace. Capturing literature-based functional annotations is limited by the ability of biocurators to handle the massive and rapidly growing scientific literature. Within the community-oriented wiki framework for GO annotation called the Gene Ontology Normal Usage Tracking System (GONUTS), we describe an approach to expand biocuration through crowdsourcing with undergraduates. This multiplies the number of high-quality annotations in international databases, enriches our coverage of the literature on normal gene function, and pushes the field in new directions. From an intercollegiate competition judged by experienced biocurators, Community Assessment of Community Annotation with Ontologies (CACAO), we have contributed nearly 5,000 literature-based annotations. Many of those annotations are to organisms not currently well-represented within GO. Over a 10-year history, our community contributors have spurred changes to the ontology not traditionally covered by professional biocurators. The CACAO principle of relying on community members to participate in and shape the future of biocuration in GO is a powerful and scalable model used to promote the scientific enterprise. It also provides undergraduate students with a unique and enriching introduction to critical reading of primary literature and acquisition of marketable skills
RNAcentral: a comprehensive database of non-coding RNA sequences: a comprehensive database of non-coding RNA sequences
RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. The website has been subject to continuous improvements focusing on text and sequence similarity searches as well as genome browsing functionality. All RNAcentral data is provided for free and is available for browsing, bulk downloads, and programmatic access at http://rnacentral.org/