10 research outputs found

    Resolving Taxonomic Names using Evidence Extracted from Text

    Get PDF
    Biological taxonomy is established on organism relationships with scientific names as the primary identifiers; however, resolving various taxonomic names remains one of the greatest challenges in taxonomy and systematic biology overall. We proposed an evidence-based approach that extracts trait (character) evidence from published literature to facilitate the comparison of taxonomic concepts. In this poster, we report an initial set of results from our first case study using the plant genus Rubus. The case study tested the entire pipeline of the Explorer of Taxon Concepts toolkit we have developed and revealed challenging phenomena to be solved in the near future

    A Natural Language Processing Pipeline to extract phenotypic data from formal taxonomic descriptions with a focus on flagellate plants

    Get PDF
    Assembling large-scale phenotypic datasets for evolutionary and biodiversity studies of plants can be extremely difficult and time consuming. New semi-automated Natural Language Processing (NLP) pipelines can extract phenotypic data from taxonomic descriptions, and their performance can be enhanced by incorporating information from ontologies, like the Plant Ontology (PO) and the Plant Trait Ontology (TO). These ontologies are powerful tools for comparing phenotypes across taxa for large-scale evolutionary and ecological analyses, but they are largely focused on terms associated with flowering plants. We describe a bottom-up approach to identify terms from flagellate plants (including bryophytes, lycophytes, ferns, and gymnosperms) that can be added to existing plant ontologies. We first parsed a large corpus of electronic taxonomic descriptions using the Explorer of Taxon Concepts tool (http://taxonconceptexplorer.org/) and identified flagellate plant specific terms that were missing from the existing ontologies. We extracted new structure and trait terms, and we are currently incorporating the missing structure terms to the PO and modifying the definitions of existing terms to expand their coverage to flagellate plants. We will incorporate trait terms to the TO in the near future

    Incentivising Use of Structured Language in Biological Descriptions: Author-Driven Phenotype Data and Ontology Production

    Get PDF
    Phenotypes are used for a multitude of purposes such as defining species, reconstructing phylogenies, diagnosing diseases or improving crop and animal productivity, but most of this phenotypic data is published in free-text narratives that are not computable. This means that the complex relationship between the genome, the environment and phenotypes is largely inaccessible to analysis and important questions related to the evolution of organisms, their diseases or their response to climate change cannot be fully addressed. It takes great effort to manually convert free-text narratives to a computable format before they can be used in large-scale analyses. We argue that this manual curation approach is not a sustainable solution to produce computable phenotypic data for three reasons: 1) it does not scale to all of biodiversity; 2) it does not stop the publication of free-text phenotypes that will continue to need manual curation in the future and, most importantly, 3) It does not solve the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other). Our empirical studies have shown that inter-curator variation is as high as 40% even within a single project. With this level of variation, it is difficult to imagine that data integrated from multiple curation projects can be of high quality. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardised vocabularies (ontologies). We argue that the authors describing phenotypes are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project’s semantics and ontology. This will speed up ontology development and improve the semantic clarity of phenotype descriptions from the moment of publication. A proof of concept project on this idea was funded by NSF ABI in July 2017. We seek readers input or critique of the proposed approaches to help achieve community-based computable phenotype data production in the near future. Results from this project will be accessible through https://biosemantics.github.io/author-driven-production

    The Spider Anatomy Ontology (SPD)—A Versatile Tool to Link Anatomy with Cross-Disciplinary Data

    Get PDF
    Spiders are a diverse group with a high eco-morphological diversity, which complicates anatomical descriptions especially with regard to its terminology. New terms are constantly proposed, and definitions and limits of anatomical concepts are regularly updated. Therefore, it is often challenging to find the correct terms, even for trained scientists, especially when the terminology has obstacles such as synonyms, disputed definitions, ambiguities, or homonyms. Here, we present the Spider Anatomy Ontology (SPD), which we developed combining the functionality of a glossary (a controlled defined vocabulary) with a network of formalized relations between terms that can be used to compute inferences. The SPD follows the guidelines of the Open Biomedical Ontologies and is available through the NCBO BioPortal (ver. 1.1). It constitutes of 757 valid terms and definitions, is rooted with the Common Anatomy Reference Ontology (CARO), and has cross references to other ontologies, especially of arthropods. The SPD offers a wealth of anatomical knowledge that can be used as a resource for any scientific study as, for example, to link images to phylogenetic datasets, compute structural complexity over phylogenies, and produce ancestral ontologies. By using a common reference in a standardized way, the SPD will help bridge diverse disciplines, such as genomics, taxonomy, systematics, evolution, ecology, and behavior.Fil: Ramirez, Martin Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Museo Argentino de Ciencias Naturales "Bernardino Rivadavia"; ArgentinaFil: Michalik, Peter. Ernst Moritz Arndt Universität Greifswald. Institut fur Geographie und Geologie; Alemani

    Challenges of comprehensive taxon sampling in comparative biology: Wrestling with rosids

    Full text link
    Using phylogenetic approaches to test hypotheses on a large scale, in terms of both species sampling and associated species traits and occurrence data—and doing this with rigor despite all the attendant challenges—is critical for addressing many broad questions in evolution and ecology. However, application of such approaches to empirical systems is hampered by a lingering series of theoretical and practical bottlenecks. The community is still wrestling with the challenges of how to develop species‐level, comprehensively sampled phylogenies and associated geographic and phenotypic resources that enable global‐scale analyses. We illustrate difficulties and opportunities using the rosids as a case study, arguing that assembly of biodiversity data that is scale‐appropriate—and therefore comprehensive and global in scope—is required to test global‐scale hypotheses. Synthesizing comprehensive biodiversity data sets in clades such as the rosids will be key to understanding the origin and present‐day evolutionary and ecological dynamics of the angiosperms.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/143800/1/ajb21059.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/143800/2/ajb21059_am.pd
    corecore