13 research outputs found
Developing New Tools for the Old Tree of Life
Millions of species reside in the Tree of Life, making the task of resolving the evolutionary origin of many organisms difficult. Biologists draw on genetic and phenotypic information to sort the Tree of Life, but the study can be slow and complex. Phenomic data (such as cell shape, metabolism and ecology), particularly for microorganisms, is often found in scientific publications and has little digital presence outside of being scanned into an online database. This has been aided by a new text mining computer program, MicroPIE (Microbial Phenomics Information Extractor), that sifts through relevant phenomic data and creates a matrix of key phenomic characters taken from the published descriptions. MicroPIE utilizes multiple natural language processing tools to extract data, along with the knowledge of microbiologists to help with developing and verifying the tools. One major challenge to building such a tool is the time it takes to collect and edit phenomic data for tens of thousands of sentences needed to develop a functioning program. We have helped to further the development of MicroPIE to identify new characteristics by providing sentences from published microbial descriptions. We also are creating a “Gold Standard” matrix (GSM) of phenomic information for 100 different bacteria that can then be compared to the MicroPIE output in order to test that MicroPIE has correctly identified and extracted phenomic information. So far MicroPIE has shown potential to aid in resolution of the microbial Tree of Life
Resolving Taxonomic Names using Evidence Extracted from Text
Biological taxonomy is established on organism relationships with scientific names as the primary identifiers; however, resolving various taxonomic names remains one of the greatest challenges in taxonomy and systematic biology overall. We proposed an evidence-based approach that extracts trait (character) evidence from published literature to facilitate the comparison of taxonomic concepts. In this poster, we report an initial set of results from our first case study using the plant genus Rubus. The case study tested the entire pipeline of the Explorer of Taxon Concepts toolkit we have developed and revealed challenging phenomena to be solved in the near future
Un enfoque semiautomático de extracción de conocimiento sobre biodiversidad a partir de descripciones textuales de especies botánicas
Reporte final del proyecto. Código del Proyecto: 5402-1375-4301Este documento describe el estado final del proyecto. Primero se introduce la gran necesidad que se tiene de poder acceder a información textual sobre biodiversidad de una manera más estructurada y semánticamente más significativa. Luego se recapitulan los principales enfoques que han sido usados para enfrentar dicho problema. Se enfatizan los enfoques que se refieren a la estructuración de descripciones morfológicas y de distribuciones geográficas, por ser estas las áreas de interés principal del proyecto. A continuación se presenta en detalle la organización del proyecto y sus tres etapas principales: recolección y transformación de documentos fuentes, estructuración semántica de fragmentos de texto de interés, y finalmente, desarrollo de herramientas para aprovechar la información estructurada. Luego se presentan los resultados obtenidos por el proyecto: resultados y evaluaciones obtenidos en la estructuración semántica de descripciones morfológicas y distribuciones geográficas, así como el estado final de las herramientas desarrolladas para pre procesamiento de los documentos originales y para la consulta de fragmentos de texto estructurados semánticamente. Después de presentar los resultados se hace una comparación entre los diferentes objetivos planteados por el proyecto y los resultados obtenidos. Finalmente se hacen una serie de recomendaciones para que futuros proyectos aprovechen los estudios y herramientas producidos por este proyecto
Reasoning over Taxonomic Change: Exploring Alignments for the Perelleschus Use Case
Classifications and phylogenetic inferences of organismal groups change in
light of new insights. Over time these changes can result in an imperfect
tracking of taxonomic perspectives through the re-/use of Code-compliant or
informal names. To mitigate these limitations, we introduce a novel approach
for aligning taxonomies through the interaction of human experts and logic
reasoners. We explore the performance of this approach with the Perelleschus
use case of Franz & Cardona-Duque (2013). The use case includes six taxonomies
published from 1936 to 2013, 54 taxonomic concepts (i.e., circumscriptions of
names individuated according to their respective source publications), and 75
expert-asserted Region Connection Calculus articulations (e.g., congruence,
proper inclusion, overlap, or exclusion). An Open Source reasoning toolkit is
used to analyze 13 paired Perelleschus taxonomy alignments under heterogeneous
constraints and interpretations. The reasoning workflow optimizes the logical
consistency and expressiveness of the input and infers the set of maximally
informative relations among the entailed taxonomic concepts. The latter are
then used to produce merge visualizations that represent all congruent and
non-congruent taxonomic elements among the aligned input trees. In this small
use case with 6-53 input concepts per alignment, the information gained through
the reasoning process is on average one order of magnitude greater than in the
input. The approach offers scalable solutions for tracking provenance among
succeeding taxonomic perspectives that may have differential biases in naming
conventions, phylogenetic resolution, ingroup and outgroup sampling, or
ostensive (member-referencing) versus intensional (property-referencing)
concepts and articulations.Comment: 30 pages, 16 figure