Search CORE

194 research outputs found

Phenex: Ontological Annotation of Phenotypic Diversity

Author: Cartik R. Kothari
Hilmar Lapp
James P. Balhoff
John G. Lundberg
Monte Westerfield
Paula Mabee
Peter E. Midford
Todd J. Vision
Wasila M. Dahdul
Publication venue
Publication date: 01/01/2010
Field of study

Phenex is a platform-independent desktop application designed to facilitate efficient and consistent annotation of phenotypic variation using Entity-Quality syntax, drawing on terms from community ontologies for anatomical entities, phenotypic qualities, and taxonomic names. Despite the centrality of the phenotype to so much of biology, traditions for communicating information about phenotypes are idiosyncratic to different disciplines. Phenotypes seem to elude standardized descriptions due to the variety of traits that compose them and the difficulty of capturing the complex forms and subtle differences among organisms that we can readily observe. Consequently, phenotypes are refractory to attempts at data integration that would allow computational analyses across studies and study systems. Phenex addresses this problem by allowing scientists to employ standard ontologies and syntax to link computable phenotype annotations to evolutionary character matrices, as well as to link taxa and specimens to ontological identifiers. Ontologies have become a foundational technology for establishing shared semantics, and, more generally, for capturing and computing with biological knowledge

Crossref

Directory of Open Access Journals

PubMed Central

Carolina Digital Repository

Nature Precedings

The Teleost Anatomy Ontology: Anatomical Representation for the Genomics Age

Author: Balhoff James P.
Dahdul Wasila M.
Haendel Melissa A.
Lapp Hilmar
Lundberg John G.
Mabee Paula M.
Midford Peter E.
Vision Todd J.
Westerfield Monte
Publication venue: 'Oxford University Press (OUP)'
Publication date: 15/04/2014
Field of study

The rich knowledge of morphological variation among organisms reported in the systematic literature has remained in free-text format, impractical for use in large-scale synthetic phylogenetic work. This noncomputable format has also precluded linkage to the large knowledgebase of genomic, genetic, developmental, and phenotype data in model organism databases. We have undertaken an effort to prototype a curated, ontology-based evolutionary morphology database that maps to these genetic databases (http://kb.phenoscape.org) to facilitate investigation into the mechanistic basis and evolution of phenotypic diversity. Among the first requirements in establishing this database was the development of a multispecies anatomy ontology with the goal of capturing anatomical data in a systematic and computable manner. An ontology is a formal representation of a set of concepts with defined relationships between those concepts. Multispecies anatomy ontologies in particular are an efficient way to represent the diversity of morphological structures in a clade of organisms, but they present challenges in their development relative to single-species anatomy ontologies. Here, we describe the Teleost Anatomy Ontology (TAO), a multispecies anatomy ontology for teleost fishes derived from the Zebrafish Anatomical Ontology (ZFA) for the purpose of annotating varying morphological features across species. To facilitate interoperability with other anatomy ontologies, TAO uses the Common Anatomy Reference Ontology as a template for its upper level nodes, and TAO and ZFA are synchronized, with zebrafish terms specified as subtypes of teleost terms. We found that the details of ontology architecture have ramifications for querying, and we present general challenges in developing a multispecies anatomy ontology, including refinement of definitions, taxon-specific relationships among terms, and representation of taxonomically variable developmental pathways.This work was supported by the National Science Foundation (NSF DBI 0641025), National Institutes of Health (HG002659), and the National Evolutionary Synthesis Center (NSF EF-0423641)

KU ScholarWorks

Toward Synthesizing Our Knowledge of Morphology: Using Ontologies and Machine Reasoning to Extract Presence/Absence Evolutionary Phenotypes across Studies

Author: Hilmar Lapp
James P. Balhoff
M. Mabee
Paula
T. Alexander Dececchi
Publication venue
Publication date: 01/01/2015
Field of study

The reality of larger and larger molecular databases and the need to integrate data scalably have presented a major challenge for the use of phenotypic data. Morphology is currently primarily described in discrete publications, entrenched in noncomputer readable text, and requires enormous investments of time and resources to integrate across large numbers of taxa and studies. Here we present a new methodology, using ontology-based reasoning systems working with the Phenoscape Knowledgebase (KB; kb.phenoscape.org), to automatically integrate large amounts of evolutionary character state descriptions into a synthetic character matrix of neomorphic (presence/absence) data. Using the KB, which includes more than 55 studies of sarcopterygian taxa, we generated a synthetic supermatrix of 639 variable characters scored for 1051 taxa, resulting in over 145,000 populated cells. Of these characters, over 76% were made variable through the addition of inferred presence/absence states derived by machine reasoning over the formal semantics of the source ontologies. Inferred data reduced the missing data in the variable character-subset from 98.5% to 78.2%. Machine reasoning also enables the isolation of conflicts in the data, that is, cells where both presence and absence are indicated; reports regarding conflicting data provenance can be generated automatically. Further, reasoning enables quantification and new visualizations of the data, here for example, allowing identification of character space that has been undersampled across the fin-to-limb transition. The approach and methods demonstrated here to compute synthetic presence/absence supermatrices are applicable to any taxonomic and phenotypic slice across the tree of life, providing the data are semantically annotated. Because such data can also be linked to model organism genetics through computational scoring of phenotypic similarity, they open a rich set of future research questions into phenotype-to-genome relationships

CiteSeerX

Crossref

ZENODO

Dryad Digital Repository (Duke University)

PubMed Central

Carolina Digital Repository

Electronic Archiving System

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Toward Synthesizing Our Knowledge of Morphology: Using Ontologies and Machine Reasoning to Extract Presence/Absence Evolutionary Phenotypes across Studies

Author: Balhoff James P.
Dececchi T. Alexander
Lapp Hilmar
Mabee Paula M.
Publication venue
Publication date: 01/01/2015
Field of study

Carolina Digital Repository

Phylotastic! Making Tree-of-Life Knowledge Accessible, Reusable and Convenient

Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great potential for a more generalized system that, starting with a query consisting of a list of any known species, would rectify non-standard names, identify expert phylogenies containing the implicated taxa, prune away unneeded parts, and supply branch lengths and annotations, resulting in a custom phylogeny suited to the user's needs. Such a system could become a sustainable community resource if implemented as a distributed system of loosely coupled parts that interact through clearly defined interfaces. Results: With the aim of building such a "phylotastic" system, the NESCent Hackathons, Interoperability, Phylogenies (HIP) working group recruited 2 dozen scientist-programmers to a weeklong programming hackathon in June 2012. During the hackathon (and a three-month follow-up period), 5 teams produced designs, implementations, documentation, presentations, and tests including: (1) a generalized scheme for integrating components; (2) proof-of-concept pruners and controllers; (3) a meta-API for taxonomic name resolution services; (4) a system for storing, finding, and retrieving phylogenies using semantic web technologies for data exchange, storage, and querying; (5) an innovative new service, DateLife.org, which synthesizes pre-computed, time-calibrated phylogenies to assign ages to nodes; and (6) demonstration projects. These outcomes are accessible via a public code repository (GitHub.com), a website (www.phylotastic.org), and a server image. Conclusions: Approximately 9 person-months of effort (centered on a software development hackathon) resulted in the design and implementation of proof-of-concept software for 4 core phylotastic components, 3 controllers, and 3 end-user demonstration tools. While these products have substantial limitations, they suggest considerable potential for a distributed system that makes phylogenetic knowledge readily accessible in computable form. Widespread use of phylotastic systems will create an electronic marketplace for sharing phylogenetic knowledge that will spur innovation in other areas of the ToL enterprise, such as annotation of sources and methods and third-party methods of quality assessment.NESCent (the National Evolutionary Synthesis Center)NSF EF-0905606iPlant Collaborative (NSF) DBI-0735191Biodiversity Synthesis Center (BioSync) of the Encyclopedia of LifeComputer Science

Crossref

Springer - Publisher Connector

PubMed Central

DukeSpace

eScholarship - University of California

The University of Arizona

Access to Research at National University of Ireland, Galway

Texas ScholarWorks

Phenoscape: Identifying Candidate Genes for Evolutionary Phenotypes

Author: Balhoff James P.
Dahdul Wasila M.
Dunham Rex A.
Eames B. Frank
Edmunds Richard C.
Lapp Hilmar
Lundberg John G.
Mabee Paula M.
Su Baofeng
Vision Todd J.
Westerfield Monte
Publication venue
Publication date: 24/10/2015
Field of study

Phenotypes resulting from mutations in genetic model organisms can help reveal candidate genes for evolutionarily important phenotypic changes in related taxa. Although testing candidate gene hypotheses experimentally in nonmodel organisms is typically difficult, ontology-driven information systems can help generate testable hypotheses about developmental processes in experimentally tractable organisms. Here, we tested candidate gene hypotheses suggested by expert use of the Phenoscape Knowledgebase, specifically looking for genes that are candidates responsible for evolutionarily interesting phenotypes in the ostariophysan fishes that bear resemblance to mutant phenotypes in zebrafish. For this, we searched ZFIN for genetic perturbations that result in either loss of basihyal element or loss of scales phenotypes, because these are the ancestral phenotypes observed in catfishes (Siluriformes). We tested the identified candidate genes by examining their endogenous expression patterns in the channel catfish, Ictalurus punctatus. The experimental results were consistent with the hypotheses that these features evolved through disruption in developmental pathways at, or upstream of, brpf1 and eda/edar for the ancestral losses of basihyal element and scales, respectively. These results demonstrate that ontological annotations of the phenotypic effects of genetic alterations in model organisms, when aggregated within a knowledgebase, can be used effectively to generate testable, and useful, hypotheses about evolutionary changes in morphology

ResearchOnline at James Cook University

PubMed Central

Carolina Digital Repository

eScholarship - University of California

Evolutionary Characters, Phenotypes and Ontologies: Curating Data from the Systematic Biology Literature

Author: Balhoff James P.
Dahdul Wasila M.
Engeman Jeffrey
Grande Terry
Hilton Eric J.
Kothari Cartik
Lapp Hilmar
Lundberg John G.
Mabee Paula M.
Midford Peter E.
Vision Todd J.
Westerfield Monte
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

BACKGROUND: The wealth of phenotypic descriptions documented in the published articles, monographs, and dissertations of phylogenetic systematics is traditionally reported in a free-text format, and it is therefore largely inaccessible for linkage to biological databases for genetics, development, and phenotypes, and difficult to manage for large-scale integrative work. The Phenoscape project aims to represent these complex and detailed descriptions with rich and formal semantics that are amenable to computation and integration with phenotype data from other fields of biology. This entails reconceptualizing the traditional free-text characters into the computable Entity-Quality (EQ) formalism using ontologies. METHODOLOGY/PRINCIPAL FINDINGS: We used ontologies and the EQ formalism to curate a collection of 47 phylogenetic studies on ostariophysan fishes (including catfishes, characins, minnows, knifefishes) and their relatives with the goal of integrating these complex phenotype descriptions with information from an existing model organism database (zebrafish, http://zfin.org). We developed a curation workflow for the collection of character, taxonomic and specimen data from these publications. A total of 4,617 phenotypic characters (10,512 states) for 3,449 taxa, primarily species, were curated into EQ formalism (for a total of 12,861 EQ statements) using anatomical and taxonomic terms from teleost-specific ontologies (Teleost Anatomy Ontology and Teleost Taxonomy Ontology) in combination with terms from a quality ontology (Phenotype and Trait Ontology). Standards and guidelines for consistently and accurately representing phenotypes were developed in response to the challenges that were evident from two annotation experiments and from feedback from curators. CONCLUSIONS/SIGNIFICANCE: The challenges we encountered and many of the curation standards and methods for improving consistency that we developed are generally applicable to any effort to represent phenotypes using ontologies. This is because an ontological representation of the detailed variations in phenotype, whether between mutant or wildtype, among individual humans, or across the diversity of species, requires a process by which a precise combination of terms from domain ontologies are selected and organized according to logical relations. The efficiencies that we have developed in this process will be useful for any attempt to annotate complex phenotypic descriptions using ontologies. We also discuss some ramifications of EQ representation for the domain of systematics

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

DukeSpace

Carolina Digital Repository

Annotation of phenotypic diversity: decoupling data curation and ontology curation using Phenex

Author: Balhoff James P
Dahdul Wasila M
Dececchi T
Lapp Hilmar
Mabee Paula M
Vision Todd J
Publication venue
Publication date: 01/01/2014
Field of study

BackgroundPhenex (http://phenex.phenoscape.org/) is a desktop application for semantically annotating the phenotypic character matrix datasets common in evolutionary biology. Since its initial publication, we have added new features that address several major bottlenecks in the efficiency of the phenotype curation process: allowing curators during the data curation phase to provisionally request terms that are not yet available from a relevant ontology; supporting quality control against annotation guidelines to reduce later manual review and revision; and enabling the sharing of files for collaboration among curators.ResultsWe decoupled data annotation from ontology development by creating an Ontology Request Broker (ORB) within Phenex. Curators can use the ORB to request a provisional term for use in data annotation; the provisional term can be automatically replaced with a permanent identifier once the term is added to an ontology. We added a set of annotation consistency checks to prevent common curation errors, reducing the need for later correction. We facilitated collaborative editing by improving the reliability of Phenex when used with online folder sharing services, via file change monitoring and continual autosave.ConclusionsWith the addition of these new features, and in particular the Ontology Request Broker, Phenex users have been able to focus more effectively on data annotation. Phenoscape curators using Phenex have reported a smoother annotation workflow, with much reduced interruptions from ontology maintenance and file management issues

Crossref

Springer - Publisher Connector

PubMed Central

Carolina Digital Repository

The vertebrate taxonomy ontology: a framework for reasoning across model organism and species phenotypes

Author: Balhoff James P.
Blackburn David C.
Dahdul Wasila M.
Dececchi Thomas Alex
Ibrahim Nizar
Lapp Hilmar
Lundberg John G.
Mabee Paula M.
Midford Peter E.
Sereno Paul C.
Vision Todd J.
Westerfield Monte
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/11/2013
Field of study

Background: A hierarchical taxonomy of organisms is a prerequisite for semantic integration of biodiversity data. Ideally, there would be a single, expansive, authoritative taxonomy that includes extinct and extant taxa, information on synonyms and common names, and monophyletic supraspecific taxa that reflect our current understanding of phylogenetic relationships. Description: As a step towards development of such a resource, and to enable large-scale integration of phenotypic data across vertebrates, we created the Vertebrate Taxonomy Ontology (VTO), a semantically defined taxonomic resource derived from the integration of existing taxonomic compilations, and freely distributed under a Creative Commons Zero (CC0) public domain waiver. The VTO includes both extant and extinct vertebrates and currently contains 106,947 taxonomic terms, 22 taxonomic ranks, 104,736 synonyms, and 162,400 cross-references to other taxonomic resources. Key challenges in constructing the VTO included (1) extracting and merging names, synonyms, and identifiers from heterogeneous sources; (2) structuring hierarchies of terms based on evolutionary relationships and the principle of monophyly; and (3) automating this process as much as possible to accommodate updates in source taxonomies. Conclusions: The VTO is the primary source of taxonomic information used by the Phenoscape Knowledgebase (http://phenoscape.org/ webcite), which integrates genetic and evolutionary phenotype data across both model and non-model vertebrates. The VTO is useful for inferring phenotypic changes on the vertebrate tree of life, which enables queries for candidate genes for various episodes in vertebrate evolution. Keywords: Data integration; Evolutionary biology; Paleontology; Taxonomic ran

KU ScholarWorks

PubMed Central

Carolina Digital Repository