Background: A hierarchical taxonomy of organisms is a prerequisite for semantic integration of biodiversity data. Ideally, there would be a single, expansive, authoritative taxonomy that includes extinct and extant taxa, information on synonyms and common names, and monophyletic supraspecific taxa that reflect our current understanding of phylogenetic relationships.
Description: As a step towards development of such a resource, and to enable large-scale integration of phenotypic data across vertebrates, we created the Vertebrate Taxonomy Ontology (VTO), a semantically defined taxonomic resource derived from the integration of existing taxonomic compilations, and freely distributed under a Creative Commons Zero (CC0) public domain waiver. The VTO includes both extant and extinct vertebrates and currently contains 106,947 taxonomic terms, 22 taxonomic ranks, 104,736 synonyms, and 162,400 cross-references to other taxonomic resources. Key challenges in constructing the VTO included (1) extracting and merging names, synonyms, and identifiers from heterogeneous sources; (2) structuring hierarchies of terms based on evolutionary relationships and the principle of monophyly; and (3) automating this process as much as possible to accommodate updates in source taxonomies.
Conclusions: The VTO is the primary source of taxonomic information used by the Phenoscape Knowledgebase (http://phenoscape.org/ webcite), which integrates genetic and evolutionary phenotype data across both model and non-model vertebrates. The VTO is useful for inferring phenotypic changes on the vertebrate tree of life, which enables queries for candidate genes for various episodes in vertebrate evolution.
Keywords: Data integration; Evolutionary biology; Paleontology; Taxonomic ran