23 research outputs found

    DNA barcoding and taxonomy: dark taxa and dark texts

    Get PDF
    Both classical taxonomy and DNA barcoding are engaged in the task of digitizing the living world. Much of the taxonomic literature remains undigitized. The rise of open access publishing this century and the freeing of older literature from the shackles of copyright have greatly increased the online availability of taxonomic descriptions, but much of the literature of the mid- to late-twentieth century remains offline (‘dark texts’). DNA barcoding is generating a wealth of computable data that in many ways are much easier to work with than classical taxonomic descriptions, but many of the sequences are not identified to species level. These ‘dark taxa’ hamper the classical method of integrating biodiversity data, using shared taxonomic names. Voucher specimens are a potential common currency of both the taxonomic literature and sequence databases, and could be used to help link names, literature and sequences. An obstacle to this approach is the lack of stable, resolvable specimen identifiers. The paper concludes with an appeal for a global ‘digital dashboard’ to assess the extent to which biodiversity data are available online. This article is part of the themed issue ‘From DNA barcodes to biomes’

    Ozymandias: a biodiversity knowledge graph

    Get PDF
    Enormous quantities of biodiversity data are being made available online, but much of this data remains isolated in silos. One approach to breaking these silos is to map local, often database-specific identifiers to shared global identifiers. This mapping can then be used to construct a knowledge graph, where entities such as taxa, publications, people, places, specimens, sequences, and institutions are all part of a single, shared knowledge space. Motivated by the 2018 GBIF Ebbe Nielsen Challenge I explore the feasibility of constructing a “biodiversity knowledge graph” for the Australian fauna. The data cleaning and reconciliation steps involved in constructing the knowledge graph are described in detail. Examples are given of its application to understanding changes in patterns of taxonomic publication over time. A web interface to the knowledge graph (called “Ozymandias”) is available at https://ozymandias-demo.herokuapp.com

    An interactive DNA barcode browser

    Get PDF
    This paper describes an interactive web application to display DNA barcode data. It supports both query by sequence and query by geographic area. By using n-gram indexing of DNA sequences, and alignment-free phylogeny construction, the user can interactively explore DNA barcode data in real time

    Training and hackathon on building biodiversity knowledge graphs

    Get PDF
    Knowledge graphs have the potential to unite disconnected digitized biodiversity data, and there are a number of efforts underway to build biodiversity knowledge graphs. More generally, the recent popularity of knowledge graphs, driven in part by the advent and success of the Google Knowledge Graph, has breathed life into the ongoing development of semantic web infrastructure and prototypes in the biodiversity informatics community. We describe a one week training event and hackathon that focused on applying three specific knowledge graph technologies – the Neptune graph database; Metaphactory; and Wikidata - to a diverse set of biodiversity use cases. We give an overview of the training, the projects that were advanced throughout the week, and the critical discussions that emerged. We believe that the main barriers towards adoption of biodiversity knowledge graphs are the lack of understanding of knowledge graphs and the lack of adoption of shared unique identifiers. Furthermore, we believe an important advancement in the outlook of knowledge graph development is the emergence of Wikidata as an identifier broker and as a scoping tool. To remedy the current barriers towards biodiversity knowledge graph development, we recommend continued discussions at workshops and at conferences, which we expect to increase awareness and adoption of knowledge graph technologies

    People are essential to linking biodiversity data

    Get PDF
    People are one of the best known and most stable entities in the biodiversity knowledge graph. The wealth of public information associated with people and the ability to identify them uniquely open up the possibility to make more use of these data in biodiversity science. Person data are almost always associated with entities such as specimens, molecular sequences, taxonomic names, observations, images, traits and publications. For example, the digitization and the aggregation of specimen data from museums and herbaria allow us to view a scientist’s specimen collecting in conjunction with the whole corpus of their works. However, the metadata of these entities are also useful in validating data, integrating data across collections and institutional databases and can be the basis of future research into biodiversity and science. In addition, the ability to reliably credit collectors for their work has the potential to change the incentive structure to promote improved curation and maintenance of natural history collections

    Phyloinformatics: toward a phylogenetic database

    No full text
    Much of the interest in the “tree of life” is motivated by the notion that we can make much more meaningful use of biological information if we query the information in a phylogenetic framework. Assembling the tree of life raises numerous computational and data management issues. Biologists are generating large numbers of evolutionary trees (phylogenies). In contrast to sequence data, very few phylogenies (and the data from which they were derived) are stored in publicly accessible databases. Part of the reason is the need to develop new methods for storing, querying, and visualizing trees. This chapter explores some of these issues; it discusses some prototypes with a view to determining how far phylogenetics is toward its goal of a phylogenetic database

    Taxonomy, supertrees, and the tree of life

    No full text
    Some of the main practical impediments to the application of supertrees in large-scale phylogenetic analysis are inconsistent use of taxonomic names, trees incorporating taxa of different ranks, and poor taxonomic overlap between different phylogenetic studies. This chapter considers these problems and suggests some solutions. The notion of a “classification graph” is introduced to test for consistency between higher-level classifications. One strategy for coping with poor taxonomic overlap is to use a constraint tree that specifies some taxonomic groups that must appear in the supertree

    Cospeciation

    No full text
    Cospeciation is joint speciation of both host and parasite, resulting in host and parasite phylogenies being mirror images of each other

    Rates and patterns of gene duplication and loss in the human genome

    No full text
    Gene duplication has certainly played a major role in structuring vertebrate genomes but the extent and nature of the duplication events involved remains controversial. A recent study identified two major episodes of gene duplication: one episode of putative genome duplication ca. 500 Myr ago and a more recent gene-family expansion attributed to segmental or tandem duplications. We confirm this pattern using methods not reliant on molecular clocks for individual gene families. However, analysis of a simple model of the birth–death process suggests that the apparent recent episode of duplication is an artefact of the birth–death process. We show that a constant-rate birth–death model is appropriate for gene duplication data, allowing us to estimate the rate of gene duplication and loss in the vertebrate genome over the last 200 Myr (0.00115 and 0.00740 Myr(−1) lineage(−1), respectively). Finally, we show that increasing rates of gene loss reduce the impact of a genome-wide duplication event on the distribution of gene duplications through time

    Tangled tales from multiple markers: reconciling conflict between phylogenies to build molecular supertrees

    No full text
    Supertree methods combine information from multiple phylogenies into a larger, composite phylogeny. When there is no disagreement between the source phylogenies, constructing the supertree is straightforward. But in the (nearly universal) presence of disagreement between source trees, supertree methods seek to either represent or resolve this conflict. Existing supertree methods that resolve conflict between source trees do so in an ad hoc way. Gene tree parsimony is a supertree method that can combine molecular phylogenies for overlapping taxon sets and interprets conflict between these phylogenies in a biologically meaningful way. We review the method and discuss the relationship between gene tree parsimony and other supertree methods. Finally, we suggest that a better understanding of the causes of conflict between source trees should lead to appropriate ways of resolving this conflict when constructing supertrees
    corecore