68 research outputs found

    Going nuclear: gene family evolution and vertebrate phylogeny reconciled

    Get PDF
    Gene duplications have been common throughout vertebrate evolution, introducing paralogy and so complicating phylogenctic inference from nuclear genes. Reconciled trees are one method capable of dealing with paralogy, using the relationship between a gene phylogeny and the phylogeny of the organisms containing those genes to identify gene duplication events. This allows us to infer phylogenies from gene families containing both orthologous and paralogous copies. Vertebrate phylogeny is well understood from morphological and palaeontological data, but studies using mitochondrial sequence data have failed to reproduce this classical view. Reconciled tree analysis of a database of 118 vertebrate gene families supports a largely classical vertebrate phylogeny

    DNA barcoding and taxonomy: dark taxa and dark texts

    Get PDF
    Both classical taxonomy and DNA barcoding are engaged in the task of digitizing the living world.Much of the taxonomic literature remains undigitized. The rise of open access publishing this century and the freeing of older literature from the shackles of copyright have greatly increased the online availability of taxonomic descriptions, but much of the literature of the mid- to late- twentieth century remains offline (‘dark texts’). DNA barcoding is generating a wealth of computable data that in many ways are much easier to work with than classical taxonomic descriptions, but many of the sequences are not identified to species level. These ‘dark taxa’ hamper the classical method of integrating biodiversity data, using shared taxonomic names. Voucher specimens are a potential common currency of both the taxonomic literature and sequence databases, and could be used to help link names, literature and sequences. An obstacle to this approach is the lack of stable, resolvable specimen identifiers. The paper concludes with an appeal for a global ‘digital dashboard’ to assess the extent to which biodiversity data are available online. This article is part of the themed issue ‘From DNA barcodes to biomes’

    Wikipedia as an encyclopaedia of life

    Get PDF
    In his 2003 essay E O Wilson outlined his vision for an “encyclopaedia of life” comprising “an electronic page for each species of organism on Earth”, each page containing “the scientific name of the species, a pictorial or genomic presentation of the primary type specimen on which its name is based, and a summary of its diagnostic traits.” Although the “quiet revolution” in biodiversity informatics has generated numerous online resources, including some directly inspired by Wilson's essay (e.g., "http://ispecies.org":http://ispecies.org, "http://www.eol.org":http://www.eol.org), we are still some way from the goal of having available online all relevant information about a species, such as its taxonomy, evolutionary history, genomics, morphology, ecology, and behaviour. While the biodiversity community has been developing a plethora of databases, some with overlapping goals and duplicated content, Wikipedia has been slowly growing to the point where it now has over 100,000 pages on biological taxa. My goal in this essay is to explore the idea that, largely independent of the efforts of biodiversity informatics and well-funded international efforts, Wikipedia ("http://en.wikipedia.org/wiki/Main_Page":http://en.wikipedia.org/wiki/Main_Page) has emerged as potentially the best platform for fulfilling E O Wilson’s vision

    GA4GH: International policies and standards for data sharing across genomic research and healthcare.

    Get PDF
    The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits

    Enhanced display of scientific articles using extended metadata

    No full text
    Although the Web has transformed science publishing, scientific papers themselves are still essentially "black boxes", with much of their content intended for human readers only. Typically, computer-readable metadata associated with an article is limited to bibliographic details. By expanding article metadata to include taxonomic names, identifiers for cited material (e.g., publications, sequences, specimens, and other data), and geographical coordinates, publishers could greatly increase the scientific value of their digital content. At the same time this will provide novel ways for users to discover and navigate through this content, beyond the relatively limited linkage provided by bibliographic citation.

As a proof of concept, my entry in the Elsevier Grand Challenge extracted extended metadata from a set of articles from the journal _Molecular Phylogeny and Evolution_ and used it to populate a entity-attribute-value database. A simple web interface to this database enables an enhanced display of the content of an article, including a map of localities mentioned either explicitly or implicitly (through links to geotagged data), taxonomic coverage, and both data and citation links. Metadata extraction was limited to information listed in tables in the articles (such as GenBank sequences and specimen codes), the body of the article wasn't used. This restriction was deliberate, in order to demonstrate that making extended metadata available doesn't require a journal's publisher to make the full-text freely available (although this is desirable for other reasons)

    New Zealand and the new biogeography

    No full text
    New Zealand is both a source of biogeographic problems, and of biogeographic ideas. The efforts of biogeographers to grapple with the implications of the revolution in the earth sciences are described. Hennig’s work on New Zealand flies and Croizat’s critique of biogeography are discussed. The ideas of these two biologists have found their fullest expression in recent work by biologists in New Zealand and New York

    Liberating links between datasets using lightweight data publishing: an example using plant names and the taxonomic literature

    Full text link
    AbstractConstructing a biodiversity knowledge graph will require making millions of cross links between diversity entities in different datasets. Researchers trying to bootstrap the growth of the biodiversity knowledge graph by constructing databases of links between these entities lack obvious ways to publish these sets of links. One appealing and lightweight approach is to create a “datasette”, a database that is wrapped together with a simple web server that enables users to query the data. Datasettes can be packaged into Docker containers and hosted online with minimal effort. This approach is illustrated using a dataset of links between globally unique identifiers for plant taxonomic names, and identifiers for the taxonomic articles that published those names.</jats:p

    Reconciling author names in taxonomic and publication databases

    Full text link
    AbstractTaxonomic names remain fundamental to linking biodiversity data, but information on these names resides in separate silos. Despite often making their contents available in RDF, records in these taxonomic databases are rarely linked to identifiers in external databases, such as DOIs for publications, or ORCIDs for people. This paper explores how author names in publication databases such as CrossRef and ORCID can be reconciled with author names in a taxonomic database using existing vocabularies and SPARQL queries.</jats:p

    Wikidata and the bibliography of life

    Get PDF
    iological taxonomy rests on a long tail of publications spanning nearly three centuries. Not only is this literature vital to resolving disputes about taxonomy and nomenclature, for many species it represents a key source—indeed sometimes the only source—of information about that species. Unlike other disciplines such as biomedicine, the taxonomic community lacks a centralised, curated literature database (the “bibliography of life”). This article argues that Wikidata can be that database as it has flexible and sophisticated models of bibliographic information, and an active community of people and programs (“bots”) adding, editing, and curating that information
    corecore