23 research outputs found

    Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm

    Get PDF
    Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory.https://doi.org/10.1186/s12859-015-0625-xhttps://doi.org/10.1186/s12859-015-0690-

    Directional auxin transport mechanisms in early diverging land plants

    Get PDF
    The emergence and radiation of multicellular land plants was driven by crucial innovations to their body plans [1]. The directional transport of the phytohormone auxin represents a key, plant-specific mechanism for polarization and patterning in complex seed plants [2, 3, 4 and 5]. Here, we show that already in the early diverging land plant lineage, as exemplified by the moss Physcomitrella patens, auxin transport by PIN transporters is operational and diversified into ER-localized and plasma membrane-localized PIN proteins. Gain-of-function and loss-of-function analyses revealed that PIN-dependent intercellular auxin transport in Physcomitrella mediates crucial developmental transitions in tip-growing filaments and waves of polarization and differentiation in leaf-like structures. Plasma membrane PIN proteins localize in a polar manner to the tips of moss filaments, revealing an unexpected relation between polarization mechanisms in moss tip-growing cells and multicellular tissues of seed plants. Our results trace the origins of polarization and auxin-mediated patterning mechanisms and highlight the crucial role of polarized auxin transport during the evolution of multicellular land plants

    New phylogenetic hypotheses for the core Chlorophyta based on chloroplast sequence data

    Get PDF
    Phylogenetic relationships in the green algal phylum Chlorophyta have long been subject to debate, especially at higher taxonomic ranks (order, class). The relationships among three traditionally defined and well-studied classes, Chlorophyceae, Trebouxiophyceae, and Ulvophyceae are of particular interest, as these groups are species-rich and ecologically important worldwide. Different phylogenetic hypotheses have been proposed over the past two decades and the monophyly of the individual classes has been disputed on occasion. Our study seeks to test these hypotheses by combining high throughput sequencing data from the chloroplast genome with increased taxon sampling. Our results suggest that while many of the deep relationships are still problematic to resolve, the classes Trebouxiophyceae and Ulvophyceae are likely not monophyletic as currently defined. Our results also support relationships among several trebouxiophycean taxa that were previously unresolved. Finally, we propose that the common term for the grouping of the three classes, “UTC clade,” be replaced with the term “core Chlorophyta” for the well-supported clade containing Chlorophyceae, taxa belonging to Ulvophyceae and Trebouxiophyceae, and the classes Chlorodendrophyceae and Pedinophyceae

    Unlocking collections: New records of Lepidoziaceae (Marchantiophyta) for the islands of Fiji

    Get PDF
    It is clearly evident that the bryophyte flora of the islands of Fiji remains inadequately documented. Here, five liverwort species of Lepidoziaceae are reported as new to the Republic of Fiji: Lepidozia haskarliana, Neolepidozia cuneifolia, N. wallichiana, Telaranea major and Tricholepidozia melanesica

    Building the Australian National Species List

    No full text
    The Australian National Species List (AuNSL) is a unified, nationally accepted, taxonomy for the native and naturalised biota of Australia. It is derived from a set of taxon-focussed resources including the Australian Plant Name Index and Australian Plant Census, the Australian Faunal Directory, and similar lists of fungi, lichens and bryophytes. These resources share a common infrastructure, contribute to the single national taxonomy (AuNSL), but retain their independent curation practices and online presentation. The AuNSL is now the core national infrastructure providing names and taxonomy for significant biodiversity data infrastructures including the Atlas of Living Australia, the Terrestrial Ecosystem Research Network, the Biodiversity Data Repository, and the Species Profile and Threats Database.As the go-to resource for names and taxonomy for Australia’s unique biodiversity, the AuNSL must be constantly updated to reflect taxonomic and nomenclatural change. For some taxonomic groups, the AuNSL is substantially complete, and the incorporation of new taxa and other novelties occurs with little time lag. For other taxonomic groups the data are patchy and updates sporadic. Like similar projects, the AuNSL would benefit from improvements to taxonomic data publishing and sharing. Such improvements have the potential to enable automated, real-time ingestion for new taxonomic and nomenclatural data, allowing curator time to be re-directed to backfilling the historical data from a dispersed and complex literature. Ideally, the AuNSL will be able to benefit from advances in automated approaches to processing the historical data, including via the sharing of standardised representations of such data.Here we outline the AuNSL data model, editor functionality, and describe our approach to sharing our data via existing and emerging standards such as Darwin Core and Taxon Concept Schema (TCS2). We then describe what we, as consumers of taxonomic data from published works, really need from publishers of new, and reprocessed historical data. In brief, we need structured taxonomic data conforming to an adequate standard

    The Evolutionary Origin of a Terrestrial Flora

    Get PDF
    Life on Earth as we know it would not be possible without the evolution of plants, and without the transition of plants to live on land. Land plants (also known as embryophytes) are a monophyletic lineage embedded within the green algae. Green algae as a whole are among the oldest eukaryotic lineages documented in the fossil record, and are well over a billion years old, while land plants are about 450–500 million years old. Much of green algal diversification took place before the origin of land plants, and the land plants are unambiguously members of a strictly freshwater lineage, the charophyte green algae. Contrary to single-gene and morphological analyses, genome-scale phylogenetic analyses indicate the sister taxon of land plants to be the Zygnematophyceae, a group of mostly unbranched filamentous or single-celled organisms. Indeed, several charophyte green algae have historically been used as model systems for certain problems, but often without a recognition of the specific phylogenetic relationships among land plants and (other) charophyte green algae. Insight into the phylogenetic and genomic properties of charophyte green algae opens up new opportunities to study key properties of land plants in closely related model. This review will outline the transition from single-celled algae to modern-day land plants, and will highlight the bright promise studying the charophyte green algae holds for better understanding plant evolution

    Australian National Species List: Name Identifier Management and Linkages

    No full text
    The Australian National Species List (AuNSL) is the provider of names and taxonomy for significant national biodiversity data infrastructures including the Atlas of Living Australia, the Terrestrial Ecosystem Research Network, the Biodiversity Data Repository, and the Species Profile and Threats Database. The AuNSL mints persistent identifiers for names covered by the codes of nomenclature and name-like objects such as phrase names. To ensure sustainability of identifiers, a mapping service is provided to always resolve all AuNSL identifiers including historical and deprecated forms. Names are used as the building blocks for recording taxon name usages and taxon concepts. We provide services for matching, disambiguation and taxonomic resolution of names.The AuNSL does not exist in a vacuum and supports identifier mappings to external resources and related systems such as the International Plant Name Index (IPNI), Zoobank, the Biodiversity Heritage Library. To enable this integration, persistent identifiers from original and/or significant sources are required and this data is currently limited and incomplete within the AuNSL. To address this issue, we need to look backwards for improved ways of matching to existing persistent identifiers and forward to improving capture of taxonomic novelties and name-like objects and their identifiers

    The Australian National Species List steps up for biodiversity conservation

    No full text
    The Australian National Species List (AuNSL) will bring together authoritative national taxonomies and supporting nomenclatural data for flora, fauna, fungi and algae. These data enable biodiversity infrastructures, such as the Atlas of Living Australia, to store information about taxa against a standard, permanently resolvable taxonomy, and to create linkages between them. The Biodiversity Data Repository (BDR) is a new Australian government infrastructure supporting environmental assessments under Australia's Environment Protection and Biodiversity Conservation Act. The BDR will bring together biodiversity data assets from a range of sources and enable decision making to be based on a more holistic view of taxa in Australia. To achieve its objectives, the BDR requires the ability to align disparate taxonomies to a national standard. To support the BDR we have restructured and consolidated the constituent datasets of the AuNSL and added a GraphQL interface from which we can provide machine access to the full suite of taxonomic and nomenclatural entities in the data and the linkages between them. Currently we provide a simple name check service for aligning supplied names with the standard AuNSL taxonomy and a full taxonomic backbone extract as a SKOS concept scheme expressed in JSON-LD. We are working towards an expanded set of taxonomic and nomenclatural services and a more generalised and complete graph of the AuNSL taxonomy, which can serve an increasing number of initiatives including the Australian Traits database (AusTraits), the Australian National DNA Library, and sensitive species data exchange mechanisms

    Green algal transcriptomes for phylogenetics and comparative genomics

    No full text
    <div><p><i></i></p><div><p><em>For technical details please see the README</em></p> <p><strong>Green algal transcriptomes for phylogenetics and comparative genomics.</strong></p> <p>The incredible diversity of green plants traces its origins to a single, unicellular common ancestor. The Green Algal Tree of Life project (www.gratol.org) aims to reconstruct the evolutionary history of these globally important organisms.</p> <p>Here we provide the results of our transcriptome sequencing initiative. The 32 (from 31 species) high-coverage transcriptome assemblies, consisting of >2.7 million contigs with a mean length ~1.3kb, provides a rich resource for phylogenetics and evolutionary comparative genomics. Our focus on high-coverage sequencing means that these transcriptomes include a larger fraction of the expressed genome, and more full-length coding sequences, than other previous and contemporary sequencing efforts.</p> <p>The recovery of a full-length homolog of the Arabidopsis BIG gene (a huge 5098 amino acid protein) provides an example of the completeness of these transcriptomes. With a dataset of over 1600 orthologs (<10% missing sequences) we are identifying and isolating disparate evolutionary signals, and reconstructing deep phylogenetic relationships with unprecedented confidence. Transcriptome wide comparative analyses are revealing broad scale patterns of gene content evolution and implicating recurrent whole genome duplications in major evolutionary transitions. Detailed analyses of key molecular genetic mechanisms are providing insights into the origins and evolution of: developmental regulation of three-dimensional growth, plant hormones, photorespiration, and desiccation tolerance.</p> <p><strong><em>At present no publication is available for citation purposes. We therefore ask that all users of this data cite it using the figshare DOI(s) as appropriate. In addition, until we have can provide a publication for citation purposes we ask that for large scale analyses of these data, users contact us to discuss their plans and thereby avoid conflict or overlap.</em></strong></p> <p><strong>PLEASE CITE:</strong><br>Cooper, Endymion; Delwiche, Charles (2016): Green algal transcriptomes for phylogenetics and comparative genomics. figshare.<a href="https://dx.doi.org/10.6084/m9.figshare.1604778">https://dx.doi.org/10.6084/m9.figshare.1604778</a></p> <p><em>For additional information please contact Professor Charles Delwiche (</em>[email protected]<em>).</em></p> <div>​</div></div><p><i></i></p></div
    corecore