17 research outputs found

    BLAST-EXPLORER helps you building datasets for phylogenetic analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The right sampling of homologous sequences for phylogenetic or molecular evolution analyses is a crucial step, the quality of which can have a significant impact on the final interpretation of the study. There is no single way for constructing datasets suitable for phylogenetic analysis, because this task intimately depends on the scientific question we want to address, Moreover, database mining softwares such as BLAST which are routinely used for searching homologous sequences are not specifically optimized for this task.</p> <p>Results</p> <p>To fill this gap, we designed BLAST-Explorer, an original and friendly web-based application that combines a BLAST search with a suite of tools that allows interactive, phylogenetic-oriented exploration of the BLAST results and flexible selection of homologous sequences among the BLAST hits. Once the selection of the BLAST hits is done using BLAST-Explorer, the corresponding sequence can be imported locally for external analysis or passed to the phylogenetic tree reconstruction pipelines available on the Phylogeny.fr platform.</p> <p>Conclusions</p> <p>BLAST-Explorer provides a simple, intuitive and interactive graphical representation of the BLAST results and allows selection and retrieving of the BLAST hit sequences based a wide range of criterions. Although BLAST-Explorer primarily aims at helping the construction of sequence datasets for further phylogenetic study, it can also be used as a standard BLAST server with enriched output. BLAST-Explorer is available at <url>http://www.phylogeny.fr</url></p

    PhyloSort: a user-friendly phylogenetic sorting tool and its application to estimating the cyanobacterial contribution to the nuclear genome of Chlamydomonas

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Phylogenomic pipelines generate a large collection of phylogenetic trees that require manual inspection to answer questions about gene or genome evolution. A notable application of phylogenomics is to photosynthetic organelle (plastid) endosymbiosis. In the case of primary endosymbiosis, a heterotrophic protist engulfed a cyanobacterium, giving rise to the first photosynthetic eukaryote. Plastid establishment precipitated extensive gene transfer from the endosymbiont to the nuclear genome of the 'host'. Estimating the magnitude of this endosymbiotic gene transfer (EGT) and determining the functions of the prokaryotic genes remain controversial issues. We used phylogenomics to study EGT in the model green alga <it>Chlamydomonas reinhardtii</it>. To facilitate this procedure, we developed PhyloSort to rapidly search large collection of trees for monophyletic relationships. Here we present PhyloSort and its application to estimating EGT in <it>Chlamydomonas</it>.</p> <p>Results</p> <p>PhyloSort is an open-source tool to sort phylogenetic trees by searching for user specified subtrees that contain a monophyletic group of interest defined by operational taxonomic units in a phylogenomic context. Using PhyloSort, we identified 897 <it>Chlamydomonas </it>genes of putative cyanobacterial origin, of which 531 had bootstrap support values ≥ 50% for the grouping of the algal and cyanobacterial homologs.</p> <p>Conclusion</p> <p>PhyloSort can be applied to quantify the number of genes that support different evolutionary hypotheses such as a taxonomic classification or endosymbiotic or horizontal gene transfer events. In our application, we demonstrate that cyanobacteria account for 3.5–6% of the protein-coding genes in the nuclear genome of <it>Chlamydomonas</it>.</p

    PPNID : a reference database and molecular identification pipeline for plant-parasitic nematodes

    Get PDF
    Motivation: The phylum Nematoda comprises the most cosmopolitan and abundant metazoans on Earth and plant-parasitic nematodes represent one of the most significant nematode groups, causing severe losses in agriculture. Practically, the demands for accurate nematode identification are high for ecological, agricultural, taxonomic and phylogenetic researches. Despite their importance, the morphological diagnosis is often a difficult task due to phenotypic plasticity and the absence of clear diagnostic characters while molecular identification is very difficult due to the problematic database and complex genetic background. Results: The present study attempts to make up for currently available databases by creating a manually-curated database including all up-to-date authentic barcoding sequences. To facilitate the laborious process associated with the interpretation and identification of a given query sequence, we developed an automatic software pipeline for rapid species identification. The incorporated alignment function facilitates the examination of mutation distribution and therefore also reveals nucleotide autapomorphies, which are important in species delimitation. The implementation of genetic distance, plot and maximum likelihood phylogeny analysis provides more powerful optimality criteria than similarity searching and facilitates species delimitation using evolutionary or phylogeny species concepts. The pipeline streamlines several functions to facilitate more precise data analyses, and the subsequent interpretation is easy and straightforward

    TARGeT: a web-based pipeline for retrieving and characterizing gene and transposable element families from genomic sequences

    Get PDF
    Gene families compose a large proportion of eukaryotic genomes. The rapidly expanding genomic sequence database provides a good opportunity to study gene family evolution and function. However, most gene family identification programs are restricted to searching protein databases where data are often lagging behind the genomic sequence data. Here, we report a user-friendly web-based pipeline, named TARGeT (Tree Analysis of Related Genes and Transposons), which uses either a DNA or amino acid ‘seed’ query to: (i) automatically identify and retrieve gene family homologs from a genomic database, (ii) characterize gene structure and (iii) perform phylogenetic analysis. Due to its high speed, TARGeT is also able to characterize very large gene families, including transposable elements (TEs). We evaluated TARGeT using well-annotated datasets, including the ascorbate peroxidase gene family of rice, maize and sorghum and several TE families in rice. In all cases, TARGeT rapidly recapitulated the known homologs and predicted new ones. We also demonstrated that TARGeT outperforms similar pipelines and has functionality that is not offered elsewhere

    Comparative gene expression in toxic versus non-toxic strains of the marine dinoflagellate Alexandrium minutum

    Get PDF
    Yang I, John U, Beszteri S, et al. Comparative gene expression in toxic versus non-toxic strains of the marine dinoflagellate Alexandrium minutum. BMC Genomics. 2010;11(1): 248.Background The dinoflagellate Alexandrium minutum typically produces paralytic shellfish poisoning (PSP) toxins, which are known only from cyanobacteria and dinoflagellates. While a PSP toxin gene cluster has recently been characterized in cyanobacteria, the genetic background of PSP toxin production in dinoflagellates remains elusive. Results We constructed and analysed an expressed sequence tag (EST) library of A. minutum, which contained 15,703 read sequences yielding a total of 4,320 unique expressed clusters. Of these clusters, 72% combined the forward-and reverse reads of at least one bacterial clone. This sequence resource was then used to construct an oligonucleotide microarray. We analysed the expression of all clusters in three different strains. While the cyanobacterial PSP toxin genes were not found among the A. minutum sequences, 192 genes were differentially expressed between toxic and non-toxic strains. Conclusions Based on this study and on the lack of identified PSP synthesis genes in the two existent Alexandrium tamarense EST libraries, we propose that the PSP toxin genes in dinoflagellates might be more different from their cyanobacterial counterparts than would be expected in the case of a recent gene transfer. As a starting point to identify possible PSP toxin-associated genes in dinoflagellates without relying on a priori sequence information, the sequences only present in mRNA pools of the toxic strain can be seen as putative candidates involved in toxin synthesis and regulation, or acclimation to intracellular PSP toxins

    Computational tools for viral metagenomics and their application in clinical research

    Get PDF
    AbstractThere are 100 times more virions than eukaryotic cells in a healthy human body. The characterization of human-associated viral communities in a non-pathological state and the detection of viral pathogens in cases of infection are essential for medical care and epidemic surveillance. Viral metagenomics, the sequenced-based analysis of the complete collection of viral genomes directly isolated from an organism or an ecosystem, bypasses the “single-organism-level” point of view of clinical diagnostics and thus the need to isolate and culture the targeted organism. The first part of this review is dedicated to a presentation of past research in viral metagenomics with an emphasis on human-associated viral communities (eukaryotic viruses and bacteriophages). In the second part, we review more precisely the computational challenges posed by the analysis of viral metagenomes, and we illustrate the problem of sequences that do not have homologs in public databases and the possible approaches to characterize them

    A software pipeline for processing and identification of fungal ITS sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Fungi from environmental samples are typically identified to species level through DNA sequencing of the nuclear ribosomal internal transcribed spacer (<it>ITS</it>) region for use in BLAST-based similarity searches in the International Nucleotide Sequence Databases. These searches are time-consuming and regularly require a significant amount of manual intervention and complementary analyses. We here present software – in the form of an identification pipeline for large sets of fungal <it>ITS </it>sequences – developed to automate the BLAST process and several additional analysis steps. The performance of the pipeline was evaluated on a dataset of 350 <it>ITS </it>sequences from fungi growing as epiphytes on building material.</p> <p>Results</p> <p>The pipeline was written in Perl and uses a local installation of NCBI-BLAST for the similarity searches of the query sequences. The variable subregion <it>ITS2 </it>of the <it>ITS </it>region is extracted from the sequences and used for additional searches of higher sensitivity. Multiple alignments of each query sequence and its closest matches are computed, and query sequences sharing at least 50% of their best matches are clustered to facilitate the evaluation of hypothetically conspecific groups. The pipeline proved to speed up the processing, as well as enhance the resolution, of the evaluation dataset considerably, and the fungi were found to belong chiefly to the <it>Ascomycota</it>, with <it>Penicillium </it>and <it>Aspergillus </it>as the two most common genera. The <it>ITS2 </it>was found to indicate a different taxonomic affiliation than did the complete <it>ITS </it>region for 10% of the query sequences, though this figure is likely to vary with the taxonomic scope of the query sequences.</p> <p>Conclusion</p> <p>The present software readily assigns large sets of fungal query sequences to their respective best matches in the international sequence databases and places them in a larger biological context. The output is highly structured to be easy to process, although it still needs to be inspected and possibly corrected for the impact of the incomplete and sometimes erroneously annotated fungal entries in these databases. The open source pipeline is available for UNIX-type platforms, and updated releases of the target database are made available biweekly. The pipeline is easily modified to operate on other molecular regions and organism groups.</p

    Experimental design and statistical rigor in phylogenomics of horizontal and endosymbiotic gene transfer

    Get PDF
    A growing number of phylogenomic investigations from diverse eukaryotes are examining conflicts among gene trees as evidence of horizontal gene transfer. If multiple foreign genes from the same eukaryotic lineage are found in a given genome, it is increasingly interpreted as concerted gene transfers during a cryptic endosymbiosis in the organism's evolutionary past, also known as "endosymbiotic gene transfer" or EGT. A number of provocative hypotheses of lost or serially replaced endosymbionts have been advanced; to date, however, these inferences largely have been post-hoc interpretations of genomic-wide conflicts among gene trees. With data sets as large and complex as eukaryotic genome sequences, it is critical to examine alternative explanations for intra-genome phylogenetic conflicts, particularly how much conflicting signal is expected from directional biases and statistical noise. The availability of genome-level data both permits and necessitates phylogenomics that test explicit, a priori predictions of horizontal gene transfer, using rigorous statistical methods and clearly defined experimental controls

    A reference guide for tree analysis and visualization

    Get PDF
    The quantities of data obtained by the new high-throughput technologies, such as microarrays or ChIP-Chip arrays, and the large-scale OMICS-approaches, such as genomics, proteomics and transcriptomics, are becoming vast. Sequencing technologies become cheaper and easier to use and, thus, large-scale evolutionary studies towards the origins of life for all species and their evolution becomes more and more challenging. Databases holding information about how data are related and how they are hierarchically organized expand rapidly. Clustering analysis is becoming more and more difficult to be applied on very large amounts of data since the results of these algorithms cannot be efficiently visualized. Most of the available visualization tools that are able to represent such hierarchies, project data in 2D and are lacking often the necessary user friendliness and interactivity. For example, the current phylogenetic tree visualization tools are not able to display easy to understand large scale trees with more than a few thousand nodes. In this study, we review tools that are currently available for the visualization of biological trees and analysis, mainly developed during the last decade. We describe the uniform and standard computer readable formats to represent tree hierarchies and we comment on the functionality and the limitations of these tools. We also discuss on how these tools can be developed further and should become integrated with various data sources. Here we focus on freely available software that offers to the users various tree-representation methodologies for biological data analysis
    corecore