227 research outputs found

    Shortest triplet clustering: reconstructing large phylogenies using representative sets

    Get PDF
    BACKGROUND: Understanding the evolutionary relationships among species based on their genetic information is one of the primary objectives in phylogenetic analysis. Reconstructing phylogenies for large data sets is still a challenging task in Bioinformatics. RESULTS: We propose a new distance-based clustering method, the shortest triplet clustering algorithm (STC), to reconstruct phylogenies. The main idea is the introduction of a natural definition of so-called k-representative sets. Based on k-representative sets, shortest triplets are reconstructed and serve as building blocks for the STC algorithm to agglomerate sequences for tree reconstruction in O(n(2)) time for n sequences. Simulations show that STC gives better topological accuracy than other tested methods that also build a first starting tree. STC appears as a very good method to start the tree reconstruction. However, all tested methods give similar results if balanced nearest neighbor interchange (BNNI) is applied as a post-processing step. BNNI leads to an improvement in all instances. The program is available at . CONCLUSION: The results demonstrate that the new approach efficiently reconstructs phylogenies for large data sets. We found that BNNI boosts the topological accuracy of all methods including STC, therefore, one should use BNNI as a post-processing step to get better topological accuracy

    TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics

    Get PDF
    BACKGROUND: Most analysis programs for inferring molecular phylogenies are difficult to use, in particular for researchers with little programming experience. RESULTS: TREEFINDER is an easy-to-use integrative platform-independent analysis environment for molecular phylogenetics. In this paper the main features of TREEFINDER (version of April 2004) are described. TREEFINDER is written in ANSI C and Java and implements powerful statistical approaches for inferring gene tree and related analyzes. In addition, it provides a user-friendly graphical interface and a phylogenetic programming language. CONCLUSIONS: TREEFINDER is a versatile framework for analyzing phylogenetic data across different platforms that is suited both for exploratory as well as advanced studies

    FACT: Functional annotation transfer between proteins with similar feature architectures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The increasing number of sequenced genomes provides the basis for exploring the genetic and functional diversity within the tree of life. Only a tiny fraction of the encoded proteins undergoes a thorough experimental characterization. For the remainder, bioinformatics annotation tools are the only means to infer their function. Exploiting significant sequence similarities to already characterized proteins, commonly taken as evidence for homology, is the prevalent method to deduce functional equivalence. Such methods fail when homologs are too diverged, or when they have assumed a different function. Finally, due to convergent evolution, functional equivalence is not necessarily linked to common ancestry. Therefore complementary approaches are required to identify functional equivalents.</p> <p>Results</p> <p>We present the <b>F</b>eature <b>A</b>rchitecture <b>C</b>omparison <b>T</b>ool <url>http://www.cibiv.at/FACT</url> to search for functionally equivalent proteins. FACT uses the similarity between feature architectures of two proteins, i.e., the arrangements of functional domains, secondary structure elements and compositional properties, as a proxy for their functional equivalence. A scoring function measures feature architecture similarities, which enables searching for functional equivalents in entire proteomes. Our evaluation of 9,570 EC classified enzymes revealed that FACT, using the full feature, set outperformed the existing architecture-based approaches by identifying significantly more functional equivalents as highest scoring proteins. We show that FACT can identify functional equivalents that share no significant sequence similarity. However, when the highest scoring protein of FACT is also the protein with the highest local sequence similarity, it is in 99% of the cases functionally equivalent to the query. We demonstrate the versatility of FACT by identifying a missing link in the yeast glutathione metabolism and also by searching for the human GolgA5 equivalent in <it>Trypanosoma brucei</it>.</p> <p>Conclusions</p> <p>FACT facilitates a quick and sensitive search for functionally equivalent proteins in entire proteomes. FACT is complementary to approaches using sequence similarity to identify proteins with the same function. Thus, FACT is particularly useful when functional equivalents need to be identified in evolutionarily distant species, or when functional equivalents are not homologous. The most reliable annotation transfers, however, are achieved when feature architecture similarity and sequence similarity are jointly taken into account.</p

    HaMStR: Profile hidden markov model based search for orthologs in ESTs

    Get PDF
    BACKGROUND: EST sequencing is a versatile approach for rapidly gathering protein coding sequences. They provide direct access to an organism's gene repertoire bypassing the still error-prone procedure of gene prediction from genomic data. Therefore, ESTs are often the only source for biological sequence data from taxa outside mainstream interest. The widespread use of ESTs in evolutionary studies and particularly in molecular systematics studies is still hindered by the lack of efficient and reliable approaches for automated ortholog predictions in ESTs. Existing methods either depend on a known species tree or cannot cope with redundancy in EST data. RESULTS: We present a novel approach (HaMStR) to mine EST data for the presence of orthologs to a curated set of genes. HaMStR combines a profile Hidden Markov Model search and a subsequent BLAST search to extend existing ortholog cluster with sequences from further taxa. We show that the HaMStR results are consistent with those obtained with existing orthology prediction methods that require completely sequenced genomes. A case study on the phylogeny of 35 fungal taxa illustrates that HaMStR is well suited to compile informative data sets for phylogenomic studies from ESTs and protein sequence data. CONCLUSION: HaMStR extends in a standardized manner a pre-defined set of orthologs with ESTs from further taxa. In the same fashion HaMStR can be applied to protein sequence data, and thus provides a comprehensive approach to compile ortholog cluster from any protein coding data. The resulting orthology predictions serve as the data basis for a variety of evolutionary studies. Here, we have demonstrated the application of HaMStR in a molecular systematics study. However, we envision that studies tracing the evolutionary fate of individual genes or functional complexes of genes will greatly benefit from HaMStR orthology predictions as well

    Accuracy of phylogeny reconstruction methods combining overlapping gene data sets

    Get PDF
    Background The availability of many gene alignments with overlapping taxon sets raises the question of which strategy is the best to infer species phylogenies from multiple gene information. Methods and programs abound that use the gene alignment in different ways to reconstruct the species tree. In particular, different methods combine the original data at different points along the way from the underlying sequences to the final tree. Accordingly, they are classified into superalignment, supertree and medium-level approaches. Here, we present a simulation study to compare different methods from each of these three approaches. Results We observe that superalignment methods usually outperform the other approaches over a wide range of parameters including sparse data and gene-specific evolutionary parameters. In the presence of high incongruency among gene trees, however, other combination methods show better performance than the superalignment approach. Surprisingly, some supertree and medium-level methods exhibit, on average, worse results than a single gene phylogeny with complete taxon information. Conclusions For some methods, using the reconstructed gene tree as an estimation of the species tree is superior to the combination of incomplete information. Superalignment usually performs best since it is less susceptible to stochastic error. Supertree methods can outperform superalignment in the presence of gene-tree conflict

    Ultrafast Approximation for Phylogenetic Bootstrap

    Get PDF
    Nonparametric bootstrap has been a widely used tool in phylogenetic analysis to assess the clade support of phylogenetic trees. However, with the rapidly growing amount of data, this task remains a computational bottleneck. Recently, approximation methods such as the RAxML rapid bootstrap (RBS) and the Shimodaira-Hasegawa-like approximate likelihood ratio test have been introduced to speed up the bootstrap. Here, we suggest an ultrafast bootstrap approximation approach (UFBoot) to compute the support of phylogenetic groups in maximum likelihood (ML) based trees. To achieve this, we combine the resampling estimated log-likelihood method with a simple but effective collection scheme of candidate trees. We also propose a stopping rule that assesses the convergence of branch support values to automatically determine when to stop collecting candidate trees. UFBoot achieves a median speed up of 3.1 (range: 0.66-33.3) to 10.2 (range: 1.32-41.4) compared with RAxML RBS for real DNA and amino acid alignments, respectively. Moreover, our extensive simulations show that UFBoot is robust against moderate model violations and the support values obtained appear to be relatively unbiased compared with the conservative standard bootstrap. This provides a more direct interpretation of the bootstrap support. We offer an efficient and easy-to-use software (available at http://www.cibiv.at/software/iqtree) to perform the UFBoot analysis with ML tree inference

    Critical growth of cerebral tissue in organoids: theory and experiments

    Full text link
    We develop a Fokker-Planck theory of tissue growth with three types of cells (symmetrically dividing, asymmetrically dividing and non-dividing) as main agents to study the growth dynamics of human cerebral organoids. Fitting the theory to lineage tracing data obtained in next generation sequencing experiments, we show that the growth of cerebral organoids is a critical process. We derive analytical expressions describing the time evolution of clonal lineage sizes and show how power-law distributions arise in the limit of long times due to the vanishing of a characteristic growth scale
    corecore