181 research outputs found

    Algorithms: simultaneous error-correction and rooting for gene tree reconciliation and the gene duplication problem

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Evolutionary methods are increasingly challenged by the wealth of fast growing resources of genomic sequence information. Evolutionary events, like gene duplication, loss, and deep coalescence, account more then ever for incongruence between gene trees and the actual species tree. Gene tree reconciliation is addressing this fundamental problem by invoking the minimum number of gene duplication and losses that reconcile a rooted gene tree with a rooted species tree. However, the reconciliation process is highly sensitive to topological error or wrong rooting of the gene tree, a condition that is not met by most gene trees in practice. Thus, despite the promises of gene tree reconciliation, its applicability in practice is severely limited.</p> <p>Results</p> <p>We introduce the problem of reconciling unrooted and erroneous gene trees by simultaneously rooting and error-correcting them, and describe an efficient algorithm for this problem. Moreover, we introduce an error-corrected version of the gene duplication problem, a standard application of gene tree reconciliation. We introduce an effective heuristic for our error-corrected version of the gene duplication problem, given that the original version of this problem is NP-hard. Our experimental results suggest that our error-correcting approaches for unrooted input trees can significantly improve on the accuracy of gene tree reconciliation, and the species tree inference under the gene duplication problem. Furthermore, the efficiency of our algorithm for error-correcting reconciliation is capable of handling truly large-scale phylogenetic studies.</p> <p>Conclusions</p> <p>Our presented error-correction approach is a crucial step towards making gene tree reconciliation more robust, and thus to improve on the accuracy of applications that fundamentally rely on gene tree reconciliation, like the inference of gene-duplication supertrees.</p

    Corrugated channels heat transfer efficiency analysis based on velocity fields resulting from computer simulation and PIV measurements

    Get PDF
    Paper presented at the 8th International Conference on Heat Transfer, Fluid Mechanics and Thermodynamics, Mauritius, 11-13 July, 2011.Numerical and experimental studies of flow and heat transfer, in corrugated channels, are presented. Such channels are representative of compact heat exchangers − for example air or water pre-heaters. The most important characteristic parameter of these channels, apart from the channel wall shape, is the angle between two corrugated sheets. Paper presents measurement results, related to velocity field in such channels – by the PIV-method. The efficiency analysis, based on the irreversible entropy generation, takes account of two processes: flow resulting from the pressure gradient and heat transfer delivered from solid walls. This approach is checked initially, in details, for arrangements related to two values of the angle mentioned, i.e. 0 and 90° : it allows comparing velocity fields obtained from the computer simulation and PIV measurement (the latter in special corrugated sheets). More extensive computer simulation results, for different wall shapes (sinuses and semi-circles) and for angle value mentioned and equal to 90°, are also presented.mp201

    Maximum likelihood models and algorithms for gene tree evolution with duplications and losses

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The abundance of new genomic data provides the opportunity to map the location of gene duplication and loss events on a species phylogeny. The first methods for mapping gene duplications and losses were based on a parsimony criterion, finding the mapping that minimizes the number of duplication and loss events. Probabilistic modeling of gene duplication and loss is relatively new and has largely focused on birth-death processes.</p> <p>Results</p> <p>We introduce a new maximum likelihood model that estimates the speciation and gene duplication and loss events in a gene tree within a species tree with branch lengths. We also provide an, in practice, efficient algorithm that computes optimal evolutionary scenarios for this model. We implemented the algorithm in the program DrML and verified its performance with empirical and simulated data.</p> <p>Conclusions</p> <p>In test data sets, DrML finds optimal gene duplication and loss scenarios within minutes, even when the gene trees contain sequences from several hundred species. In many cases, these optimal scenarios differ from the lca-mapping that results from a parsimony gene tree reconciliation. Thus, DrML provides a new, practical statistical framework on which to study gene duplication.</p

    The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances

    Get PDF
    In the last five years there have been a large number of new time series classification algorithms proposed in the literature. These algorithms have been evaluated on subsets of the 47 data sets in the University of California, Riverside time series classification archive. The archive has recently been expanded to 85 data sets, over half of which have been donated by researchers at the University of East Anglia. Aspects of previous evaluations have made comparisons between algorithms difficult. For example, several different programming languages have been used, experiments involved a single train/test split and some used normalised data whilst others did not. The relaunch of the archive provides a timely opportunity to thoroughly evaluate algorithms on a larger number of datasets. We have implemented 18 recently proposed algorithms in a common Java framework and compared them against two standard benchmark classifiers (and each other) by performing 100 resampling experiments on each of the 85 datasets. We use these results to test several hypotheses relating to whether the algorithms are significantly more accurate than the benchmarks and each other. Our results indicate that only 9 of these algorithms are significantly more accurate than both benchmarks and that one classifier, the Collective of Transformation Ensembles, is significantly more accurate than all of the others. All of our experiments and results are reproducible: we release all of our code, results and experimental details and we hope these experiments form the basis for more rigorous testing of new algorithms in the future

    Reconciliation Revisited: Handling Multiple Optima when Reconciling with Duplication, Transfer, and Loss

    Get PDF
    Phylogenetic tree reconciliation is a powerful approach for inferring evolutionary events like gene duplication, horizontal gene transfer, and gene loss, which are fundamental to our understanding of molecular evolution. While duplication–loss (DL) reconciliation leads to a unique maximum-parsimony solution, duplication-transfer-loss (DTL) reconciliation yields a multitude of optimal solutions, making it difficult to infer the true evolutionary history of the gene family. This problem is further exacerbated by the fact that different event cost assignments yield different sets of optimal reconciliations. Here, we present an effective, efficient, and scalable method for dealing with these fundamental problems in DTL reconciliation. Our approach works by sampling the space of optimal reconciliations uniformly at random and aggregating the results. We show that even gene trees with only a few dozen genes often have millions of optimal reconciliations and present an algorithm to efficiently sample the space of optimal reconciliations uniformly at random in O(mn[superscript 2]) time per sample, where m and n denote the number of genes and species, respectively. We use these samples to understand how different optimal reconciliations vary in their node mappings and event assignments and to investigate the impact of varying event costs. We apply our method to a biological dataset of approximately 4700 gene trees from 100 taxa and observe that 93% of event assignments and 73% of mappings remain consistent across different multiple optima. Our analysis represents the first systematic investigation of the space of optimal DTL reconciliations and has many important implications for the study of gene family evolution.National Science Foundation (U.S.) (CAREER Award 0644282)National Institutes of Health (U.S.) (Grant RC2 HG005639)National Science Foundation (U.S.). Assembling the Tree of Life (Program) (Grant 0936234

    Beyond representing orthology relations by trees

    Get PDF
    Reconstructing the evolutionary past of a family of genes is an important aspect of many genomic studies. To help with this, simple relations on a set of sequences called orthology relations may be employed. In addition to being interesting from a practical point of view they are also attractive from a theoretical perspective in that e.\,g.\,a characterization is known for when such a relation is representable by a certain type of phylogenetic tree. For an orthology relation inferred from real biological data it is however generally too much to hope for that it satisfies that characterization. Rather than trying to correct the data in some way or another which has its own drawbacks, as an alternative, we propose to represent an orthology relation δ\delta in terms of a structure more general than a phylogenetic tree called a phylogenetic network. To compute such a network in the form of a level-1 representation for δ\delta, we formalize an orthology relation in terms of the novel concept of a symbolic 3- dissimilarity which is motivated by the biological concept of a ``cluster of orthologous groups'', or COG for short. For such maps which assign symbols rather that real values to elements, we introduce the novel {\sc Network-Popping} algorithm which has several attractive properties. In addition, we characterize an orthology relation δ\delta on some set XX that has a level-1 representation in terms of eight natural properties for δ\delta as well as in terms of level-1 representations of orthology relations on certain subsets of XX

    Study on Phylogenetic Relationships, Variability, and Correlated Mutations in M2 Proteins of Influenza Virus A

    Get PDF
    M2 channel, an influenza virus transmembrane protein, serves as an important target for antiviral drug design. There are still discordances concerning the role of some residues involved in proton transfer as well as the mechanism of inhibition by commercial drugs. The viral M2 proteins show high conservativity; about 3/4 of the positions are occupied by one residue in over 95%. Nine M2 proteins from the H3N2 strain and possibly two proteins from H2N2 strains make a phylogenic cluster closely related to 2RLF. The variability range is limited to 4 residues/position with one exception. The 2RLF protein stands out by the presence of 2 serines at the positions 19 and 50, which are in most other M2 proteins occupied by cysteines. The study of correlated mutations shows that there are several positions with significant mutational correlation that have not been described so far as functionally important. That there are 5 more residues potentially involved in the M2 mechanism of action. The original software used in this work (Consensus Constructor, SSSSg, Corm, Talana) is freely accessible as stand-alone offline applications upon request to the authors. The other software used in this work is freely available online for noncommercial purposes at public services on bioinformatics such as ExPASy or NCBI. The study on mutational variability, evolutionary relationship, and correlated mutation presented in this paper is a potential way to explain more completely the role of significant factors in proton channel action and to clarify the inhibition mechanism by specific drugs
    corecore