7 research outputs found

    Neighbor Joining And Leaf Status

    Full text link
    The Neighbor Joining Algorithm is among the most fundamental algorithmic results in computational biology. However, its definition and correctness proof are not straightforward. In particular, ''the question ''what does the NJ method seek to do?'' has until recently proved somewhat elusive'' [Gascuel \& Steel, 2006]. While a rigorous mathematical analysis is now available, it is still considered somewhat hard to follow and its proof tedious at best. In this work, we present an alternative interpretation of the goal of the Neighbor Joining algorithm by proving that it chooses to merge the two taxa u and v that maximize the ''leaf-status'', that is, the sum of distances of all leaves to the unique u-v-path

    RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs

    Get PDF
    BACKGROUND: When analyzing protein sequences using sequence similarity searches, orthologous sequences (that diverged by speciation) are more reliable predictors of a new protein's function than paralogous sequences (that diverged by gene duplication). The utility of phylogenetic information in high-throughput genome annotation ("phylogenomics") is widely recognized, but existing approaches are either manual or not explicitly based on phylogenetic trees. RESULTS: Here we present RIO (Resampled Inference of Orthologs), a procedure for automated phylogenomics using explicit phylogenetic inference. RIO analyses are performed over bootstrap resampled phylogenetic trees to estimate the reliability of orthology assignments. We also introduce supplementary concepts that are helpful for functional inference. RIO has been implemented as Perl pipeline connecting several C and Java programs. It is available at http://www.genetics.wustl.edu/eddy/forester/. A web server is at http://www.rio.wustl.edu/. RIO was tested on the Arabidopsis thaliana and Caenorhabditis elegans proteomes. CONCLUSION: The RIO procedure is particularly useful for the automated detection of first representatives of novel protein subfamilies. We also describe how some orthologies can be misleading for functional inference

    Parsimony, likelihood and the role of models in molecular phylogenetics.

    Get PDF
    Methods such as maximum parsimony (MP) are frequently criticized as being statistically unsound and not being based on any ''model.'' On the other hand, advocates of MP claim that maximum likelihood (ML) has some fundamental problems. Here, we explore the connection between the different versions of MP and ML methods, particularly in light of recent theoretical results. We describe links between the two methods-for example, we describe how MP can be regarded as an ML method when there is no common mechanism between sites (such as might occur with morphological data and certain forms of molecular data). In the process, we clarify certain historical points of disagreement between proponents of the two methodologies, including a discussion of several forms of the ML optimality criterion. We also describe some additional results that shed light on how much needs to be assumed about underling models of sequence evolution in order to successfully reconstruct evolutionary trees

    A multimodal and multiobjective approach for phylogenetic trees reconstruction

    Get PDF
    Orientador: Fernando Jose Von ZubenTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de ComputaçãoResumo : A reconstrução de árvores filogenéticas pode ser interpretada como um processo sistemático de proposição de uma descrição arbórea para as diferenças relativas que se observam em conjuntos de atributos genéticos homólogos de espécies sob comparação. A árvore filogenética resultante apresenta uma certa topologia, ou padrão de ancestralidade, e os comprimentos dos ramos desta árvore são indicativos do número de mudanças evolutivas desde a divergência do ancestral comum. Tanto a topologia quanto os comprimentos de ramos são hipóteses descritivas de eventos não-observáveis e condicionais, razão pela qual tendem a existir diversas hipóteses de alta qualidade para a reconstrução, assim como múltiplos critérios de desempenho. Esta tese (i) aborda árvores sem raiz; (ii) enfatiza os critérios de quadrados mínimos, evolução mínima e máxima verossimilhança; (iii) propõe uma extensão ao algoritmo Neighbor Joining que oferece múltiplas hipóteses de alta qualidade para a reconstrução; e (iv) descreve e utiliza uma nova ferramenta para otimização multiobjetivo no contexto de reconstrução filogenética. São considerados dados artificiais e dados reais na apresentação de resultados, os quais apontam vantagens e aspectos diferenciais das metodologias propostasAbstract: The reconstruction of phylogenetic trees can be interpreted as a systematic process of proposing an arborean description to the relative dissimilarities observed among sets of homologous genetic attributes of species being compared. The resulting phylogenetic tree presents a certain topology, or ancestrality pattern, and the length of the edges of the tree will indicate the number of evolutionary changes since the divergence from the common ancestor. Both topology and edge lengths are descriptive hypotheses of non-observable and conditional events, which implies the existence of diverse high-quality hypotheses for the reconstruction, as long as multiple performance criteria. This thesis (i) deals with unrooted trees; (ii) emphasizes the least squares, minimum evolution, and maximum likelihood criteria; (iii) proposes an extension to the Neighbor Joining algorithm which offers multiple high-quality reconstruction hypotheses; and (iv) describes and uses a new tool for multiobjective optimization in the context of phylogenetic reconstruction. Artificial and real datasets are considered in the presentation of results, which points to some advantages and distinctive aspects of the proposed methodologiesDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétric

    Computational statistics in molecular phylogenetics

    Get PDF
    Simulation remains a very important approach to testing the robustness and accuracy of phylogenetic inference methods. However, current simulation programs are limited, especially concerning realistic models for simulating insertions and deletions (indels). In this thesis I implement a new, portable and flexible application, named INDELible, which can be used to generate nucleotide, amino acid and codon sequence data by simulating indels (under several models of indel length distribution) as well as substitutions (under a rich repertoire of substitution models). In particular, I introduce a simulation study that makes use of one of INDELible’s many unique features to simulate data with indels under codon models that allow the nonsynonymous/synonymous substitution rate ratio to vary among sites and branches. This data is used to quantify, for the first time, the precise effects of indels and alignment errors on the false-positive rate and power of the widely used branch-site test of positive selection. Several alignment programs are used and assessed in this context. Through the simulation experiment, I show that insertions and deletions do not cause the test to generate excessive false positives if the alignment is correct, but alignment errors can lead to unacceptably high false positives. Previous selection studies that use inferior alignment programs are revisited to demonstrate the applicability of my results in real world situations. Further work uses simulated data from INDELible to examine the effects of tree-shape and branch length on the alignment accuracy of several alignment programs, and the impact of alignment errors on different methods of phylogeny reconstruction. In particular, analysis is performed to explore which programs avoid generating the kind of alignment errors that are most detrimental to the process of phylogeny reconstruction
    corecore