2,595 research outputs found

    Lateral transfer in Stochastic Dollo models

    Full text link
    Lateral transfer, a process whereby species exchange evolutionary traits through non-ancestral relationships, is a frequent source of model misspecification in phylogenetic inference. Lateral transfer obscures the phylogenetic signal in the data as the histories of affected traits are mosaics of the overall phylogeny. We control for the effect of lateral transfer in a Stochastic Dollo model and a Bayesian setting. Our likelihood is highly intractable as the parameters are the solution of a sequence of large systems of differential equations representing the expected evolution of traits along a tree. We illustrate our method on a data set of lexical traits in Eastern Polynesian languages and obtain an improved fit over the corresponding model without lateral transfer.Comment: Improvements suggested by reviewer

    Assessment of mitochondrial genomes for heterobranch gastropod phylogenetics

    Get PDF
    Background Heterobranchia is a diverse clade of marine, freshwater, and terrestrial gastropod molluscs. It includes such disparate taxa as nudibranchs, sea hares, bubble snails, pulmonate land snails and slugs, and a number of (mostly small-bodied) poorly known snails and slugs collectively referred to as the “lower heterobranchs”. Evolutionary relationships within Heterobranchia have been challenging to resolve and the group has been subject to frequent and significant taxonomic revision. Mitochondrial (mt) genomes can be a useful molecular marker for phylogenetics but, to date, sequences have been available for only a relatively small subset of Heterobranchia. Results To assess the utility of mitochondrial genomes for resolving evolutionary relationships within this clade, eleven new mt genomes were sequenced including representatives of several groups of “lower heterobranchs”. Maximum likelihood analyses of concatenated matrices of the thirteen protein coding genes found weak support for most higher-level relationships even after several taxa with extremely high rates of evolution were excluded. Bayesian inference with the CAT + GTR model resulted in a reconstruction that is much more consistent with the current understanding of heterobranch phylogeny. Notably, this analysis recovered Valvatoidea and Orbitestelloidea in a polytomy with a clade including all other heterobranchs, highlighting these taxa as important to understanding early heterobranch evolution. Also, dramatic gene rearrangements were detected within and between multiple clades. However, a single gene order is conserved across the majority of heterobranch clades. Conclusions Analysis of mitochondrial genomes in a Bayesian framework with the site heterogeneous CAT + GTR model resulted in a topology largely consistent with the current understanding of heterobranch phylogeny. However, mitochondrial genomes appear to be too variable to serve as good phylogenetic markers for robustly resolving a number of deeper splits within this clade.publishedVersio

    An approximate search engine for structure

    Get PDF
    As the size of structural databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute-value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art. In this dissertation, efficient search techniques are presented for retrieving trees from a database that are similar to a given query tree. Rooted ordered labeled trees, rooted unordered labeled trees and free trees are considered. Ordered labeled trees are trees in which each node has a label and the left-to-right order among siblings matters. Unordered labeled trees are trees in which the parent-child relationship is significant, but the order among siblings is unimportant. Free trees (unrooted unordered trees) are acyclic graphs. These trees find many applications in bioinformatics, Web log analysis, phyloinformatics, XML processing, etc. Two types of similarity measures are investigated: (i) counting the mismatching paths in the query tree and a data tree, and (ii) measuring the topological relationship between the trees. The proposed approaches include storing the paths of trees in a suffix array, employing hashing techniques to speed up retrieval, and counting the number of up-down operations to move a token from one node to another node in a tree. Various filters for accelerating a search, different strategies for parallelizing these search algorithms and applications of these algorithms to XML and phylogenetic data management are discussed. The proposed techniques have been implemented into a phylogenetic search engine which is fully operational and is available on the World Wide Web. Experimental results on comparing the similarity measures with existing tree metrics and on evaluating the efficiency of the search techniques demonstrate the effectiveness of the search engine. Future work includes extending the techniques to other structural data, as well as developing new filters and algorithms for speeding up searching and mining in complex structures

    A Note on Encodings of Phylogenetic Networks of Bounded Level

    Full text link
    Driven by the need for better models that allow one to shed light into the question how life's diversity has evolved, phylogenetic networks have now joined phylogenetic trees in the center of phylogenetics research. Like phylogenetic trees, such networks canonically induce collections of phylogenetic trees, clusters, and triplets, respectively. Thus it is not surprising that many network approaches aim to reconstruct a phylogenetic network from such collections. Related to the well-studied perfect phylogeny problem, the following question is of fundamental importance in this context: When does one of the above collections encode (i.e. uniquely describe) the network that induces it? In this note, we present a complete answer to this question for the special case of a level-1 (phylogenetic) network by characterizing those level-1 networks for which an encoding in terms of one (or equivalently all) of the above collections exists. Given that this type of network forms the first layer of the rich hierarchy of level-k networks, k a non-negative integer, it is natural to wonder whether our arguments could be extended to members of that hierarchy for higher values for k. By giving examples, we show that this is not the case
    • …
    corecore