8,593 research outputs found

    Distinguishing regional from within-codon rate heterogeneity in DNA sequence alignments

    Get PDF
    We present an improved phylogenetic factorial hidden Markov model (FHMM) for detecting two types of mosaic structures in DNA sequence alignments, related to (1) recombination and (2) rate heterogeneity. The focus of the present work is on improving the modelling of the latter aspect. Earlier papers have modelled different degrees of rate heterogeneity with separate hidden states of the FHMM. This approach fails to appreciate the intrinsic difference between two types of rate heterogeneity: long-range regional effects, which are potentially related to differences in the selective pressure, and the short-term periodic patterns within the codons, which merely capture the signature of the genetic code. We propose an improved model that explicitly distinguishes between these two effects, and we assess its performance on a set of simulated DNA sequence alignments

    Horseshoe-based Bayesian nonparametric estimation of effective population size trajectories

    Full text link
    Phylodynamics is an area of population genetics that uses genetic sequence data to estimate past population dynamics. Modern state-of-the-art Bayesian nonparametric methods for recovering population size trajectories of unknown form use either change-point models or Gaussian process priors. Change-point models suffer from computational issues when the number of change-points is unknown and needs to be estimated. Gaussian process-based methods lack local adaptivity and cannot accurately recover trajectories that exhibit features such as abrupt changes in trend or varying levels of smoothness. We propose a novel, locally-adaptive approach to Bayesian nonparametric phylodynamic inference that has the flexibility to accommodate a large class of functional behaviors. Local adaptivity results from modeling the log-transformed effective population size a priori as a horseshoe Markov random field, a recently proposed statistical model that blends together the best properties of the change-point and Gaussian process modeling paradigms. We use simulated data to assess model performance, and find that our proposed method results in reduced bias and increased precision when compared to contemporary methods. We also use our models to reconstruct past changes in genetic diversity of human hepatitis C virus in Egypt and to estimate population size changes of ancient and modern steppe bison. These analyses show that our new method captures features of the population size trajectories that were missed by the state-of-the-art methods.Comment: 36 pages, including supplementary informatio

    Analytic approach to the evolutionary effects of genetic exchange

    Full text link
    We present an approximate analytic study of our previously introduced model of evolution including the effects of genetic exchange. This model is motivated by the process of bacterial transformation. We solve for the velocity, the rate of increase of fitness, as a function of the fixed population size, NN. We find the velocity increases with lnN\ln N, eventually saturated at an NN which depends on the strength of the recombination process. The analytical treatment is seen to agree well with direct numerical simulations of our model equations

    In search of lost introns

    Full text link
    Many fundamental questions concerning the emergence and subsequent evolution of eukaryotic exon-intron organization are still unsettled. Genome-scale comparative studies, which can shed light on crucial aspects of eukaryotic evolution, require adequate computational tools. We describe novel computational methods for studying spliceosomal intron evolution. Our goal is to give a reliable characterization of the dynamics of intron evolution. Our algorithmic innovations address the identification of orthologous introns, and the likelihood-based analysis of intron data. We discuss a compression method for the evaluation of the likelihood function, which is noteworthy for phylogenetic likelihood problems in general. We prove that after O(nL)O(nL) preprocessing time, subsequent evaluations take O(nL/logL)O(nL/\log L) time almost surely in the Yule-Harding random model of nn-taxon phylogenies, where LL is the input sequence length. We illustrate the practicality of our methods by compiling and analyzing a data set involving 18 eukaryotes, more than in any other study to date. The study yields the surprising result that ancestral eukaryotes were fairly intron-rich. For example, the bilaterian ancestor is estimated to have had more than 90% as many introns as vertebrates do now

    Preservation of information in a prebiotic package model

    Full text link
    The coexistence between different informational molecules has been the preferred mode to circumvent the limitation posed by imperfect replication on the amount of information stored by each of these molecules. Here we reexamine a classic package model in which distinct information carriers or templates are forced to coexist within vesicles, which in turn can proliferate freely through binary division. The combined dynamics of vesicles and templates is described by a multitype branching process which allows us to write equations for the average number of the different types of vesicles as well as for their extinction probabilities. The threshold phenomenon associated to the extinction of the vesicle population is studied quantitatively using finite-size scaling techniques. We conclude that the resultant coexistence is too frail in the presence of parasites and so confinement of templates in vesicles without an explicit mechanism of cooperation does not resolve the information crisis of prebiotic evolution.Comment: 9 pages, 8 figures, accepted version, to be published in PR

    MixtureTree: a program for constructing phylogeny

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>MixtureTree v1.0 is a Linux based program (written in C++) which implements an algorithm based on mixture models for reconstructing phylogeny from binary sequence data, such as single-nucleotide polymorphisms (SNPs). In addition to the mixture algorithm with three different optimization options, the program also implements a bootstrap procedure with majority-rule consensus.</p> <p>Results</p> <p>The MixtureTree program written in C++ is a Linux based package. The User's Guide and source codes will be available at <url>http://math.asu.edu/~scchen/MixtureTree.html</url></p> <p>Conclusions</p> <p>The efficiency of the mixture algorithm is relatively higher than some classical methods, such as Neighbor-Joining method, Maximum Parsimony method and Maximum Likelihood method. The shortcoming of the mixture tree algorithms, for example timing consuming, can be improved by implementing other revised Expectation-Maximization(EM) algorithms instead of the traditional EM algorithm.</p

    Asexual and sexual replication in sporulating organisms

    Full text link
    This paper develops models describing asexual and sexual replication in sporulating organisms. Replication via sporulation is the replication strategy for all multicellular life, and may even be observed in unicellular life (such as with budding yeast). We consider diploid populations replicating via one of two possible sporulation mechanisms: (1) Asexual sporulation, whereby adult organisms produce single-celled diploid spores that grow into adults themselves. (2) Sexual sporulation, whereby adult organisms produce single-celled diploid spores that divide into haploid gametes. The haploid gametes enter a haploid "pool", where they may recombine with other haploids to form a diploid spore that then grows into an adult. We consider a haploid fusion rate given by second-order reaction kinetics. We work with a simplified model where the diploid genome consists of only two chromosomes, each of which may be rendered defective with a single point mutation of the wild-type. We find that the asexual strategy is favored when the rate of spore production is high compared to the characteristic growth rate from a spore to a reproducing adult. Conversely, the sexual strategy is favored when the rate of spore production is low compared to the characteristic growth rate from a spore to a reproducing adult. As the characteristic growth time increases, or as the population density increases, the critical ratio of spore production rate to organism growth rate at which the asexual strategy overtakes the sexual one is pushed to higher values. Therefore, the results of this model suggest that, for complex multicellular organisms, sexual replication is favored at high population densities, and low growth and sporulation rates.Comment: 8 pages, 5 figures, to be submitted to Journal of Theoretical Biology, figures not included in this submissio

    Efficient FPT algorithms for (strict) compatibility of unrooted phylogenetic trees

    Full text link
    In phylogenetics, a central problem is to infer the evolutionary relationships between a set of species XX; these relationships are often depicted via a phylogenetic tree -- a tree having its leaves univocally labeled by elements of XX and without degree-2 nodes -- called the "species tree". One common approach for reconstructing a species tree consists in first constructing several phylogenetic trees from primary data (e.g. DNA sequences originating from some species in XX), and then constructing a single phylogenetic tree maximizing the "concordance" with the input trees. The so-obtained tree is our estimation of the species tree and, when the input trees are defined on overlapping -- but not identical -- sets of labels, is called "supertree". In this paper, we focus on two problems that are central when combining phylogenetic trees into a supertree: the compatibility and the strict compatibility problems for unrooted phylogenetic trees. These problems are strongly related, respectively, to the notions of "containing as a minor" and "containing as a topological minor" in the graph community. Both problems are known to be fixed-parameter tractable in the number of input trees kk, by using their expressibility in Monadic Second Order Logic and a reduction to graphs of bounded treewidth. Motivated by the fact that the dependency on kk of these algorithms is prohibitively large, we give the first explicit dynamic programming algorithms for solving these problems, both running in time 2O(k2)n2^{O(k^2)} \cdot n, where nn is the total size of the input.Comment: 18 pages, 1 figur

    Nocardia kroppenstedtii sp. nov., a novel actinomycete isolated from a lung transplant patient with a pulmonary infection

    Get PDF
    An actinomycete, strain N1286T, isolated from a lung transplant patient with a pulmonary infection, was provisionally assigned to the genus Nocardia. The strain had chemotaxonomic and morphological properties typical of members of the genus Nocardia and formed a distinct phyletic line in the Nocardia 16S rRNA gene tree. It was most closely related to Nocardia farcinica DSM 43665T (99.8% gene similarity) but was distinguished from the latter by a low level of DNA:DNA relatedness. These strains were also distinguished by a broad range of phenotypic properties. On the basis of these data, it is proposed that isolate N1286T (=DSM 45810T = NCTC 13617T) should be classified as the type strain of a new Nocardia species for which the name Nocardia kroppenstedtii is proposed

    Fast computation of distance estimators

    Get PDF
    BACKGROUND: Some distance methods are among the most commonly used methods for reconstructing phylogenetic trees from sequence data. The input to a distance method is a distance matrix, containing estimated pairwise distances between all pairs of taxa. Distance methods themselves are often fast, e.g., the famous and popular Neighbor Joining (NJ) algorithm reconstructs a phylogeny of n taxa in time O(n(3)). Unfortunately, the fastest practical algorithms known for Computing the distance matrix, from n sequences of length l, takes time proportional to l·n(2). Since the sequence length typically is much larger than the number of taxa, the distance estimation is the bottleneck in phylogeny reconstruction. This bottleneck is especially apparent in reconstruction of large phylogenies or in applications where many trees have to be reconstructed, e.g., bootstrapping and genome wide applications. RESULTS: We give an advanced algorithm for Computing the number of mutational events between DNA sequences which is significantly faster than both Phylip and Paup. Moreover, we give a new method for estimating pairwise distances between sequences which contain ambiguity Symbols. This new method is shown to be more accurate as well as faster than earlier methods. CONCLUSION: Our novel algorithm for Computing distance estimators provides a valuable tool in phylogeny reconstruction. Since the running time of our distance estimation algorithm is comparable to that of most distance methods, the previous bottleneck is removed. All distance methods, such as NJ, require a distance matrix as input and, hence, our novel algorithm significantly improves the overall running time of all distance methods. In particular, we show for real world biological applications how the running time of phylogeny reconstruction using NJ is improved from a matter of hours to a matter of seconds
    corecore