8,593 research outputs found
Distinguishing regional from within-codon rate heterogeneity in DNA sequence alignments
We present an improved phylogenetic factorial hidden Markov model (FHMM) for detecting two types of mosaic structures in DNA sequence alignments, related to (1) recombination and (2) rate heterogeneity. The focus of the present work is on improving the modelling of the latter aspect. Earlier papers have modelled different degrees of rate heterogeneity with separate hidden states of the FHMM. This approach fails to appreciate the intrinsic difference between two types of rate heterogeneity: long-range regional effects, which are potentially related to differences in the selective pressure, and the short-term periodic patterns within the codons, which merely capture the signature of the genetic code. We propose an improved model that explicitly distinguishes between these two effects, and we assess its performance on a set of simulated DNA sequence alignments
Horseshoe-based Bayesian nonparametric estimation of effective population size trajectories
Phylodynamics is an area of population genetics that uses genetic sequence
data to estimate past population dynamics. Modern state-of-the-art Bayesian
nonparametric methods for recovering population size trajectories of unknown
form use either change-point models or Gaussian process priors. Change-point
models suffer from computational issues when the number of change-points is
unknown and needs to be estimated. Gaussian process-based methods lack local
adaptivity and cannot accurately recover trajectories that exhibit features
such as abrupt changes in trend or varying levels of smoothness. We propose a
novel, locally-adaptive approach to Bayesian nonparametric phylodynamic
inference that has the flexibility to accommodate a large class of functional
behaviors. Local adaptivity results from modeling the log-transformed effective
population size a priori as a horseshoe Markov random field, a recently
proposed statistical model that blends together the best properties of the
change-point and Gaussian process modeling paradigms. We use simulated data to
assess model performance, and find that our proposed method results in reduced
bias and increased precision when compared to contemporary methods. We also use
our models to reconstruct past changes in genetic diversity of human hepatitis
C virus in Egypt and to estimate population size changes of ancient and modern
steppe bison. These analyses show that our new method captures features of the
population size trajectories that were missed by the state-of-the-art methods.Comment: 36 pages, including supplementary informatio
Analytic approach to the evolutionary effects of genetic exchange
We present an approximate analytic study of our previously introduced model
of evolution including the effects of genetic exchange. This model is motivated
by the process of bacterial transformation. We solve for the velocity, the rate
of increase of fitness, as a function of the fixed population size, . We
find the velocity increases with , eventually saturated at an which
depends on the strength of the recombination process. The analytical treatment
is seen to agree well with direct numerical simulations of our model equations
In search of lost introns
Many fundamental questions concerning the emergence and subsequent evolution
of eukaryotic exon-intron organization are still unsettled. Genome-scale
comparative studies, which can shed light on crucial aspects of eukaryotic
evolution, require adequate computational tools.
We describe novel computational methods for studying spliceosomal intron
evolution. Our goal is to give a reliable characterization of the dynamics of
intron evolution. Our algorithmic innovations address the identification of
orthologous introns, and the likelihood-based analysis of intron data. We
discuss a compression method for the evaluation of the likelihood function,
which is noteworthy for phylogenetic likelihood problems in general. We prove
that after preprocessing time, subsequent evaluations take time almost surely in the Yule-Harding random model of -taxon
phylogenies, where is the input sequence length.
We illustrate the practicality of our methods by compiling and analyzing a
data set involving 18 eukaryotes, more than in any other study to date. The
study yields the surprising result that ancestral eukaryotes were fairly
intron-rich. For example, the bilaterian ancestor is estimated to have had more
than 90% as many introns as vertebrates do now
Preservation of information in a prebiotic package model
The coexistence between different informational molecules has been the
preferred mode to circumvent the limitation posed by imperfect replication on
the amount of information stored by each of these molecules. Here we reexamine
a classic package model in which distinct information carriers or templates are
forced to coexist within vesicles, which in turn can proliferate freely through
binary division. The combined dynamics of vesicles and templates is described
by a multitype branching process which allows us to write equations for the
average number of the different types of vesicles as well as for their
extinction probabilities. The threshold phenomenon associated to the extinction
of the vesicle population is studied quantitatively using finite-size scaling
techniques. We conclude that the resultant coexistence is too frail in the
presence of parasites and so confinement of templates in vesicles without an
explicit mechanism of cooperation does not resolve the information crisis of
prebiotic evolution.Comment: 9 pages, 8 figures, accepted version, to be published in PR
MixtureTree: a program for constructing phylogeny
<p>Abstract</p> <p>Background</p> <p>MixtureTree v1.0 is a Linux based program (written in C++) which implements an algorithm based on mixture models for reconstructing phylogeny from binary sequence data, such as single-nucleotide polymorphisms (SNPs). In addition to the mixture algorithm with three different optimization options, the program also implements a bootstrap procedure with majority-rule consensus.</p> <p>Results</p> <p>The MixtureTree program written in C++ is a Linux based package. The User's Guide and source codes will be available at <url>http://math.asu.edu/~scchen/MixtureTree.html</url></p> <p>Conclusions</p> <p>The efficiency of the mixture algorithm is relatively higher than some classical methods, such as Neighbor-Joining method, Maximum Parsimony method and Maximum Likelihood method. The shortcoming of the mixture tree algorithms, for example timing consuming, can be improved by implementing other revised Expectation-Maximization(EM) algorithms instead of the traditional EM algorithm.</p
Asexual and sexual replication in sporulating organisms
This paper develops models describing asexual and sexual replication in
sporulating organisms. Replication via sporulation is the replication strategy
for all multicellular life, and may even be observed in unicellular life (such
as with budding yeast). We consider diploid populations replicating via one of
two possible sporulation mechanisms: (1) Asexual sporulation, whereby adult
organisms produce single-celled diploid spores that grow into adults
themselves. (2) Sexual sporulation, whereby adult organisms produce
single-celled diploid spores that divide into haploid gametes. The haploid
gametes enter a haploid "pool", where they may recombine with other haploids to
form a diploid spore that then grows into an adult. We consider a haploid
fusion rate given by second-order reaction kinetics. We work with a simplified
model where the diploid genome consists of only two chromosomes, each of which
may be rendered defective with a single point mutation of the wild-type. We
find that the asexual strategy is favored when the rate of spore production is
high compared to the characteristic growth rate from a spore to a reproducing
adult. Conversely, the sexual strategy is favored when the rate of spore
production is low compared to the characteristic growth rate from a spore to a
reproducing adult. As the characteristic growth time increases, or as the
population density increases, the critical ratio of spore production rate to
organism growth rate at which the asexual strategy overtakes the sexual one is
pushed to higher values. Therefore, the results of this model suggest that, for
complex multicellular organisms, sexual replication is favored at high
population densities, and low growth and sporulation rates.Comment: 8 pages, 5 figures, to be submitted to Journal of Theoretical
Biology, figures not included in this submissio
Efficient FPT algorithms for (strict) compatibility of unrooted phylogenetic trees
In phylogenetics, a central problem is to infer the evolutionary
relationships between a set of species ; these relationships are often
depicted via a phylogenetic tree -- a tree having its leaves univocally labeled
by elements of and without degree-2 nodes -- called the "species tree". One
common approach for reconstructing a species tree consists in first
constructing several phylogenetic trees from primary data (e.g. DNA sequences
originating from some species in ), and then constructing a single
phylogenetic tree maximizing the "concordance" with the input trees. The
so-obtained tree is our estimation of the species tree and, when the input
trees are defined on overlapping -- but not identical -- sets of labels, is
called "supertree". In this paper, we focus on two problems that are central
when combining phylogenetic trees into a supertree: the compatibility and the
strict compatibility problems for unrooted phylogenetic trees. These problems
are strongly related, respectively, to the notions of "containing as a minor"
and "containing as a topological minor" in the graph community. Both problems
are known to be fixed-parameter tractable in the number of input trees , by
using their expressibility in Monadic Second Order Logic and a reduction to
graphs of bounded treewidth. Motivated by the fact that the dependency on
of these algorithms is prohibitively large, we give the first explicit dynamic
programming algorithms for solving these problems, both running in time
, where is the total size of the input.Comment: 18 pages, 1 figur
Nocardia kroppenstedtii sp. nov., a novel actinomycete isolated from a lung transplant patient with a pulmonary infection
An actinomycete, strain N1286T, isolated from a lung transplant patient with a pulmonary infection, was provisionally assigned to the genus Nocardia. The strain had chemotaxonomic and morphological properties typical of members of the genus Nocardia and formed a distinct phyletic line in the Nocardia 16S rRNA gene tree. It was most closely related to Nocardia farcinica DSM 43665T (99.8% gene similarity) but was distinguished from the latter by a low level of DNA:DNA relatedness. These strains were also distinguished by a broad range of phenotypic properties. On the basis of these data, it is proposed that isolate N1286T (=DSM 45810T = NCTC 13617T) should be classified as the type strain of a new Nocardia species for which the name Nocardia kroppenstedtii is proposed
Fast computation of distance estimators
BACKGROUND: Some distance methods are among the most commonly used methods for reconstructing phylogenetic trees from sequence data. The input to a distance method is a distance matrix, containing estimated pairwise distances between all pairs of taxa. Distance methods themselves are often fast, e.g., the famous and popular Neighbor Joining (NJ) algorithm reconstructs a phylogeny of n taxa in time O(n(3)). Unfortunately, the fastest practical algorithms known for Computing the distance matrix, from n sequences of length l, takes time proportional to l·n(2). Since the sequence length typically is much larger than the number of taxa, the distance estimation is the bottleneck in phylogeny reconstruction. This bottleneck is especially apparent in reconstruction of large phylogenies or in applications where many trees have to be reconstructed, e.g., bootstrapping and genome wide applications. RESULTS: We give an advanced algorithm for Computing the number of mutational events between DNA sequences which is significantly faster than both Phylip and Paup. Moreover, we give a new method for estimating pairwise distances between sequences which contain ambiguity Symbols. This new method is shown to be more accurate as well as faster than earlier methods. CONCLUSION: Our novel algorithm for Computing distance estimators provides a valuable tool in phylogeny reconstruction. Since the running time of our distance estimation algorithm is comparable to that of most distance methods, the previous bottleneck is removed. All distance methods, such as NJ, require a distance matrix as input and, hence, our novel algorithm significantly improves the overall running time of all distance methods. In particular, we show for real world biological applications how the running time of phylogeny reconstruction using NJ is improved from a matter of hours to a matter of seconds
- …