1,123 research outputs found

    Proof of the Feldman-Karlin Conjecture on the Maximum Number of Equilibria in an Evolutionary System

    Full text link
    Feldman and Karlin conjectured that the number of isolated fixed points for deterministic models of viability selection and recombination among n possible haplotypes has an upper bound of 2^n - 1. Here a proof is provided. The upper bound of 3^{n-1} obtained by Lyubich et al. (2001) using Bezout's Theorem (1779) is reduced here to 2^n through a change of representation that reduces the third-order polynomials to second order. A further reduction to 2^n - 1 is obtained using the homogeneous representation of the system, which yields always one solution `at infinity'. While the original conjecture was made for systems of viability selection and recombination, the results here generalize to viability selection with any arbitrary system of bi-parental transmission, which includes recombination and mutation as special cases. An example is constructed of a mutation-selection system that has 2^n - 1 fixed points given any n, which shows that 2^n - 1 is the sharpest possible upper bound that can be found for the general space of selection and transmission coefficients.Comment: 9 pages, 1 figure; v.4: final minor revisions, corrections, additions; v.3: expands theorem to cover all cases, obviating v.2 distinction of reducible/irreducible; details added to: discussion of Lyubich (1992), example that attains upper bound, and homotopy continuation method

    Comparative statistical analysis of bacteria genomes in "word" context

    Full text link
    Statistical analysis of bacteria genomes texts has been performed on the basis of 20 complete genomes origin from Genebank. It has been revealed that the word ranked distributions are quite well approximated by logarithmic law. Results obtained in the absent words investigation show the considerably nonrandom character of DNA texts. In character of autocorrelation function behavior in several genomes period 3 oscillations were found. Short range autocorrelations are present in short (n=3n=3) words and practically absent in longer words.Comment: 14 pages, 3 figures, submitted to Physica

    Universality of Long-Range Correlations in Expansion-Randomization Systems

    Full text link
    We study the stochastic dynamics of sequences evolving by single site mutations, segmental duplications, deletions, and random insertions. These processes are relevant for the evolution of genomic DNA. They define a universality class of non-equilibrium 1D expansion-randomization systems with generic stationary long-range correlations in a regime of growing sequence length. We obtain explicitly the two-point correlation function of the sequence composition and the distribution function of the composition bias in sequences of finite length. The characteristic exponent χ\chi of these quantities is determined by the ratio of two effective rates, which are explicitly calculated for several specific sequence evolution dynamics of the universality class. Depending on the value of χ\chi, we find two different scaling regimes, which are distinguished by the detectability of the initial composition bias. All analytic results are accurately verified by numerical simulations. We also discuss the non-stationary build-up and decay of correlations, as well as more complex evolutionary scenarios, where the rates of the processes vary in time. Our findings provide a possible example for the emergence of universality in molecular biology.Comment: 23 pages, 15 figure

    A Solvable Sequence Evolution Model and Genomic Correlations

    Full text link
    We study a minimal model for genome evolution whose elementary processes are single site mutation, duplication and deletion of sequence regions and insertion of random segments. These processes are found to generate long-range correlations in the composition of letters as long as the sequence length is growing, i.e., the combined rates of duplications and insertions are higher than the deletion rate. For constant sequence length, on the other hand, all initial correlations decay exponentially. These results are obtained analytically and by simulations. They are compared with the long-range correlations observed in genomic DNA, and the implications for genome evolution are discussed.Comment: 4 pages, 4 figure

    Detecting Horizontally Transferred and Essential Genes Based on Dinucleotide Relative Abundance

    Get PDF
    Various methods have been developed to detect horizontal gene transfer in bacteria, based on anomalous nucleotide composition, assuming that compositional features undergo amelioration in the host genome. Evolutionary theory predicts the inevitability of false positives when essential sequences are strongly conserved. Foreign genes could become more detectable on the basis of their higher order compositions if such features ameliorate more rapidly and uniformly than lower order features. This possibility is tested by comparing the heterogeneities of bacterial genomes with respect to strand-independent first- and second-order features, (i) G + C content and (ii) dinucleotide relative abundance, in 1 kb segments. Although statistical analysis confirms that (ii) is less inhomogeneous than (i) in all 12 species examined, extreme anomalies with respect to (ii) in the Escherichia coli K12 genome are typically co-located with essential genes

    Incorporating sequence quality data into alignment improves DNA read mapping

    Get PDF
    New DNA sequencing technologies have achieved breakthroughs in throughput, at the expense of higher error rates. The primary way of interpreting biological sequences is via alignment, but standard alignment methods assume the sequences are accurate. Here, we describe how to incorporate the per-base error probabilities reported by sequencers into alignment. Unlike existing tools for DNA read mapping, our method models both sequencer errors and real sequence differences. This approach consistently improves mapping accuracy, even when the rate of real sequence difference is only 0.2%. Furthermore, when mapping Drosophila melanogaster reads to the Drosophila simulans genome, it increased the amount of correctly mapped reads from 49 to 66%. This approach enables more effective use of DNA reads from organisms that lack reference genomes, are extinct or are highly polymorphic

    Correlation property of length sequences based on global structure of complete genome

    Get PDF
    This paper considers three kinds of length sequences of the complete genome. Detrended fluctuation analysis, spectral analysis, and the mean distance spanned within time LL are used to discuss the correlation property of these sequences. The values of the exponents from these methods of these three kinds of length sequences of bacteria indicate that the long-range correlations exist in most of these sequences. The correlation have a rich variety of behaviours including the presence of anti-correlations. Further more, using the exponent γ\gamma, it is found that these correlations are all linear (γ=1.0±0.03\gamma=1.0\pm 0.03). It is also found that these sequences exhibit 1/f1/f noise in some interval of frequency (f>1f>1). The length of this interval of frequency depends on the length of the sequence. The shape of the periodogram in f>1f>1 exhibits some periodicity. The period seems to depend on the length and the complexity of the length sequence.Comment: RevTex, 9 pages with 5 figures and 3 tables. Phys. Rev. E Jan. 1,2001 (to appear

    ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences

    Get PDF
    Recent metagenomics studies of environmental samples suggested that microbial communities are much more diverse than previously reported, and deep sequencing will significantly increase the estimate of total species diversity. Massively parallel pyrosequencing technology enables ultra-deep sequencing of complex microbial populations rapidly and inexpensively. However, computational methods for analyzing large collections of 16S ribosomal sequences are limited. We proposed a new algorithm, referred to as ESPRIT, which addresses several computational issues with prior methods. We developed two versions of ESPRIT, one for personal computers (PCs) and one for computer clusters (CCs). The PC version is used for small- and medium-scale data sets and can process several tens of thousands of sequences within a few minutes, while the CC version is for large-scale problems and is able to analyze several hundreds of thousands of reads within one day. Large-scale experiments are presented that clearly demonstrate the effectiveness of the newly proposed algorithm. The source code and user guide are freely available at http://www.biotech.ufl.edu/people/sun/esprit.html
    corecore