29 research outputs found

    Exact correspondence between walk in nucleotide and protein sequence spaces

    No full text
    In the course of evolution, genes traverse the nucleotide sequence space, which translates to a trajectory of changes in the protein sequence in protein sequence space. The correspondence between regions of the nucleotide and protein sequence spaces is understood in general but not in detail. One of the unexplored questions is how many sequences a protein can reach with a certain number of nucleotide substitutions in its gene sequence. Here I propose an algorithm to calculate the volume of protein sequence space accessible to a given protein sequence as a function of the number of nucleotide substitutions made in the protein-coding sequence. The algorithm utilizes the power of the dynamic programming approach, and makes all calculations within a couple of seconds on a desktop computer. I apply the algorithm to green fluorescence protein, and get the number of sequences four times higher than estimated before. However, taking into account the astronomically huge size of the protein sequence space, the previous estimate can be considered as acceptable as an order of magnitude estimation. The proposed algorithm has practical applications in the study of evolutionary trajectories in sequence space.This work was supported by HHMI International Early Career Scientist Program (55007424), The MINECO (BFU2015-68723-P), Spanish Ministry of Economy and Competitiveness Centro de Excelencia Severo Ochoa 2013-2017 grant (SEV-2012-0208), Secretaria d'Universitats i Recerca del Departament d'Economia i Coneixement de la Generalitat's AGAUR program (2014 SGR 0974), and European Research Council under the European Union's Seventh Framework Programme (FP7/2007-2013, ERC grant agreement 335980_EinME)

    Comparison of approximate [5] and exact (this paper) number of possible amino acid sequences of GFP.

    No full text
    <p>Comparison of approximate [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0182525#pone.0182525.ref005" target="_blank">5</a>] and exact (this paper) number of possible amino acid sequences of GFP.</p

    Consideration of the serine-coded UCG codon.

    No full text
    <p>(a) The standard genetic code table with codons colored by distance from the considered UCG codon: UCG codon itself is colored black; codons at the distance of one, two and three nucleotide substitutions, are colored by blue, green and red, respectively. (b) The list of amino acids that can be obtained from serine UCG codon by zero (black), one (blue), two (green), and three (red) nucleotide substitutions. On the left all amino acid variants are given, while on the right only variants are given that contribute to the increment of the protein sequence space. (c) The graph representation of the number of possible amino acid variants when mutating UCG codon. Black, blue, green, and red arrows correspond to zero, one, two, and three nucleotide substitutions, multiplying the previously available number of amino acid variants (here one, left circle) by one, five, ten, and four variants, respectively.</p

    Prediction of protein folding rates from the amino acid sequence-predicted secondary structure

    No full text

    Coupling between properties of the protein shape and the rate of protein folding.

    Get PDF
    There are several important questions on the coupling between properties of the protein shape and the rate of protein folding. We have studied a series of structural descriptors intended for describing protein shapes (the radius of gyration, the radius of cross-section, and the coefficient of compactness) and their possible connection with folding behavior, either rates of folding or the emergence of folding intermediates, and compared them with classical descriptors, protein chain length and contact order. It has been found that when a descriptor is normalized to eliminate the influence of the protein size (the radius of gyration normalized to the radius of gyration of a ball of equal volume, the coefficient of compactness defined as the ratio of the accessible surface area of a protein to that of an ideal ball of equal volume, and relative contact order) it completely looses its ability to predict folding rates. On the other hand, when a descriptor correlates well with protein size (the radius of cross-section and absolute contact order in our consideration) then it correlates well with the logarithm of folding rates and separates reasonably well two-state folders from multi-state ones. The critical control for the performance of new descriptors demonstrated that the radius of cross-section has a somewhat higher predictive power (the correlation coefficient is -0.74) than size alone (the correlation coefficient is -0.65). So, we have shown that the numerical descriptors of the overall shape-geometry of protein structures are one of the important determinants of the protein-folding rate and mechanism

    Rate of sequence divergence under constant selection

    Get PDF
    BACKGROUND: Divergence of two independently evolving sequences that originated from a common ancestor can be described by two parameters, the asymptotic level of divergence E and the rate r at which this level of divergence is approached. Constant negative selection impedes allele replacements and, therefore, is routinely assumed to decelerate sequence divergence. However, its impact on E and on r has not been formally investigated. RESULTS: Strong selection that favors only one allele can make E arbitrarily small and r arbitrarily large. In contrast, in the case of 4 possible alleles and equal mutation rates, the lowest value of r, attained when two alleles confer equal fitnesses and the other two are strongly deleterious, is only two times lower than its value under selective neutrality. CONCLUSIONS: Constant selection can strongly constrain the level of sequence divergence, but cannot reduce substantially the rate at which this level is approached. In particular, under any constant selection the divergence of sequences that accumulated one substitution per neutral site since their origin from the common ancestor must already constitute at least one half of the asymptotic divergence at sites under such selectio
    corecore