2,913 research outputs found
Historical contingency and entrenchment in protein evolution under purifying selection
The fitness contribution of an allele at one genetic site may depend on
alleles at other sites, a phenomenon known as epistasis. Epistasis can
profoundly influence the process of evolution in populations under selection,
and can shape the course of protein evolution across divergent species. Whereas
epistasis between adaptive substitutions has been the subject of extensive
study, relatively little is known about epistasis under purifying selection.
Here we use mechanistic models of thermodynamic stability in a ligand-binding
protein to explore the structure of epistatic interactions between
substitutions that fix in protein sequences under purifying selection. We find
that the selection coefficients of mutations that are nearly-neutral when they
fix are highly contingent on the presence of preceding mutations. Conversely,
mutations that are nearly-neutral when they fix are subsequently entrenched due
to epistasis with later substitutions. Our evolutionary model includes
insertions and deletions, as well as point mutations, and so it allows us to
quantify epistasis within each of these classes of mutations, and also to study
the evolution of protein length. We find that protein length remains largely
constant over time, because indels are more deleterious than point mutations.
Our results imply that, even under purifying selection, protein sequence
evolution is highly contingent on history and so it cannot be predicted by the
phenotypic effects of mutations assayed in the wild-type sequence.Comment: 42 pages, 13 figure
Protein 3D Structure Computed from Evolutionary Sequence Variation
The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing
Molecular Evolutionary Studies using Structural Genomics and Proteomics.
The field of molecular evolution has progressed with the accumulation of various molecular data. It started with the analysis of protein sequence data, followed by that of gene and genome sequence dada. Recently, structural genomics and proteomics have offered new types of data for addressing molecular evolution questions. Structural genomics refers to genome-wide collection of protein structures, whereas proteomics is the study of all proteins in a cell or organism. In this thesis, I conducted molecular evolutionary projects using data provided by structural genomics and proteomics. First, I used protein structure information to explain why some human-disease associated amino acid residues (DARs) appear as the wild-type in other species. Because destabilizing protein structures is a primary reason why DARs are deleterious, I focused on protein stability and discovered that, in species where a DAR represents the wild-type, the destabilizing effect of the DAR is generally lessened by the observed amino acid substitutions in the spatial proximity of the DAR. This finding of compensatory residue substitutions has important implications for understanding epistasis in protein evolution. Second, the recently published human proteomes include peptides encoded by annotated pseudogenes, which are relics of formerly functional genes. These translated pseudogenes may actually be functional and subject to purifying selection. Alternatively, their translations may be accidental and do not indicate functionality. My analysis suggests that a sizable fraction of the translated pseudogenes are subject to purifying selection acting at the protein level. Third, for the purpose of understanding protein evolution and structure-function relationships, protein structures are classified according to their structure similarities. A fold encompasses protein structures with similar core topologies. Current fold classifications implicitly assume that folds are discrete islands in the protein structure space, whereas increasing evidence supports a continuous fold space. I developed a likelihood method to classify structures into existing folds by considering the continuity in fold space. My results using this method demonstrated the growing importance of considering this continuity in fold classification. Together, my work illustrated the utility of structural genomics and proteomics in answering evolutionary questions and provided better understanding of gene and protein evolution.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113597/1/jinruixu_1.pd
Recommended from our members
Protein Fold Recognition Using Neural Networks
To predict accurately the three-dimensional (3D) structures of proteins from their amino acid sequences alone remains a challenging problem. However, using protein fold recognition tools, it is often possible to achieve good models or at least to gain some more information, to aid scientists in their research. This thesis describes development of TUNE (Threading Using Neural Networks), a fold recognition program using artificial neural network (ANN) models. A new method to generate amino acid substitution matrices is described in chapter two. It uses an ANN to generalise amino acid substitutions observed in protein structure alignments. Matrices for alignment scoring from this approach were compared with classic alignment scoring schemes. From these neural network models, a series of encoding schemes were constructed. These schemes describe the amino acid types with a few numbers. They were generated to replace the orthogonal encoding scheme, so that smaller, faster and more accurate neural network models can be applied on bioinformatic problems. The TUNE model was introduced in chapter four to measure protein sequence-structure compatibility. Given the integrated residue structural environment descriptions, the model predicts probabilities of observing amino acid types in such environments. Using this model, a scoring function to measure the fitness of a residue in a protein structure model can be made for protein threading programs. The model in chapter two was extended by including the residue structural environment descriptions for predictions. A simple protein fold recognition program with a dynamic programming algorithm was developed using this model. The program was then tested in the fourth round of the Critical Assessment of protein Structure Prediction methods (CASP4) and produced reasonably good results
An automatic method for assessing structural importance of amino acid positions
Background: A great deal is known about the qualitative aspects of the sequence-structure relationship, for example that buried residues are usually more conserved between structurally similar homologues, but no attempts have been made to quantitate the relationship between evolutionary conservation at a sequence position and change to global tertiary structure. In this paper we demonstrate that the Spearman correlation between sequence and structural change is suitable for this purpose.
Results:
Buried residues, bends, cysteines, prolines and leucines were significantly more likely to occupy positions highly correlated with structural change than expected by chance. Some buried residues were found to be less informative than expected, particularly residues involved in active sites and the binding of small molecules.
Conclusion:
The correlation-based method generates predictions of structural importance for superfamily positions which agree well with previous results of manual analyses, and may be of use in automated residue annotation piplines. A PERL script which implements the method is provided
Advancing the accuracy of protein fold recognition by utilizing profiles from hidden Markov models
Protein fold recognition is an important step towards
solving protein function and tertiary structure prediction problems. Among a wide range of approaches proposed to solve this problem, pattern recognition based techniques have achieved the best results. The most effective pattern recognition-based techniques for solving this problem have been based on extracting evolutionary-based features. Most studies have relied on thePosition Specific Scoring Matrix (PSSM) to extract these features. However it is known that profile-profile sequence alignment techniques can identify more remote
homologs than sequence-profile approaches like PSIBLAST. In this study we use a profile-profile sequence alignment technique, namely HHblits, to extract HMM profiles.We will show that unlike previous studies, using the HMM profile to extract evolutionary information can significantly enhance the protein fold prediction accuracy. We develop a new pattern recognition based system called HMMFold which extracts HMM based evolutionary information and captures remote homology information better than previous studies. Using
HMMFold we achieve up to 93.8% and 86.0% prediction accuracies when the sequential similarity rates are less than 40% and 25%, respectively. These results are up to 10% better than previously reported results for this task. Our results show significant enhancement especially for benchmarks with sequential similarity as low as 25% which highlights the effectiveness of HMMFold to address
this problem and its superiority over previously proposed approaches found in the literature
New encouraging developments in contact prediction: Assessment of the CASP11 results
This article provides a report on the state-of-the-art in the prediction of intra-molecular residue-residue contacts in proteins
based on the assessment of the predictions submitted to the CASP11 experiment. The assessment emphasis is placed on the
accuracy in predicting long-range contacts. Twenty-nine groups participated in contact prediction in CASP11. At least eight
of them used the recently developed evolutionary coupling techniques, with the top group (CONSIP2) reaching precision of
27% on target proteins that could not be modeled by homology. This result indicates a breakthrough in the development of
methods based on the correlated mutation approach. Successful prediction of contacts was shown to be practically helpful
in modeling three-dimensional structures; in particular target T0806 was modeled exceedingly well with accuracy not yet
seen for ab initio targets of this size (>250 residues
- …