Search CORE

2,913 research outputs found

Historical contingency and entrenchment in protein evolution under purifying selection

Author: McCandlish David M.
Plotkin Joshua B.
Shah Premal
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 15/07/2014
Field of study

The fitness contribution of an allele at one genetic site may depend on alleles at other sites, a phenomenon known as epistasis. Epistasis can profoundly influence the process of evolution in populations under selection, and can shape the course of protein evolution across divergent species. Whereas epistasis between adaptive substitutions has been the subject of extensive study, relatively little is known about epistasis under purifying selection. Here we use mechanistic models of thermodynamic stability in a ligand-binding protein to explore the structure of epistatic interactions between substitutions that fix in protein sequences under purifying selection. We find that the selection coefficients of mutations that are nearly-neutral when they fix are highly contingent on the presence of preceding mutations. Conversely, mutations that are nearly-neutral when they fix are subsequently entrenched due to epistasis with later substitutions. Our evolutionary model includes insertions and deletions, as well as point mutations, and so it allows us to quantify epistasis within each of these classes of mutations, and also to study the evolution of protein length. We find that protein length remains largely constant over time, because indels are more deleterious than point mutations. Our results imply that, even under purifying selection, protein sequence evolution is highly contingent on history and so it cannot be predicted by the phenotypic effects of mutations assayed in the wild-type sequence.Comment: 42 pages, 13 figure

arXiv.org e-Print Archive

Cold Spring Harbor Laboratory Institutional Repository

PubMed Central

Protein 3D Structure Computed from Evolutionary Sequence Variation

Author: A Kryshtafovych
A Roy
A Schug
A Zemla
AA Fodor
AF Poon
AF Poon
Andrea Pagnani
Andrej Sali
AP Kamat
AR Ortiz
AR Ortiz
ASGB Lapedes
AT Brunger
B Reva
BG Giraud
C Chothia
Chris Sander
CS Miller
D Altschuh
D Altschuh
D Cozzetto
DE Kim
DE Shaw
Debora S. Marks
E Neher
E Schneidman
EI Shakhnovich
F Morcos
G Kolesov
H Fehlhammer
HRFB Kappen
IN Shindyalov
J DeBartolo
J Moult
J Moult
J Moult
J Qiu
J Skolnick
JM Duarte
JM Skerker
JS Yang
JW Locasale
KT Simons
L Burger
L Burger
L Holm
Lucy J. Colwell
M Mezard
M Miyano
M Vendruscolo
M Weigt
MMT Mezard
N Halabi
N Siew
P Bradley
P Bradley
P Fariselli
P Joost
PMJW Ravikumar
R Das
R Nair
R Sathyapriya
RD Finn
Riccardo Zecchina
RO Dror
Robert Sheridan
S Raman
S Raman
S Wu
S Wu
S Yooseph
SD Dunn
T Mora
TF Havel
Thomas A. Hopf
TR Lezon
TR Lezon
U Göbel
V Morea
VMR Sessak
WP Russ
WR Atchley
WR Taylor
WR Taylor
Y Duan
Y Zhang
Y Zhang
YJAH Roudi
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Molecular Evolutionary Studies using Structural Genomics and Proteomics.

Author: Xu Jinrui
Publication venue
Publication date: 01/01/2015
Field of study

The field of molecular evolution has progressed with the accumulation of various molecular data. It started with the analysis of protein sequence data, followed by that of gene and genome sequence dada. Recently, structural genomics and proteomics have offered new types of data for addressing molecular evolution questions. Structural genomics refers to genome-wide collection of protein structures, whereas proteomics is the study of all proteins in a cell or organism. In this thesis, I conducted molecular evolutionary projects using data provided by structural genomics and proteomics. First, I used protein structure information to explain why some human-disease associated amino acid residues (DARs) appear as the wild-type in other species. Because destabilizing protein structures is a primary reason why DARs are deleterious, I focused on protein stability and discovered that, in species where a DAR represents the wild-type, the destabilizing effect of the DAR is generally lessened by the observed amino acid substitutions in the spatial proximity of the DAR. This finding of compensatory residue substitutions has important implications for understanding epistasis in protein evolution. Second, the recently published human proteomes include peptides encoded by annotated pseudogenes, which are relics of formerly functional genes. These translated pseudogenes may actually be functional and subject to purifying selection. Alternatively, their translations may be accidental and do not indicate functionality. My analysis suggests that a sizable fraction of the translated pseudogenes are subject to purifying selection acting at the protein level. Third, for the purpose of understanding protein evolution and structure-function relationships, protein structures are classified according to their structure similarities. A fold encompasses protein structures with similar core topologies. Current fold classifications implicitly assume that folds are discrete islands in the protein structure space, whereas increasing evidence supports a continuous fold space. I developed a likelihood method to classify structures into existing folds by considering the continuity in fold space. My results using this method demonstrated the growing importance of considering this continuity in fold classification. Together, my work illustrated the utility of structural genomics and proteomics in answering evolutionary questions and provided better understanding of gene and protein evolution.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113597/1/jinruixu_1.pd

Deep Blue Documents at the University of Michigan

Recommended from our members

Protein Fold Recognition Using Neural Networks

Author: Lin Guang
Publication venue
Publication date: 01/01/2003
Field of study

To predict accurately the three-dimensional (3D) structures of proteins from their amino acid sequences alone remains a challenging problem. However, using protein fold recognition tools, it is often possible to achieve good models or at least to gain some more information, to aid scientists in their research. This thesis describes development of TUNE (Threading Using Neural Networks), a fold recognition program using artificial neural network (ANN) models. A new method to generate amino acid substitution matrices is described in chapter two. It uses an ANN to generalise amino acid substitutions observed in protein structure alignments. Matrices for alignment scoring from this approach were compared with classic alignment scoring schemes. From these neural network models, a series of encoding schemes were constructed. These schemes describe the amino acid types with a few numbers. They were generated to replace the orthogonal encoding scheme, so that smaller, faster and more accurate neural network models can be applied on bioinformatic problems. The TUNE model was introduced in chapter four to measure protein sequence-structure compatibility. Given the integrated residue structural environment descriptions, the model predicts probabilities of observing amino acid types in such environments. Using this model, a scoring function to measure the fitness of a residue in a protein structure model can be made for protein threading programs. The model in chapter two was extended by including the residue structural environment descriptions for predictions. A simple protein fold recognition program with a dynamic programming algorithm was developed using this model. The program was then tested in the fourth round of the Critical Assessment of protein Structure Prediction methods (CASP4) and produced reasonably good results

Open Research Online (The Open University)

OpenGrey Repository

An automatic method for assessing structural importance of amino acid positions

Author: Jones D.T.
Sadowski M.I.
Publication venue
Publication date: 01/01/2009
Field of study

Background: A great deal is known about the qualitative aspects of the sequence-structure relationship, for example that buried residues are usually more conserved between structurally similar homologues, but no attempts have been made to quantitate the relationship between evolutionary conservation at a sequence position and change to global tertiary structure. In this paper we demonstrate that the Spearman correlation between sequence and structural change is suitable for this purpose. Results: Buried residues, bends, cysteines, prolines and leucines were significantly more likely to occupy positions highly correlated with structural change than expected by chance. Some buried residues were found to be less informative than expected, particularly residues involved in active sites and the binding of small molecules. Conclusion: The correlation-based method generates predictions of structural importance for superfamily positions which agree well with previous results of manual analyses, and may be of use in automated residue annotation piplines. A PERL script which implements the method is provided

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

UCL Discovery

PubMed Central

Advancing the accuracy of protein fold recognition by utilizing profiles from hidden Markov models

Author: Dehzangi A.
Heffernan R.
Lyons J.
Paliwal K.K.
Sharma Alokanand
Yang Yuedong
Zhou Y.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Protein fold recognition is an important step towards solving protein function and tertiary structure prediction problems. Among a wide range of approaches proposed to solve this problem, pattern recognition based techniques have achieved the best results. The most effective pattern recognition-based techniques for solving this problem have been based on extracting evolutionary-based features. Most studies have relied on thePosition Specific Scoring Matrix (PSSM) to extract these features. However it is known that profile-profile sequence alignment techniques can identify more remote homologs than sequence-profile approaches like PSIBLAST. In this study we use a profile-profile sequence alignment technique, namely HHblits, to extract HMM profiles.We will show that unlike previous studies, using the HMM profile to extract evolutionary information can significantly enhance the protein fold prediction accuracy. We develop a new pattern recognition based system called HMMFold which extracts HMM based evolutionary information and captures remote homology information better than previous studies. Using HMMFold we achieve up to 93.8% and 86.0% prediction accuracies when the sequential similarity rates are less than 40% and 25%, respectively. These results are up to 10% better than previously reported results for this task. Our results show significant enhancement especially for benchmarks with sequential similarity as low as 25% which highlights the effectiveness of HMMFold to address this problem and its superiority over previously proposed approaches found in the literature

University of the South Pacific Electronic Research Repository

New encouraging developments in contact prediction: Assessment of the CASP11 results

Author: D'ANDREA DANIEL
Fidelis Krzysztof
Kryshtafovych Andriy
Monastyrskyy Bohdan
TRAMONTANO ANNA
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

This article provides a report on the state-of-the-art in the prediction of intra-molecular residue-residue contacts in proteins based on the assessment of the predictions submitted to the CASP11 experiment. The assessment emphasis is placed on the accuracy in predicting long-range contacts. Twenty-nine groups participated in contact prediction in CASP11. At least eight of them used the recently developed evolutionary coupling techniques, with the top group (CONSIP2) reaching precision of 27% on target proteins that could not be modeled by homology. This result indicates a breakthrough in the development of methods based on the correlated mutation approach. Successful prediction of contacts was shown to be practically helpful in modeling three-dimensional structures; in particular target T0806 was modeled exceedingly well with accuracy not yet seen for ab initio targets of this size (>250 residues

PubMed Central

Archivio della ricerca- Università di Roma La Sapienza