Search CORE

58 research outputs found

Maximum entropy models for antibody diversity

Author: A. M. Walczak
C. G. Callan
Cordes
Halabi
Hanggi
Hozumi
Lieschke
Pal
Russ
Schneidman
Schneidman
Seno
Socolich
T. Mora
Tang
W. Bialek
Weinstein
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 28/12/2009
Field of study

Recognition of pathogens relies on families of proteins showing great diversity. Here we construct maximum entropy models of the sequence repertoire, building on recent experiments that provide a nearly exhaustive sampling of the IgM sequences in zebrafish. These models are based solely on pairwise correlations between residue positions, but correctly capture the higher order statistical properties of the repertoire. Exploiting the interpretation of these models as statistical physics problems, we make several predictions for the collective properties of the sequence ensemble: the distribution of sequences obeys Zipf's law, the repertoire decomposes into several clusters, and there is a massive restriction of diversity due to the correlations. These predictions are completely inconsistent with models in which amino acid substitutions are made independently at each site, and are in good agreement with the data. Our results suggest that antibody diversity is not limited by the sequences encoded in the genome, and may reflect rapid adaptation to antigenic challenges. This approach should be applicable to the study of the global properties of other protein families

arXiv.org e-Print Archive

Crossref

PubMed Central

Beyond inverse Ising model: structure of the analytical solution for a class of inverse problems

Author: A. Braunstein
A. Tarantola
D. Ackley
D. Lachapelle de
E. Aurell
E. Aurell
E. Jaynes
E. Jaynes
E. Marinari
E. Moro
E. Schneidman
F. Lillo
F. Ricci-Tersenghi
G. Gori
H. Kappen
H. Nguyen
Iacopo Mastromatteo
J. Shlens
M. Bailly-Bechet
M. Mézard
M. Mézard
M. Socolich
M. Wainwright
M. Wainwright
M. Weigt
M. Welling
S. Cocco
S. Cocco
S. Cocco
T. Cover
T. Tanaka
V. Sessak
Y. Roudi
Y. Roudi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/09/2012
Field of study

I consider the problem of deriving couplings of a statistical model from measured correlations, a task which generalizes the well-known inverse Ising problem. After reminding that such problem can be mapped on the one of expressing the entropy of a system as a function of its corresponding observables, I show the conditions under which this can be done without resorting to iterative algorithms. I find that inverse problems are local (the inverse Fisher information is sparse) whenever the corresponding models have a factorized form, and the entropy can be split in a sum of small cluster contributions. I illustrate these ideas through two examples (the Ising model on a tree and the one-dimensional periodic chain with arbitrary order interaction) and support the results with numerical simulations. The extension of these methods to more general scenarios is finally discussed.Comment: 15 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Hybrid approaches for the detection of networks of critical residues involved in functional motions in protein families

Author: D Armenta-Medina
Dagoberto Armenta-Medina
E Eyal
Ernesto Perez-Rueda
M Socolich
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Pairwise maximum entropy models for studying large biological systems: when they can and when they can't work

Author: A Tang
C Shannon
D Johnson
D Mastronarde
D Ts'o
E Schneidman
E Vargas-Madrazo
F Rieke
H Lancaster
H Lancaster
J Eisenberg
J Nelson
J Oates
J Shlens
J Shlens
K Dill
M Bethge
M Socolich
N Friedman
N Slonim
O Sarmanov
O Sarmanov
Olaf Sporns
Peter E. Latham
R Bahadur
R Wrangham
S Amari
S DeVries
S Kullback
S Lockless
S Nirenberg
S Yu
Sheila Nirenberg
T Cover
V Sessak
W Russ
Y Dan
Yasser Roudi
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 06/11/2008
Field of study

One of the most critical problems we face in the study of biological systems is building accurate statistical descriptions of them. This problem has been particularly challenging because biological systems typically contain large numbers of interacting elements, which precludes the use of standard brute force approaches. Recently, though, several groups have reported that there may be an alternate strategy. The reports show that reliable statistical models can be built without knowledge of all the interactions in a system; instead, pairwise interactions can suffice. These findings, however, are based on the analysis of small subsystems. Here we ask whether the observations will generalize to systems of realistic size, that is, whether pairwise models will provide reliable descriptions of true biological systems. Our results show that, in most cases, they will not. The reason is that there is a crossover in the predictive power of pairwise models: If the size of the subsystem is below the crossover point, then the results have no predictive power for large systems. If the size is above the crossover point, the results do have predictive power. This work thus provides a general framework for determining the extent to which pairwise models can be used to predict the behavior of whole biological systems. Applied to neural data, the size of most systems studied so far is below the crossover point

arXiv.org e-Print Archive

Public Library of Science (PLOS)

CiteSeerX

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

Identifying and Seeing beyond Multiple Sequence Alignment Errors Using Intra-Molecular Protein Covariation

Author: A Löytynoja
A Marchler-Bauer
Andrew D. Fernandes
BP Kleinstiver
C Floudas
C Kim
C Yanofsky
CM Buslje
CW Hogue
Darren P. Martin
DD Pollock
DY Little
ERM Tillier
F Pazos
G Shackelford
GB Gloor
GB Gloor
Gregory B. Gloor
JD Thompson
KM Wong
KR Wollenberg
KY Yip
LC Martin
Lindi M. Wahl
M Socolich
MA Fares
O Gotoh
R Kolodny
R Oliveira
RC Edgar
Russell J. Dickson
S Dunn
SAA Travers
SW Lockless
WM Fitch
WR Atchley
Publication venue: Public Library of Science
Publication date: 28/06/2010
Field of study

BACKGROUND: There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency of gaps, and the probable correct gap placement. Covariation analysis is used to find putatively important residue pairs in a sequence alignment. Different alignments of the same protein family give different results demonstrating that covariation depends on the quality of the sequence alignment. We thus hypothesized that current criteria are insufficient to build alignments for use with covariation analyses. METHODOLOGY/PRINCIPAL FINDINGS: We show that current criteria are insufficient to build alignments for use with covariation analyses as systematic sequence alignment errors are present even in hand-curated structure-based alignment datasets like those from the Conserved Domain Database. We show that current non-parametric covariation statistics are sensitive to sequence misalignments and that this sensitivity can be used to identify systematic alignment errors. We demonstrate that removing alignment errors due to 1) improper structure alignment, 2) the presence of paralogous sequences, and 3) partial or otherwise erroneous sequences, improves contact prediction by covariation analysis. Finally we describe two non-parametric covariation statistics that are less sensitive to sequence alignment errors than those described previously in the literature. CONCLUSIONS/SIGNIFICANCE: Protein alignments with errors lead to false positive and false negative conclusions (incorrect assignment of covariation and conservation, respectively). Covariation analysis can provide a verification step, independent of traditional criteria, to identify systematic misalignments in protein alignments. Two non-parametric statistics are shown to be somewhat insensitive to misalignment errors, providing increased confidence in contact prediction when analyzing alignments with erroneous regions because of an emphasis on they emphasize pairwise covariation over group covariation

Public Library of Science (PLOS)

Scholarship@Western

Crossref

Directory of Open Access Journals

PubMed Central

Correlated Evolution of Nearby Residues in Drosophilid Proteins

Author: A Eyre-Walker
A Tanay
AFY Poon
AL Hughes
Benjamin Callahan
BH Davis
Boris I. Shraiman
C Branden
C Chothia
CH Yeang
CW Birky
D Karolchik
DA Kirby
DG Consortium
DJ Begun
DM Weinreich
Doris Bachtrog
E Neher
EA Ortlund
G Sella
GA Bazykin
GA Bazykin
Gil McVean
HA Orr
HRB Olivier Lichtarge
J Hey
J Wang
JA Shapiro
JC Fay
JC Whisstock
JH Gillespie
JH McDonald
JM Smith
K Fukami-Kobayashi
K Ridout
KR Takahasi
L Burger
LM Colgin
M Kimura
M Nei
M Slatkin
M Socolich
M Zvelebil
MV Meer
NGC Smith
NH Barton
P Andolfatto
P Andolfatto
Peter Andolfatto
Q Wang
R Kulathinal
Richard A. Neher
S Schwartz
SW Lockless
T Ohta
W Fitch
W Stephan
WG Hill
WR Rice
Z Yang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Here we investigate the correlations between coding sequence substitutions as a function of their separation along the protein sequence. We consider both substitutions between the reference genomes of several Drosophilids as well as polymorphisms in a population sample of Zimbabwean Drosophila melanogaster. We find that amino acid substitutions are “clustered” along the protein sequence, that is, the frequency of additional substitutions is strongly enhanced within ≈10 residues of a first such substitution. No such clustering is observed for synonymous substitutions, supporting a “correlation length” associated with selection on proteins as the causative mechanism. Clustering is stronger between substitutions that arose in the same lineage than it is between substitutions that arose in different lineages. We consider several possible origins of clustering, concluding that epistasis (interactions between amino acids within a protein that affect function) and positional heterogeneity in the strength of purifying selection are primarily responsible. The role of epistasis is directly supported by the tendency of nearby substitutions that arose on the same lineage to preserve the total charge of the residues within the correlation length and by the preferential cosegregation of neighboring derived alleles in our population sample. We interpret the observed length scale of clustering as a statistical reflection of the functional locality (or modularity) of proteins: amino acids that are near each other on the protein backbone are more likely to contribute to, and collaborate toward, a common subfunction

Crossref

Directory of Open Access Journals

edoc

PubMed Central

Probing the Mutational Interplay between Primary and Promiscuous Protein Functions: A Computational-Experimental Approach

Protein promiscuity is of considerable interest due its role in adaptive metabolic plasticity, its fundamental connection with molecular evolution and also because of its biotechnological applications. Current views on the relation between primary and promiscuous protein activities stem largely from laboratory evolution experiments aimed at increasing promiscuous activity levels. Here, on the other hand, we attempt to assess the main features of the simultaneous modulation of the primary and promiscuous functions during the course of natural evolution. The computational/experimental approach we propose for this task involves the following steps: a function-targeted, statistical coupling analysis of evolutionary data is used to determine a set of positions likely linked to the recruitment of a promiscuous activity for a new function; a combinatorial library of mutations on this set of positions is prepared and screened for both, the primary and the promiscuous activities; a partial-least-squares reconstruction of the full combinatorial space is carried out; finally, an approximation to the Pareto set of variants with optimal primary/promiscuous activities is derived. Application of the approach to the emergence of folding catalysis in thioredoxin scaffolds reveals an unanticipated scenario: diverse patterns of primary/promiscuous activity modulation are possible, including a moderate (but likely significant in a biological context) simultaneous enhancement of both activities. We show that this scenario can be most simply explained on the basis of the conformational diversity hypothesis, although alternative interpretations cannot be ruled out. Overall, the results reported may help clarify the mechanisms of the evolution of new functions. From a different viewpoint, the partial-least-squares-reconstruction/Pareto-set-prediction approach we have introduced provides the computational basis for an efficient directed-evolution protocol aimed at the simultaneous enhancement of several protein features and should therefore open new possibilities in the engineering of multi-functional enzymes

Public Library of Science (PLOS)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

Repositorio Institucional Universidad de Granada

FigShare

Are biological systems poised at criticality?

Author: A. Cavagna
A. Cavagna
A. Clauset
A. Herz
A. Horovitz
A. Schug
A. Tang
A.J. Hudspeth
A.M. Litke
B. Lunt
C. Adami
C. Haldeman
C.B. Anfinsen
D. Ackley
D. Bishop
D.J. Amit
D.J. Krause
D.M. Chen
D.T. Kemp
E. Schneidman
E. Schneidman
E.T. Jaynes
E.T. Jaynes
F. Auerbach
F. Rieke
G. Yule
G.K. Zipf
H. Watson
H. Wässle
H.S. Seung
I. Giardina
I.E. Ohiorhenuan
I.E. Ohiorhenuan
J. Chu
J. Hertz
J. Shlens
J. Shlens
J. Toner
J. Toner
J.A. Weinstein
J.B. Keller
J.J. Hopfield
J.J. Hopfield
J.M. Beggs
J.M. Beggs
J.M. Beggs
J.M. Cullen
K. Huang
K.P. Murphy
M. Ballerini
M. Ballerini
M. Magnasco
M. Meister
M. Mézard
M. Mézard
M. Ospeck
M. Socolich
M. Usher
M. Weigt
M.E.J. Newman
M.H. Cordes
M.O. Magnasco
N. Halabi
P. Bak
P. Bak
P. Bak
P. Bak
P. Zurek
R. Durbin
R. Segev
S. Camalet
S. Cocco
S. Gould
S. Yu
S. Zapperi
S.L. Veatch
S.W. Lockless
T. Duke
T. Gold
T. Mora
T. Vicsek
T.E. Harris
T.M. Cover
Thierry Mora
V. Daggett
V. Sessak
V.M. Eguíluz
W. Chen
W.P. Russ
William Bialek
Y. Choe
Á. Corral
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/12/2010
Field of study

Many of life's most fascinating phenomena emerge from interactions among many elements--many amino acids determine the structure of a single protein, many genes determine the fate of a cell, many neurons are involved in shaping our thoughts and memories. Physicists have long hoped that these collective behaviors could be described using the ideas and methods of statistical mechanics. In the past few years, new, larger scale experiments have made it possible to construct statistical mechanics models of biological systems directly from real data. We review the surprising successes of this "inverse" approach, using examples form families of proteins, networks of neurons, and flocks of birds. Remarkably, in all these cases the models that emerge from the data are poised at a very special point in their parameter space--a critical point. This suggests there may be some deeper theoretical principle behind the behavior of these diverse systems.Comment: 21 page

arXiv.org e-Print Archive

Crossref

Interrogating and Predicting Tolerated Sequence Diversity in Protein Folds: Application to E. elaterium Trypsin Inhibitor-II Cystine-Knot Miniprotein

Author: A Christmann
A Heitz
A Skerra
A Wentzel
AA Fodor
AD Nagi
Adam P. Silverman
AP Silverman
AR Ortiz
B Szenthe
D Le Nguyen
D Le-Nguyen
D Le-Nguyen
DJ Rodi
DJ Rodi
DS Gill
ET Boder
EV Shusta
F Pazos
G Chao
H Kolmar
HK Binz
I Kass
IN Shindyalov
J Gracy
J Reina
J Silverman
James M. Briggs
JC Gelly
Jennifer L. Lahti
Jennifer R. Cochran
JM Kowalski
JP Dekker
JR Cochran
K Hilpert
L Ellgaard
L Makowski
L Xu
LR Helms
M Andersson
M Socolich
MA Larkin
MC Kieke
MH Parker
ML Colgrave
NG Hoffman
O Olmea
P Colas
P Escoubas
P Fariselli
PD Holler
R Baggio
R Kratzner
RH Kimura
S Krause
S Mandava
S Park
S Reiss
SW Lockless
T Hey
U Gobel
W Ji
WP Russ
WR Atchley
XM Chen
Publication venue: Public Library of Science
Publication date: 01/09/2009
Field of study

Cystine-knot miniproteins (knottins) are promising molecular scaffolds for protein engineering applications. Members of the knottin family have multiple loops capable of displaying conformationally constrained polypeptides for molecular recognition. While previous studies have illustrated the potential of engineering knottins with modified loop sequences, a thorough exploration into the tolerated loop lengths and sequence space of a knottin scaffold has not been performed. In this work, we used the Ecballium elaterium trypsin inhibitor II (EETI) as a model member of the knottin family and constructed libraries of EETI loop-substituted variants with diversity in both amino acid sequence and loop length. Using yeast surface display, we isolated properly folded EETI loop-substituted clones and applied sequence analysis tools to assess the tolerated diversity of both amino acid sequence and loop length. In addition, we used covariance analysis to study the relationships between individual positions in the substituted loops, based on the expectation that correlated amino acid substitutions will occur between interacting residue pairs. We then used the results of our sequence and covariance analyses to successfully predict loop sequences that facilitated proper folding of the knottin when substituted into EETI loop 3. The sequence trends we observed in properly folded EETI loop-substituted clones will be useful for guiding future protein engineering efforts with this knottin scaffold. Furthermore, our findings demonstrate that the combination of directed evolution with sequence and covariance analyses can be a powerful tool for rational protein engineering

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Correlated Mutations: A Hallmark of Phenotypic Amino Acid Substitutions

Author: A Bairoch
A Fuchs
A Hamosh
A Lapedes
A Lupi
A Tanoue
A Tanoue
AA Fodor
Andreas Kowarsch
Angelika Fuchs
BC Lee
C von Mering
D Altschuh
D Altschuh
D Vitkup
DD Pollock
DD Pollock
Dmitrij Frishman
EE Winter
F Endo
F Pazos
GB Gloor
H Huang
HM Berman
I Feldman
I Kass
IN Shindyalov
JG Caporaso
LC Martin
M Krzywinski
M Socolich
MH Knaggs
MS Singer
N Lopez-Bigas
NGC Smith
O Noivirt
O Noivirt-Brik
O Olmea
O Olmea
P Fariselli
P Ledoux
P Tuffery
P Wong
PC Ng
PC Ng
PD Stenson
Philipp Pagel
PJ Kundrotas
RC Edgar
RE Steward
RR Gutell
S Henikoff
S Sunyaev
S Vicatos
SAA Travers
SD Dunn
SK Ng
SM Larson
T Hershkovitz
Thomas Lengauer
U Göbel
V Ramensky
W Kabsch
WP Russ
WR Taylor
ZO Wang
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Point mutations resulting in the substitution of a single amino acid can cause severe functional consequences, but can also be completely harmless. Understanding what determines the phenotypical impact is important both for planning targeted mutation experiments in the laboratory and for analyzing naturally occurring mutations found in patients. Common wisdom suggests using the extent of evolutionary conservation of a residue or a sequence motif as an indicator of its functional importance and thus vulnerability in case of mutation. In this work, we put forward the hypothesis that in addition to conservation, co-evolution of residues in a protein influences the likelihood of a residue to be functionally important and thus associated with disease. While the basic idea of a relation between co-evolution and functional sites has been explored before, we have conducted the first systematic and comprehensive analysis of point mutations causing disease in humans with respect to correlated mutations. We included 14,211 distinct positions with known disease-causing point mutations in 1,153 human proteins in our analysis. Our data show that (1) correlated positions are significantly more likely to be disease-associated than expected by chance, and that (2) this signal cannot be explained by conservation patterns of individual sequence positions. Although correlated residues have primarily been used to predict contact sites, our data are in agreement with previous observations that (3) many such correlations do not relate to physical contacts between amino acid residues. Access to our analysis results are provided at http://webclu.bio.wzw.tum.de/~pagel/supplements/correlated-positions/

Crossref

Directory of Open Access Journals

PubMed Central

PuSH