Search CORE

20 research outputs found

Graphical Models of Residue Coupling in Protein Families

Author: Bailey-Kellogg Chris
Ramakrishnan Naren
Thomas John
Publication venue: Dartmouth Digital Commons
Publication date: 01/03/2005
Field of study

Identifying residue coupling relationships within a protein family can provide important insights into intrinsic molecular processes, and has significant applications in modeling structure and dynamics, understanding function, and designing new or modified proteins. We present the first algorithm to infer an undirected graphical model representing residue coupling in protein families. Such a model serves as a compact description of the joint amino acid distribution, and can be used for predictive (will this newly designed protein be folded and functional?), diagnostic (why is this protein not stable or functional?), and abductive reasoning (what if I attempt to graft features of one protein family onto another?). Unlike current correlated mutation algorithms that are focused on assessing dependence, which can conflate direct and indirect relationships, our algorithm focuses on assessing independence, which modularizes variation and thus enables efficient reasoning of the types described above. Further, our algorithm can readily incorporate, as priors, hypotheses regarding possible underlying mechanistic/energetic explanations for coupling. The resulting approach constitutes a powerful and discriminatory mechanism to identify residue coupling from protein sequences and structures. Analysis results on the G-protein coupled receptor (GPCR) and PDZ domain families demonstrate the ability of our approach to effectively uncover and exploit models of residue coupling

Dartmouth Digital Commons (Dartmouth College)

HORI: a web server to compute Higher Order Residue Interactions in protein structures

Author: Gakkhar Sunita
Shameer Khader
Sowdhamini Ramanathan
Sreenivasan Raashi
Sundaramurthy Pandurangan
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: Folding of a protein into its three dimensional structure is influenced by both local and global interactions within a protein. Higher order residue interactions, like pairwise, triplet and quadruplet ones, play a vital role in attaining the stable conformation of the protein structure. It is generally agreed that higher order interactions make significant contribution to the potential energy landscape of folded proteins and therefore it is important to identify them to estimate their contributions to overall stability of a protein structure. Results: We developed HORI [Higher order residue interactions in proteins], a web server for the calculation of global and local higher order interactions in protein structures. The basic algorithm of HORI is designed based on the classical concept of four-body nearest-neighbour propensities of amino-acid residues. It has been proved that higher order residue interactions up to the level of quadruple interactions plays a major role in the three-dimensional structure of proteins and is an important feature that can be used in protein structure analysis. Conclusion: HORI server will be a useful resource for the structural bioinformatics community to perform analysis on protein structures based on higher order residue interactions. HORI server is a highly interactive web server designed in three modules that enables the user to analyse higher order residue interactions in protein structures. HORI server is available from the URL: http://caps. ncbs.res.in/hori

Crossref

Springer - Publisher Connector

PubMed Central

SMURFLite: combining simplified Markov random fields with simulated evolution improves remote homology detection for beta-structural proteins into the twilight zone

Author: B. Berger
Berman
Bradley
Cowen
Eddy
Karplus
L. J. Cowen
Lathrop
Lifson
Liu
Menke
Murzin
N. M. Daniels
Olmea
R. Hosur
Sayle
Smyth
Soding
White
Zhang
Zhang
Zhu
Publication venue: Oxford University Press
Publication date: 01/03/2012
Field of study

Motivation: One of the most successful methods to date for recognizing protein sequences that are evolutionarily related has been profile hidden Markov models (HMMs). However, these models do not capture pairwise statistical preferences of residues that are hydrogen bonded in beta sheets. These dependencies have been partially captured in the HMM setting by simulated evolution in the training phase and can be fully captured by Markov random fields (MRFs). However, the MRFs can be computationally prohibitive when beta strands are interleaved in complex topologies. We introduce SMURFLite, a method that combines both simplified MRFs and simulated evolution to substantially improve remote homology detection for beta structures. Unlike previous MRF-based methods, SMURFLite is computationally feasible on any beta-structural motif

DSpace@MIT

Crossref

PubMed Central

Protein Design by Mining and Sampling an Undirected Graphical Model of Evolutionary Constraints

Author: Bailey-Kellogg Chris
Ramakrishnan Naren
Thomas John
Publication venue: Dartmouth Digital Commons
Publication date: 01/03/2007
Field of study

Evolutionary pressures on proteins to maintain structure and function have constrained their sequences over time and across species. The sequence record thus contains valuable information regarding the acceptable variation and covariation of amino acids in members of a protein family. When designing new members of a protein family, with an eye toward modified or improved stability or functionality, it is incumbent upon a protein engineer to uncover such constraints and design conforming sequences. This paper develops such an approach for protein design: we first mine an undirected probabilistic graphical model of a given protein family, and then use the model generatively to sample new sequences. While sampling from an undirected model is difficult in general, we present two complementary algorithms that effectively sample the sequence space constrained by our protein family model. One algorithm focuses on the high-likelihood regions of the space. Sequences are generated by sampling the cliques in a graphical model according to their likelihood while maintaining neighborhood consistency. The other algorithm designs a fixed number of high-likelihood sequences that are reflective of the amino acid composition of the given family. A set of shuffled sequences is iteratively improved so as to increase their mean likelihood under the model. Tests for two important protein families, WW domains and PDZ domains, show that both sampling methods converge quickly and generate diverse high-quality sets of sequences for further biological study

Dartmouth Digital Commons (Dartmouth College)

An evolutionary analysis of cAMP-specific Phosphodiesterase 4 alternative splicing

Author: Danziger Robert S
Johnson Keven R
Nicodemus-Johnson Jessie
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Cyclic nucleotide phosphodiesterases (PDEs) hydrolyze the intracellular second messengers: cyclic adenosine monophosphate (cAMP) and cyclic guanine monophosphate (cGMP). The cAMP-specific PDE family 4 (PDE4) is widely expressed in vertebrates. Each of the four PDE4 gene isoforms (PDE4 A-D) undergo extensive alternative splicing via alternative transcription initiation sites, producing unique amino termini and yielding multiple splice variant forms from each gene isoform termed long, short, super-short and truncated super-short. Many species across the vertebrate lineage contain multiple splice variants of each gene type, which are characterized by length and amino termini. Results A phylogenetic approach was used to visualize splice variant form genesis and identify conserved splice variants (genome conservation with EST support) across the vertebrate taxa. Bayesian and maximum likelihood phylogenetic inference indicated PDE4 gene duplication occurred at the base of the vertebrate lineage and reveals additional gene duplications specific to the teleost lineage. Phylogenetic inference and PDE4 splice variant presence, or absence as determined by EST screens, were further supported by the genomic analysis of select vertebrate taxa. Two conserved PDE4 long form splice variants were found in each of the PDE4A, PDE4B, and PDE4C genes, and eight conserved long forms from the PDE4 D gene. Conserved short and super-short splice variants were found from each of the PDE4A, PDE4B, and PDE4 D genes, while truncated super-short variants were found from the PDE4C and PDE4 D genes. PDE4 long form splice variants were found in all taxa sampled (invertebrate through mammals); short, super-short, and truncated super-short are detected primarily in tetrapods and mammals, indicating an increasing complexity in both alternative splicing and cAMP metabolism through vertebrate evolution. Conclusions There was a progressive independent incorporation of multiple PDE4 splice variant forms and amino termini, increasing PDE4 proteome complexity from primitive vertebrates to humans. While PDE4 gene isoform duplicates with limited alternative splicing were found in teleosts, an expansion of both PDE4 splice variant forms, and alternatively spliced amino termini predominantly occurs in mammals. Since amino termini have been linked to intracellular targeting of the PDE4 enzymes, the conservation of amino termini in PDE4 splice variants in evolution highlights the importance of compartmentalization of PDE4-mediated cAMP hydrolysis.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Limitations of Protein Structure Prediction Algorithms in Therapeutic Protein Development

Author: Mariam Zamara
Niazi Sarfaraz K.
Paracha Rehan Z.
Publication venue
Publication date: 01/01/2024
Field of study

The three-dimensional protein structure is pivotal in comprehending biological phenomena. It directly governs protein function and hence aids in drug discovery. The development of protein prediction algorithms, such as AlphaFold2, ESMFold, and trRosetta, has given much hope in expediting protein-based therapeutic discovery. Though no study has reported a conclusive application of these algorithms, the efforts continue with much optimism. We intended to test the application of these algorithms in rank-ordering therapeutic proteins for their instability during the pre-translational modification stages, as may be predicted according to the confidence of the structure predicted by these algorithms. The selected molecules were based on a harmonized category of licensed therapeutic proteins; out of the 204 licensed products, 188 that were not conjugated were chosen for analysis, resulting in a lack of correlation between the confidence scores and structural or protein properties. It is crucial to note here that the predictive accuracy of these algorithms is contingent upon the presence of the known structure of the protein in the accessible database. Consequently, our conclusion emphasizes that these algorithms primarily replicate information derived from existing structures. While our findings caution against relying on these algorithms for drug discovery purposes, we acknowledge the need for a nuanced interpretation. Considering their limitations and recognizing that their utility may be constrained to scenarios where known structures are available is important. Hence, caution is advised when applying these algorithms to characterize various attributes of therapeutic proteins without the support of adequate structural information. It is worth noting that the two main algorithms, AlfphaFold2 and ESMFold, also showed a 72% correlation in their scores, pointing to similar limitations. While much progress has been made in computational sciences, the Levinthal paradox remains unsolved

Directory of Open Access Journals

Coventry University Pure Portal

Correlated mutations via regularized multinomial regression

Author: Sreekumar Janardanan
ter Braak Cajo JF
van Dijk Aalt DJ
van Ham Roeland CHJ
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background In addition to sequence conservation, protein multiple sequence alignments contain evolutionary signal in the form of correlated variation among amino acid positions. This signal indicates positions in the sequence that influence each other, and can be applied for the prediction of intra- or intermolecular contacts. Although various approaches exist for the detection of such correlated mutations, in general these methods utilize only pairwise correlations. Hence, they tend to conflate direct and indirect dependencies. Results We propose RMRCM, a method for Regularized Multinomial Regression in order to obtain Correlated Mutations from protein multiple sequence alignments. Importantly, our method is not restricted to pairwise (column-column) comparisons only, but takes into account the network nature of relationships between protein residues in order to predict residue-residue contacts. The use of regularization ensures that the number of predicted links between columns in the multiple sequence alignment remains limited, preventing overprediction. Using simulated datasets we analyzed the performance of our approach in predicting residue-residue contacts, and studied how it is influenced by various types of noise. For various biological datasets, validation with protein structure data indicates a good performance of the proposed algorithm for the prediction of residue-residue contacts, in comparison to previous results. RMRCM can also be applied to predict interactions (in addition to only predicting interaction sites or contact sites), as demonstrated by predicting PDZ-peptide interactions. Conclusions A novel method is presented, which uses regularized multinomial regression in order to obtain correlated mutations from protein multiple sequence alignments

Springer - Publisher Connector

PubMed Central

A structural biology community assessment of AlphaFold2 applications

Author: Akdel Mehmet
Ascher David B.
Basquin Jérôme
Bateman Alex
Beltrao Pedro
Borkakoti Neera
Bryant Patrick
Burke David
Croll Tristan I.
Davey Norman E.
Dunham Alistair S.
Durairaj Janani
Elofsson Arne
Frost Adam
Good Lydia L.
Jänes Jürgen
Kajava Andrey V.
Kundrotas Petras
Laskowski Roman A.
Lindorff-Larsen Kresten
Mészáros Bálint
Ovchinnikov Sergey
Pardo Eduard Porta
Pires Douglas E. V.
Pozzati Gabriele
Rodrigues Carlos H. M.
Serra Victoria Ruiz
Shenoy Aditi
Stein Amelie
Thornton Janet M.
Valencia Alfonso
Velankar Sameer
Zalevsky Arthur O.
Zhu Wensi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods for protein structure predictions have reached the accuracy of experimentally determined models. Although this has been independently verified, the implementation of these methods across structural-biology applications remains to be tested. Here, we evaluate the use of AlphaFold2 (AF2) predictions in the study of characteristic structural elements; the impact of missense variants; function and ligand binding site predictions; modeling of interactions; and modeling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modeled when compared with homology modeling, identifying structural features rarely seen in the Protein Data Bank. AF2-based predictions of protein disorder and complexes surpass dedicated tools, and AF2 models can be used across diverse applications equally well compared with experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life-science research

edoc

PubMed Central

MPG.PuRe

Computing Highly Correlated Positions Using Mutual Information and Graph Theory for G Protein-Coupled Receptors

Author: A Pagano
AA Ivanov
AK Ramani
AR Ortiz
B Galitsky
BTM Korber
C Goh
C Hemmerich
C Yeang
Carson C. Chow
CD Strader
CE Shannon
CJ Harris
CS Sum
D Altschuh
DD Pollock
DD Pollock
DKY Chiu
DM Rosenbaum
E Neher
F Horn
F Knoflach
F Pazos
F Pazos
FY Carroll
G Casari
G Kleinau
G Kleinau
G Suel
G Swaminath
GB Gloor
H Herzel
H Jaschke
I Halperin
I Kass
IG Tikhonova
IN Shindyalov
J Dutheil
J Kim
J Thomas
JA Ballesteros
JA Ballesteros
JA Capra
JE Donald
JE Donald
JE Donald
JS Surgand
JW Kelly
JX Hu
K Palczewski
K Ray
K Ray
K Sjolander
K Ye
K Ye
KD Pruitt
KL Pierce
KY Yip
L Lewyn
L Oliveira
L Oliveira
L Oliveira
L Pritchard
LA Mirny
LC Martin
LH Heitman
M Raviscioni
M Scarselli
M Socolich
MA Hanson
Matthieu Louis
ME Olah
MJ Buck
ML Lopez-Rodriguez
MS Roulston
MW Dimmic
ND Clarke
NG Hoffman
O Lichtarge
O Lichtarge
O Noivirt
OF Lange
OV Kalinina
PJ Kundrotas
PR Gouldson
R Banerjee
R Brun
R Fredriksson
R Jothi
R Steuer
RI Dima
RM Williamson
RR Gutell
S Chakrabarti
S Costanzi
S Costanzi
S Costanzi
S Costanzi
S Govindarajan
S Litschig
S Madabushi
S Moore
S Moro
S Ohno
S Takeda
Sarosh N. Fatakia
SB Nagl
SB Nagl
SD Dunn
SGF Rasmussen
SJ Fleishman
SS Hannenhalli
Stefano Costanzi
SW Lockless
T Klabunde
T Klabunde
T Sato
T Warne
TD Schneider
TM Cover
V Batageli
V Cherezov
VP Jaakola
WP Russ
WR Atchley
WR Atchley
WR Taylor
Y Liu
Y Qi
Publication venue: Public Library of Science
Publication date: 05/03/2009
Field of study

G protein-coupled receptors (GPCRs) are a superfamily of seven transmembrane-spanning proteins involved in a wide array of physiological functions and are the most common targets of pharmaceuticals. This study aims to identify a cohort or clique of positions that share high mutual information. Using a multiple sequence alignment of the transmembrane (TM) domains, we calculated the mutual information between all inter-TM pairs of aligned positions and ranked the pairs by mutual information. A mutual information graph was constructed with vertices that corresponded to TM positions and edges between vertices were drawn if the mutual information exceeded a threshold of statistical significance. Positions with high degree (i.e. had significant mutual information with a large number of other positions) were found to line a well defined inter-TM ligand binding cavity for class A as well as class C GPCRs. Although the natural ligands of class C receptors bind to their extracellular N-terminal domains, the possibility of modulating their activity through ligands that bind to their helical bundle has been reported. Such positions were not found for class B GPCRs, in agreement with the observation that there are not known ligands that bind within their TM helical bundle. All identified key positions formed a clique within the MI graph of interest. For a subset of class A receptors we also considered the alignment of a portion of the second extracellular loop, and found that the two positions adjacent to the conserved Cys that bridges the loop with the TM3 qualified as key positions. Our algorithm may be useful for localizing topologically conserved regions in other protein families

Public Library of Science (PLOS)

Crossref

PubMed Central