Search CORE

4,411 research outputs found

Refining intra-protein contact prediction by graph analysis

Author: Eyal Eran
Frenkel-Morgenstern Milana
Magid Rachel
Pietrokovski Shmuel
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Accurate prediction of intra-protein residue contacts from sequence information will allow the prediction of protein structures. Basic predictions of such specific contacts can be further refined by jointly analyzing predicted contacts, and by adding information on the relative positions of contacts in the protein primary sequence. Results We introduce a method for graph analysis refinement of intra-protein contacts, termed GARP. Our previously presented intra-contact prediction method by means of pair-to-pair substitution matrix (P2PConPred) was used to test the GARP method. In our approach, the top contact predictions obtained by a basic prediction method were used as edges to create a weighted graph. The edges were scored by a mutual clustering coefficient that identifies highly connected graph regions, and by the density of edges between the sequence regions of the edge nodes. A test set of 57 proteins with known structures was used to determine contacts. GARP improves the accuracy of the P2PConPred basic prediction method in whole proteins from 12% to 18%. Conclusion Using a simple approach we increased the contact prediction accuracy of a basic method by 1.5 times. Our graph approach is simple to implement, can be used with various basic prediction methods, and can provide input for further downstream analyses.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A Comprehensive Resource of Interacting Protein Regions for Refining Human Transcription Factor Networks

Author: A Lobley
A Patil
AC Gavin
AI Su
AK Dunker
AL Barabasi
AS Garza
B Guglielmi
C Haynes
DJ LaCount
DL Graham
DM Lawson
DR Rhodes
E Miyamoto-Sato
E Miyamoto-Sato
E Miyamoto-Sato
E Miyamoto-Sato
Etsuko Miyamoto-Sato
F Verrecchia
G Butland
G Rigaut
GD Bader
GP Singh
GP Singh
H Jeong
Hidetoshi Akasaka
Hiroshi Yanagawa
HJ Dyson
J Park
JA Wells
JD Han
Jean Peccoud
JF Rual
JJ Ward
Jun Sugiyama
Katsuya Hino
Kazuyo Masuoka
L Chen
L Giot
L Hakes
M Barrios-Rodiles
M Boxem
M Fromont-Racine
M Singhal
Masamichi Ishizaka
Masaru Tomita
N Nemoto
Naoya Hirai
NT Liberati
P Legrain
P Radivojac
P Shannon
P Uetz
PM Kim
PW Hammond
RD Finn
Rintaro Saito
RM Ewing
RW Roberts
S Fields
S Li
Shigeo Fujimori
T Ito
Takanori Washio
Tatsuhiro Yamashita
Tomohiro Oshikubo
U Stelzl
VN Uversky
X Shen
Y Chinenov
Y Zhang
Yasuo Matsumoto
Yosuke Ozawa
Z Dosztanyi
Publication venue: Public Library of Science
Publication date: 24/02/2010
Field of study

Large-scale data sets of protein-protein interactions (PPIs) are a valuable resource for mapping and analysis of the topological and dynamic features of interactome networks. The currently available large-scale PPI data sets only contain information on interaction partners. The data presented in this study also include the sequences involved in the interactions (i.e., the interacting regions, IRs) suggested to correspond to functional and structural domains. Here we present the first large-scale IR data set obtained using mRNA display for 50 human transcription factors (TFs), including 12 transcription-related proteins. The core data set (966 IRs; 943 PPIs) displays a verification rate of 70%. Analysis of the IR data set revealed the existence of IRs that interact with multiple partners. Furthermore, these IRs were preferentially associated with intrinsic disorder. This finding supports the hypothesis that intrinsically disordered regions play a major role in the dynamics and diversity of TF networks through their ability to structurally adapt to and bind with multiple partners. Accordingly, this domain-based interaction resource represents an important step in refining protein interactions and networks at the domain level and in associating network analysis with biological structure and function

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Prediction of Oxidation States of Cysteines and Disulphide Connectivity

Author: Du Aiguo
Publication venue: ScholarWorks @ Georgia State University
Publication date: 27/11/2007
Field of study

Knowledge on cysteine oxidation state and disulfide bond connectivity is of great importance to protein chemistry and 3-D structures. This research is aimed at finding the most relevant features in prediction of cysteines oxidation states and the disulfide bonds connectivity of proteins. Models predicting the oxidation states of cysteines are developed with machine learning techniques such as Support Vector Machines (SVMs) and Associative Neural Networks (ASNNs). A record high prediction accuracy of oxidation state, 95%, is achieved by incorporating the oxidation states of N-terminus cysteines, flanking sequences of cysteines and global information on the protein chain (number of cysteines, length of the chain and amino acids composition of the chain etc.) into the SVM encoding. This is 5% higher than the current methods. This indicates to us that the oxidation states of amino terminal cysteines infer the oxidation states of other cysteines in the same protein chain. Satisfactory prediction results are also obtained with the newer and more inclusive SPX dataset, especially for chains with higher number of cysteines. Compared to literature methods, our approach is a one-step prediction system, which is easier to implement and use. A side by side comparison of SVM and ASNN is conducted. Results indicated that SVM outperform ASNN on this particular problem. For the prediction of correct pairings of cysteines to form disulfide bonds, we first study disulfide connectivity by calculating the local interaction potentials between the flanking sequences of the cysteine pairs. The obtained interaction potential is further adjusted by the coefficients related to the binding motif of enzymes during disulfide formation and also by the linear distance between the cysteine pairs. Finally, maximized weight matching algorithm is applied and performance of the interaction potentials evaluated. Overall prediction accuracy is unsatisfactory compared with the literature. SVM is used to predict the disulfide connectivity with the assumption that oxidation states of cysteines on the protein are known. Information on binding region during disulfide formation, distance between cysteine pairs, global information of the protein chain and the flanking sequences around the cysteine pairs are included in the SVM encoding. Prediction results illustrate the advantage of using possible anchor region information

ScholarWorks @ Georgia State University

DBAC: A simple prediction method for protein binding hot spots based on burial levels and deeply buried atomic contacts

Author: Li Jinyan
Li Zhenhua
Wong Limsoon
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background A protein binding hot spot is a cluster of residues in the interface that are energetically important for the binding of the protein with its interaction partner. Identifying protein binding hot spots can give useful information to protein engineering and drug design, and can also deepen our understanding of protein-protein interaction. These residues are usually buried inside the interface with very low solvent accessible surface area (SASA). Thus SASA is widely used as an outstanding feature in hot spot prediction by many computational methods. However, SASA is not capable of distinguishing slightly buried residues, of which most are non hot spots, and deeply buried ones that are usually inside a hot spot. Results We propose a new descriptor called “burial level” for characterizing residues, atoms and atomic contacts. Specifically, burial level captures the depth the residues are buried. We identify different kinds of deeply buried atomic contacts (DBAC) at different burial levels that are directly broken in alanine substitution. We use their numbers as input for SVM to classify between hot spot or non hot spot residues. We achieve F measure of 0.6237 under the leave-one-out cross-validation on a data set containing 258 mutations. This performance is better than other computational methods. Conclusions Our results show that hot spot residues tend to be deeply buried in the interface, not just having a low SASA value. This indicates that a high burial level is not only a necessary but also a more sufficient condition than a low SASA for a residue to be a hot spot residue. We find that those deeply buried atoms become increasingly more important when their burial levels rise up. This work also confirms the contribution of deeply buried interfacial atomic contacts to the energy of protein binding hot spot.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

OPUS - University of Technology Sydney

PubMed Central

ScholarBank@NUS

HAAD: A Quick Algorithm for Accurate Prediction of Hydrogen Atoms in Protein Structures

Author: A Sali
A Verma
AA Canutescu
AA Kossiakoff
AA Kossiakoff
AC Anderson
AD MacKerell
Ambrish Roy
Andreas Hofmann
AT Brunger
AT Brunger
BR Brooks
D Seeliger
E Lindahl
EL Ulrich
F Baud
FM Bickelhaupt
G Klebe
G Vriend
GD Rose
IK McDonald
J Chen
JM Word
KA Dill
L Pauling
LR Forrest
M Cohen
M Gochin
M Nilges
N Engler
RJ Read
RS Rowland
S Jones
Shakhnovich SaEI
SR Kimura
T Herrmann
V Pophristic
W Kabsch
W Wang
Y Li
Y Zhang
Yang Zhang
YQ Li
Yunqi Li
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Hydrogen constitutes nearly half of all atoms in proteins and their positions are essential for analyzing hydrogen-bonding interactions and refining atomic-level structures. However, most protein structures determined by experiments or computer prediction lack hydrogen coordinates. We present a new algorithm, HAAD, to predict the positions of hydrogen atoms based on the positions of heavy atoms. The algorithm is built on the basic rules of orbital hybridization followed by the optimization of steric repulsion and electrostatic interactions. We tested the algorithm using three independent data sets: ultra-high-resolution X-ray structures, structures determined by neutron diffraction, and NOE proton-proton distances. Compared with the widely used programs CHARMM and REDUCE, HAAD has a significantly higher accuracy, with the average RMSD of the predicted hydrogen atoms to the X-ray and neutron diffraction structures decreased by 26% and 11%, respectively. Furthermore, hydrogen atoms placed by HAAD have more matches with the NOE restraints and fewer clashes with heavy atoms. The average CPU cost by HAAD is 18 and 8 times lower than that of CHARMM and REDUCE, respectively. The significant advantage of HAAD in both the accuracy and the speed of the hydrogen additions should make HAAD a useful tool for the detailed study of protein structure and function. Both an executable and the source code of HAAD are freely available at http://zhang.bioinformatics.ku.edu/HAAD

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

KU ScholarWorks

PubMed Central

Formulation of Hybrid Knowledge-Based/Molecular Mechanics Potentials for Protein Structure Refinement and a Novel Graph Theoretical Protein Structure Comparison and Analysis Technique

Author: Maus Aaron
Publication venue: ScholarWorks@UNO
Publication date: 05/08/2019
Field of study

Proteins are the fundamental machinery that enables the functions of life. It is critical to understand them not just for basic biology, but also to enable medical advances. The field of protein structure prediction is concerned with developing computational techniques to predict protein structure and function from a protein’s amino acid sequence, encoded for directly in DNA, alone. Despite much progress since the first computational models in the late 1960’s, techniques for the prediction of protein structure still cannot reliably produce structures of high enough accuracy to enable desired applications such as rational drug design. Protein structure refinement is the process of modifying a predicted model of a protein to bring it closer to its native state. In this dissertation a protein structure refinement technique, that of potential energy minimization using hybrid molecular mechanics/knowledge based potential energy functions is examined in detail. The generation of the knowledge-based component is critically analyzed, and in the end, a potential that is a modest improvement over the original is presented. This dissertation also examines the task of protein structure comparison. In evaluating various protein structure prediction techniques, it is crucial to be able to compare produced models against known structures to understand how well the technique performs. A novel technique is proposed that allows an in-depth yet intuitive evaluation of the local similarities between protein structures. Based on a graph analysis of pairwise atomic distance similarities, multiple regions of structural similarity can be identified between structures independently of relative orientation. Multidomain structures can be evaluated and this technique can be combined with global measures of similarity such as the global distance test. This method of comparison is expected to have broad applications in rational drug design, the evolutionary study of protein structures, and in the analysis of the protein structure prediction effort

University of New Orleans

MAGIIC-PRO: detecting functional signatures by efficient discovery of long patterns in protein sequences

Author: Chen Chien-Yu
Hsu Chen-Ming
Liu Baw-Jhiune
Publication venue: Oxford University Press
Publication date: 14/07/2006
Field of study

This paper presents a web service named MAGIIC-PRO, which aims to discover functional signatures of a query protein by sequential pattern mining. Automatic discovery of patterns from unaligned biological sequences is an important problem in molecular biology. MAGIIC-PRO is different from several previously established methods performing similar tasks in two major ways. The first remarkable feature of MAGIIC-PRO is its efficiency in delivering long patterns. With incorporating a new type of gap constraints and some of the state-of-the-art data mining techniques, MAGIIC-PRO usually identifies satisfied patterns within an acceptable response time. The efficiency of MAGIIC-PRO enables the users to quickly discover functional signatures of which the residues are not from only one region of the protein sequences or are only conserved in few members of a protein family. The second remarkable feature of MAGIIC-PRO is its effort in refining the mining results. Considering large flexible gaps improves the completeness of the derived functional signatures. The users can be directly guided to the patterns with as many blocks as that are conserved simultaneously. In this paper, we show by experiments that MAGIIC-PRO is efficient and effective in identifying ligand-binding sites and hot regions in protein–protein interactions directly from sequences. The web service is available at and a mirror site at

Crossref

PubMed Central

Automatic \u3csup\u3e13\u3c/sup\u3eC Chemical Shift Reference Correction of Protein NMR Spectral Data Using Data Mining and Bayesian Statistical Modeling

Author: Chen Xi
Publication venue: UKnowledge
Publication date: 01/01/2019
Field of study

Nuclear magnetic resonance (NMR) is a highly versatile analytical technique for studying molecular configuration, conformation, and dynamics, especially of biomacromolecules such as proteins. However, due to the intrinsic properties of NMR experiments, results from the NMR instruments require a refencing step before the down-the-line analysis. Poor chemical shift referencing, especially for 13C in protein Nuclear Magnetic Resonance (NMR) experiments, fundamentally limits and even prevents effective study of biomacromolecules via NMR. There is no available method that can rereference carbon chemical shifts from protein NMR without secondary experimental information such as structure or resonance assignment. To solve this problem, we constructed a Bayesian probabilistic framework that circumvents the limitations of previous reference correction methods that required protein resonance assignment and/or three-dimensional protein structure. Our algorithm named Bayesian Model Optimized Reference Correction (BaMORC) can detect and correct 13C chemical shift referencing errors before the protein resonance assignment step of analysis and without a three-dimensional structure. By combining the BaMORC methodology with a new intra-peaklist grouping algorithm, we created a combined method called Unassigned BaMORC that utilizes only unassigned experimental peak lists and the amino acid sequence. Unassigned BaMORC kept all experimental three-dimensional HN(CO)CACB-type peak lists tested within ± 0.4 ppm of the correct 13C reference value. On a much larger unassigned chemical shift test set, the base method kept 13C chemical shift referencing errors to within ± 0.45 ppm at a 90% confidence interval. With chemical shift assignments, Assigned BaMORC can detect and correct 13C chemical shift referencing errors to within ± 0.22 at a 90% confidence interval. Therefore, Unassigned BaMORC can correct 13C chemical shift referencing errors when it will have the most impact, right before protein resonance assignment and other downstream analyses are started. After assignment, chemical shift reference correction can be further refined with Assigned BaMORC. To further support a broader usage of these new methods, we also created a software package with web-based interface for the NMR community. This software will allow non-NMR experts to detect and correct 13C referencing errors at critical early data analysis steps, lowering the bar of NMR expertise required for effective protein NMR analysis

University of Kentucky

Mass & secondary structure propensity of amino acids explain their mutability and evolutionary replacements

Author: Bohórquez Hugo J.
Patarroyo Manuel Elkin
Suárez Carlos F.
Publication venue
Publication date: 01/01/2017
Field of study

Why is an amino acid replacement in a protein accepted during evolution? The answer given by bioinformatics relies on the frequency of change of each amino acid by another one and the propensity of each to remain unchanged. We propose that these replacement rules are recoverable from the secondary structural trends of amino acids. A distance measure between high-resolution Ramachandran distributions reveals that structurally similar residues coincide with those found in substitution matrices such as BLOSUM: Asn Asp, Phe Tyr, Lys Arg, Gln Glu, Ile Val, Met → Leu; with Ala, Cys, His, Gly, Ser, Pro, and Thr, as structurally idiosyncratic residues. We also found a high average correlation (\overline{R} R = 0.85) between thirty amino acid mutability scales and the mutational inertia (I X ), which measures the energetic cost weighted by the number of observations at the most probable amino acid conformation. These results indicate that amino acid substitutions follow two optimally-efficient principles: (a) amino acids interchangeability privileges their secondary structural similarity, and (b) the amino acid mutability depends directly on its biosynthetic energy cost, and inversely with its frequency. These two principles are the underlying rules governing the observed amino acid substitutions. © 2017 The Author(s)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas