Search CORE

5,597 research outputs found

Improving the performance of DomainDiscovery of protein domain boundary assignment using inter-domain linker index

Author: A Andreeva
Abdur R Sikder
Albert Y Zomaya
AR Sikder
FMG Pearl
G Pollastri
G Pollastri
HM Berman
J Cheng
J Liu
J Sim
JE Gewehr
L Kong
M Dumontier
M Suyama
N Nagarajan
OV Galzitskaya
RA George
RL Marsden
S Veretnik
SF Altschul
SJ Wheelan
T Joachims
TA Holland
V Vapnik
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Knowledge of protein domain boundaries is critical for the characterisation and understanding of protein function. The ability to identify domains without the knowledge of the structure – by using sequence information only – is an essential step in many types of protein analyses. In this present study, we demonstrate that the performance of DomainDiscovery is improved significantly by including the inter-domain linker index value for domain identification from sequence-based information. Improved DomainDiscovery uses a Support Vector Machine (SVM) approach and a unique training dataset built on the principle of consensus among experts in defining domains in protein structure. The SVM was trained using a PSSM (Position Specific Scoring Matrix), secondary structure, solvent accessibility information and inter-domain linker index to detect possible domain boundaries for a target sequence. RESULTS: Improved DomainDiscovery is compared with other methods by benchmarking against a structurally non-redundant dataset and also CASP5 targets. Improved DomainDiscovery achieves 70% accuracy for domain boundary identification in multi-domains proteins. CONCLUSION: Improved DomainDiscovery compares favourably to the performance of other methods and excels in the identification of domain boundaries for multi-domain proteins as a result of introducing support vector machine with benchmark_2 dataset

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Feature-based multiple models improve classification of mutation-induced stability changes

Author: Abdul Sattar
Bela Stantic
Lukas Folkman
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

BACKGROUND: Reliable prediction of stability changes in protein variants is an important aspect of computational protein design. A number of machine learning methods that allow a classification of stability changes knowing only the sequence of the protein emerged. However, their performance on amino acid substitutions of previously unseen non-homologous proteins is rather limited. Moreover, the performance varies for different types of mutations based on the secondary structure or accessible surface area of the mutation site. RESULTS: We proposed feature-based multiple models with each model designed for a specific type of mutations. The new method is composed of five models trained for mutations in exposed, buried, helical, sheet, and coil residues. The classification of a mutation as stabilising or destabilising is made as a consensus of two models, one selected based on the predicted accessible surface area and the other based on the predicted secondary structure of the mutation site. We refer to our new method as Evolutionary, Amino acid, and Structural Encodings with Multiple Models (EASE-MM). Cross-validation results show that EASE-MM provides a notable improvement to our previous work reaching a Matthews correlation coefficient of 0.44. EASE-MM was able to correctly classify 73% and 75% of stabilising and destabilising protein variants, respectively. Using an independent test set of 238 mutations, we confirmed our results in a comparison with related work. CONCLUSIONS: EASE-MM not only outperformed other related methods but achieved more balanced results for different types of mutations based on the accessible surface area, secondary structure, or magnitude of stability changes. This can be attributed to using multiple models with the most relevant features selected for the given type of mutations. Therefore, our results support the presumption that different interactions govern stability changes in the exposed and buried residues or in residues with a different secondary structure

Springer - Publisher Connector

PubMed Central

New Methods to Improve Protein Structure Modeling

Author: Abdelrasoul Maha
Publication venue: ODU Digital Commons
Publication date: 01/07/2018
Field of study

Proteins are considered the central compound necessary for life, as they play a crucial role in governing several life processes by performing the most essential biological and chemical functions in every living cell. Understanding protein structures and functions will lead to a significant advance in life science and biology. Such knowledge is vital for various fields such as drug development and synthetic biofuels production. Most proteins have definite shapes that they fold into, which are the most stable state they can adopt. Due to the fact that the protein structure information provides important insight into its functions, many research efforts have been conducted to determine the protein 3-dimensional structure from its sequence. The experimental methods for protein 3-dimensional structure determination are often time-consuming, costly, and even not feasible for some proteins. Accordingly, recent research efforts focus more and more on computational approaches to predict protein 3-dimensional structures. Template-based modeling is considered one of the most accurate protein structure prediction methods. The success of template-based modeling relies on correctly identifying one or a few experimentally determined protein structures as structural templates that are likely to resemble the structure of the target sequence as well as accurately producing a sequence alignment that maps the residues in the target sequence to those in the template. In this work, we aim at improving the template-based protein structure modeling by enhancing the correctness of identifying the most appropriate templates and precisely aligning the target and template sequences. Firstly, we investigate employing inter-residue contact score to measure the favorability of a target sequence fitting in the folding topology of a certain template. Secondly, we design a multi-objective alignment algorithm extending the famous Needleman-Wunsch algorithm to obtain a complete set of alignments yielding Pareto optimality. Then, we use protein sequence and structural information as objectives and generate the complete Pareto optimal front of alignments between target sequence and template. The alignments obtained enable one to analyze the trade-offs between the potentially conflicting objectives. These approaches lead to accuracy enhancement in template-based protein structure modeling

Old Dominion University

DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces

Author: Ahmad
Ahmad
Albright
Albright
Altschul
Bedell
Buckle
Bullock
Bulyk
Chen
Chen
Chen
Endres
Ferrer-Costa
Fujii
Gnatt
Harianto Tjong
Havranek
Henikoff
Horton
Huan-Xiang Zhou
Jones
Kabsch
Kalodimos
Kalodimos
Keil
Kono
Kummerfeld
Kuznetsov
Lejeune
Lima
Luscombe
Luscombe
Mandel-Gutfreund
Mitton-Fry
Myers
Pabo
Paillard
Rost
Schmeing
Siggers
Singleton
Stawiski
Tsuchiya
van Dijk
van Dijk
Wang
Yan
Zhou
Publication venue: Oxford University Press
Publication date: 01/01/2007
Field of study

Structural and physical properties of DNA provide important constraints on the binding sites formed on surfaces of DNA-targeting proteins. Characteristics of such binding sites may form the basis for predicting DNA-binding sites from the structures of proteins alone. Such an approach has been successfully developed for predicting protein–protein interface. Here this approach is adapted for predicting DNA-binding sites. We used a representative set of 264 protein–DNA complexes from the Protein Data Bank to analyze characteristics and to train and test a neural network predictor of DNA-binding sites. The input to the predictor consisted of PSI-blast sequence profiles and solvent accessibilities of each surface residue and 14 of its closest neighboring residues. Predicted DNA-contacting residues cover 60% of actual DNA-contacting residues and have an accuracy of 76%. This method significantly outperforms previous attempts of DNA-binding site predictions. Its application to the prion protein yielded a DNA-binding site that is consistent with recent NMR chemical shift perturbation data, suggesting that it can complement experimental techniques in characterizing protein–DNA interfaces

CiteSeerX

Crossref

PubMed Central

MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information

Author: Altschul
Bahr
Blake
Boutonnet
Chandonia
Chivian
Cline
Dayhoff
de Bakker
Do
Durbin
Eddy
Eddy
Edgar
Edgar
Ginalski
Gotoh
Henikoff
Holm
Holm
Huang
Hubbard
Jimin Pei
Jones
Jones
Kabsch
Katoh
Kinch
Lichtarge
Marchler-Bauer
Miyazawa
Murzin
Needleman
Nick V. Grishin
Notredame
O'Sullivan
O'Sullivan
Pei
Pei
Prlic
Rost
Rychlewski
Sadreyev
Shindyalov
Simossis
Smith
Thompson
Thompson
Thompson
Van Walle
Venclovas
Wallace
Wallace
Wallner
Wang
Zemla
Zhang
Zhou
Publication venue: Oxford University Press
Publication date: 26/08/2006
Field of study

We have developed MUMMALS, a program to construct multiple protein sequence alignment using probabilistic consistency. MUMMALS improves alignment quality by using pairwise alignment hidden Markov models (HMMs) with multiple match states that describe local structural information without exploiting explicit structure predictions. Parameters for such models have been estimated from a large library of structure-based alignments. We show that (i) on remote homologs, MUMMALS achieves statistically best accuracy among several leading aligners, such as ProbCons, MAFFT and MUSCLE, albeit the average improvement is small, in the order of several percent; (ii) a large collection (>10 000) of automatically computed pairwise structure alignments of divergent protein domains is superior to smaller but carefully curated datasets for estimation of alignment parameters and performance tests; (iii) reference-independent evaluation of alignment quality using sequence alignment-dependent structure superpositions correlates well with reference-dependent evaluation that compares sequence-based alignments to structure-based reference alignments

Crossref

PubMed Central

New integrative tools for interactive protein structure modeling and function prediction

Author: Barbato Alessandro
Publication venue
Publication date: 27/02/2013
Field of study

Pubblicazioni Aperte Digitali Interateneo Sapienza

Archivio della ricerca- Università di Roma La Sapienza