Search CORE

1,335 research outputs found

Transmembrane helix prediction using amino acid property features and latent semantic analysis

Author: Balakrishnan N
Ganapathiraju Madhavi
Klein-Seetharaman Judith
Reddy Raj
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Prediction of transmembrane (TM) helices by statistical methods suffers from lack of sufficient training data. Current best methods use hundreds or even thousands of free parameters in their models which are tuned to fit the little data available for training. Further, they are often restricted to the generally accepted topology "cytoplasmic-transmembrane-extracellular" and cannot adapt to membrane proteins that do not conform to this topology. Recent crystal structures of channel proteins have revealed novel architectures showing that the above topology may not be as universal as previously believed. Thus, there is a need for methods that can better predict TM helices even in novel topologies and families

Springer - Publisher Connector

PubMed Central

Open Access Repository of IISc Research Publications

PROTEUS2: a web server for comprehensive protein structure prediction and structure-based annotation

Author: Bagos
Bendtsen
Canutescu
Carter
Cuff
D. Arndt
D. S. Wishart
Eyrich
Garrow
Hall
J. A. Cruz
Jones
Kernytsky
Krogh
Liu
M. Berjanskii
Mewes
Montgomerie
Nayeem
Pollastri
Riley
Rost
S. Montgomerie
S. Shrivastava
Schwede
Stothard
Stothard
Ullman
Van Domselaar
Wallner
Walther
Wishart
Publication venue: Oxford University Press
Publication date
Field of study

PROTEUS2 is a web server designed to support comprehensive protein structure prediction and structure-based annotation. PROTEUS2 accepts either single sequences (for directed studies) or multiple sequences (for whole proteome annotation) and predicts the secondary and, if possible, tertiary structure of the query protein(s). Unlike most other tools or servers, PROTEUS2 bundles signal peptide identification, transmembrane helix prediction, transmembrane β-strand prediction, secondary structure prediction (for soluble proteins) and homology modeling (i.e. 3D structure generation) into a single prediction pipeline. Using a combination of progressive multi-sequence alignment, structure-based mapping, hidden Markov models, multi-component neural nets and up-to-date databases of known secondary structure assignments, PROTEUS is able to achieve among the highest reported levels of predictive accuracy for signal peptides (Q2 = 94%), membrane spanning helices (Q2 = 87%) and secondary structure (Q3 score of 81.3%). PROTEUS2's homology modeling services also provide high quality 3D models that compare favorably with those generated by SWISS-MODEL and 3D JigSaw (within 0.2 Å RMSD). The average PROTEUS2 prediction takes ∼3 min per query sequence. The PROTEUS2 server along with source code for many of its modules is accessible a http://wishart.biology.ualberta.ca/proteus2

Crossref

PubMed Central

Active machine learning for transmembrane helix prediction

Author: Carbonell Jaime G
Ganapathiraju Madhavi K
Osmanbeyoglu Hatice U
Wehner Jessica A
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background About 30% of genes code for membrane proteins, which are involved in a wide variety of crucial biological functions. Despite their importance, experimentally determined structures correspond to only about 1.7% of protein structures deposited in the Protein Data Bank due to the difficulty in crystallizing membrane proteins. Algorithms that can identify proteins whose high-resolution structure can aid in predicting the structure of many previously unresolved proteins are therefore of potentially high value. Active machine learning is a supervised machine learning approach which is suitable for this domain where there are a large number of sequences but only very few have known corresponding structures. In essence, active learning seeks to identify proteins whose structure, if revealed experimentally, is maximally predictive of others. Results An active learning approach is presented for selection of a minimal set of proteins whose structures can aid in the determination of transmembrane helices for the remaining proteins. TMpro, an algorithm for high accuracy TM helix prediction we previously developed, is coupled with active learning. We show that with a well-designed selection procedure, high accuracy can be achieved with only few proteins. TMpro, trained with a single protein achieved an F-score of 94% on benchmark evaluation and 91% on MPtopo dataset, which correspond to the state-of-the-art accuracies on TM helix prediction that are achieved usually by training with over 100 training proteins. Conclusion Active learning is suitable for bioinformatics applications, where manually characterized data are not a comprehensive representation of all possible data, and in fact can be a very sparse subset thereof. It aids in selection of data instances which when characterized experimentally can improve the accuracy of computational characterization of remaining raw data. The results presented here also demonstrate that the feature extraction method of TMpro is well designed, achieving a very good separation between TM and non TM segments

Crossref

Springer - Publisher Connector

PubMed Central

Carolina Digital Repository

Computational studies for prediction of protein folding and ligand binding

Author: Luo Lingqi
Publication venue
Publication date: 02/02/2018
Field of study

This dissertation comprises four projects. I) Glycosylation is a post-translational modification that affects many physiological processes, including protein folding, cell interaction and host immune response. PglC, a phosphoglycosyl transferase (PGT) involved in the biosynthesis of N-linked glycoproteins in Campylobacter jejuni, is representative of one of the structurally simplest members of the small bacterial PGT family. The research utilizes sequence similarity network and evolutionary covariance studies to identify the catalytic core of PglC, followed by modeling its three-dimensional structure using the covariance as constraints. II) Rapid growth of fragment-based drug discovery necessitates accurate fragment library screening for targets of interest, finding strong binders with specific binding. While many high-resolution biophysical methods for fragment screening work well, docking-based virtual screening is highly desired due to the speed and cost efficiency. Beyond the key performance-determining factors like score function and search method, the goal is to learn from the experimental fragment bound structures in the PDBbinder database and to evaluate the profile of side-chain flexibility in the interface and its contribution to docking performance. III) Protein docking procedures carry out the task of predicting the structure of a protein–protein complex starting from the known structures of the individual protein components. However, the structure of one or both components frequently must be obtained by homology modeling based on known structures. This work presents a benchmark dataset of experimentally determined target complexes with a large set of sufficiently diverse template complexes identified for each target. The dataset allows developers to test their algorithms combining homology modeling and docking, in order to determine the factors that critically influence the prediction performance. IV) Human Eukaryotic Initiation Factor 4AI (heIF4AI) is the enzymatic component of a highly efficient complex, heIF4F. Its helicase activity binds and unwinds the secondary structure of mRNA at the 5' end and thus plays a crucial role in translation initiation. This research focuses on the C-terminal domain of heIF4AI and investigates its potential as an anti-cancer target by integrating the approaches of solvent mapping, docking, crystallization and NMR

Boston University Institutional Repository (OpenBU)

TMFoldRec: a statistical potential-based transmembrane protein fold recognition tool.

Author: A Fiser
A Lobley
A Oberai
A Ray
AA Canutescu
AJ Heim
AL Hopkins
AS Amin
BE Suzek
BE Weiner
D Fischer
DP Ng
Dániel Kozma
EJ Tarling
F Morcos
F Palmieri
G Studer
GE Tusnády
GE Tusnády
GE Tusnády
GE Tusnády
Gábor E. Tusnády
H Wang
I Sillitoe
J Ma
J Ma
J Peng
J Stefková
J Söding
K Arnold
M Punta
M Remmert
MI Sadowski
MR Dorwart
P Barth
P Bradley
PA Insel
PD Thomas
RD Finn
RD Finn
S Kalyaanamoorthy
S Rust
SF Altschul
SR Eddy
T Nugent
T Schöneberg
V Yarov-Yarovoy
Y Zhang
Z Dosztányi
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

BACKGROUND: Transmembrane proteins (TMPs) are the key components of signal transduction, cell-cell adhesion and energy and material transport into and out from the cells. For the deep understanding of these processes, structure determination of transmembrane proteins is indispensable. However, due to technical difficulties, only a few transmembrane protein structures have been determined experimentally. Large-scale genomic sequencing provides increasing amounts of sequence information on the proteins and whole proteomes of living organisms resulting in the challenge of bioinformatics; how the structural information should be gained from a sequence. RESULTS: Here, we present a novel method, TMFoldRec, for fold prediction of membrane segments in transmembrane proteins. TMFoldRec based on statistical potentials was tested on a benchmark set containing 124 TMP chains from the PDBTM database. Using a 10-fold jackknife method, the native folds were correctly identified in 77 % of the cases. This accuracy overcomes the state-of-the-art methods. In addition, a key feature of TMFoldRec algorithm is the ability to estimate the reliability of the prediction and to decide with an accuracy of 70 %, whether the obtained, lowest energy structure is the native one. CONCLUSION: These results imply that the membrane embedded parts of TMPs dictate the TM structures rather than the soluble parts. Moreover, predictions with reliability scores make in this way our algorithm applicable for proteome-wide analyses. AVAILABILITY: The program is available upon request for academic use

Crossref

Springer - Publisher Connector

PubMed Central

Repository of the Academy's Library

CCTOP: a Consensus Constrained TOPology prediction web server.

Author: Dobson László
Reményi István
Tusnády Gábor
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

The Consensus Constrained TOPology prediction (CCTOP; http://cctop.enzim.ttk.mta.hu) server is a web-based application providing transmembrane topology prediction. In addition to utilizing 10 different state-of-the-art topology prediction methods, the CCTOP server incorporates topology information from existing experimental and computational sources available in the PDBTM, TOPDB and TOPDOM databases using the probabilistic framework of hidden Markov model. The server provides the option to precede the topology prediction with signal peptide prediction and transmembrane-globular protein discrimination. The initial result can be recalculated by (de)selecting any of the prediction methods or mapped experiments or by adding user specified constraints. CCTOP showed superior performance to existing approaches. The reliability of each prediction is also calculated, which correlates with the accuracy of the per protein topology prediction. The prediction results and the collected experimental information are visualized on the CCTOP home page and can be downloaded in XML format. Programmable access of the CCTOP server is also available, and an example of client-side script is provided

PubMed Central

Repository of the Academy's Library

The human transmembrane proteome

Author: Dobson L.
Reményi I.
Tusnády Gábor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Background: Transmembrane proteins have important roles in cells, as they are involved in energy production, signal transduction, cell-cell interaction, cell-cell communication and more. In human cells, they are frequently targets for pharmaceuticals; therefore, knowledge about their properties and structure is crucial. Topology of transmembrane proteins provide a low resolution structural information, which can be a starting point for either laboratory experiments or modelling their 3D structures. Results: Here, we present a database of the human α-helical transmembrane proteome, including the predicted and/or experimentally established topology of each transmembrane protein, together with the reliability of the prediction. In order to distinguish transmembrane proteins in the proteome as well as for topology prediction, we used a newly developed consensus method (CCTOP) that incorporates recent state of the art methods, with tested accuracies on a novel human benchmark protein set. CCTOP utilizes all available structure and topology data as well as bioinformatical evidences for topology prediction in a probabilistic framework provided by the hidden Markov model. This method shows the highest accuracy (98.5 % for discrinimating between transmembrane and non-transmembrane proteins and 84 % for per protein topology prediction) among the dozen tested topology prediction methods. Analysis of the human proteome with the CCTOP indicates that it contains 4998 (26 %) transmembrane proteins. Besides predicting topology, reliability of the predictions is estimated as well, and it is demonstrated that the per protein prediction accuracies of more than 60 % of the predictions are over 98 % on the benchmark sets and most probably on the predicted human transmembrane proteome too. Conclusions: Here, we present the most accurate prediction of the human transmembrane proteome together with the experimental topology data. These data, as well as various statistics about the human transmembrane proteins and their topologies can be downloaded from and can be visualized at the website of the human transmembrane proteome (http://htp.enzim.hu). Reviewers: This article was reviewed by Dr. Sandor Pongor, Dr. Michael Galperin and Dr. Pascale Gaudet (nominated by Dr Michael Galperin). © 2015 Dobson et al.; licensee BioMed Central

Springer - Publisher Connector

PubMed Central

Repository of the Academy's Library

PPT-DB: the protein property prediction and testing database

Author: A. C. Guo
Bendtsen
Berjanskii
Cheng
Choo
D. Arndt
D. S. Wishart
Dor
G. Lin
Garrow
Guzzo
J. Zhou
Kernytsky
Krogh
Kumar
M. Berjanskii
Maiti
Montgomerie
O'Donovan
Richards
S. Shrivastava
Schlessinger
Siepen
Wang
Willard
Wilmot
Wishart
Y. Shi
Y. Zhou
Zhang
Publication venue: Oxford University Press
Publication date
Field of study

The protein property prediction and testing database (PPT-DB) is a database housing nearly 30 carefully curated databases, each of which contains commonly predicted protein property information. These properties include both structural (i.e. secondary structure, contact order, disulfide pairing) and dynamic (i.e. order parameters, B-factors, folding rates) features that have been measured, derived or tabulated from a variety of sources. PPT-DB is designed to serve two purposes. First it is intended to serve as a centralized, up-to-date, freely downloadable and easily queried repository of predictable or ‘derived’ protein property data. In this role, PPT-DB can serve as a one-stop, fully standardized repository for developers to obtain the required training, testing and validation data needed for almost any kind of protein property prediction program they may wish to create. The second role that PPT-DB can play is as a tool for homology-based protein property prediction. Users may query PPT-DB with a sequence of interest and have a specific property predicted using a sequence similarity search against PPT-DB's extensive collection of proteins with known properties. PPT-DB exploits the well-known fact that protein structure and dynamic properties are highly conserved between homologous proteins. Predictions derived from PPT-DB's similarity searches are typically 85–95% correct (for categorical predictions, such as secondary structure) or exhibit correlations of >0.80 (for numeric predictions, such as accessible surface area). This performance is 10–20% better than what is typically obtained from standard ‘ab initio’ predictions. PPT-DB, its prediction utilities and all of its contents are available at http://www.pptdb.c

Crossref

PubMed Central

Recommended from our members

Scoring functions for protein docking and drug design

Author: Viswanath Shruthi
Publication venue
Publication date: 26/06/2014
Field of study

textPredicting the structure of complexes formed by two interacting proteins is an important problem in computation structural biology. Proteins perform many of their functions by binding to other proteins. The structure of protein-protein complexes provides atomic details about protein function and biochemical pathways, and can help in designing drugs that inhibit binding. Docking computationally models the structure of protein-protein complexes, given three-dimensional structures of the individual chains. Protein docking methods have two phases. In the first phase, a comprehensive, coarse search is performed for optimally docked models. In the second refinement and reranking phase, the models from the first phase are refined and reranked, with the expectation of extracting a small set of accurate models from the pool of thousands of models obtained from the first phase. In this thesis, new algorithms are developed for the refinement and reranking phase of docking. New scoring functions, or potentials, that rank models are developed. These potentials are learnt using large-scale machine learning methods based on mathematical programming. The procedure for learning these potentials involves examining hundreds of thousands of correct and incorrect models. In this thesis, hierarchical constraints were introduced into the learning algorithm. First, an atomic potential was developed using this learning procedure. A refinement procedure involving side-chain remodeling and conjugate gradient-based minimization was introduced. The refinement procedure combined with the atomic potential was shown to improve docking accuracy significantly. Second, a hydrogen bond potential, was developed. Molecular dynamics-based sampling combined with the hydrogen bond potential improved docking predictions. Third, mathematical programming compared favorably to SVMs and neural networks in terms of accuracy, training and test time for the task of designing potentials to rank docking models. The methods described in this thesis are implemented in the docking package DOCK/PIERR. DOCK/PIERR was shown to be among the best automated docking methods in community wide assessments. Finally, DOCK/PIERR was extended to predict membrane protein complexes. A membrane-based score was added to the reranking phase, and shown to improve the accuracy of docking. This docking algorithm for membrane proteins was used to study the dimers of amyloid precursor protein, implicated in Alzheimer's disease.R. DOCK/PIERR was shown to be among the best automated docking methods in community wide assessments. Finally, DOCK/PIERR was extended to predict membrane protein complexes. A membrane-based score was added to the reranking phase, and shown to improve the accuracy of docking. This docking algorithm for membrane proteins was used to study the dimers of amyloid precursor protein, implicated in Alzheimer’s disease.Computer Science

Texas ScholarWorks