Search CORE

3,450 research outputs found

Prediction of Carbohydrate-Binding Proteins from Sequences Using Support Vector Machines

Author: Cao Wei
Ge Zhenyi
Hirose Osamu
Kakuta Masanori
Morita Mizuki
Nakamura Shugo
Shimizu Kentaro
Someya Seizi
Sumikoshi Kazuya
Terada Tohru
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2010
Field of study

Carbohydrate-binding proteins are proteins that can interact with sugar chains but do not modify them. They are involved in many physiological functions, and we have developed a method for predicting them from their amino acid sequences. Our method is based on support vector machines (SVMs). We first clarified the definition of carbohydrate-binding proteins and then constructed positive and negative datasets with which the SVMs were trained. By applying the leave-one-out test to these datasets, our method delivered 0.92 of the area under the receiver operating characteristic (ROC) curve. We also examined two amino acid grouping methods that enable effective learning of sequence patterns and evaluated the performance of these methods. When we applied our method in combination with the homology-based prediction method to the annotated human genome database, H-invDB, we found that the true positive rate of prediction was improved

Crossref

Directory of Open Access Journals

PubMed Central

The interplay of descriptor-based computational analysis with pharmacophore modeling builds the basis for a novel classification scheme for feruloyl esterases

Author: Akin
Altschul
Andersen
Andreasen
Aurilia
Barnum
Bartolomé
Bendtsen
Benner
Benoit
Benoit
Bhasin
Bhasin
Blum
Cai
Cai
Castanares
Chang
Choi
Crepin
D.B.R.K. Gupta Udatha
Dodd
Donaghy
Donaghy
Dudoit
Dysvik
Ewing
Faulds
Ferguson
Fillingham
Finn
Garcia-Conesa
García-Conesa
Garrigues
Gasteiger
Gasteiger
Gianni Panagiotou
Giuliani
Goldstone
Hall
Han
Hatzakis
Henikoff
Hermoso
Hsu
Humberstone
Huson
Irene Kouskoumvekaki
Kaiser
Karchin
Keerthi
Kheder
Kikuzaki
Kim
Kohavi
Kohonen
Koseki
Koseki
Kroon
Kroon
Kumar
Lao
Larkin
Laszlo
Latha
Lee
Lesage-Meessen
Levasseur
Levasseur
Li
Lima
Lisbeth Olsson
MacKay
Marcotte
McAuley
Meinicke
Morris
Mukherjee
Nielsen
Noble
Nsereko
Oili
Ong
Platt
Prates
Pérez-Bercoff
Rashamuse
Record
Rost
Sancho
Sankararaman
Sankararaman
Schrödinger Suite 2009
Schubot
Slavin
Tarbouriech
Teodoro
Thompson
Tomoko
Topakas
Topakas
Topakas
Topakas
Topakas
Tsuchiyama
Tsuchiyama
Uestuen
Vafiadi
Vafiadi
Vafiadi
Vafiadi
Vafiadi
Vafiadi
Wang
Wang
Wang
Wilkinson
Publication venue
Publication date: 11/08/2010
Field of study

One of the most intriguing groups of enzymes, the feruloyl esterases (FAEs), is ubiquitous in both simple and complex organisms. FAEs have gained importance in biofuel, medicine and food industries due to their capability of acting on a large range of substrates for cleaving ester bonds and synthesizing high-added value molecules through esterification and transesterification reactions. During the past two decades extensive studies have been carried out on the production and partial characterization of FAEs from fungi, while much less is known about FAEs of bacterial or plant origin. Initial classification studies on FAEs were restricted on sequence similarity and substrate specificity on just four model substrates and considered only a handful of FAEs belonging to the fungal kingdom. This study centers on the descriptor-based classification and structural analysis of experimentally verified and putative FAEs; nevertheless, the framework presented here is applicable to every poorly characterized enzyme family. 365 FAE-related sequences of fungal, bacterial and plantae origin were collected and they were clustered using Self Organizing Maps followed by k-means clustering into distinct groups based on amino acid composition and physico-chemical composition descriptors derived from the respective amino acid sequence. A Support Vector Machine model was subsequently constructed for the classification of new FAEs into the pre-assigned clusters. The model successfully recognized 98.2% of the training sequences and all the sequences of the blind test. The underlying functionality of the 12 proposed FAE families was validated against a combination of prediction tools and published experimental data. Another important aspect of the present work involves the development of pharmacophore models for the new FAE families, for which sufficient information on known substrates existed. Knowing the pharmacophoric features of a small molecule that are essential for binding to the members of a certain family opens a window of opportunities for tailored applications of FAEs

Crossref

Chalmers Research

Nature Precedings

Online Research Database In Technology

Chalmers Publication Library

HKU Scholars Hub

Identification of Mannose Interacting Residues Using Local Composition

Author: A Garg
A Koch
A Malik
A Malik
Anna Tramontano
C Shionyu-Mitsuyama
C Taroni
E Jeong
F Larsen
F Larsen
FA Quiocho
Gajendra P. S. Raghava
GP Raghava
H Kaur
H Kaur
H Nassif
Harinder Singh
HR Ansari
IB Kuznetsov
JS Chauhan
K Julenius
L Sompayrac
LH Bouwman
M Kulharia
M Kumar
M Kumar
M Muraki
M Patra
M Rashid
M Rashid
MM Gromiha
MS Sujatha
N Bhardwaj
Nitish Kumar Mishra
NK Mishra
RA Bauer
S Ahmad
S Hakomori
Sandhya Agarwal
SF Altschul
T Joachims
V Sobolev
VSR Rao
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

BACKGROUND: Mannose binding proteins (MBPs) play a vital role in several biological functions such as defense mechanisms. These proteins bind to mannose on the surface of a wide range of pathogens and help in eliminating these pathogens from our body. Thus, it is important to identify mannose interacting residues (MIRs) in order to understand mechanism of recognition of pathogens by MBPs. RESULTS: This paper describes modules developed for predicting MIRs in a protein. Support vector machine (SVM) based models have been developed on 120 mannose binding protein chains, where no two chains have more than 25% sequence similarity. SVM models were developed on two types of datasets: 1) main dataset consists of 1029 mannose interacting and 1029 non-interacting residues, 2) realistic dataset consists of 1029 mannose interacting and 10320 non-interacting residues. In this study, firstly, we developed standard modules using binary and PSSM profile of patterns and got maximum MCC around 0.32. Secondly, we developed SVM modules using composition profile of patterns and achieved maximum MCC around 0.74 with accuracy 86.64% on main dataset. Thirdly, we developed a model on a realistic dataset and achieved maximum MCC of 0.62 with accuracy 93.08%. Based on this study, a standalone program and web server have been developed for predicting mannose interacting residues in proteins (http://www.imtech.res.in/raghava/premier/). CONCLUSIONS: Compositional analysis of mannose interacting and non-interacting residues shows that certain types of residues are preferred in mannose interaction. It was also observed that residues around mannose interacting residues have a preference for certain types of residues. Composition of patterns/peptide/segment has been used for predicting MIRs and achieved reasonable high accuracy. It is possible that this novel strategy may be effective to predict other types of interacting residues. This study will be useful in annotating the function of protein as well as in understanding the role of mannose in the immune system

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

An Introduction to Bioinformatics for Glycomics Research

Author: A Bohne-Lang
A Suga
A Varki
A Varki
B Eisenhaber
B Scholkop
B Scholkopf
CA Cooper
CW von der Lieth
CW von der Lieth
D Goldberg
E Banin
FJ Krambeck
Fran Lewitter
H Tang
HH Freeze
J Irungu
JE Hansen
K Hashimoto
K Hashimoto
K Julenius
K Maass
K Ohtsubo
KF Aoki
KF Aoki
KF Aoki
KF Aoki-Kinoshita
Kiyoko F. Aoki-Kinoshita
KK Lohmann
M Dayhoff
M Diligenti
N Fankhauser
N Ueda
NH Packer
P Umana
R Apweiler
R Gupta
R Raman
RS Green
S Doubet
S Hakomori
S Henikoff
S Kawano
T Kuboyama
T Lütteke
T Lütteke
Y Hizukuri
Y Yamanishi
Publication venue: Public Library of Science
Publication date: 01/05/2008
Field of study

Crossref

Directory of Open Access Journals

PubMed Central

Analysis and prediction of cancerlectins using evolutionary and domain information

Author: A Garg
A Thies
Bharat Panwar
C Chen
CK Ching
D Damodaran
E Gorelik
EG De Mejia
EM Zdobnov
FT Liu
Gajendra PS Raghava
GD Fasman
GR Vasta
H Ding
H Kaur
H Kaur
H Kaur
H Lis
J Adam
Jagat S Chauhan
KA Mahon
KC Chou
KV Brinda
L Jun-Wei
M Bhasin
M Jogindra Swamy
M Kumar
M Kumar
M Kumar
M Rashid
M Vijayan
N Sharon
P Baldi
R Duncan
R Kaundal
R Lotan
R Verma
R Verma
Ravi Kumar
S Dejun
S Hu
S Nakahara
SF Altschul
SF Altschul
SH Choi
T Joachims
T Shirai
T Szoke
U Schumacher
U Schumacher
V Vapnik
WR Pearson
XF Bai
Y K Song
YK Song
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Predicting the function of a protein is one of the major challenges in the post-genomic era where a large number of protein sequences of unknown function are accumulating rapidly. Lectins are the proteins that specifically recognize and bind to carbohydrate moieties present on either proteins or lipids. Cancerlectins are those lectins that play various important roles in tumor cell differentiation and metastasis. Although the two types of proteins are linked, still there is no computational method available that can distinguish cancerlectins from the large pool of non-cancerlectins. Hence, it is imperative to develop a method that can distinguish between cancer and non-cancerlectins. Results All the models developed in this study are based on a non-redundant dataset containing 178 cancerlectins and 226 non-cancerlectins in which no two sequences have more than 50% sequence similarity. We have applied the similarity search based technique, i.e. BLAST, and achieved a maximum accuracy of 43.25%. The amino acids compositional analysis have shown that certain residues (e.g. Leucine, Proline) were preferred in cancerlectins whereas some other (e.g. Asparatic acid, Asparagine) were preferred in non-cancerlectins. It has been found that the PROSITE domain "Crystalline beta gamma" was abundant in cancerlectins whereas domains like "SUEL-type lectin domain" were found mainly in non-cancerlectins. An SVM-based model has been developed to differentiate between the cancer and non-cancerlectins which achieved a maximum Matthew's correlation coefficient (MCC) value of 0.32 with an accuracy of 64.84%, using amino acid compositions. We have developed a model based on dipeptide compositions which achieved an MCC value of 0.30 with an accuracy of 64.84%. Thereafter, we have developed models based on split compositions (2 and 4 parts) and achieved an MCC value of 0.31, 0.32 with accuracies of 65.10% and 66.09%, respectively. An SVM model based on Position Specific Scoring Matrix (PSSM), generated by PSI-BLAST, was developed and achieved an MCC value of 0.36 with an accuracy of 68.34%. Finally, we have integrated the PROSITE domain information with PSSM and developed an SVM model that has achieved an MCC value of 0.38 with 69.09% accuracy. Conclusion BLAST has been found inefficient to distinguish between cancer and non-cancerlectins. We analyzed the protein sequences of cancer and non-cancerlectins and identified interesting patterns. We have been able to identify PROSITE domains that are preferred in cancer and non-cancerlectins and thus provided interesting insights into the two types of proteins. The method developed in this study will be useful for researchers studying cancerlectins, lectins and cancer biology. The web-server based on the above study, is available at <url>http://www.imtech.res.in/raghava/cancer_pred/</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

GOPred: GO Molecular Function Prediction by Combined Classifiers

Author: A Arampatzis
A Bairoch
A Ben-Hur
A Fernandes
A Sokolov
A Yildiz
AH Liu
B Vogelstein
BE Engelhardt
BO Bodemann
BYM Cheng
C Altay
C Pasquier
C Zhai
CS Leslie
CZ Cai
D Demos
DMA Martin
DT Holloway
F Wilcoxon
H Hasumi
I Friedberg
I Melvin
J Kittler
JG Shanahan
JTL Wang
K Blekas
L Jensen
MN Wass
Niall James Haslam
O Sasson
OS Sarac
P Rice
PA McChesney
R Eisner
R Karchin
R Schwanbeck
RD King
Rengul Cetin-Atalay
RO Duda
S Tanaka
SF Altschul
SF Altschul
SS Hannenhalli
SY Sohn
T Cover
T Hawkins
V Costa
V Kunik
Volkan Atalay
WR Gilks
WW Colby
X Wang
Y Guermeur
Y jig Cho
Ömer Sinan Saraç
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Functional protein annotation is an important matter for in vivo and in silico biology. Several computational methods have been proposed that make use of a wide range of features such as motifs, domains, homology, structure and physicochemical properties. There is no single method that performs best in all functional classification problems because information obtained using any of these features depends on the function to be assigned to the protein. In this study, we portray a novel approach that combines different methods to better represent protein function. First, we formulated the function annotation problem as a classification problem defined on 300 different Gene Ontology (GO) terms from molecular function aspect. We presented a method to form positive and negative training examples while taking into account the directed acyclic graph (DAG) structure and evidence codes of GO. We applied three different methods and their combinations. Results show that combining different methods improves prediction accuracy in most cases. The proposed method, GOPred, is available as an online computational annotation tool (http://kinaz.fen.bilkent.edu.tr/gopred)

CiteSeerX

Crossref

Bilkent University Institutional Repository

PubMed Central

OpenMETU (Middle East Technical University)

Identification of RNA Binding Proteins and RNA Binding Residues Using Effective Machine Learning Techniques

Author: Khanal Reecha
Publication venue: ScholarWorks@UNO
Publication date: 01/04/2019
Field of study

Identification and annotation of RNA Binding Proteins (RBPs) and RNA Binding residues from sequence information alone is one of the most challenging problems in computational biology. RBPs play crucial roles in several fundamental biological functions including transcriptional regulation of RNAs and RNA metabolism splicing. Existing experimental techniques are time-consuming and costly. Thus, efficient computational identification of RBPs directly from the sequence can be useful to annotate RBP and assist the experimental design. Here, we introduce AIRBP, a computational sequence-based method, which utilizes features extracted from evolutionary information, physiochemical properties, and disordered properties to train a machine learning method designed using stacking, an advanced machine learning technique, for effective prediction of RBPs. Furthermore, it makes use of efficient machine learning algorithms like Support Vector Machine, Logistic Regression, K-Nearest Neighbor and XGBoost (Extreme Gradient Boosting Algorithm). In this research work, we also propose another predictor for efficient annotation of RBP residues. This RBP residue predictor also uses stacking and evolutionary algorithms for efficient annotation of RBPs and RNA Binding residue. The RNA-binding residue predictor also utilizes various evolutionary, physicochemical and disordered properties to train a robust model. This thesis presents a possible solution to the RBP and RNA binding residue prediction problem through two independent predictors, both of which outperform existing state-of-the-art approaches

University of New Orleans

Identification of RNA Binding Proteins and RNA Binding Residues Using Effective Machine Learning Techniques

Author: Khanal Reecha
Publication venue: ScholarWorks@UNO
Publication date: 01/04/2019
Field of study

Bacteriophage-host determinants: identification of bacteriophage receptors through machine learning techniques

Author: Araújo Pedro Henrique Matela Aidos Manso de
Publication venue
Publication date: 01/01/2021
Field of study

Dissertação de mestrado em BioinformaticsBacterial resistance to antibiotics is nowadays becoming a major concern. Several reports indicate that bacteria are developing resistance mechanisms to various antibiotics. Moreover, the processes involved in the development of new antibiotics are lengthy and expensive. Therefore, an alternative to antibiotics is needed. One promising alternative are bacteriophages, viruses that specifically infect bacteria, causing their lysis. Hence, it would be interesting to discover which bacteria a specific phage recognizes. The bacterial receptors determine phage specificity, using tail spikes/fibres as receptor binding proteins to detect carbohydrates or proteins, in bacterial surface. Studying interactions between phage tail spikes/- fibres and bacterial receptors can allow the identification of interaction pairs. Machine learning algorithms can be used to find patterns in these interactions and build models to make predictions. In this work, PhageHost, a tool that predicts hosts at a strain level, for three species, E. coli, K. pneumoniae and A. baumannii was developed. Several data was extracted from GenBank, retrieving general, protein and coding information, for both phages and bacteria. The protein data was used to build an important phage protein function database, that allowed the classification of protein functions, namely, phage tail spikes/fibres. In the end, several machine learning models with relevant protein features were created to predict phage-host strain interactions. Compared with previously performed works, these models show better predictive power and the ability to perform strain-level predictions. For the best model, a Matthews correlation coefficient (MCC) of 96.6% and an F-score of 98.3% were obtained. These best predictive models were implemented online, in a server under the name PhageHost (https://galaxy.bio.di. uminho.pt).Resistência bacteriana a antibióticos está a tornar-se uma preocupação hoje em dia. Várias bactérias foram descritas desenvolvendo mecanismos de resistência a diversos antibióticos. Aliado a isto, estão os longos e dispendiosos processos envolvidos no desenvolvimento de antibióticos. Por isso, há a necessidade de procurar uma alternativa aos antibióticos. Uma alternativa promissora são os bacteriófagos, vírus que infetam especificamente bactérias e levam à sua lise. Posto isto, seria interessante descobrir qual a bactéria que um certo fago reconhece. A especificidade de fagos é dada pelos recetores da superfícies das bactérias que conseguem reconhecer. Eles usam proteínas das spikes/fibras para reconhecer recetires proteicos ou hidratos de carbono nas bactérias. Estudar as interações entre spikes/fibras das caudas de fagos e recetores bacterianos pode permitir a identificação de pares de interação. Algoritmos de aprendizagem máquina podem ser utilizados para descobrir padrões nestas interações e construir modelos para realizar previsões. Neste trabalho, a ferramenta PhageHost foi desenvolvida. Permite a previsão de hospedeiros ao nível da estirpe, para três espécies, E. coli, K. pneumoniae e A. baumannii. Vários dados foram extraídos do GenBank, nomeadamente informações gerais, de proteína e codificante, para fagos e bactérias. Com todos os dados proteicos, uma base de dados importante foi construída, que permitiu a classificação de funções proteicas, nomeadamente, spikes/fibras das caudas dos fagos. Finalmente, vários modelos de aprendizagem máquina, com características proteicas relevantes, capazes de prever interações fago-hospedeiro, a nível da estirpe. Em comparação com outros trabalhos semelhantes, estes modelos demonstraram melhor poder preditivo, assim como capacidade de prever interações a nível da estirpe. Para o melhor modelo foram obtidos um coeficiente de correlação de Matthews de 96.6% e um F-score de 98.3%. Os melhores modelos foram implementados online, num servidor com o nome PhageHost (https://galaxy.bio.di.uminho.pt)

Universidade do Minho: RepositoriUM