Search CORE

18 research outputs found

Efficacy of different protein descriptors in predicting protein functional families

Author: Cao Z.
Chen Y.Z.
Li Z.R.
Lin H.H.
Ong S.A.K.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

10.1186/1471-2105-8-300BMC Bioinformatics8-BBMI

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

ScholarBank@NUS

Efficacy of different protein descriptors in predicting protein functional families using support vector machine

Author: ONG AI KIANG SERENE
Publication venue
Publication date: 29/01/2008
Field of study

Master'sMASTER OF SCIENCE (PHARMACY

ScholarBank@NUS

Enzyme classification with peptide programs: a comparative study

Author: A Al-Shahib
A Bairoch
A Garg
André O Falcão
António EN Ferreira
AO Falcao
C Pasquier
CE Jones
CZ Cai
D Devos
D Devos
Daniel Faria
DP Lewis
HH Lin
I Dubchak
K Chou
L Breiman
L Han
L Nanni
L Nanni
L Nanni
LI Kuncheva
LY Han
M Kumar
MQ Yang
N Bhardwaj
RE Langlois
S Kirkpatrick
SAK Ong
SF Altschul
T Joachims
W Tian
ZR Li
ZR Yang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Efficient and accurate prediction of protein function from sequence is one of the standing problems in Biology. The generalised use of sequence alignments for inferring function promotes the propagation of errors, and there are limits to its applicability. Several machine learning methods have been applied to predict protein function, but they lose much of the information encoded by protein sequences because they need to transform them to obtain data of fixed length. Results We have developed a machine learning methodology, called peptide programs (PPs), to deal directly with protein sequences and compared its performance with that of Support Vector Machines (SVMs) and BLAST in detailed enzyme classification tasks. Overall, the PPs and SVMs had a similar performance in terms of Matthews Correlation Coefficient, but the PPs had generally a higher precision. BLAST performed globally better than both methodologies, but the PPs had better results than BLAST and SVMs for the smaller datasets. Conclusion The higher precision of the PPs in comparison to the SVMs suggests that dealing with sequences is advantageous for detailed protein classification, as precision is essential to avoid annotation errors. The fact that the PPs performed better than BLAST for the smaller datasets demonstrates the potential of the methodology, but the drop in performance observed for the larger datasets indicates that further development is required. Possible strategies to address this issue include partitioning the datasets into smaller subsets and training individual PPs for each subset, or training several PPs for each dataset and combining them using a bagging strategy.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Universidade de Lisboa: Repositório.UL

Predicting Bevirimat resistance of HIV-1 from genotype

Author: A Kernytsky
A Löytynoja
AD Sevin
C Cole
C Notredame
CS Adamson
CS Adamson
D Heider
D Nguyen
D Wang
Daniel Hoffmann
DK Worthylake
Dominik Heider
E Frank
ER Wright
F Li
F Li
F Wilcoxon
GC Cawley
HB Shen
IH Witten
J Demsar
J Kingston
J Kyte
J Thompson
J Verheyen
J Zhou
Jens Verheyen
K Salzwedel
K Salzwedel
KC Chou
KV Baelen
L Breiman
L Nanni
M Borschbach
M Miller
M Riedmiller
MA Accola
N Beerenwinkel
N Beerenwinkel
N Margot
N Morellet
R Development Core Team
R King
R Lathrop
RC Edgar
RE Banfield
RJ Murray
S Draghici
S McCallister
S Ong
S Tzafestas
SR Eddy
T Fawcett
T Sing
W Resch
WW Cohen
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Maturation inhibitors are a new class of antiretroviral drugs. Bevirimat (BVM) was the first substance in this class of inhibitors entering clinical trials. While the inhibitory function of BVM is well established, the molecular mechanisms of action and resistance are not well understood. It is known that mutations in the regions CS p24/p2 and p2 can cause phenotypic resistance to BVM. We have investigated a set of p24/p2 sequences of HIV-1 of known phenotypic resistance to BVM to test whether BVM resistance can be predicted from sequence, and to identify possible molecular mechanisms of BVM resistance in HIV-1. Results We used artificial neural networks and random forests with different descriptors for the prediction of BVM resistance. Random forests with hydrophobicity as descriptor performed best and classified the sequences with an area under the Receiver Operating Characteristics (ROC) curve of 0.93 ± 0.001. For the collected data we find that p2 sequence positions 369 to 376 have the highest impact on resistance, with positions 370 and 372 being particularly important. These findings are in partial agreement with other recent studies. Apart from the complex machine learning models we derived a number of simple rules that predict BVM resistance from sequence with surprising accuracy. According to computational predictions based on the data set used, cleavage sites are usually not shifted by resistance mutations. However, we found that resistance mutations could shorten and weaken the <it>α</it>-helix in p2, which hints at a possible resistance mechanism. Conclusions We found that BVM resistance of HIV-1 can be predicted well from the sequence of the p2 peptide, which may prove useful for personalized therapy if maturation inhibitors reach clinical practice. Results of secondary structure analysis are compatible with a possible route to BVM resistance in which mutations weaken a six-helix bundle discovered in recent experiments, and thus ease Gag cleavage by the retroviral protease.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Machine learning on normalized protein sequences

Author: A Altmann
A Kernytsky
AE Karnoub
AK Patick
B Liu
B Liu
C Strobl
C Torti
D Heider
D Heider
D Heider
D Wang
Daniel Hoffmann
DJ Kempf
Dominik Heider
F Wilcoxon
GC Cawley
GE Forsythe
GM Pao
H Lodhi
I Dubchak
IR Vetter
J Demsar
J Kjaer
J Kyte
J Pánek
Jens Verheyen
JN Dybowski
K Wang
KC Chou
L Breiman
L Nanni
M Borschbach
M Kierczak
M Kozisek
MA Jensen
ME Quinones-Mateu
N Beerenwinkel
N Beerenwinkel
N Beerenwinkel
N Qian
NS Shulman
O Haq
P Chowriappa
P Mundra
R Colonno
S Boisvert
S Ong
S Sonnenburg
S Xu
SY Rhee
T Fawcett
T Hou
T Sing
TB Thompson
V Svetnik
W Resch
Y Guo
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Machine learning techniques have been widely applied to biological sequences, e.g. to predict drug resistance in HIV-1 from sequences of drug target proteins and protein functional classes. As deletions and insertions are frequent in biological sequences, a major limitation of current methods is the inability to handle varying sequence lengths. Findings We propose to normalize sequences to uniform length. To this end, we tested one linear and four different non-linear interpolation methods for the normalization of sequence lengths of 19 classification datasets. Classification tasks included prediction of HIV-1 drug resistance from drug target sequences and sequence-based prediction of protein function. We applied random forests to the classification of sequences into "positive" and "negative" samples. Statistical tests showed that the linear interpolation outperforms the non-linear interpolation methods in most of the analyzed datasets, while in a few cases non-linear methods had a small but significant advantage. Compared to other published methods, our prediction scheme leads to an improvement in prediction accuracy by up to 14%. Conclusions We found that machine learning on sequences normalized by simple linear interpolation gave better or at least competitive results compared to state-of-the-art procedures, and thus, is a promising alternative to existing methods, especially for protein sequences of variable length.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Machine learning based prediction of esterases' promiscuity

Author: Universitat Autònoma de Barcelona. Facultat de Biociències
Xiang Ruite
Publication venue
Publication date: 01/01/2020
Field of study

Els enzims són de gran interès per a la majoria de les indústries, no obstant la seva caracterització en el laboratori és costosa i molt laboriosa, fet que ha impulsat el desenvolupament de tecnologies de predicció de les activitats dels enzims. Malgrat això, els enzims industrials han de tenir unes propietats molt específiques com per exemple alta especificitat, alta activitat en condicions no biològiques i alta promiscuitat, característiques que no estan ben cobertes per les eines de predicció actuals. Per aquest motiu, amb aquest projecte, s'intenta mitigar el problema creant classificadors binaris que poden predir la promiscuitat de les esterases.Enzymes are of great interest for a vast variety of industries; however, the experimental characterization is very time consuming and expensive. Moreover, industrial enzymes need to adapt to nonbiological conditions while maintaining high activity, promiscuity and stereo-selectivity, properties that are not well covered, currently, by prediction technologies which means that their characterization still relies solely on experimentation. This project has the intention of mitigating the problem by developing binary classifiers and multi-classifiers that can predict the promiscuity of esterases, one of the many industrially relevant enzymes

Diposit Digital de Documents de la UAB

Effect of Features Generated from Adjacent and Overlapped Segments in Protein Sequence Classification

Author: Mohammad Reza Faisal
モハマドレザファイサル
Publication venue
Publication date: 26/09/2018
Field of study

13301甲第4828号博士（工学）金沢大学博士論文要旨Abstrac

Institutional Repositories DataBase (IRDB)

Kanazawa University Repository for Academic Resources

Prediction of lung tumor types based on protein attributes by machine learning algorithms

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref

The interplay of descriptor-based computational analysis with pharmacophore modeling builds the basis for a novel classification scheme for feruloyl esterases

Author: Akin
Altschul
Andersen
Andreasen
Aurilia
Barnum
Bartolomé
Bendtsen
Benner
Benoit
Benoit
Bhasin
Bhasin
Blum
Cai
Cai
Castanares
Chang
Choi
Crepin
D.B.R.K. Gupta Udatha
Dodd
Donaghy
Donaghy
Dudoit
Dysvik
Ewing
Faulds
Ferguson
Fillingham
Finn
Garcia-Conesa
García-Conesa
Garrigues
Gasteiger
Gasteiger
Gianni Panagiotou
Giuliani
Goldstone
Hall
Han
Hatzakis
Henikoff
Hermoso
Hsu
Humberstone
Huson
Irene Kouskoumvekaki
Kaiser
Karchin
Keerthi
Kheder
Kikuzaki
Kim
Kohavi
Kohonen
Koseki
Koseki
Kroon
Kroon
Kumar
Lao
Larkin
Laszlo
Latha
Lee
Lesage-Meessen
Levasseur
Levasseur
Li
Lima
Lisbeth Olsson
MacKay
Marcotte
McAuley
Meinicke
Morris
Mukherjee
Nielsen
Noble
Nsereko
Oili
Ong
Platt
Prates
Pérez-Bercoff
Rashamuse
Record
Rost
Sancho
Sankararaman
Sankararaman
Schrödinger Suite 2009
Schubot
Slavin
Tarbouriech
Teodoro
Thompson
Tomoko
Topakas
Topakas
Topakas
Topakas
Topakas
Tsuchiyama
Tsuchiyama
Uestuen
Vafiadi
Vafiadi
Vafiadi
Vafiadi
Vafiadi
Vafiadi
Wang
Wang
Wang
Wilkinson
Publication venue
Publication date: 11/08/2010
Field of study

One of the most intriguing groups of enzymes, the feruloyl esterases (FAEs), is ubiquitous in both simple and complex organisms. FAEs have gained importance in biofuel, medicine and food industries due to their capability of acting on a large range of substrates for cleaving ester bonds and synthesizing high-added value molecules through esterification and transesterification reactions. During the past two decades extensive studies have been carried out on the production and partial characterization of FAEs from fungi, while much less is known about FAEs of bacterial or plant origin. Initial classification studies on FAEs were restricted on sequence similarity and substrate specificity on just four model substrates and considered only a handful of FAEs belonging to the fungal kingdom. This study centers on the descriptor-based classification and structural analysis of experimentally verified and putative FAEs; nevertheless, the framework presented here is applicable to every poorly characterized enzyme family. 365 FAE-related sequences of fungal, bacterial and plantae origin were collected and they were clustered using Self Organizing Maps followed by k-means clustering into distinct groups based on amino acid composition and physico-chemical composition descriptors derived from the respective amino acid sequence. A Support Vector Machine model was subsequently constructed for the classification of new FAEs into the pre-assigned clusters. The model successfully recognized 98.2% of the training sequences and all the sequences of the blind test. The underlying functionality of the 12 proposed FAE families was validated against a combination of prediction tools and published experimental data. Another important aspect of the present work involves the development of pharmacophore models for the new FAE families, for which sufficient information on known substrates existed. Knowing the pharmacophoric features of a small molecule that are essential for binding to the members of a certain family opens a window of opportunities for tailored applications of FAEs

Crossref

Chalmers Research

Nature Precedings

Online Research Database In Technology

Chalmers Publication Library

HKU Scholars Hub