Search CORE

13,263 research outputs found

SChloro: directing Viridiplantae proteins to six chloroplastic sub-compartments

Author: Casadio Rita
Fariselli Piero
Martelli Pier Luigi
Savojardo Castrense
Publication venue: 'Oxford University Press (OUP)'
Publication date: 22/10/2016
Field of study

Motivation: Chloroplasts are organelles found in plants and involved in several important cell processes. Similarly to other compartments in the cell, chloroplasts have an internal structure comprising several sub-compartments, where different proteins are targeted to perform their functions. Given the relation between protein function and localization, the availability of effective computational tools to predict protein sub-organelle localizations is crucial for large-scale functional studies. Results: In this paper we present SChloro, a novel machine-learning approach to predict protein sub-chloroplastic localization, based on targeting signal detection and membrane protein information. The proposed approach performs multi-label predictions discriminating six chloroplastic sub-compartments that include inner membrane, outer membrane, stroma, thylakoid lumen, plastoglobule and thylakoid membrane. In comparative benchmarks, the proposed method outperforms current state-of-the-art methods in both single-and multi-compartment predictions, with an overall multi-label accuracy of 74%. The results demonstrate the relevance of the approach that is eligible as a good candidate for integration into more general large-scale annotation pipelines of protein subcellular localization

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Padova

Institutional Research Information System University of Turin

ProLanGO: Protein Function Prediction Using Neural~Machine Translation Based on a Recurrent Neural Network

Author: Cao Renzhi
Chan Leong
Chen Zhangxin
Freitas Colton
Jiang Haiqing
Sun Miao
Publication venue
Publication date: 01/10/2017
Field of study

With the development of next generation sequencing techniques, it is fast and cheap to determine protein sequences but relatively slow and expensive to extract useful information from protein sequences because of limitations of traditional biological experimental techniques. Protein function prediction has been a long standing challenge to fill the gap between the huge amount of protein sequences and the known function. In this paper, we propose a novel method to convert the protein function problem into a language translation problem by the new proposed protein sequence language "ProLan" to the protein function language "GOLan", and build a neural machine translation model based on recurrent neural networks to translate "ProLan" language to "GOLan" language. We blindly tested our method by attending the latest third Critical Assessment of Function Annotation (CAFA 3) in 2016, and also evaluate the performance of our methods on selected proteins whose function was released after CAFA competition. The good performance on the training and testing datasets demonstrates that our new proposed method is a promising direction for protein function prediction. In summary, we first time propose a method which converts the protein function prediction problem to a language translation problem and applies a neural machine translation model for protein function prediction.Comment: 13 pages, 5 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

The interplay of descriptor-based computational analysis with pharmacophore modeling builds the basis for a novel classification scheme for feruloyl esterases

Author: Akin
Altschul
Andersen
Andreasen
Aurilia
Barnum
Bartolomé
Bendtsen
Benner
Benoit
Benoit
Bhasin
Bhasin
Blum
Cai
Cai
Castanares
Chang
Choi
Crepin
D.B.R.K. Gupta Udatha
Dodd
Donaghy
Donaghy
Dudoit
Dysvik
Ewing
Faulds
Ferguson
Fillingham
Finn
Garcia-Conesa
García-Conesa
Garrigues
Gasteiger
Gasteiger
Gianni Panagiotou
Giuliani
Goldstone
Hall
Han
Hatzakis
Henikoff
Hermoso
Hsu
Humberstone
Huson
Irene Kouskoumvekaki
Kaiser
Karchin
Keerthi
Kheder
Kikuzaki
Kim
Kohavi
Kohonen
Koseki
Koseki
Kroon
Kroon
Kumar
Lao
Larkin
Laszlo
Latha
Lee
Lesage-Meessen
Levasseur
Levasseur
Li
Lima
Lisbeth Olsson
MacKay
Marcotte
McAuley
Meinicke
Morris
Mukherjee
Nielsen
Noble
Nsereko
Oili
Ong
Platt
Prates
Pérez-Bercoff
Rashamuse
Record
Rost
Sancho
Sankararaman
Sankararaman
Schrödinger Suite 2009
Schubot
Slavin
Tarbouriech
Teodoro
Thompson
Tomoko
Topakas
Topakas
Topakas
Topakas
Topakas
Tsuchiyama
Tsuchiyama
Uestuen
Vafiadi
Vafiadi
Vafiadi
Vafiadi
Vafiadi
Vafiadi
Wang
Wang
Wang
Wilkinson
Publication venue
Publication date: 11/08/2010
Field of study

One of the most intriguing groups of enzymes, the feruloyl esterases (FAEs), is ubiquitous in both simple and complex organisms. FAEs have gained importance in biofuel, medicine and food industries due to their capability of acting on a large range of substrates for cleaving ester bonds and synthesizing high-added value molecules through esterification and transesterification reactions. During the past two decades extensive studies have been carried out on the production and partial characterization of FAEs from fungi, while much less is known about FAEs of bacterial or plant origin. Initial classification studies on FAEs were restricted on sequence similarity and substrate specificity on just four model substrates and considered only a handful of FAEs belonging to the fungal kingdom. This study centers on the descriptor-based classification and structural analysis of experimentally verified and putative FAEs; nevertheless, the framework presented here is applicable to every poorly characterized enzyme family. 365 FAE-related sequences of fungal, bacterial and plantae origin were collected and they were clustered using Self Organizing Maps followed by k-means clustering into distinct groups based on amino acid composition and physico-chemical composition descriptors derived from the respective amino acid sequence. A Support Vector Machine model was subsequently constructed for the classification of new FAEs into the pre-assigned clusters. The model successfully recognized 98.2% of the training sequences and all the sequences of the blind test. The underlying functionality of the 12 proposed FAE families was validated against a combination of prediction tools and published experimental data. Another important aspect of the present work involves the development of pharmacophore models for the new FAE families, for which sufficient information on known substrates existed. Knowing the pharmacophoric features of a small molecule that are essential for binding to the members of a certain family opens a window of opportunities for tailored applications of FAEs

Crossref

Chalmers Research

Nature Precedings

Online Research Database In Technology

Chalmers Publication Library

HKU Scholars Hub

Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features

Author: Cai Yu-Dong
Chou Kuo-Chen
He Zhisong
Hu Le-Le
Kong Xiangyin
Shi Xiao-He
Zhang Jian
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background: Study of drug-target interaction networks is an important topic for drug development. It is both timeconsuming and costly to determine compound-protein interactions or potential drug-target interactions by experiments alone. As a complement, the in silico prediction methods can provide us with very useful information in a timely manner. Methods/Principal Findings: To realize this, drug compounds are encoded with functional groups and proteins encoded by biological features including biochemical and physicochemical properties. The optimal feature selection procedures are adopted by means of the mRMR (Maximum Relevance Minimum Redundancy) method. Instead of classifying the proteins as a whole family, target proteins are divided into four groups: enzymes, ion channels, G-protein- coupled receptors and nuclear receptors. Thus, four independent predictors are established using the Nearest Neighbor algorithm as their operation engine, with each to predict the interactions between drugs and one of the four protein groups. As a result, the overall success rates by the jackknife cross-validation tests achieved with the four predictors are 85.48%, 80.78%, 78.49%, and 85.66%, respectively. Conclusion/Significance: Our results indicate that the network prediction system thus established is quite promising an

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence

Author: A Kumar
A Reinhardt
B Matthews
C Guda
C Guda
CH Wu
G Dellaire
G-P Zhou
H Liu
H-B Shen
H-B Shen
HGE Sutherland
J Cedano
JL Heazlewood
K Nakai
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-J Park
M Wang
M Wang
MA Andrade
MS Scott
O Emanuelsson
P Lio
Pufeng Du
Q-B Gao
RA Gottlieb
S Hua
S Kawashima
S-Q Wang
W Jassem
W Li
WA BickMore
Y Huang
Y-D Cai
Y-D Cai
Y-D Cai
Yanda Li
Z Lei
Z Yuan
Z-P Feng
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Knowing the submitochondria localization of a mitochondria protein is an important step to understand its function. We develop a method which is based on an extended version of pseudo-amino acid composition to predict the protein localization within mitochondria. This work goes one step further than predicting protein subcellular location. We also try to predict the membrane protein type for mitochondrial inner membrane proteins. RESULTS: By using leave-one-out cross validation, the prediction accuracy is 85.5% for inner membrane, 94.5% for matrix and 51.2% for outer membrane. The overall prediction accuracy for submitochondria location prediction is 85.2%. For proteins predicted to localize at inner membrane, the accuracy is 94.6% for membrane protein type prediction. CONCLUSION: Our method is an effective method for predicting protein submitochondria location. But even with our method or the methods at subcellular level, the prediction of protein submitochondria location is still a challenging problem. The online service SubMito is now available at

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Using Support Vector Machine and Evolutionary Profiles to Predict Antifreeze Protein Sequences

Author: Altschul
Breton
Chen
Cheng
Chou
Chou
Chou
Chou
Chou
Chou
Chou
Chou
Chou
Chou
Davies
Davies
Davies
Esmaeili
Ewart
Griffith
Jones
Kandaswamy
Kawashima
Lewitt
Li
Li
Lin
Logsdon
Minghao Yin
Mohabatkar
Moriyama
Ruchi
Schaffer
Scholander
Sformo
Shao
Shen
Shen
Urrutia
Vapnik
Wang
Xiaowei Zhao
Yu
Zeng
Zhao
Zhiqiang Ma
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/01/2012
Field of study

Antifreeze proteins (AFPs) are ice-binding proteins. Accurate identification of new AFPs is important in understanding ice-protein interactions and creating novel ice-binding domains in other proteins. In this paper, an accurate method, called AFP_PSSM, has been developed for predicting antifreeze proteins using a support vector machine (SVM) and position specific scoring matrix (PSSM) profiles. This is the first study in which evolutionary information in the form of PSSM profiles has been successfully used for predicting antifreeze proteins. Tested by 10-fold cross validation and independent test, the accuracy of the proposed method reaches 82.67% for the training dataset and 93.01% for the testing dataset, respectively. These results indicate that our predictor is a useful tool for predicting antifreeze proteins. A web server (AFP_PSSM) that implements the proposed predictor is freely available

Multidisciplinary Digital Publishing Institute

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Classification and Analysis of Regulatory Pathways Using Graph Property, Biochemical and Physicochemical Property, and Functional Property

Author: A Bairoch
A Barabasi
C Chen
C Chen
C Klukas
C Krieger
Cathal Seoighe
CF Gao
D Chakrabarti
D Frishman
DN Georgiou
E Camon
F Chiti
G Pollastri
GF Cooper
GP Zhou
GP Zhou
GY Zhang
H Ding
H Lin
H Mohabatkar
H Mohabatkar
H Ogata
H Peng
I Althaus
I Althaus
I Althaus
I Dubchak
I Dubchak
I Schomburg
I Schomburg
IH Witten
J Andraos
J Cheng
J Cheng
JD Qiu
JM Dale
K Chou
K Chou
K Chou
K Chou
K Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
Kuo-Chen Chou
L Chen
L Chen
L Chen
L Chen
L Chen
L Lu
L Lu
L Yu
Lei Chen
M Chang
M Esmaeili
M Kanehisa
M Kanehisa
M Kanehisa
M Kanehisa
N Chazal
N Friedman
P Carmona-Saez
P Pharkya
Q Gu
R Caspi
R Caspi
RR Bouckaert
S Salzberg
SS Keerthi
T Denoeux
T Huang
T Huang
T Huang
T Huang
T Huang
Tao Huang
U Stelzl
W Buntine
X Xiao
XB Zhou
Y Cai
Y Cai
Y Cai
Y Qi
YH Zeng
YS Lobanova
Yu-Dong Cai
Z He
ZC Wu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Given a regulatory pathway system consisting of a set of proteins, can we predict which pathway class it belongs to? Such a problem is closely related to the biological function of the pathway in cells and hence is quite fundamental and essential in systems biology and proteomics. This is also an extremely difficult and challenging problem due to its complexity. To address this problem, a novel approach was developed that can be used to predict query pathways among the following six functional categories: (i) “Metabolism”, (ii) “Genetic Information Processing”, (iii) “Environmental Information Processing”, (iv) “Cellular Processes”, (v) “Organismal Systems”, and (vi) “Human Diseases”. The prediction method was established trough the following procedures: (i) according to the general form of pseudo amino acid composition (PseAAC), each of the pathways concerned is formulated as a 5570-D (dimensional) vector; (ii) each of components in the 5570-D vector was derived by a series of feature extractions from the pathway system according to its graphic property, biochemical and physicochemical property, as well as functional property; (iii) the minimum redundancy maximum relevance (mRMR) method was adopted to operate the prediction. A cross-validation by the jackknife test on a benchmark dataset consisting of 146 regulatory pathways indicated that an overall success rate of 78.8% was achieved by our method in identifying query pathways among the above six classes, indicating the outcome is quite promising and encouraging. To the best of our knowledge, the current study represents the first effort in attempting to identity the type of a pathway system or its biological function. It is anticipated that our report may stimulate a series of follow-up investigations in this new and challenging area

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Analysis and Prediction of the Metabolic Stability of Proteins Based on Their Sequential Features, Subcellular Locations and Interaction Networks

Author: A Madkan
A Ruepp
Andreas Hofmann
B Niu
C Chen
C Chothia
CA Minetti
DS Wishart
FM Li
G Pollastri
G Pollastri
H Ding
H Lin
H Lin
H Peng
H Wei
HB Shen
HB Shen
HC Yen
I Dubchak
I Dubchak
J Wang
JF Wang
JF Wang
JF Wang
JF Wang
JJ Chou
JL Fauchere
JR Schnell
K Gong
K Oxenoid
Kai-Yan Feng
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
Kuo-Chen Chou
L Cristian
L Li
LeLe Hu
LJ Jensen
MM Gromiha
P Martel
P Rice
PA Fields
Ping Wang
QS Du
R Grantham
R Lumry
R Sharan
RB Huang
RM Pielak
SF Altschul
SH White
T Huang
Tao Huang
TJ Kamerzell
TL Zhang
X Xiao
Xiangyin Kong
Xiao-He Shi
Yi-Xue Li
Yu-Dong Cai
Z Qian
Zhisong He
Publication venue: Public Library of Science
Publication date: 04/06/2010
Field of study

The metabolic stability is a very important idiosyncracy of proteins that is related to their global flexibility, intramolecular fluctuations, various internal dynamic processes, as well as many marvelous biological functions. Determination of protein's metabolic stability would provide us with useful information for in-depth understanding of the dynamic action mechanisms of proteins. Although several experimental methods have been developed to measure protein's metabolic stability, they are time-consuming and more expensive. Reported in this paper is a computational method, which is featured by (1) integrating various properties of proteins, such as biochemical and physicochemical properties, subcellular locations, network properties and protein complex property, (2) using the mRMR (Maximum Relevance & Minimum Redundancy) principle and the IFS (Incremental Feature Selection) procedure to optimize the prediction engine, and (3) being able to identify proteins among the four types: “short”, “medium”, “long”, and “extra-long” half-life spans. It was revealed through our analysis that the following seven characters played major roles in determining the stability of proteins: (1) KEGG enrichment scores of the protein and its neighbors in network, (2) subcellular locations, (3) polarity, (4) amino acids composition, (5) hydrophobicity, (6) secondary structure propensity, and (7) the number of protein complexes the protein involved. It was observed that there was an intriguing correlation between the predicted metabolic stability of some proteins and the real half-life of the drugs designed to target them. These findings might provide useful insights for designing protein-stability-relevant drugs. The computational method can also be used as a large-scale tool for annotating the metabolic stability for the avalanche of protein sequences generated in the post-genomic age

Public Library of Science (PLOS)

Crossref

PubMed Central