Search CORE

D-Scholarship@Pitt

FigShare

Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences

Author: A Jammalamadaka
B Cai
Binghuang Cai
C Cortes
C Hans
CJ Bendell
CW Tung
D Dash
D Dash
D Heckerman
DG Kleinbaum
DJ Yu
DS Kirkpatrick
FV Jensen
G Xu
GF Cooper
GF Cooper
I Walsh
J Herrmann
JM Peng
K Tomii
L Li
P Kontkanen
R Kohavi
RE Fan
RE Neapolitan
RL Welchman
S Kawashima
T Hastie
UB Kjaerulff
W Kim
W Wei
X Chen
X Jiang
X Jiang
X Jiang
X Jiang
X Jiang
Xia Jiang
Y Cai
Z Chen
Z Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Public Library of Science (PLOS)

Incorporating Distant Sequence Features and Radial Basis Function Networks to Identify Ubiquitin Conjugation Sites

Author: A Catic
A Hershko
A Zanzoni
AL Chernorudskiy
B Boeckmann
C Chothia
C-J Lin
CN Pang
CT Su
CW Tung
CY Ou
D Xie
DM Shien
DT Jones
GE Crooks
GZ Zhang
HM Berman
Hsin-Yi Hung
J Peng
JL Fauchere
K Bryson
K Ron
L Hicke
LJ McGuffin
M Charton
P Radivojac
R Grantham
S Ahmad
SA Chen
SF Altschul
SF Altschul
Shu-An Chen
T Gilon
TA Tatusova
TD Schneider
TL Bailey
Tzong-Yi Lee
V Vacic
Vladimir Uversky
Y-Y Ou
Yu-Yen Ou
YY Ou
Z Hu
ZR Yang
Publication venue: Public Library of Science
Publication date: 09/03/2011
Field of study

Ubiquitin (Ub) is a small protein that consists of 76 amino acids about 8.5 kDa. In ubiquitin conjugation, the ubiquitin is majorly conjugated on the lysine residue of protein by Ub-ligating (E3) enzymes. Three major enzymes participate in ubiquitin conjugation. They are – E1, E2 and E3 which are responsible for activating, conjugating and ligating ubiquitin, respectively. Ubiquitin conjugation in eukaryotes is an important mechanism of the proteasome-mediated degradation of a protein and regulating the activity of transcription factors. Motivated by the importance of ubiquitin conjugation in biological processes, this investigation develops a method, UbSite, which uses utilizes an efficient radial basis function (RBF) network to identify protein ubiquitin conjugation (ubiquitylation) sites. This work not only investigates the amino acid composition but also the structural characteristics, physicochemical properties, and evolutionary information of amino acids around ubiquitylation (Ub) sites. With reference to the pathway of ubiquitin conjugation, the substrate sites for E3 recognition, which are distant from ubiquitylation sites, are investigated. The measurement of F-score in a large window size (−20∼+20) revealed a statistically significant amino acid composition and position-specific scoring matrix (evolutionary information), which are mainly located distant from Ub sites. The distant information can be used effectively to differentiate Ub sites from non-Ub sites. As determined by five-fold cross-validation, the model that was trained using the combination of amino acid composition and evolutionary information performs best in identifying ubiquitin conjugation sites. The prediction sensitivity, specificity, and accuracy are 65.5%, 74.8%, and 74.5%, respectively. Although the amino acid sequences around the ubiquitin conjugation sites do not contain conserved motifs, the cross-validation result indicates that the integration of distant sequence features of Ub sites can improve predictive performance. Additionally, the independent test demonstrates that the proposed method can outperform other ubiquitylation prediction tools

Journal of Mountain Area Research (Karakoram International University, Gilgit, Pakistan)

UBI-XGB: IDENTIFICATION OF UBIQUITIN PROTEINS USING MACHINE LEARNING MODEL

Author: Farman Ali
Ghulam Ali
Maher Zulfikar Ahmed
Rahu Sikandar
Saba Erum
Talpur Dhani Bux
Talpur Mir Sajjad Hussain
Tunio Saima
Publication venue: Karakoram International University Gilgit, Pakistan
Publication date: 02/12/2022
Field of study

A recent line of research has focused on Ubiquitination, a pervasive and proteasome-mediated protein degradation that controls apoptosis and is crucial in the breakdown of proteins and the development of cell disorders, is a major factor.  The turnover of proteins and ubiquitination are two related processes. We predict ubiquitination sites; these attributes are lastly fed into the extreme gradient boosting (XGBoost) classifier. We develop reliable predictors computational tool using experimental identification of protein ubiquitination sites is typically labor- and time-intensive. First, we encoded protein sequence features into matrix data using Dipeptide Deviation from Expected Mean (DDE) features encoding techniques. We also proposed 2nd features extraction model named dipeptide composition (DPC) model. It is vital to develop reliable predictors since experimental identification of protein ubiquitination sites is typically labor- and time-intensive. In this paper, we proposed computational method as named Ubipro-XGBoost, a multi-view feature-based technique for predicting ubiquitination sites. Recent developments in proteomic technology have sparked renewed interest in the identification of ubiquitination sites in a number of human disorders, which have been studied experimentally and clinically.  When more experimentally verified ubiquitination sites appear, we developed a predictive algorithm that can locate lysine ubiquitination sites in large-scale proteome data. This paper introduces Ubipro-XGBoost, a machine learning method. Ubipro-XGBoost had an AUC (area under the Receiver Operating Characteristic curve) of 0.914% accuracy, 0.836% Sensitivity, 0.992% Specificity, and 0.839% MCC on a 5-fold cross validation based on DPC model, and 2nd 0.909% accuracy, 0.839% Sensitivity, 0.979% Specificity, and 0. 0.829% MCC on a 5-fold cross validation based on DDE model. The findings demonstrate that the suggested technique, Ubipro-XGBoost, outperforms conventional ubiquitination prediction methods and offers fresh advice for ubiquitination site identification

Prediction of Lysine Ubiquitylation with Ensemble Classifier and Feature Selection

Author: Aguilar
Altschul
Anand
Atchey
Boeckmann
Bordoli
Breiman
Cai
Chou
Chou
Chou
Cover
Denis
Dunker
Fleuret
He
Herrmann
Hershko
Hicke
Hicke
Hitchcock
Jeon
Jones
Kaur
Kawashima
Kim
Kirkpatrick
Levi
Li
Liu
Liu
Liu
Ma
Matsumoto
Minghao Yin
Peng
Peng
Peng
Peng
Pickart
Pugalenthi
Radivojac
Saghatelian
Shen
Sikic
Skurichina
Tompa
Tung
Welchman
Wright
Wu
Xiangtao Li
Xiao
Xiaowei Zhao
Yu
Zheng
Zhiqiang Ma
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/11/2011
Field of study

Ubiquitylation is an important process of post-translational modification. Correct identification of protein lysine ubiquitylation sites is of fundamental importance to understand the molecular mechanism of lysine ubiquitylation in biological systems. This paper develops a novel computational method to effectively identify the lysine ubiquitylation sites based on the ensemble approach. In the proposed method, 468 ubiquitylation sites from 323 proteins retrieved from the Swiss-Prot database were encoded into feature vectors by using four kinds of protein sequences information. An effective feature selection method was then applied to extract informative feature subsets. After different feature subsets were obtained by setting different starting points in the search procedure, they were used to train multiple random forests classifiers and then aggregated into a consensus classifier by majority voting. Evaluated by jackknife tests and independent tests respectively, the accuracy of the proposed predictor reached 76.82% for the training dataset and 79.16% for the test dataset, indicating that this predictor is a useful tool to predict lysine ubiquitylation sites. Furthermore, site-specific feature analysis was performed and it was shown that ubiquitylation is intimately correlated with the features of its surrounding sites in addition to features derived from the lysine site itself. The feature selection method is available upon request

Multidisciplinary Digital Publishing Institute

Computational identification of ubiquitylation sites from protein sequences

Author: A Dey
AL Chernorudskiy
AL Hitchcock
C Denison
CC Chang
Chun-Wei Tung
CW Tung
D Plewczynski
DS Kirkpatrick
DT Jones
E Tomlinson
GE Crooks
H Kaur
H Meirovitch
HB Jeon
IH Witten
J Cedano
J Herrmann
J Peng
JL Cornette
JR Quinlan
M Matsumoto
NJ Denis
Q Wu
RA George
RL Welchman
SF Altschul
Shinn-Ying Ho
SY Ho
SY Ho
W Li
WL Huang
Y Harpaz
Y Xue
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Ubiquitylation plays an important role in regulating protein functions. Recently, experimental methods were developed toward effective identification of ubiquitylation sites. To efficiently explore more undiscovered ubiquitylation sites, this study aims to develop an accurate sequence-based prediction method to identify promising ubiquitylation sites. Results We established an ubiquitylation dataset consisting of 157 ubiquitylation sites and 3676 putative non-ubiquitylation sites extracted from 105 proteins in the UbiProt database. This study first evaluates promising sequence-based features and classifiers for the prediction of ubiquitylation sites by assessing three kinds of features (amino acid identity, evolutionary information, and physicochemical property) and three classifiers (support vector machine, <it>k</it>-nearest neighbor, and NaïveBayes). Results show that the set of used 531 physicochemical properties and support vector machine (SVM) are the best kind of features and classifier respectively that their combination has a prediction accuracy of 72.19% using leave-one-out cross-validation. Consequently, an informative physicochemical property mining algorithm (IPMA) is proposed to select an informative subset of 531 physicochemical properties. A prediction system UbiPred was implemented by using an SVM with the feature set of 31 informative physicochemical properties selected by IPMA, which can improve the accuracy from 72.19% to 84.44%. To further analyze the informative physicochemical properties, a decision tree method C5.0 was used to acquire if-then rule-based knowledge of predicting ubiquitylation sites. UbiPred can screen promising ubiquitylation sites from putative non-ubiquitylation sites using prediction scores. By applying UbiPred, 23 promising ubiquitylation sites were identified from an independent dataset of 3424 putative non-ubiquitylation sites, which were also validated by using the obtained prediction rules. Conclusion We have proposed an algorithm IPMA for mining informative physicochemical properties from protein sequences to build an SVM-based prediction system UbiPred. UbiPred can predict ubiquitylation sites accompanied with a prediction score each to help biologists in identifying promising sites for experimental verification. UbiPred has been implemented as a web server and is available at <url>http://iclab.life.nctu.edu.tw/ubipred</url>.</p

Springer - Publisher Connector

Bioinformatics Approaches to the Functional Proﬁling of Genetic Variants

Author: Li Biao
Mooney Sean
Radivojac Predrag
Publication venue: 'IntechOpen'
Publication date: 12/10/2012
Field of study

IntechOpen

Public Library of Science (PLOS)

PMeS: Prediction of Methylation Sites Based on Enhanced Feature Encoding Scheme

Author: A Shukla
A Suzuki
AJ Bannister
APL Snijders
B Xiao
BM Turner
C Cortes
C Pang
C Teyssier
CC Chang
CNI Pang
D Plewczynski
DM Shien
DS Johnson
FG Mastronardi
GE Crooks
H Chen
J Sayegh
JF Couture
Jian-Ding Qiu
JJ Gao
JL Fauchere
JL Shao
JL Xu
JM Aleta
KM Daily
L Nanni
LH Dong
LL Hu
M Kiledjian
ME Rudbeck
MR Stallcup
MT Bedford
N Dolzhanskaya
Niall James Haslam
R Predel
RA Varier
Ru-Ping Liang
S Ahmad
S Ahmad
S Niu
S Pahlich
Shao-Ping Shi
Sheng-Bao Suo
Shu-Yun Huang
T Rögnvaldsson
TS Rögnvaldsson
VD Longo
VN Lapko
WK Paik
WK Paik
WL Wooderchak
WZ Li
X Chen
Xing-Yu Sun
ZH Zhang
Publication venue: Public Library of Science
Publication date: 15/06/2012
Field of study

Protein methylation is predominantly found on lysine and arginine residues, and carries many important biological functions, including gene regulation and signal transduction. Given their important involvement in gene expression, protein methylation and their regulatory enzymes are implicated in a variety of human disease states such as cancer, coronary heart disease and neurodegenerative disorders. Thus, identification of methylation sites can be very helpful for the drug designs of various related diseases. In this study, we developed a method called PMeS to improve the prediction of protein methylation sites based on an enhanced feature encoding scheme and support vector machine. The enhanced feature encoding scheme was composed of the sparse property coding, normalized van der Waals volume, position weight amino acid composition and accessible surface area. The PMeS achieved a promising performance with a sensitivity of 92.45%, a specificity of 93.18%, an accuracy of 92.82% and a Matthew’s correlation coefficient of 85.69% for arginine as well as a sensitivity of 84.38%, a specificity of 93.94%, an accuracy of 89.16% and a Matthew’s correlation coefficient of 78.68% for lysine in 10-fold cross validation. Compared with other existing methods, the PMeS provides better predictive performance and greater robustness. It can be anticipated that the PMeS might be useful to guide future experiments needed to identify potential methylation sites in proteins of interest. The online service is available at http://bioinfo.ncu.edu.cn/inquiries_PMeS.aspx

Public Library of Science (PLOS)

FigShare

Deriving a mutation index of carcinogenicity using protein structure and protein interfaces

Author: A Custodio
A David
A Dixit
A Hamosh
A Pal
AJ Bass
Anna Tramontano
B Reva
B Vogelstein
CJ Richardson
CM Croce
D Chasman
D Sims
D Talavera
D Xu
E Krissinel
EC Chao
ER Mardis
F Damm
Frances Pearl
G Birrane
G De Baets
H Boutselakis
H Carter
H Makishima
IA Adzhubei
IS Moreira
J Carlsson
Jarle Hakas
JM Hurst
JM Izarzugaza
JR Morris
K Wang
Konstantinos Mitsopoulos
L Breiman
L Ding
M Li
M Magrane
Marketa Zvelebil
MR Stratton
MR Stratton
MS Greenblatt
MW MacArthur
MY Frederic
Octavio Espinosa
P Flicek
P Kumar
P Srivastava
PA Chan
PA Futreal
PB Crowley
PC Ng
PC Ng
PD Stenson
PH Lee
PT Wan
PV Hornbeck
PY Chou
R Ferla
R Rajasekaran
RJ Kinsella
S Jones
S Sunyaev
S Velankar
SA Forbes
TM Anne
V Ramensky
W Huang da
W Kabsch
X Wang
X Wang
Y Bromberg
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

With the advent of Next Generation Sequencing the identification of mutations in the genomes of healthy and diseased tissues has become commonplace. While much progress has been made to elucidate the aetiology of disease processes in cancer, the contributions to disease that many individual mutations make remain to be characterised and their downstream consequences on cancer phenotypes remain to be understood. Missense mutations commonly occur in cancers and their consequences remain challenging to predict. However, this knowledge is becoming more vital, for both assessing disease progression and for stratifying drug treatment regimes. Coupled with structural data, comprehensive genomic databases of mutations such as the 1000 Genomes project and COSMIC give an opportunity to investigate general principles of how cancer mutations disrupt proteins and their interactions at the molecular and network level. We describe a comprehensive comparison of cancer and neutral missense mutations; by combining features derived from structural and interface properties we have developed a carcinogenicity predictor, InCa (Index of Carcinogenicity). Upon comparison with other methods, we observe that InCa can predict mutations that might not be detected by other methods. We also discuss general limitations shared by all predictors that attempt to predict driver mutations and discuss how this could impact high-throughput predictions. A web interface to a server implementation is publicly available at http://inca.icr.ac.uk/

CiteSeerX