Search CORE

499 research outputs found

SNOSite: Exploiting Maximal Dependence Decomposition to Identify Cysteine S-Nitrosylation with Substrate Site Specificity

Author: AL Weber
AV Finkelstein
B Derakhshan
B Gaston
C Bogdan
C Burge
C Chothia
C Lindermayr
C-C Chang
C-J Lin
CN Pang
CW Tung
D Schwartz
D Seth
D Yao
D-H Cho
DM Shien
DT Hess
DT Hess
E Karpuzoglu
G Hao
GD Fasman
GE Crooks
HM Berman
HR Guy
HR Meirovitch
Hsien-Da Huang
J Janin
JL Fauchere
JS Stamler
JS Stamler
KC Chou
L Jia
M Ashburner
M Knipp
M Levitt
M Oobatake
M Punta
M Takano
MA Roseman
MC Romero-Puertas
MT Forrester
P Lane
PAS Karplus
S Ahmad
S Ahmad
S Kawashima
SM Marino
SR Jaffrey
SS Rackovsky
T Kuncewicz
T Kuncewicz
T Nakamura
TA Tatusova
TD Schneider
TM Greco
TM Greco
Tsung-Cheng Lu
TY Lee
Tzong-Yi Lee
V Vacic
Vladimir N. Uversky
WC Chang
WR Krigbaum
WR Krigbaum
Y Kidera
Y Xue
Y-J Chen
Yi-Ju Chen
Yu-Ju Chen
YW Lam
Publication venue: Public Library of Science
Publication date: 15/07/2011
Field of study

S-nitrosylation, the covalent attachment of a nitric oxide to (NO) the sulfur atom of cysteine, is a selective and reversible protein post-translational modification (PTM) that regulates protein activity, localization, and stability. Despite its implication in the regulation of protein functions and cell signaling, the substrate specificity of cysteine S-nitrosylation remains unknown. Based on a total of 586 experimentally identified S-nitrosylation sites from SNAP/L-cysteine-stimulated mouse endothelial cells, this work presents an informatics investigation on S-nitrosylation sites including structural factors such as the flanking amino acids composition, the accessible surface area (ASA) and physicochemical properties, i.e. positive charge and side chain interaction parameter. Due to the difficulty to obtain the conserved motifs by conventional motif analysis, maximal dependence decomposition (MDD) has been applied to obtain statistically significant conserved motifs. Support vector machine (SVM) is applied to generate predictive model for each MDD-clustered motif. According to five-fold cross-validation, the MDD-clustered SVMs could achieve an accuracy of 0.902, and provides a promising performance in an independent test set. The effectiveness of the model was demonstrated on the correct identification of previously reported S-nitrosylation sites of Bos taurus dimethylarginine dimethylaminohydrolase 1 (DDAH1) and human hemoglobin subunit beta (HBB). Finally, the MDD-clustered model was adopted to construct an effective web-based tool, named SNOSite (http://csb.cse.yzu.edu.tw/SNOSite/), for identifying S-nitrosylation sites on the uncharacterized protein sequences

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity

Author: C Burge
Cheng-Tsung Lu
DM Shien
E Huala
F Diella
F Gnad
FF Zhou
GE Crooks
H Steen
HD Huang
HD Huang
J Gao
J Gao
JC Obenauer
JL Heazlewood
JM Stone
KC Chou
LM Iakoucheva
M Schneider
M Steffen
MJ Hubbard
N Blom
N Blom
Neil Arvin Bretaña
P Diolez
PV Hornbeck
R Aebersold
S Luan
SC Huber
SR Eddy
TD Schneider
TY Lee
TY Lee
TY Lee
Tzong-Yi Lee
V Vacic
Y Xue
Y Xue
YH Wong
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction. Due to the difficulty in performing high-throughput mass spectrometry-based experiment, there is a desire to predict phosphorylation sites using computational methods. However, previous studies regarding <it>in silico </it>prediction of plant phosphorylation sites lack the consideration of kinase-specific phosphorylation data. Thus, we are motivated to propose a new method that investigates different substrate specificities in plant phosphorylation sites. Results Experimentally verified phosphorylation data were extracted from TAIR9-a protein database containing 3006 phosphorylation data from the plant species <it>Arabidopsis thaliana</it>. In an attempt to investigate the various substrate motifs in plant phosphorylation, maximal dependence decomposition (MDD) is employed to cluster a large set of phosphorylation data into subgroups containing significantly conserved motifs. Profile hidden Markov model (HMM) is then applied to learn a predictive model for each subgroup. Cross-validation evaluation on the MDD-clustered HMMs yields an average accuracy of 82.4% for serine, 78.6% for threonine, and 89.0% for tyrosine models. Moreover, independent test results using <it>Arabidopsis thaliana </it>phosphorylation data from UniProtKB/Swiss-Prot show that the proposed models are able to correctly predict 81.4% phosphoserine, 77.1% phosphothreonine, and 83.7% phosphotyrosine sites. Interestingly, several MDD-clustered subgroups are observed to have similar amino acid conservation with the substrate motifs of well-known kinases from Phospho.ELM-a database containing kinase-specific phosphorylation data from multiple organisms. Conclusions This work presents a novel method for identifying plant phosphorylation sites with various substrate motifs. Based on cross-validation and independent testing, results show that the MDD-clustered models outperform models trained without using MDD. The proposed method has been implemented as a web-based plant phosphorylation prediction tool, PlantPhos <url>http://csb.cse.yzu.edu.tw/PlantPhos/</url>. Additionally, two case studies have been demonstrated to further evaluate the effectiveness of PlantPhos.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Exploiting Two-Layered Support Vector Machine to Predict Phosphorylation Sites on Virus Proteins

Author
Publication venue: 'IACSIT Press'
Publication date: 01/01/2013
Field of study

Crossref

Higher Order PWM for Modeling Transcription Factor Binding Sites

Author: Srinivasan Dhivya
Publication venue: SJSU ScholarWorks
Publication date: 01/10/2013
Field of study

Traditional Position Weight Matrices (PWMs) that are used to model Transcription Factor Binding Sites (TFBS) assume independence among different positions in the binding site. In reality, this may not necessarily be the case. A better way to model TFBS is to consider the distribution of dinucleotides or trinucleotides instead of just mononucleotides, thus taking neighboring nucleotides into account. We can therefore, extend the single nucleotide PWM to a dinucleotide PWM or an even higher-order PWM to correctly estimate the dependencies among the nucleotides in a given sequence. The purpose of this project is to develop an algorithm to implement higher-order PWMs to detect the TFBS and other biological motifs in DNA, RNA, and proteins

SJSU ScholarWorks

Using Hidden Markov Models to Detect DNA Motifs

Author: Nerli Santrupti
Publication venue: SJSU ScholarWorks
Publication date: 13/05/2015
Field of study

During the process of gene expression in eukaryotes, mRNA splicing is one of the key processes carried out by a complex called spliceosome. Spliceosome guarantees proper removal of introns and joining of exons before the translation process. Precise splicing is essential for the production of functional proteins. Spliceosome detects specific sequence motifs within an mRNA sequence called splice sites. Two of the splice sites are the 5’ and 3’ sites that border all the introns. Normal splicing process if disrupted by mutation may lead to fatal diseases. In this work, we predict splice sites in a human genome using hidden Markov models (HMMs). Prior to hidden Markov models, we tried to predict splice sites using higher order position weight matrices. Position Weight Matrix (PWM) is a conventional computational method used to represent splice sites or any sequence motif. In a set of aligned sequences, PWM captures the distribution of nucleotides at each position. The performance of simple PWMs in classifying authentic 5 and 3 splice sites and predicting cryptic splice sites in human genes is resonably well [1, 2, 3]. However, they are built by making a strong independence assumption between contiguous and non- contiguous nucleotide positions. Therefore, we developed a higher order PWM method that incorporates maximal dependence decomposition algorithm (MDD) [4] to successfully identify statistically significant splice sites. Simple PWM also fails to capture sites that lie in both splice site and non-splice site regions. Therefore, we implemented HMMs to overcome this limitation of PWM. We performed 10-fold cross validation of all the three methods for 5 and 3 authentic human splice sites from the HS3D database [5] and observed that MDD outperforms the other two methods with area under the Receiver Operating Characteristic curve (ROC) to be 0.96 and 0.93, respectively. Similarly, we performed classification of 5 and 3 putative cryptic splice sites in the beta-globin (HBB) and breast cancer type 1 susceptibility protein (BRCA1) genes. We observed that MDD performs very well in classifying both BRCA1 and HBB cryptic splice sites with area under ROC of 0.99, 0.95, 0.89 and 1.0 respectively. However, we also observed that HMMs perform fairly well in classifying splice sites and cryptic splice sites compared to traditional PWM method

SJSU ScholarWorks

Recognition of short functional motifs in protein sequences

Author: Prytuliak Roman
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 22/06/2018
Field of study

The main goal of this study was to develop a method for computational de novo prediction of short linear motifs (SLiMs) in protein sequences that would provide advantages over existing solutions for the users. The users are typically biological laboratory researchers, who want to elucidate the function of a protein that is possibly mediated by a short motif. Such a process can be subcellular localization, secretion, post-translational modification or degradation of proteins. Conducting such studies only with experimental techniques is often associated with high costs and risks of uncertainty. Preliminary prediction of putative motifs with computational methods, them being fast and much less expensive, provides possibilities for generating hypotheses and therefore, more directed and efficient planning of experiments. To meet this goal, I have developed HH-MOTiF – a web-based tool for de novo discovery of SLiMs in a set of protein sequences. While working on the project, I have also detected patterns in sequence properties of certain SLiMs that make their de novo prediction easier. As some of these patterns are not yet described in the literature, I am sharing them in this thesis. While evaluating and comparing motif prediction results, I have identified conceptual gaps in theoretical studies, as well as existing practical solutions for comparing two sets of positional data annotating the same set of biological sequences. To close this gap and to be able to carry out in-depth performance analyses of HH-MOTiF in comparison to other predictors, I have developed a corresponding statistical method, SLALOM (for StatisticaL Analysis of Locus Overlap Method). It is currently available as a standalone command line tool

Incorporating substrate sequence motifs and spatial amino acid composition to identify kinase-specific phosphorylation sites on protein three-dimensional structures

Author: A Zanzoni
A Zanzoni
B Kobe
C-C Chang
C-J Lin
CT Lu
DM Shien
F Gnad
G Manning
GR Mishra
H Dinkel
H Li
HD Huang
HD Huang
HM Berman
J Wan
JC Obenauer
JH Kim
K Bryson
LJ McGuffin
M Steffen
Min-Gang Su
MJ Hubbard
ML Miller
N Blom
N Blom
N Farriol-Mathis
NA Bretana
NF Saunders
P Durek
PV Hornbeck
R Linding
S Ahmad
S Ahmad
SA Chen
TD Schneider
TY Lee
TY Lee
TY Lee
TY Lee
Tzong-Yi Lee
W Kabsch
WC Chang
Y Xue
Y Xue
YH Wong
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Investigation and identification of protein γ-glutamyl carboxylation sites

Author: Bretaña Neil Arvin
Chen Shu-An
Cheng Tzu-Hsiu
Huang Kai-Yao
Lee Tzong-Yi
Lu Cheng-Tsung
Su Min-Gang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Carboxylation is a modification of glutamate (Glu) residues which occurs post-translation that is catalyzed by γ-glutamyl carboxylase in the lumen of the endoplasmic reticulum. Vitamin K is a critical co-factor in the post-translational conversion of Glu residues to γ-carboxyglutamate (Gla) residues. It has been shown that the process of carboxylation is involved in the blood clotting cascade, bone growth, and extraosseous calcification. However, studies in this field have been limited by the difficulty of experimentally studying substrate site specificity in γ-glutamyl carboxylation. <it>In silico</it> investigations have the potential for characterizing carboxylated sites before experiments are carried out. Results Because of the importance of γ-glutamyl carboxylation in biological mechanisms, this study investigates the substrate site specificity in carboxylation sites. It considers not only the composition of amino acids that surround carboxylation sites, but also the structural characteristics of these sites, including secondary structure and solvent-accessible surface area (ASA). The explored features are used to establish a predictive model for differentiating between carboxylation sites and non-carboxylation sites. A support vector machine (SVM) is employed to establish a predictive model with various features. A five-fold cross-validation evaluation reveals that the SVM model, trained with the combined features of positional weighted matrix (PWM), amino acid composition (AAC), and ASA, yields the highest accuracy (0.892). Furthermore, an independent testing set is constructed to evaluate whether the predictive model is over-fitted to the training set. Conclusions Independent testing data that did not undergo the cross-validation process shows that the proposed model can differentiate between carboxylation sites and non-carboxylation sites. This investigation is the first to study carboxylation sites and to develop a system for identifying them. The proposed method is a practical means of preliminary analysis and greatly diminishes the total number of potential carboxylation sites requiring further experimental confirmation.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recognition of short functional motifs in protein sequences

Author: Prytuliak Roman
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 22/06/2018
Field of study

Digitale Hochschulschriften der LMU