Search CORE

Accurate Prediction of Protein Structural Class

Author: AG Murzin
CA Orengo
CB Anfinsen
G Deleage
H Nakashima
HB Shen
I Bahar
JY Yang
JY Yang
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KD Kedarisetti
KD Pruitt
L Dong
L Kurgan
L Kurgan
L Kurgan
Meng Ge
MJ Mizianty
P Baldi
RY Luo
S Costantini
S Costantini
SE Brenner
SF Altschul
SM Muska
T Liu
T Liu
TG Liu
Vladimir N. Uversky
W Li
WS Bu
X Xiao
X Xiao
Xia-Yu Xia
Xian-Ming Pan
XM Pan
Y Cai
YD Cai
YD Cai
ZC Li
Zhi-Xin Wang
ZX Wang
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally select several sequence features and then feed them directly into a classification program to identify the structural class. The current best sequence-based predictor achieved an overall accuracy of 74.1% when tested on a widely used, non-homologous benchmark dataset 25PDB. In the present work, we built a multiple linear regression (MLR) model to convert the 440-dimensional (440D) sequence feature vector extracted from the Position Specific Scoring Matrix (PSSM) of a protein domain to a 4-dimensinal (4D) structural feature vector, which could then be used to predict the four major structural classes. We performed 10-fold cross-validation and jackknife tests of the method on a large non-homologous dataset containing 8,244 domains distributed among the four major classes. The performance of our approach outperformed all of the existing sequence-based methods and had an overall accuracy of 83.1%, which is even higher than the results of those predicted secondary structure-based methods

CiteSeerX

FigShare

BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection

Author: A Marchler-Bauer
CC Chang
CH Contag
D Aha
E Frank
EH White
EL Sonnhammer
F Sanger
G Pugalenthi
Ganesan Pugalenthi
HC Peng
JE Gonzalez
JE Lloyd
JW Hastings
Kai-Uwe Kalies
KR Muller
Krishna Kumar Kandaswamy
L Breiman
LM DiPilato
M Chalfie
M Haindl
M Kanehisa
Mehrnaz Khodam Hazrati
MJ Cormier
MJ Hayes
R Quinlan
S Hunter
S Kawashima
SF Altschul
SHD Haddock
SR Eddy
SR Kain
T Joachims
T Wilson
T Wilson
Thomas Martinetz
V Vapnik
W Li
WW Ward
XB Zhou
Y Freund
Y Zhang
YD Cai
YD Cai
ZH Zhang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many medical and general technological advances. Identification of bioluminescent proteins is more challenging due to their poor similarity in sequence. So far, no specific method has been reported to identify bioluminescent proteins from primary sequence. Results In this paper, we propose a novel predictive method that uses a Support Vector Machine (SVM) and physicochemical properties to predict bioluminescent proteins. BLProt was trained using a dataset consisting of 300 bioluminescent proteins and 300 non-bioluminescent proteins, and evaluated by an independent set of 141 bioluminescent proteins and 18202 non-bioluminescent proteins. To identify the most prominent features, we carried out feature selection with three different filter approaches, ReliefF, infogain, and mRMR. We selected five different feature subsets by decreasing the number of features, and the performance of each feature subset was evaluated. Conclusion BLProt achieves 80% accuracy from training (5 fold cross-validations) and 80.06% accuracy from testing. The performance of BLProt was compared with BLAST and HMM. High prediction accuracy and successful prediction of hypothetical proteins suggests that BLProt can be a useful approach to identify bioluminescent proteins from sequence information, irrespective of their sequence similarity. The BLProt software is available at <url>http://www.inb.uni-luebeck.de/tools-demos/bioluminescent%20protein/BLProt</url></p

Springer - Publisher Connector

Predicting Anatomical Therapeutic Chemical (ATC) Classification of Drugs by Integrating Chemical-Chemical Interactions and Similarities

Author: DN Georgiou
GA Watson
GP Zhou
GP Zhou
GP Zhou
H Gurulingappa
H Mohabatkar
H Mohabatkar
IW Althaus
J Andraos
J Lin
Kai-Yan Feng
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
Kuo-Chen Chou
L Hu
Lei Chen
M Dunkel
M Esmaeili
M Hattori
M Kanehisa
M Kanehisa
M Kuhn
Ozlem Keskin
P Jaccard
P Wang
Q Gu
R Sharan
T Huang
U Karaoz
Wei-Ming Zeng
WZ Lin
X Xiao
YD Cai
YD Cai
Yu-Dong Cai
ZC Wu
ZC Wu
Publication venue: Public Library of Science
Publication date: 13/04/2012
Field of study

The Anatomical Therapeutic Chemical (ATC) classification system, recommended by the World Health Organization, categories drugs into different classes according to their therapeutic and chemical characteristics. For a set of query compounds, how can we identify which ATC-class (or classes) they belong to? It is an important and challenging problem because the information thus obtained would be quite useful for drug development and utilization. By hybridizing the informations of chemical-chemical interactions and chemical-chemical similarities, a novel method was developed for such purpose. It was observed by the jackknife test on a benchmark dataset of 3,883 drug compounds that the overall success rate achieved by the prediction method was about 73% in identifying the drugs among the following 14 main ATC-classes: (1) alimentary tract and metabolism; (2) blood and blood forming organs; (3) cardiovascular system; (4) dermatologicals; (5) genitourinary system and sex hormones; (6) systemic hormonal preparations, excluding sex hormones and insulins; (7) anti-infectives for systemic use; (8) antineoplastic and immunomodulating agents; (9) musculoskeletal system; (10) nervous system; (11) antiparasitic products, insecticides and repellents; (12) respiratory system; (13) sensory organs; (14) various. Such a success rate is substantially higher than 7% by the random guess. It has not escaped our notice that the current method can be straightforwardly extended to identify the drugs for their 2nd-level, 3rd-level, 4th-level, and 5th-level ATC-classifications once the statistically significant benchmark data are available for these lower levels

FigShare

'Unite and conquer': enhanced prediction of protein subcellular localization by integrating multiple specialized tools

Author: A Bulashevska
A Krogh
C Andreoli
C Guda
C Guda
CS Yu
E Badidi
E Frank
GE Tusnady
Gertraud Burger
H Bannai
H Shatkay
HB Shen
HB Shen
I Small
JL Heazlewood
JR Quinlan
JY Shi
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KJ Park
L Kall
M Bhasin
M Boden
MG Claros
MS Scott
N Pfanner
N Wiedemann
O Emanuelsson
P Donnes
QB Gao
S Džeroski
S Hua
S Matsuda
SHB Chou KC
T Hirokawa
T Zhang
W Li
X Xiao
Y Huang
Yao Qing Shen
YD Cai
YD Cai
YL Chen
YX Pan
Z Lu
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Knowing the subcellular location of proteins provides clues to their function as well as the interconnectivity of biological processes. Dozens of tools are available for predicting protein location in the eukaryotic cell. Each tool performs well on certain data sets, but their predictions often disagree for a given protein. Since the individual tools each have particular strengths, we set out to integrate them in a way that optimally exploits their potential. The method we present here is applicable to various subcellular locations, but tailored for predicting whether or not a protein is localized in mitochondria. Knowledge of the mitochondrial proteome is relevant to understanding the role of this organelle in global cellular processes. Results In order to develop a method for enhanced prediction of subcellular localization, we integrated the outputs of available localization prediction tools by several strategies, and tested the performance of each strategy with known mitochondrial proteins. The accuracy obtained (up to 92%) surpasses by far the individual tools. The method of integration proved crucial to the performance. For the prediction of mitochondrion-located proteins, integration via a two-layer decision tree clearly outperforms simpler methods, as it allows emphasis of biologically relevant features such as the mitochondrial targeting peptide and transmembrane domains. Conclusion We developed an approach that enhances the prediction accuracy of mitochondrial proteins by uniting the strength of specialized tools. The combination of machine-learning based integration with biological expert knowledge leads to improved performance. This approach also alleviates the conundrum of how to choose between conflicting predictions. Our approach is easy to implement, and applicable to predicting subcellular locations other than mitochondria, as well as other biological features. For a trial of our approach, we provide a webservice for mitochondrial protein prediction (named YimLOC), which can be accessed through the AnaBench suite at http://anabench.bcm.umontreal.ca/anabench/. The source code is provided in the Additional File <supplr sid="S2">2</supplr>. <suppl id="S2"> <title> Additional file 2 </title> <text> This file contains scripts for the online server YimLOC. Please note that there scripts only codes for the ready-to-use STACK-mem-DT described in the main text. The scripts do not provide the training process. </text> <file name="1471-2105-8-420-S2.pdf"> Click here for file </file> </suppl

Springer - Publisher Connector

Cell-cycle-dependent transcriptional and translational DNA-damage response of 2 ribonucleotide reductase genes in S. cerevisiae

Author: Altschuler SJ
An X
Bregman A
Cai L
Cerqueira NM
Cerqueira NM
Chabes A
Chabes A
Chabes A
Domkin V
Elledge SJ
Hoeijmakers JH
Jordheim LP
Lee YD
Nick McElhinny SA
Perlstein DL
Raj A
Raj A
Sabouri N
Tan RZ
Tomar RS
Trcek T
Yao R
Youk H
Zenklusen D
Zhao X
Zhao X
Zhao X
Publication venue: 'American Society for Microbiology'
Publication date: 01/11/2012
Field of study

The ribonucleotide reductase (RNR) enzyme catalyzes an essential step in the production of deoxyribonucleotide triphosphates (dNTPs) in cells. Bulk biochemical measurements in synchronized Saccharomyces cerevisiae cells suggest that RNR mRNA production is maximal in late G1 and S phases; however, damaged DNA induces RNR transcription throughout the cell cycle. But such en masse measurements reveal neither cell-to-cell heterogeneity in responses nor direct correlations between transcript and protein expression or localization in single cells which may be central to function. We overcame these limitations by simultaneous detection of single RNR transcripts and also Rnr proteins in the same individual asynchronous S. cerevisiae cells, with and without DNA damage by methyl methanesulfonate (MMS). Surprisingly, RNR subunit mRNA levels were comparably low in both damaged and undamaged G1 cells and highly induced in damaged S/G2 cells. Transcript numbers became correlated with both protein levels and localization only upon DNA damage in a cell cycle-dependent manner. Further, we showed that the differential RNR response to DNA damage correlated with variable Mec1 kinase activity in the cell cycle in single cells. The transcription of RNR genes was found to be noisy and non-Poissonian in nature. Our results provide vital insight into cell cycle-dependent RNR regulation under conditions of genotoxic stress.Massachusetts Institute of Technology. Center for Environmental Health Sciences (deriving from NIH P30-ES002109)National Institutes of Health (U.S.) (grant R01-CA055042)National Institutes of Health (U.S.) (grant DP1-OD006422)Massachusetts Institute of Technology (CSBi Merck-MIT Fellowship

DSpace@MIT

Aberdeen University Research

The impact of point mutations in the human androgen receptor : classification of mutations on the basis of transcriptional activity

Author: A Godoy
A Haelens
A Monge
AN Vis
AO Brinkmann
B Cinar
B Gottlieb
B He
B He
B He
B Tiwary
BA Evans
C Cai
CA Berrevoets
CA Heinlein
Colin W. Hay
CT Kesler
CT Wu
D Ricketson
DJ Lamb
EB Askew
EP Gelmann
ER Hyytinen
G Buchanan
G Chen
G Verrijdt
GN Brooke
GP Reddy
H Faus
H Takahashi
HI Scher
HJ Dubbink
HT Bruggenwirth
I Ahrens-Fath
Iain J. McEwan
IJ McEwan
J Brodie
J Duff
J Edwards
J Geller
J Reid
J Tan
J Veldscholte
JA Locke
JD Wilson
Jean-Marc A. Lobaccaro
K Evaul
K Haapala
KB Cleutjens
KE Knudsen
KH Chang
KK Waltering
KM Lakshman
LJ Blok
M Fu
M Jagla
M Marcelli
M Sun
M Uemura
M Welsh
ME Baker
ME Taplin
ME Taplin
ME Taplin
MJ Linja
MM Centenera
MM Shen
N Mononen
OA O'Mahony
PA Watson
PE de Ruiter
Q Wang
R Betney
R Chenna
R Hu
R Schule
RJ Andersen
S Baron
S Haile
S Koochekpour
SM Dehm
SM Dehm
SM Powell
SS Dutt
SS Taneja
ST Page
TH Li
TM Tanner
W Li
Y Chen
Y Gluzman
Y Niu
Y Niu
Y Ogino
YD Li
Z Guo
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

Peer reviewedPublisher PD

CiteSeerX

FigShare

Application of amino acid occurrence for discriminating different folding types of globular proteins

Author: AG Murzin
H Zhou
HB Shen
HB Shen
HQD Ding
J Cheng
J Shi
KC Chou
KC Chou
M Michael Gromiha
MM Gromiha
MM Gromiha
MM Gromiha
MM Gromiha
MM Gromiha
MM Gromiha
MM Gromiha
P Klein
QS Du
R Development Core Team
T Hirokawa
TS Kumarevel
WS Bu
Y Ofran
Y-h Taguchi
YD Cai
ZZ Wang
Publication venue: BioMed Central
Publication date: 01/10/2007
Field of study

Abstract Background Predicting the three-dimensional structure of a protein from its amino acid sequence is a long-standing goal in computational/molecular biology. The discrimination of different structural classes and folding types are intermediate steps in protein structure prediction. Results In this work, we have proposed a method based on linear discriminant analysis (LDA) for discriminating 30 different folding types of globular proteins using amino acid occurrence. Our method was tested with a non-redundant set of 1612 proteins and it discriminated them with the accuracy of 38%, which is comparable to or better than other methods in the literature. A web server has been developed for discriminating the folding type of a query protein from its amino acid sequence and it is available at http://granular.com/PROLDA/. Conclusion Amino acid occurrence has been successfully used to discriminate different folding types of globular proteins. The discrimination accuracy obtained with amino acid occurrence is better than that obtained with amino acid composition and/or amino acid properties. In addition, the method is very fast to obtain the results.</p

eScholarship - University of California

Recommended from our members

Cosmogenic neutron production at the Sudbury Neutrino Observatory

Author: Aharmim B
Ahmed SN
Anthony AE
Barros N
Beier EW
Bellerive A
Beltran B
Bergevin M
Biller SD
Bonventre R
Boudjemline K
Boulay MG
Cai B
Callaghan EJ
Caravaca J
Chan YD
Chauhan D
Chen M
Cleveland BT
Cox GA
Curley R
Dai X
Deng H
Descamps FB
Detwiler JA
Doe PJ
Doucas G
Drouin PL
Dunford M
Elliott SR
Evans HC
Ewan GT
Farine J
Fergani H
Fleurot F
Ford RJ
Formaggio JA
Gagnon N
Gilje K
Goon JTM
Graham K
Guillian E
Habib S
Hahn RL
Hallin AL
Hallman ED
Harvey PJ
Hazama R
Heintzelman WJ
Heise J
Helmer RL
Hime A
Howard C
Huang M
Jagam P
Jamieson B
Jelley NA
Jerkins M
Keeter KJ
Klein JR
Kormos LL
Kos M
Kraus C
Krauss CB
Krüger A
Kutter T
Kyba CCM
Kéfélian C
Land BJ
Lange R
Law J
Lawson IT
Lesko KT
Leslie JR
Levine I
Loach JC
MacLellan R
Majerus S
Mak HB
Maneira J
Martin RD
Mastbaum A
McCauley N
McDonald AB
McGee SR
Miller ML
Monreal B
Monroe J
Nickel BG
Noble AJ
O'Keeffe HM
Oblath NS
Okada CE
Ollerhead RW
Orebi Gann GD
Oser SM
Ott RA
Peeters SJM
Poon AWP
Prior G
Publication venue: eScholarship, University of California
Publication date: 12/12/2019
Field of study

Neutrons produced in nuclear interactions initiated by cosmic-ray muons present an irreducible background to many rare-event searches, even in detectors located deep underground. Models for the production of these neutrons have been tested against previous experimental data, but the extrapolation to deeper sites is not well understood. Here we report results from an analysis of cosmogenically produced neutrons at the Sudbury Neutrino Observatory. A specific set of observables are presented, which can be used to benchmark the validity of geant4 physics models. In addition, the cosmogenic neutron yield, in units of 10-4 cm2/(g·μ), is measured to be 7.28±0.09(stat)-1.12+1.59(syst) in pure heavy water and 7.30±0.07(stat)-1.02+1.40(syst) in NaCl-loaded heavy water. These results provide unique insights into this potential background source for experiments at SNOLAB

Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction

Author: AK Bjorklund
B Rost
BW Matthews
D Sarda
G Dellaire
H Wu
J Wang
JL Gardy
JL Gardy
K Itoh
K Nakai
K Tu
KC Chou
L Cocco
M Bhasin
MA Harris
P Zhang
PW Lord
PW Lord
R Gentleman
R Nair
R Nair
V Brendel
X Lu
X Wu
Yang Dai
YD Cai
Z Lei
Zhengdeng Lei
ZP Feng
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The accomplishment of the various genome sequencing projects resulted in accumulation of massive amount of gene sequence information. This calls for a large-scale computational method for predicting protein localization from sequence. The protein localization can provide valuable information about its molecular function, as well as the biological pathway in which it participates. The prediction of localization of a protein at subnuclear level is a challenging task. In our previous work we proposed an SVM-based system using protein sequence information for this prediction task. In this work, we assess protein similarity with Gene Ontology (GO) and then improve the performance of the system by adding a module of nearest neighbor classifier using a similarity measure derived from the GO annotation terms for protein sequences. RESULTS: The performance of the new system proposed here was compared with our previous system using a set of proteins resided within 6 localizations collected from the Nuclear Protein Database (NPD). The overall MCC (accuracy) is elevated from 0.284 (50.0%) to 0.519 (66.5%) for single-localization proteins in leave-one-out cross-validation; and from 0.420 (65.2%) to 0.541 (65.2%) for an independent set of multi-localization proteins. The new system is available at . CONCLUSION: The prediction of protein subnuclear localizations can be largely influenced by various definitions of similarity for a pair of proteins based on different similarity measures of GO terms. Using the sum of similarity scores over the matched GO term pairs for two proteins as the similarity definition produced the best predictive outcome. Substantial improvement in predicting protein subnuclear localizations has been achieved by combining Gene Ontology with sequence information

Springer - Publisher Connector