Search CORE

20 research outputs found

Analysis and prediction of cancerlectins using evolutionary and domain information

Author: A Garg
A Thies
Bharat Panwar
C Chen
CK Ching
D Damodaran
E Gorelik
EG De Mejia
EM Zdobnov
FT Liu
Gajendra PS Raghava
GD Fasman
GR Vasta
H Ding
H Kaur
H Kaur
H Kaur
H Lis
J Adam
Jagat S Chauhan
KA Mahon
KC Chou
KV Brinda
L Jun-Wei
M Bhasin
M Jogindra Swamy
M Kumar
M Kumar
M Kumar
M Rashid
M Vijayan
N Sharon
P Baldi
R Duncan
R Kaundal
R Lotan
R Verma
R Verma
Ravi Kumar
S Dejun
S Hu
S Nakahara
SF Altschul
SF Altschul
SH Choi
T Joachims
T Shirai
T Szoke
U Schumacher
U Schumacher
V Vapnik
WR Pearson
XF Bai
Y K Song
YK Song
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Predicting the function of a protein is one of the major challenges in the post-genomic era where a large number of protein sequences of unknown function are accumulating rapidly. Lectins are the proteins that specifically recognize and bind to carbohydrate moieties present on either proteins or lipids. Cancerlectins are those lectins that play various important roles in tumor cell differentiation and metastasis. Although the two types of proteins are linked, still there is no computational method available that can distinguish cancerlectins from the large pool of non-cancerlectins. Hence, it is imperative to develop a method that can distinguish between cancer and non-cancerlectins. Results All the models developed in this study are based on a non-redundant dataset containing 178 cancerlectins and 226 non-cancerlectins in which no two sequences have more than 50% sequence similarity. We have applied the similarity search based technique, i.e. BLAST, and achieved a maximum accuracy of 43.25%. The amino acids compositional analysis have shown that certain residues (e.g. Leucine, Proline) were preferred in cancerlectins whereas some other (e.g. Asparatic acid, Asparagine) were preferred in non-cancerlectins. It has been found that the PROSITE domain "Crystalline beta gamma" was abundant in cancerlectins whereas domains like "SUEL-type lectin domain" were found mainly in non-cancerlectins. An SVM-based model has been developed to differentiate between the cancer and non-cancerlectins which achieved a maximum Matthew's correlation coefficient (MCC) value of 0.32 with an accuracy of 64.84%, using amino acid compositions. We have developed a model based on dipeptide compositions which achieved an MCC value of 0.30 with an accuracy of 64.84%. Thereafter, we have developed models based on split compositions (2 and 4 parts) and achieved an MCC value of 0.31, 0.32 with accuracies of 65.10% and 66.09%, respectively. An SVM model based on Position Specific Scoring Matrix (PSSM), generated by PSI-BLAST, was developed and achieved an MCC value of 0.36 with an accuracy of 68.34%. Finally, we have integrated the PROSITE domain information with PSSM and developed an SVM model that has achieved an MCC value of 0.38 with 69.09% accuracy. Conclusion BLAST has been found inefficient to distinguish between cancer and non-cancerlectins. We analyzed the protein sequences of cancer and non-cancerlectins and identified interesting patterns. We have been able to identify PROSITE domains that are preferred in cancer and non-cancerlectins and thus provided interesting insights into the two types of proteins. The method developed in this study will be useful for researchers studying cancerlectins, lectins and cancer biology. The web-server based on the above study, is available at <url>http://www.imtech.res.in/raghava/cancer_pred/</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Comparative Study of Radiation Shielding Parameters for Bismuth Borate Glasses

Author: Bashter I
Kaewkhao J
Kaur U
Kirdsiri K
Kurudirek M
Lee CM
Limkitjaroenporn P
Medhat ME
Nogami M
Pathak D
Pathak D
Pathak D
Pathak D
Rajinder Singh Kaundal
Singh C
Singh K
Singh K
Singh KJ
Singh N
Singh N
Singh S
Publication venue: 'FapUNIFESP (SciELO)'
Publication date
Field of study

Crossref

Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins

Author: B Magnin
CD Creelman
D Emerson
FM Chappell
HR Ansari
IA Naguib
IWY Dondoshansky
J Fletcher
JP Daures
JP Vert
L O'Dwyer
N Balakrishnan
N Zahr
P Baldi
P Dharmasaroja
P Hannequin
Q Lu
R Kaundal
R Verma
R Verma
R Wiebringhaus
RN Strange
Ruchi Verma
S Algarabel
S Choi
SF Altschul
T Joachims
TS Furey
U Melcher
Ulrich Melcher
X He
X Hu
Y Higashida
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/09/2012
Field of study

Background: Members of the phylum Proteobacteria are most prominent among bacteria causing plant diseases that result in a diminution of the quantity and quality of food produced by agriculture. To ameliorate these losses, there is a need to identify infections in early stages. Recent developments in next generation nucleic acid sequencing and mass spectrometry open the door to screening plants by the sequences of their macromolecules. Such an approach requires the ability to recognize the organismal origin of unknown DNA or peptide fragments. There are many ways to approach this problem but none have emerged as the best protocol. Here we attempt a systematic way to determine organismal origins of peptides by using a machine learning algorithm. The algorithm that we implement is a Support Vector Machine (SVM).Result: The amino acid compositions of proteobacterial proteins were found to be different from those of plant proteins. We developed an SVM model based on amino acid and dipeptide compositions to distinguish between a proteobacterial protein and a plant protein. The amino acid composition (AAC) based SVM model had an accuracy of 92.44% with 0.85 Matthews correlation coefficient (MCC) while the dipeptide composition (DC) based SVM model had a maximum accuracy of 94.67% and 0.89 MCC. We also developed SVM models based on a hybrid approach (AAC and DC), which gave a maximum accuracy 94.86% and a 0.90 MCC. The models were tested on unseen or untrained datasets to assess their validity.Conclusion: The results indicate that the SVM based on the AAC and DC hybrid approach can be used to distinguish proteobacterial from plant protein sequences.Peer reviewedBiochemistry and Molecular Biolog

Crossref

Springer - Publisher Connector

PubMed Central

SHAREOK repository