Search CORE

13 research outputs found

Methodology flow.

Author: Dinesh Gupta (19194)
Zeenia Jagga (564251)
Publication venue
Publication date
Field of study

Framework of the computational method used for development of predictions models for plant virus encoded RNA silencing suppressors.</p

FigShare

Supervised Learning Classification Models for Prediction of Plant Virus Encoded RNA Silencing Suppressors

Author: Dinesh Gupta (19194)
Zeenia Jagga (564251)
Publication venue
Publication date: 14/05/2014
Field of study

<div>Viral encoded RNA silencing suppressor proteins interfere with the host RNA silencing machinery, facilitating viral infection by evading host immunity. In plant hosts, the viral proteins have several basic science implications and biotechnology applications. However in silico identification of these proteins is limited by their high sequence diversity. In this study we developed supervised learning based classification models for plant viral RNA silencing suppressor proteins in plant viruses. We developed four classifiers based on supervised learning algorithms: J48, Random Forest, LibSVM and Naïve Bayes algorithms, with enriched model learning by correlation based feature selection. Structural and physicochemical features calculated for experimentally verified primary protein sequences were used to train the classifiers. The training features include amino acid composition; auto correlation coefficients; composition, transition, and distribution of various physicochemical properties; and pseudo amino acid composition. Performance analysis of predictive models based on 10 fold cross-validation and independent data testing revealed that the Random Forest based model was the best and achieved 86.11% overall accuracy and 86.22% balanced accuracy with a remarkably high area under the Receivers Operating Characteristic curve of 0.95 to predict viral RNA silencing suppressor proteins. The prediction models for plant viral RNA silencing suppressors can potentially aid identification of novel viral RNA silencing suppressors, which will provide valuable insights into the mechanism of RNA silencing and could be further explored as potential targets for designing novel antiviral therapeutics. Also, the key subset of identified optimal features may help in determining compositional patterns in the viral proteins which are important determinants for RNA silencing suppressor activities. The best prediction model developed in the study is available as a freely accessible web server pVsupPred at <a href="http://bioinfo.icgeb.res.in/pvsup/" target="_blank">http://bioinfo.icgeb.res.in/pvsup/</a>.</div

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

FigShare

Number of positive and negative instances in testing and training dataset.

Author: Dinesh Gupta (19194)
Zeenia Jagga (564251)
Publication venue
Publication date
Field of study

Number of positive and negative instances in testing and training dataset.</p

FigShare

Sensitivity and Specificity plot.

Author: Dinesh Gupta (19194)
Zeenia Jagga (564251)
Publication venue
Publication date
Field of study

The plot compares sensitivity and specificity of the developed predictive models, to determine effective classifier for identifying positive and negative instances. Random Forest classifier has the highest sensitivity and specificity values as compared to J48, LibSVM and Naïve Bayes.</p

FigShare

Feature distribution of optimal feature subset generated by correlation based feature selection.

Author: Dinesh Gupta (19194)
Zeenia Jagga (564251)
Publication venue
Publication date
Field of study

Feature distribution of optimal feature subset generated by correlation based feature selection.</p

FigShare

ROC curves for the predictive performance of different cost sensitive classifiers.

Author: Dinesh Gupta (19194)
Zeenia Jagga (564251)
Publication venue
Publication date
Field of study

ROC plot depicts a relative trade-off between true positive rate and false positive rate of the predictions. The diagonal value represents a completely random guess. The corresponding scalar values of area under curve are given as auROC in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0097446#pone-0097446-t004" target="_blank">table 4</a>.</p

FigShare

Number of protein sequences after removing redundant proteins at thresholds of 90%, 70% and 40% using CD-HIT.

Author: Dinesh Gupta (19194)
Zeenia Jagga (564251)
Publication venue
Publication date
Field of study

Number of protein sequences after removing redundant proteins at thresholds of 90%, 70% and 40% using CD-HIT.</p

FigShare

Schematic representation of the dataset generation and filtration steps.

Author: Dinesh Gupta (19194)
Zeenia Jagga (564251)
Publication venue
Publication date
Field of study

The figure shows flowchart for the steps followed for sequence collection, filtering and redundancy removal for training dataset generation.</p

FigShare

Number of protein sequences after removing redundant proteins at thresholds of 90%, 70% and 40% using CD-HIT.

Author: Zeenia Jagga (564251)
Dinesh Gupta (19194)
Publication venue
Publication date: 14/05/2014
Field of study

Number of protein sequences after removing redundant proteins at thresholds of 90%, 70% and 40% using CD-HIT.</p

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

FigShare

Summary of statistical measures of the best classifiers on re-evaluation with independent dataset.

Author: Dinesh Gupta (19194)
Zeenia Jagga (564251)
Publication venue
Publication date
Field of study

Summary of statistical measures of the best classifiers on re-evaluation with independent dataset.</p

FigShare