Search CORE

Improving model predictions for RNA interference activities that use support vector machine regression by combining and filtering features

Abstract Background RNA interference (RNAi) is a naturally occurring phenomenon that results in the suppression of a target RNA sequence utilizing a variety of possible methods and pathways. To dissect the factors that result in effective siRNA sequences a regression kernel Support Vector Machine (SVM) approach was used to quantitatively model RNA interference activities. Results Eight overall feature mapping methods were compared in their abilities to build SVM regression models that predict published siRNA activities. The primary factors in predictive SVM models are position specific nucleotide compositions. The secondary factors are position independent sequence motifs (<it>N</it>-grams) and guide strand to passenger strand sequence thermodynamics. Finally, the factors that are least contributory but are still predictive of efficacy are measures of intramolecular guide strand secondary structure and target strand secondary structure. Of these, the site of the 5' most base of the guide strand is the most informative. Conclusion The capacity of specific feature mapping methods and their ability to build predictive models of RNAi activity suggests a relative biological importance of these features. Some feature mapping methods are more informative in building predictive models and overall <it>t</it>-test filtering provides a method to remove some noisy features or make comparisons among datasets. Together, these features can yield predictive SVM regression models with increased predictive accuracy between predicted and observed activities both within datasets by cross validation, and between independently collected RNAi activity datasets. Feature filtering to remove features should be approached carefully in that it is possible to reduce feature set size without substantially reducing predictive models, but the features retained in the candidate models become increasingly distinct. Software to perform feature prediction and SVM training and testing on nucleic acid sequences can be found at the following site: <url>ftp://scitoolsftp.idtdna.com/SEQ2SVM/</url>.</p

Springer - Publisher Connector

arXiv.org e-Print Archive

Kernel methods in genomics and computational biology

Author: Vert Jean-Philippe
Publication venue
Publication date: 17/10/2005
Field of study

Support vector machines and kernel methods are increasingly popular in genomics and computational biology, due to their good performance in real-world applications and strong modularity that makes them suitable to a wide range of problems, from the classification of tumors to the automatic annotation of proteins. Their ability to work in high dimension, to process non-vectorial data, and the natural framework they provide to integrate heterogeneous data are particularly relevant to various problems arising in computational biology. In this chapter we survey some of the most prominent applications published so far, highlighting the particular developments in kernel methods triggered by problems in biology, and mention a few promising research directions likely to expand in the future

HAL-MINES ParisTech

PFRED: A computational platform for siRNA and antisense oligonucleotides design [preprint]

Author: Caffrey Daniel R.
Cao Qing
Cruz Dario
Hughes Jason D.
Lawrence Christine
Rotstein Sergio
Sciabola Simone
Stanton Robert
Xi Hualin
Zhang Tianhong
Publication venue: eScholarship@UMassChan
Publication date: 25/08/2020
Field of study

PFRED a software application for the design, analysis, and visualization of antisense oligonucleotides and siRNA is described. The software provides an intuitive user-interface for scientists to design a library of siRNA or antisense oligonucleotides that target a specific gene of interest. Moreover, the tool facilitates the incorporation of various design criteria that have been shown to be important for stability and potency. PFRED has been made available as an open-source project so the code can be easily modified to address the future needs of the oligonucleotide research community. A compiled version is available for downloading at https://github.com/pfred/pfred-gui/releases as a java Jar file. The source code and the links for downloading the precompiled version can be found at https://github.com/pfred

eScholarship@UMMS

An accurate and interpretable model for siRNA efficacy prediction

Author: A Fire
A Khvorova
A Reynolds
AL Jackson
B Efron
B Haley
B Heale
C Cogoni
Christian Lajaunie
D Huesken
D Semizarov
DC Baulcombe
DH Mathews
DS Schwarz
G Hutvágner
G Meister
GJ Hannon
H Xia
J Harborth
J Ma
Jean-Philippe Vert
K Huppi
K Ui-Tei
M Amarzguioui
M Overhoff
M Zuker
MT McManus
Nicolas Foveau
NJ Caplen
P Saetrom
P Saetrom
P Zamore
Q Boese
R Teramoto
R Tibshirani
RA Jorgensen
RM Surabhi
S Schubert
S Shabalina
SI Pai
SM Elbashir
SM Freier
SM Yiu
T Holen
T Tuschl
T Xia
TA Vickers
Y Ren
Yves Vandenbrouck
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The use of exogenous small interfering RNAs (siRNAs) for gene silencing has quickly become a widespread molecular tool providing a powerful means for gene functional study and new drug target identification. Although considerable progress has been made recently in understanding how the RNAi pathway mediates gene silencing, the design of potent siRNAs remains challenging. RESULTS: We propose a simple linear model combining basic features of siRNA sequences for siRNA efficacy prediction. Trained and tested on a large dataset of siRNA sequences made recently available, it performs as well as more complex state-of-the-art models in terms of potency prediction accuracy, with the advantage of being directly interpretable. The analysis of this linear model allows us to detect and quantify the effect of nucleotide preferences at particular positions, including previously known and new observations. We also detect and quantify a strong propensity of potent siRNAs to contain short asymmetric motifs in their sequence, and show that, surprisingly, these motifs alone contain at least as much relevant information for potency prediction as the nucleotide preferences for particular positions. CONCLUSION: The model proposed for prediction of siRNA potency is as accurate as a state-of-the-art nonlinear model and is easily interpretable in terms of biological features. It is freely available on the web a

Springer - Publisher Connector

Public Library of Science (PLOS)

HAL-MINES ParisTech

HAL-CEA

Reconsideration of In-Silico siRNA Design Based on Feature Selection: A Cross-Platform Data Integration Perspective

Author: A Fire
A Khvorova
A Reynolds
AM Chalk
AS Peek
B Jagla
C Phalon
D Castanotto
D Huesken
GR Devi
Han Zhou
J Liu
JP Vert
Juan Cui
JW Klingelhoefer
L Wang
M Ghildiyal
O Matveeva
P Saetrom
PD Zamore
Q Liu
Qi Liu
R Teramoto
S Ji
S Lin
SA Shabalina
SM Elbashir
SM Yiu
T Holen
T Katoh
T Tuschl
V Pihur
V Pihur
V Pihur
VN Kim
VS Gomase
W Gong
W Gong
Y Naito
Y Pei
Y Ren
Y Shao
Yi Xing
Ying Xu
Zhiwei Cao
ZJ Lu
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

RNA interference via exogenous short interference RNAs (siRNA) is increasingly more widely employed as a tool in gene function studies, drug target discovery and disease treatment. Currently there is a strong need for rational siRNA design to achieve more reliable and specific gene silencing; and to keep up with the increasing needs for a wider range of applications. While progress has been made in the ability to design siRNAs with specific targets, we are clearly at an infancy stage towards achieving rational design of siRNAs with high efficacy. Among the many obstacles to overcome, lack of general understanding of what sequence features of siRNAs may affect their silencing efficacy and of large-scale homogeneous data needed to carry out such association analyses represents two challenges. To address these issues, we investigated a feature-selection based in-silico siRNA design from a novel cross-platform data integration perspective. An integration analysis of 4,482 siRNAs from ten meta-datasets was conducted for ranking siRNA features, according to their possible importance to the silencing efficacy of siRNAs across heterogeneous data sources. Our ranking analysis revealed for the first time the most relevant features based on cross-platform experiments, which compares favorably with the traditional in-silico siRNA feature screening based on the small samples of individual platform data. We believe that our feature ranking analysis can offer more creditable suggestions to help improving the design of siRNA with specific silencing targets. Data and scripts are available at http://csbl.bmb.uga.edu/publications/materials/qiliu/siRNA.html

DigitalCommons@University of Nebraska

Springer - Publisher Connector

Selecting effective siRNA sequences by using radial basis function network and decision tree learning

Author: Kawamura Yoshihiro
Konagaya Akihiko
Takasaki Shigeru
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Although short interfering RNA (siRNA) has been widely used for studying gene functions in mammalian cells, its gene silencing efficacy varies markedly and there are only a few consistencies among the recently reported design rules/guidelines for selecting siRNA sequences effective for mammalian genes. Another shortcoming of the previously reported methods is that they cannot estimate the probability that a candidate sequence will silence the target gene. RESULTS: We propose two prediction methods for selecting effective siRNA target sequences from many possible candidate sequences, one based on the supervised learning of a radial basis function (RBF) network and other based on decision tree learning. They are quite different from the previous score-based siRNA design techniques and can predict the probability that a candidate siRNA sequence will be effective. The proposed methods were evaluated by applying them to recently reported effective and ineffective siRNA sequences for various genes (15 genes, 196 siRNA sequences). We also propose the combined prediction method of the RBF network and decision tree learning. As the average prediction probabilities of gene silencing for the effective and ineffective siRNA sequences of the reported genes by the proposed three methods were respectively 65% and 32%, 56.6% and 38.1%, and 68.5% and 28.1%, the methods imply high estimation accuracy for selecting candidate siRNA sequences. CONCLUSION: New prediction methods were presented for selecting effective siRNA sequences. As the proposed methods indicated high estimation accuracy for selecting candidate siRNA sequences, they would be useful for many other genes

Public Library of Science (PLOS)

Comparing Artificial Neural Networks, General Linear Models and Support Vector Machines in Building Predictive Models for Small Interfering RNAs

Author: A Fire
A Henschel
A Khvorova
A Reynolds
AC Hsieh
AM Chalk
Andrew S. Peek
AS Peek
B Jagla
C Nadeau
C Xue
D Huesken
D Huesken
DK Walters
DS Schwarz
G Ge
H Tafer
I Bradac
I Ladunga
JP Vert
K Ui-Tei
Kyle A. McQuisten
L Poliseno
M Amarzguioui
M Ichihara
O Matveeva
P Jia
P Jiang
P Sætrom
P Sætrom
R Kretschmer-Kazemi Far
R Teramoto
RS de Almeida
S Takasaki
SA Bohula EA
SA Shabalina
SM Yiu
Stefan Wölfl
T Holen
T Katoh
TA Vickers
TG Dietterich
W Gong
ZJ Lu
Publication venue: Public Library of Science
Publication date: 01/10/2009
Field of study

Exogenous short interfering RNAs (siRNAs) induce a gene knockdown effect in cells by interacting with naturally occurring RNA processing machinery. However not all siRNAs induce this effect equally. Several heterogeneous kinds of machine learning techniques and feature sets have been applied to modeling siRNAs and their abilities to induce knockdown. There is some growing agreement to which techniques produce maximally predictive models and yet there is little consensus for methods to compare among predictive models. Also, there are few comparative studies that address what the effect of choosing learning technique, feature set or cross validation approach has on finding and discriminating among predictive models.Three learning techniques were used to develop predictive models for effective siRNA sequences including Artificial Neural Networks (ANNs), General Linear Models (GLMs) and Support Vector Machines (SVMs). Five feature mapping methods were also used to generate models of siRNA activities. The 2 factors of learning technique and feature mapping were evaluated by complete 3x5 factorial ANOVA. Overall, both learning techniques and feature mapping contributed significantly to the observed variance in predictive models, but to differing degrees for precision and accuracy as well as across different kinds and levels of model cross-validation.The methods presented here provide a robust statistical framework to compare among models developed under distinct learning techniques and feature sets for siRNAs. Further comparisons among current or future modeling approaches should apply these or other suitable statistically equivalent methods to critically evaluate the performance of proposed models. ANN and GLM techniques tend to be more sensitive to the inclusion of noisy features, but the SVM technique is more robust under large numbers of features for measures of model precision and accuracy. Features found to result in maximally predictive models are not consistent across learning techniques, suggesting care should be taken in the interpretation of feature relevance. In the models developed here, there are statistically differentiable combinations of learning techniques and feature mapping methods where the SVM technique under a specific combination of features significantly outperforms all the best combinations of features within the ANN and GLM techniques