355 research outputs found

    More complete gene silencing by fewer siRNAs: transparent optimized design and biophysical signature

    Get PDF
    Highly accurate knockdown functional analyses based on RNA interference (RNAi) require the possible most complete hydrolysis of the targeted mRNA while avoiding the degradation of untargeted genes (off-target effects). This in turn requires significant improvements to target selection for two reasons. First, the average silencing activity of randomly selected siRNAs is as low as 62%. Second, applying more than five different siRNAs may lead to saturation of the RNA-induced silencing complex (RISC) and to the degradation of untargeted genes. Therefore, selecting a small number of highly active siRNAs is critical for maximizing knockdown and minimizing off-target effects. To satisfy these needs, a publicly available and transparent machine learning tool is presented that ranks all possible siRNAs for each targeted gene. Support vector machines (SVMs) with polynomial kernels and constrained optimization models select and utilize the most predictive effective combinations from 572 sequence, thermodynamic, accessibility and self-hairpin features over 2200 published siRNAs. This tool reaches an accuracy of 92.3% in cross-validation experiments. We fully present the underlying biophysical signature that involves free energy, accessibility and dinucleotide characteristics. We show that while complete silencing is possible at certain structured target sites, accessibility information improves the prediction of the 90% active siRNA target sites. Fast siRNA activity predictions can be performed on our web server at

    Improving model predictions for RNA interference activities that use support vector machine regression by combining and filtering features

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>RNA interference (RNAi) is a naturally occurring phenomenon that results in the suppression of a target RNA sequence utilizing a variety of possible methods and pathways. To dissect the factors that result in effective siRNA sequences a regression kernel Support Vector Machine (SVM) approach was used to quantitatively model RNA interference activities.</p> <p>Results</p> <p>Eight overall feature mapping methods were compared in their abilities to build SVM regression models that predict published siRNA activities. The primary factors in predictive SVM models are position specific nucleotide compositions. The secondary factors are position independent sequence motifs (<it>N</it>-grams) and guide strand to passenger strand sequence thermodynamics. Finally, the factors that are least contributory but are still predictive of efficacy are measures of intramolecular guide strand secondary structure and target strand secondary structure. Of these, the site of the 5' most base of the guide strand is the most informative.</p> <p>Conclusion</p> <p>The capacity of specific feature mapping methods and their ability to build predictive models of RNAi activity suggests a relative biological importance of these features. Some feature mapping methods are more informative in building predictive models and overall <it>t</it>-test filtering provides a method to remove some noisy features or make comparisons among datasets. Together, these features can yield predictive SVM regression models with increased predictive accuracy between predicted and observed activities both within datasets by cross validation, and between independently collected RNAi activity datasets. Feature filtering to remove features should be approached carefully in that it is possible to reduce feature set size without substantially reducing predictive models, but the features retained in the candidate models become increasingly distinct. Software to perform feature prediction and SVM training and testing on nucleic acid sequences can be found at the following site: <url>ftp://scitoolsftp.idtdna.com/SEQ2SVM/</url>.</p

    Predicting siRNA potency with random forests and support vector machines

    Get PDF
    Abstract Background Short interfering RNAs (siRNAs) can be used to knockdown gene expression in functional genomics. For a target gene of interest, many siRNA molecules may be designed, whereas their efficiency of expression inhibition often varies. Results To facilitate gene functional studies, we have developed a new machine learning method to predict siRNA potency based on random forests and support vector machines. Since there were many potential sequence features, random forests were used to select the most relevant features affecting gene expression inhibition. Support vector machine classifiers were then constructed using the selected sequence features for predicting siRNA potency. Interestingly, gene expression inhibition is significantly affected by nucleotide dimer and trimer compositions of siRNA sequence. Conclusions The findings in this study should help design potent siRNAs for functional genomics, and might also provide further insights into the molecular mechanism of RNA interference

    Reconsideration of In-Silico siRNA Design Based on Feature Selection: A Cross-Platform Data Integration Perspective

    Get PDF
    RNA interference via exogenous short interference RNAs (siRNA) is increasingly more widely employed as a tool in gene function studies, drug target discovery and disease treatment. Currently there is a strong need for rational siRNA design to achieve more reliable and specific gene silencing; and to keep up with the increasing needs for a wider range of applications. While progress has been made in the ability to design siRNAs with specific targets, we are clearly at an infancy stage towards achieving rational design of siRNAs with high efficacy. Among the many obstacles to overcome, lack of general understanding of what sequence features of siRNAs may affect their silencing efficacy and of large-scale homogeneous data needed to carry out such association analyses represents two challenges. To address these issues, we investigated a feature-selection based in-silico siRNA design from a novel cross-platform data integration perspective. An integration analysis of 4,482 siRNAs from ten meta-datasets was conducted for ranking siRNA features, according to their possible importance to the silencing efficacy of siRNAs across heterogeneous data sources. Our ranking analysis revealed for the first time the most relevant features based on cross-platform experiments, which compares favorably with the traditional in-silico siRNA feature screening based on the small samples of individual platform data. We believe that our feature ranking analysis can offer more creditable suggestions to help improving the design of siRNA with specific silencing targets. Data and scripts are available at http://csbl.bmb.uga.edu/publications/materials/qiliu/siRNA.html

    Designing of highly effective complementary and mismatch siRNAs for silencing a gene

    Get PDF
    In past, numerous methods have been developed for predicting efficacy of short interfering RNA (siRNA). However these methods have been developed for predicting efficacy of fully complementary siRNA against a gene. Best of author's knowledge no method has been developed for predicting efficacy of mismatch siRNA against a gene. In this study, a systematic attempt has been made to identify highly effective complementary as well as mismatch siRNAs for silencing a gene. Support vector machine (SVM) based models have been developed for predicting efficacy of siRNAs using composition, binary and hybrid pattern siRNAs. We achieved maximum correlation 0.67 between predicted and actual efficacy of siRNAs using hybrid model. All models were trained and tested on a dataset of 2182 siRNAs and performance was evaluated using five-fold cross validation techniques. The performance of our method desiRm is comparable to other well-known methods. In this study, first time attempt has been made to design mutant siRNAs (mismatch siRNAs). In this approach we mutated a given siRNA on all possible sites/positions with all possible nucleotides. Efficacy of each mutated siRNA is predicted using our method desiRm. It is well known from literature that mismatches between siRNA and target affects the silencing efficacy. Thus we have incorporated the rules derived from base mismatches experimental data to find out over all efficacy of mutated or mismatch siRNAs. Finally we developed a webserver, desiRm (http://www.imtech.res.in/raghava/desirm/) for designing highly effective siRNA for silencing a gene. This tool will be helpful to design siRNA to degrade disease isoform of heterozygous single nucleotide polymorphism gene without depleting the wild type protein

    Prediction of guide strand of microRNAs from its sequence and secondary structure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>MicroRNAs (miRNAs) are produced by the sequential processing of a long hairpin RNA transcript by Drosha and Dicer, an RNase III enzymes, and form transitory small RNA duplexes. One strand of the duplex, which incorporates into RNA-induced silencing complex (RISC) and silences the gene expression is called guide strand, or miRNA; while the other strand of duplex is degraded and called the passenger strand, or miRNA*. Predicting the guide strand of miRNA is important for better understanding the RNA interference pathways.</p> <p>Results</p> <p>This paper describes support vector machine (SVM) models developed for predicting the guide strands of miRNAs. All models were trained and tested on a dataset consisting of 329 miRNA and 329 miRNA* pairs using five fold cross validation technique. Firstly, models were developed using mono-, di-, and tri-nucleotide composition of miRNA strands and achieved the highest accuracies of 0.588, 0.638 and 0.596 respectively. Secondly, models were developed using split nucleotide composition and achieved maximum accuracies of 0.553, 0.641 and 0.602 for mono-, di-, and tri-nucleotide respectively. Thirdly, models were developed using binary pattern and achieved the highest accuracy of 0.708. Furthermore, when integrating the secondary structure features with binary pattern, an accuracy of 0.719 was seen. Finally, hybrid models were developed by combining various features and achieved maximum accuracy of 0.799 with sensitivity 0.781 and specificity 0.818. Moreover, the performance of this model was tested on an independent dataset that achieved an accuracy of 0.80. In addition, we also compared the performance of our method with various siRNA-designing methods on miRNA and siRNA datasets.</p> <p>Conclusion</p> <p>In this study, first time a method has been developed to predict guide miRNA strands, of miRNA duplex. This study demonstrates that guide and passenger strand of miRNA precursors can be distinguished using their nucleotide sequence and secondary structure. This method will be useful in understanding microRNA processing and can be implemented in RNA silencing technology to improve the biological and clinical research. A web server has been developed based on SVM models described in this study <url>http://crdd.osdd.net:8081/RISCbinder/</url>.</p

    Machine learning and data mining frameworks for predicting drug response in cancer:An overview and a novel <i>in silico</i> screening process based on association rule mining

    Get PDF
    corecore