124 research outputs found

    Profiled support vector machines for antisense oligonucleotide efficacy prediction

    Get PDF
    BACKGROUND: This paper presents the use of Support Vector Machines (SVMs) for prediction and analysis of antisense oligonucleotide (AO) efficacy. The collected database comprises 315 AO molecules including 68 features each, inducing a problem well-suited to SVMs. The task of feature selection is crucial given the presence of noisy or redundant features, and the well-known problem of the curse of dimensionality. We propose a two-stage strategy to develop an optimal model: (1) feature selection using correlation analysis, mutual information, and SVM-based recursive feature elimination (SVM-RFE), and (2) AO prediction using standard and profiled SVM formulations. A profiled SVM gives different weights to different parts of the training data to focus the training on the most important regions. RESULTS: In the first stage, the SVM-RFE technique was most efficient and robust in the presence of low number of samples and high input space dimension. This method yielded an optimal subset of 14 representative features, which were all related to energy and sequence motifs. The second stage evaluated the performance of the predictors (overall correlation coefficient between observed and predicted efficacy, r; mean error, ME; and root-mean-square-error, RMSE) using 8-fold and minus-one-RNA cross-validation methods. The profiled SVM produced the best results (r = 0.44, ME = 0.022, and RMSE= 0.278) and predicted high (>75% inhibition of gene expression) and low efficacy (<25%) AOs with a success rate of 83.3% and 82.9%, respectively, which is better than by previous approaches. A web server for AO prediction is available online at . CONCLUSIONS: The SVM approach is well suited to the AO prediction problem, and yields a prediction accuracy superior to previous methods. The profiled SVM was found to perform better than the standard SVM, suggesting that it could lead to improvements in other prediction problems as well

    Identification of sequence motifs significantly associated with antisense activity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Predicting the suppression activity of antisense oligonucleotide sequences is the main goal of the rational design of nucleic acids. To create an effective predictive model, it is important to know what properties of an oligonucleotide sequence associate significantly with antisense activity. Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model. This paper will discuss the results of a randomization procedure to find motifs that associate significantly with either high or low antisense suppression activity, analysis of their properties, as well as the results of support vector machine modelling using these significant motifs as features.</p> <p>Results</p> <p>We discovered 155 motifs that associate significantly with high antisense suppression activity and 202 motifs that associate significantly with low suppression activity. The motifs range in length from 2 to 5 bases, contain several motifs that have been previously discovered as associating highly with antisense activity, and have thermodynamic properties consistent with previous work associating thermodynamic properties of sequences with their antisense activity. Statistical analysis revealed no correlation between a motif's position within an antisense sequence and that sequences antisense activity. Also, many significant motifs existed as subwords of other significant motifs. Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs.</p> <p>Conclusion</p> <p>The thermodynamic properties of the significantly associated motifs support existing data correlating the thermodynamic properties of the antisense oligonucleotide with antisense efficiency, reinforcing our hypothesis that antisense suppression is strongly associated with probe/target thermodynamics, as there are no enzymatic mediators to speed the process along like the RNA Induced Silencing Complex (RISC) in RNAi. The independence of motif position and antisense activity also allows us to bypass consideration of this feature in the modelling process, promoting model efficiency and reducing the chance of overfitting when predicting antisense activity. The increase in SVR correlation with significant features compared to nearest-neighbour features indicates that thermodynamics alone is likely not the only factor in determining antisense efficiency.</p

    PFRED: A computational platform for siRNA and antisense oligonucleotides design [preprint]

    Get PDF
    PFRED a software application for the design, analysis, and visualization of antisense oligonucleotides and siRNA is described. The software provides an intuitive user-interface for scientists to design a library of siRNA or antisense oligonucleotides that target a specific gene of interest. Moreover, the tool facilitates the incorporation of various design criteria that have been shown to be important for stability and potency. PFRED has been made available as an open-source project so the code can be easily modified to address the future needs of the oligonucleotide research community. A compiled version is available for downloading at https://github.com/pfred/pfred-gui/releases as a java Jar file. The source code and the links for downloading the precompiled version can be found at https://github.com/pfred

    PFRED: A computational platform for siRNA and antisense oligonucleotides design

    Get PDF
    PFRED a software application for the design, analysis, and visualization of antisense oligonucleotides and siRNA is described. The software provides an intuitive user-interface for scientists to design a library of siRNA or antisense oligonucleotides that target a specific gene of interest. Moreover, the tool facilitates the incorporation of various design criteria that have been shown to be important for stability and potency. PFRED has been made available as an open-source project so the code can be easily modified to address the future needs of the oligonucleotide research community. A compiled version is available for downloading at https://github.com/pfred/pfred-gui/releases/tag/v1.0 as a java Jar file. The source code and the links for downloading the precompiled version can be found at https://github.com/pfred

    More complete gene silencing by fewer siRNAs: transparent optimized design and biophysical signature

    Get PDF
    Highly accurate knockdown functional analyses based on RNA interference (RNAi) require the possible most complete hydrolysis of the targeted mRNA while avoiding the degradation of untargeted genes (off-target effects). This in turn requires significant improvements to target selection for two reasons. First, the average silencing activity of randomly selected siRNAs is as low as 62%. Second, applying more than five different siRNAs may lead to saturation of the RNA-induced silencing complex (RISC) and to the degradation of untargeted genes. Therefore, selecting a small number of highly active siRNAs is critical for maximizing knockdown and minimizing off-target effects. To satisfy these needs, a publicly available and transparent machine learning tool is presented that ranks all possible siRNAs for each targeted gene. Support vector machines (SVMs) with polynomial kernels and constrained optimization models select and utilize the most predictive effective combinations from 572 sequence, thermodynamic, accessibility and self-hairpin features over 2200 published siRNAs. This tool reaches an accuracy of 92.3% in cross-validation experiments. We fully present the underlying biophysical signature that involves free energy, accessibility and dinucleotide characteristics. We show that while complete silencing is possible at certain structured target sites, accessibility information improves the prediction of the 90% active siRNA target sites. Fast siRNA activity predictions can be performed on our web server at

    Kernel methods in genomics and computational biology

    Full text link
    Support vector machines and kernel methods are increasingly popular in genomics and computational biology, due to their good performance in real-world applications and strong modularity that makes them suitable to a wide range of problems, from the classification of tumors to the automatic annotation of proteins. Their ability to work in high dimension, to process non-vectorial data, and the natural framework they provide to integrate heterogeneous data are particularly relevant to various problems arising in computational biology. In this chapter we survey some of the most prominent applications published so far, highlighting the particular developments in kernel methods triggered by problems in biology, and mention a few promising research directions likely to expand in the future

    Small RNA Sorting in Drosophila Produces Chemically Distinct Functional RNA-Protein Complexes: A Dissertation

    Get PDF
    Small interfering RNAs (siRNAs), microRNAs (miRNAs), and piRNAs (piRNA) are conserved classes of small single-stranded ~21-30 nucleotide (nt) RNA guides that repress eukaryotic gene expression using distinct RNA Induced Silencing Complexes (RISCs). At its core, RISC is composed of a single-stranded small RNA guide bound to a member of the Argonaute protein family, which together bind and repress complementary target RNA. miRNAs target protein coding mRNAs—a function essential for normal development and broadly involved in pathways of human disease; small interfering RNAs (siRNA) defend against viruses, but can also be engineered to direct experimental or therapeutic gene silencing; piwi associated RNAs (piRNAs) protect germline genomes from expansion of parasitic nucleic acids such as transposons. Using the fruit fly, Drosophila melanogaster, as a model organism we seek to understand how small silencing RNAs are made and how they function. In Drosophila, miRNAs and siRNAs are proposed to have parallel, but separate biogenesis and effector machinery. miRNA duplexes are excised from imperfectly paired hairpin precursors by Dicer1 and loaded into Ago1; siRNA duplexes are hewn from perfectly paired long dsRNA by Dicer2 and loaded into Ago2. Contrary to this model we found one miRNA, miR-277, is made by Dicer1, but partitions between Ago1 and Ago2 RISCs. These two RISCs are functionally distinct—Ago2 could silence a perfectly paired target, but not a centrally bulged target; Ago1 could silence a bulged target, but not a perfect target. This was surprising since both Ago1 and Ago2 have endonucleolytic cleavage activity necessary for perfect target cleavage in vitro. Our detailed kinetic studies suggested why—Ago2 is a robust multiple turnover enzyme, but Ago1 is not. Along with a complementary in vitro study our data supports a duplex sorting mechanism in which Diced duplexes are released, and rebind to Ago1 or Ago2 loading machinery, regardless of which Dicer produced them. This allows structural information embedded in small RNA duplexes to direct small RNA loading into Ago1 and/or Ago2, resulting in distinct regulatory outputs. Small RNA sorting also has chemical consequences for the small RNA guide. Although siRNAs were presumed to have the signature 2′, 3′ hydroxyl ends left by Dicer, we found that small RNAs loaded into Ago2 or Piwi proteins, but not Ago1, are modified at their 3´ ends by the RNA 2´-O-methyltransferase DmHen1. In plants Hen1 modifies the 3´ ends all small RNAs duplexs, protecting and stabilizing them. Implying a similar function in flies, piRNAs are smaller, less abundant, and their function is perturbed in hen1 mutants. But unlike plants, small RNAs are modified as single-strands in RISC rather than as duplexes. This nicely explains why the dsRNA binding domain in plant Hen1 was discarded in animals, and why both dsRNA derived siRNAs and ssRNA derived piRNAs are modified. The recent discovery that both piRNAs and siRNAs target transposons links terminal modification and transposon silencing, suggesting that it is specialized for this purpose

    Development of a Procedure for Genome-wide Expression Profiling from Minute Tissue Samples and Application in Mammary Carcinoma:Gene Activity Patterns Unveiling Molecular Pathways and Predicting Clinical Response

    Get PDF
    In this thesis, a novel procedure for linear amplification of messenger RNA (mRNA) molecules and labeling with fluorescently modified nucleotides was developed, that can be used to perform genome-wide expression analysis from minute tissue samples using microarrays of long gene-specific oligonucleotide DNA probes. The procedure was then applied to analyze core needle biopsies taken at time of diagnosis from tumors of female primary breast carcinoma patients. Upon receiving chemotherapy consisting of gemcitabine, epirubicin and docetaxel, the patients were classified according to their response to the chemotherapy into responders, defined as patients with a pathological complete remission of the tumor, and non-responders, defined as patients with no change or pathological partial remission. The gene expression profiles of the tumors from these patients were then bioinformatically processed and analyzed to identify a gene expression signature, which could be used to predict the response of the patients. Additionally, this gene signature was inspected for the significantly enriched pathways and biological processes, and a subset of genes was analyzed in the patient's biopsies with respect to RNA expression as validated by real-time quantitative polymerase chain reaction and protein expression as measured by immuno-histochemistry. The gene expression signature contained 512 genes, which allow a prediction of the patient response with an overall accuracy of 88%, a sensitivity of 78% and a specificity of 90%. Signaling pathways and biological processes identified with significant enrichment in the gene set were the Ras pathway, TGF β signaling, DNA damage response and apoptosis. From these pathways, the genes DAPK2, BAMBI, LMO4 and SMAD3 could be validated by RQ-PCR, but not SRC. In protein analysis by IHC, BAMBI was strongly associated with the patient's outcome, while BMP4, LMO4, SMAD3 and SRC were not directly associated. Additionally, BAMBI protein expression showed strong relationship with BRCA1 expression in the primary female breast carcinoma. Taken together, these results show the applicability of the novel developed procedure for amplification and labeling of mRNA for genome-wide gene expression analysis with the long oligonucleotide microarray technique and the successful use in biological and clinical investigations. The analysis of gene expression profiles of the primary breast tumors revealed an association of the Ras pathway, TGF β signaling, DNA damage response and apoptosis with the outcome of the patients after chemotherapy, as well as associations of several genes within these pathways and biological processes

    Exploring issues of balanced versus imbalanced samples in mapping grass community in the telperion reserve using high resolution images and selected machine learning algorithms

    Get PDF
    ABSTRACT Accurate vegetation mapping is essential for a number of reasons, one of which is for conservation purposes. The main objective of this research was to map different grass communities in the game reserve using RapidEye and Sentinel-2 MSI images and machine learning classifiers [support vector machine (SVM) and Random forest (RF)] to test the impacts of balanced and imbalance training data on the performance and the accuracy of Support Vector Machine and Random forest in mapping the grass communities and test the sensitivities of pixel resolution to balanced and imbalance training data in image classification. The imbalanced and balanced data sets were obtained through field data collection. The results show RF and SVM are producing a high overall accuracy for Sentinel-2 imagery for both the balanced and imbalanced data set. The RF classifier has yielded an overall accuracy of 79.45% and kappa of 74.38% and an overall accuracy of 76.19% and kappa of 73.21% using imbalanced and balanced training data respectively. The SVM classifier yielded an overall accuracy of 82.54% and kappa of 80.36% and an overall accuracy of 82.21% and a kappa of 78.33% using imbalanced and balanced training data respectively. For the RapidEye imagery, RF and SVM algorithm produced overall accuracy affected by a balanced data set leading to reduced accuracy. The RF algorithm had an overall accuracy that dropped by 6% (from 63.24% to 57.94%) while the SVM dropped by 7% (from 57.31% to 50.79%). The results thereby show that the imbalanced data set is a better option when looking at the image classification of vegetation species than the balanced data set. The study recommends the implementation of ways of handling misclassification among the different grass species to improve classification for future research. Further research can be carried out on other types of high resolution multispectral imagery using different advanced algorithms on different training size samples.EM201

    An update on novel approaches for diagnosis and treatment of SARS-CoV-2 infection

    Get PDF
    The ongoing pandemic of coronavirus disease 2019 (COVID-19) has made a serious public health and economic crisis worldwide which united global efforts to develop rapid, precise, and cost-efficient diagnostics, vaccines, and therapeutics. Numerous multi-disciplinary studies and techniques have been designed to investigate and develop various approaches to help frontline health workers, policymakers, and populations to overcome the disease. While these techniques have been reviewed within individual disciplines, it is now timely to provide a cross-disciplinary overview of novel diagnostic and therapeutic approaches summarizing complementary efforts across multiple fields of research and technology. Accordingly, we reviewed and summarized various advanced novel approaches used for diagnosis and treatment of COVID-19 to help researchers across diverse disciplines on their prioritization of resources for research and development and to give them better a picture of the latest techniques. These include artificial intelligence, nano-based, CRISPR-based, and mass spectrometry technologies as well as neutralizing factors and traditional medicines. We also reviewed new approaches for vaccine development and developed a dashboard to provide frequent updates on the current and future approved vaccines
    • …
    corecore