1,020 research outputs found

    TargetSpy: a supervised machine learning approach for microRNA target prediction

    Get PDF
    [Background] Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites. [Results] We developed TargetSpy, a novel computational approach for predicting target sites regardless of the presence of a seed match. It is based on machine learning and automatic feature selection using a wide spectrum of compositional, structural, and base pairing features covering current biological knowledge. Our model does not rely on evolutionary conservation, which allows the detection of species-specific interactions and makes TargetSpy suitable for analyzing unconserved genomic sequences. In order to allow for an unbiased comparison of TargetSpy to other methods, we classified all algorithms into three groups: I) no seed match requirement, II) seed match requirement, and III) conserved seed match requirement. TargetSpy predictions for classes II and III are generated by appropriate postfiltering. On a human dataset revealing fold-change in protein production for five selected microRNAs our method shows superior performance in all classes. In Drosophila melanogaster not only our class II and III predictions are on par with other algorithms, but notably the class I (no-seed) predictions are just marginally less accurate. We estimate that TargetSpy predicts between 26 and 112 functional target sites without a seed match per microRNA that are missed by all other currently available algorithms. [Conclusion] Only a few algorithms can predict target sites without demanding a seed match and TargetSpy demonstrates a substantial improvement in prediction accuracy in that class. Furthermore, when conservation and the presence of a seed match are required, the performance is comparable with state-of-the-art algorithms. TargetSpy was trained on mouse and performs well in human and drosophila, suggesting that it may be applicable to a broad range of species. Moreover, we have demonstrated that the application of machine learning techniques in combination with upcoming deep sequencing data results in a powerful microRNA target site prediction tool http://www.targetspy.org webcite.The work of MH was supported by the Spanish Government (Grant number: BIO2008.01353) and by the Junta de Andalucia (Grant number P07-FQM-03613)

    MicroRNA Identification Based on Bioinformatics Approaches

    Get PDF

    miRCat2: Accurate prediction of plant and animal microRNAs from next-generation sequencing datasets

    Get PDF
    Motivation: MicroRNAs are a class of ∼21-22 nucleotide small RNAs which are excised from a stable hairpin-like secondary structure. They have important gene regulatory functions and are involved in many pathways including developmental timing, organogenesis and development in eukaryotes. There are several computational tools for miRNA detection from next-generation sequencing (NGS) datasets. However, many of these tools suffer from high false positive and false negative rates. Here we present a novel miRNA prediction algorithm, miRCat2. miRCat2 incorporates a new entropy-based approach to detect miRNA loci, which is designed to cope with the high sequencing depth of current NGS datasets. It has a user-friendly interface and produces graphical representations of the hairpin structure and plots depicting the alignment of sequences on the secondary structure. Results: We tested miRCat2 on a number of animal and plant datasets and present a comparative analysis with miRCat, miRDeep2, miRPlant and miReap. We also use mutants in the miRNA biogenesis pathway to evaluate the predictions of these tools. Results indicate that miRCat2 has an improved accuracy compared with other methods tested. Moreover, miRCat2 predicts several new miRNAs that are differentially expressed in wildtype versus mutants in the miRNA biogenesis pathway. Availability: miRCat2 is part of the UEA small RNA Workbench and is freely available from http://srnaworkbench.cmp.uea.ac.uk

    Methods in and Applications of the Sequencing of Short Non-Coding RNAs

    Get PDF
    Short non-coding RNAs are important for all domains of life. With the advent of modern molecular biology their applicability to medicine has become apparent in settings ranging from diagonistic biomarkers to therapeutics and fields ranging from oncology to neurology. In addition, a critical, recent technological development is high-throughput sequencing of nucleic acids. The convergence of modern biotechnology with developments in RNA biology presents opportunities in both basic research and medical settings. Here I present two novel methods for leveraging high-throughput sequencing in the study of short non-coding RNAs, as well as a study in which they are applied to Alzheimer\u27s Disease (AD). The computational methods presented here include High-throughput Annotation of Modified Ribonucleotides (HAMR), which enables researchers to detect post-transcriptional covalent modifications to RNAs in a high-throughput manner. In addition, I describe Classification of RNAs by Analysis of Length (CoRAL), a computational method that allows researchers to characterize the pathways responsible for short non-coding RNA biogenesis. Lastly, I present an application of the study of non-coding RNAs to Alzheimer\u27s disease. When applied to the study of AD, it is apparent that several classes of non-coding RNAs, particularly tRNAs and tRNA fragments, show striking changes in the dorsolateral prefrontal cortex of affected human brains. Interestingly, the nature of these changes differs between mitochondrial and nuclear tRNAs, implicating an association between Alzheimer\u27s disease and perturbation of mitochondrial function. In addition, by combining known genetic factors of AD with genes that are differentially expressed and targets of regulatory RNAs that are differentially expressed, I construct a network of genes that are potentially relevant to the pathogenesis of the disease. By combining genetics data with novel results from the study of non-coding RNAs, we can further elucidate the molecular mechanisms that underly Alzheimer\u27s disease pathogenesis

    Methods in and Applications of the Sequencing of Short Non-Coding RNAs

    Get PDF
    Short non-coding RNAs are important for all domains of life. With the advent of modern molecular biology their applicability to medicine has become apparent in settings ranging from diagonistic biomarkers to therapeutics and fields ranging from oncology to neurology. In addition, a critical, recent technological development is high-throughput sequencing of nucleic acids. The convergence of modern biotechnology with developments in RNA biology presents opportunities in both basic research and medical settings. Here I present two novel methods for leveraging high-throughput sequencing in the study of short non-coding RNAs, as well as a study in which they are applied to Alzheimer\u27s Disease (AD). The computational methods presented here include High-throughput Annotation of Modified Ribonucleotides (HAMR), which enables researchers to detect post-transcriptional covalent modifications to RNAs in a high-throughput manner. In addition, I describe Classification of RNAs by Analysis of Length (CoRAL), a computational method that allows researchers to characterize the pathways responsible for short non-coding RNA biogenesis. Lastly, I present an application of the study of non-coding RNAs to Alzheimer\u27s disease. When applied to the study of AD, it is apparent that several classes of non-coding RNAs, particularly tRNAs and tRNA fragments, show striking changes in the dorsolateral prefrontal cortex of affected human brains. Interestingly, the nature of these changes differs between mitochondrial and nuclear tRNAs, implicating an association between Alzheimer\u27s disease and perturbation of mitochondrial function. In addition, by combining known genetic factors of AD with genes that are differentially expressed and targets of regulatory RNAs that are differentially expressed, I construct a network of genes that are potentially relevant to the pathogenesis of the disease. By combining genetics data with novel results from the study of non-coding RNAs, we can further elucidate the molecular mechanisms that underly Alzheimer\u27s disease pathogenesis
    corecore