16 research outputs found

    Learning from positive examples when the negative class is undetermined- microRNA gene identification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The application of machine learning to classification problems that depend only on positive examples is gaining attention in the computational biology community. We and others have described the use of two-class machine learning to identify novel miRNAs. These methods require the generation of an artificial negative class. However, designation of the negative class can be problematic and if it is not properly done can affect the performance of the classifier dramatically and/or yield a biased estimate of performance. We present a study using one-class machine learning for microRNA (miRNA) discovery and compare one-class to two-class approaches using naïve Bayes and Support Vector Machines. These results are compared to published two-class miRNA prediction approaches. We also examine the ability of the one-class and two-class techniques to identify miRNAs in newly sequenced species.</p> <p>Results</p> <p>Of all methods tested, we found that 2-class naive Bayes and Support Vector Machines gave the best accuracy using our selected features and optimally chosen negative examples. One class methods showed average accuracies of 70–80% versus 90% for the two 2-class methods on the same feature sets. However, some one-class methods outperform some recently published two-class approaches with different selected features. Using the EBV genome as and external validation of the method we found one-class machine learning to work as well as or better than a two-class approach in identifying true miRNAs as well as predicting new miRNAs.</p> <p>Conclusion</p> <p>One and two class methods can both give useful classification accuracies when the negative class is well characterized. The advantage of one class methods is that it eliminates guessing at the optimal features for the negative class when they are not well defined. In these cases one-class methods can be superior to two-class methods when the features which are chosen as representative of that positive class are well defined.</p> <p>Availability</p> <p>The OneClassmiRNA program is available at: <abbrgrp><abbr bid="B1">1</abbr></abbrgrp></p

    Comparison of four Ab initio MicroRNA prediction tools

    Get PDF
    International Conference on Bioinformatics Models, Methods and Algorithms, BIOINFORMATICS 2013; Barcelona; Spain; 11 February 2013 through 14 February 2013MicroRNAs are small RNA sequences of 18-24 nucleotides in length, which serve as templates to drive post transcriptional gene silencing. The canonical microRNA pathway starts with transcription from DNA and is followed by processing by the Microprocessor complex, yielding a hairpin structure. This is then exported into the cytosol where it is processed by Dicer and next incorporated into the RNA induced silencing complex. All of these biogenesis steps add to the overall specificity of miRNA production and effect. Unfortunately, experimental detection of miRNAs is cumbersome and therefore computational tools are necessary. Homology-based miRNA prediction tools are limited by fast miRNA evolution and by the fact that they are template driven. Ab initio miRNA prediction methods have been proposed but they have not been analyzed competitively so that their relative performance is largely unknown. Here we implement the features proposed in four miRNA ab initio studies and evaluate them on two data sets. Using the features described in Bentwich 2008 leads to the highest accuracy but still does not provide enough confidence into the results to warrant experimental validation of all predictions in a larger genome like the human genome. Copyright © 2013 SCITEPRESS - Science and Technology Publications.Turkish Academy of Science

    The impact of feature selection on one and two-class classification performance for plant microRNAs

    Get PDF
    MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18-24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore, machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being ~29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is ~13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features.The Scientific and Technological Research Council of Turkey (grant number 113E326

    One-class models for validation of miRNAs and ERBB2 gene interactions based on sequence features for breast cancer scenarios

    Get PDF
    One challenge in miRNA–genes–diseases interaction studies is that it is challenging to find labeled data that indicate a positive or negative relationship between miRNA and genes. The use of one-class classification methods shows a promising path for validating them. We have applied two one-class classification methods, Isolation Forest and One-class SVM, to validate miRNAs interactions with the ERBB2 gene present in breast cancer scenarios using features extracted via sequence-binding. We found that the One-class SVM outperforms the Isolation Forest model, with values of sensitivity of 80.49% and a specificity of 86.49% showing results that are comparable to previous studies. Additionally, we have demonstrated that the use of features extracted from a sequence-based approach (considering miRNA and gene sequence binding characteristics) and one-class models have proven to be a feasible method for validating these genetic molecule interactions

    MicroRNA Identification Based on Bioinformatics Approaches

    Get PDF

    Joint sub-classifiers one class classification model for avian influenza outbreak detection

    Full text link
    H5N1 avian influenza outbreak detection is a significant issue for early warning of epidemics. This paper proposes domain knowledge-based joint one class classification model for avian influenza outbreak. Instead of focusing on manipulations of the one class classification model, we delve into the one class avian influenza dataset, divide it into sub-classes by domain knowledge, train the sub-class classifiers and unify the result of each classifier. The proposed joint method solves the one class classification and features selection problems together. The experiment results demonstrate that the proposed joint model definitely outperforms the normal one class classification model on the animal avian influenza dataset. © 2011 Imperial College Press

    A framework for improving microRNA prediction in non-human genomes

    Get PDF
    The prediction of novel pre-microRNA (miRNA) from genomic sequence has received considerable attention recently. However, the majority of studies have focused on the human genome. Previous studies have demonstrated that sensitivity (correctly detecting true miRNA) is sustained when human-trained methods are applied to other species, however they have failed to report the dramatic drop in specificity (the ability to correctly reject non-miRNA sequences) in
    corecore