11 research outputs found

    The impact of feature selection on one and two-class classification performance for plant microRNAs

    Get PDF
    MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18-24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore, machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being ~29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is ~13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features.The Scientific and Technological Research Council of Turkey (grant number 113E326

    MikroRNA metabolik ağlarının bilişimsel kurulumu

    No full text
    Thesis (Doctoral)--Izmir Institute of Technology, Molecular Biology and Genetics, Izmir, 2017Full text release delayed at author's request until 2020.07.02Includes bibliographical references (leaves: 40-47)Text in English; Abstract: Turkish and EnglishMicroRNAs (miRNAs) are single-stranded, small, non-coding RNAs, that control gene expression at the post transcriptional level through various mechanisms such as translational inhibition, degradation and destabilisation of their target mRNAs. Despite the fact that thousands of miRNAs have been reported in various species, most still remain unknown. Due to this, the identification of new miRNAs is an essential process for analysing miRNA mediated post transcriptional regulation mechanisms. Moreover, many biological approaches suffer from limitations in their capacity to reveal rare miRNAs, and are further restricted to the state of the organism under examination. Such limitations have resulted in the construction of sophisticated computational tools for identification of possible miRNAs in silico. However, these programs suffer from low sensitivity and/or accuracy and as a result they do not provide enough confidence for validating all their predictions experimentally. In this study, the aim is overcoming these challenges by creating a new and adaptable machine learning based method to predict potential miRNAs in any given sequence. The efficiency of proposed method is shown by comparison with available tools on various data sets. By using this approach, miRNAs from the genomes of various organisms like human (Homo sapiens), fly (Drosophila melanogaster) and tomato (Solanum lycopersicum) are identified. Moreover, networks between the possible miRNAs of virus and human genes as well as the communications among nuclear and organelle genomes of Solanum lycopersicum through miRNAs are investigated.MikroRNAlar (miRNAlar) tek diziden oluşan, küçük, kodlayıcı olmayan, hedef mRNAlarının translasyonel inhibisyonu, bozunması ve kararsızlaşması gibi çeşitli mekanizmalar aracılığıyla transkripsiyon sonrası seviyesinde gen ekspresyonunu kontrol edebilen RNAlardır. Farklı türlerde binlerce miRNA rapor edilmesine rağmen çoğu hala bilinmemektedir. Bu nedenle, yeni miRNAların belirlenmesi, miRNA aracılı transkripsiyon sonrası düzenleme mekanizmalarını analiz etmek için önemli bir işlemdir. Ayrıca, birçok biyolojik yaklaşım nadir miRNAları ortaya çıkarma kapasitesindeki sınırlamalardan muzdariptir ve inceleme altındaki organizmanın durumuyla daha da kısıtlıdır. Bu tür sınırlamalar olası miRNAların in silico olarak tanımlanması için karmaşık bilişimsel araçların yapımıyla sonuçlanmıştır. Ancak, bu programlar düşük duyarlılık ve/veya doğruluktan muzdariptir ve bunun sonucu olarak da tüm tahminlerin deneysel olarak doğrulaması için yeterince güven vermemektedir. Bu çalışmada amaç, verilen herhangi bir dizideki potansiyel miRNAları tahmin etmek için yeni ve uyarlanabilir makine öğrenme temelli bir yöntem oluşturarak bu zorlukların üstesinden gelmektir. Önerilen yöntemin verimliliği çeşitli veri kümeleri üzerinde uygun araçlar ile karşılaştırılarak gösterilmektedir. Bu yaklaşımı kullanılarak insan (Homo sapiens), meyve sineği (Drosophila melanogaster) ve domates (Solanum lycopersicum) gibi çeşitli organizmaların genomlarından miRNAlar tanımlanmıştır. Ayrıca, hem olası virüs miRNAları ve insan genleri arasındaki ağlar hem de Solanum lycopersicum nükleer ve organel genomları arasındaki miRNA vasıtalı iletişim incelenmiştir.TUBITAK (EEEAG/113E326

    Circular RNA–MicroRNA–MRNA interaction predictions in SARS-CoV-2 infection

    No full text
    Different types of noncoding RNAs like microRNAs (miRNAs) and circular RNAs (circRNAs) have been shown to take part in various cellular processes including post-transcriptional gene regulation during infection. MiRNAs are expressed by more than 200 organisms ranging from viruses to higher eukaryotes. Since miRNAs seem to be involved in host–pathogen interactions, many studies attempted to identify whether human miRNAs could target severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) mRNAs as an antiviral defence mechanism. In this work, a machine learning based miRNA analysis workflow was developed to predict differential expression patterns of human miRNAs during SARS-CoV-2 infection. In order to obtain the graphical representation of miRNA hairpins, 36 features were defined based on the secondary structures. Moreover, potential targeting interactions between human circRNAs and miRNAs as well as human miRNAs and viral mRNAs were investigated

    Delineating the impact of machine learning elements in pre-microRNA detection

    Get PDF
    Gene regulation modulates RNA expression via transcription factors. Posttranscriptional gene regulation in turn influences the amount of protein product through, for example, microRNAs (miRNAs). Experimental establishment of miRNAs and their effects is complicated and even futile when aiming to establish the entirety of miRNA target interactions. Therefore, computational approaches have been proposed. Many such tools rely on machine learning (ML) which involves example selection, feature extraction, model training, algorithm selection, and parameter optimization. Different ML algorithms have been used for model training on various example sets, more than 1,000 features describing pre-miRNAs have been proposed and different training and testing schemes have been used for model establishment. For pre-miRNA detection, negative examples cannot easily be established causing a problem for two class classification algorithms. There is also no consensus on what ML approach works best and, therefore, we set forth and established the impact of the different parts involved in ML on model performance. Furthermore, we established two new negative datasets and analyzed the impact of using them for training and testing. It was our aim to attach an order of importance to the parts involved in ML for pre-miRNA detection, but instead we found that all parts are intricately connected and their contributions cannot be easily untangled leading us to suggest that when attempting ML-based pre-miRNA detection many scenarios need to be explored.Scientific and Technological Research Council of Turkey (113E326

    Computational analysis of microRNA-mediated interactions in SARS-CoV-2 infection

    No full text
    MicroRNAs (miRNAs) are post-transcriptional regulators of gene expression found in more than 200 diverse organisms. Although it is still not fully established if RNA viruses could generate miRNAs, there are examples of miRNA like sequences from RNA viruses with regulatory functions. In the case of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there are several mechanisms that would make miRNAs impact the virus, like interfering with viral replication, translation and even modulating the host expression. In this study, we performed a machine learning based miRNA prediction analysis for the SARS-CoV-2 genome to identify miRNA-like hairpins and searched for potential miRNA-based interactions between the viral miRNAs and human genes and human miRNAs and viral genes. Overall, 950 hairpin structured sequences were extracted from the virus genome and based on the prediction results, 29 of them could be precursor miRNAs. Targeting analysis showed that 30 viral mature miRNA-like sequences could target 1,367 different human genes. PANTHER gene function analysis results indicated that viral derived miRNA candidates could target various human genes involved in crucial cellular processes including transcription, metabolism, defense system and several signaling pathways such as Wnt and EGFR signalings. Protein class-based grouping of targeted human genes showed that host transcription might be one of the main targets of the virus since 96 genes involved in transcriptional processes were potential targets of predicted viral miRNAs. For instance, basal transcription machinery elements including several components of human mediator complex (MED1, MED9, MED12L, MED19), basal transcription factors such as TAF4, TAF5, TAF7L and site-specific transcription factors such as STAT1 were found to be targeted. In addition, many known human miRNAs appeared to be able to target viral genes involved in viral life cycle such as S, M, N, E proteins and ORF1ab, ORF3a, ORF8, ORF7a and ORF10. Considering the fact that miRNA-based therapies have been paid attention, based on the findings of this study, comprehending mode of actions of miRNAs and their possible roles during SARS-CoV-2 infections could create new opportunities for the development and improvement of new therapeutics

    Delineating the impact of machine learning elements in pre-microRNA detection

    No full text
    ABSTRACT Gene regulation modulates RNA expression via transcription factors. Posttranscriptional gene regulation in turn influences the amount of protein product through, for example, microRNAs (miRNAs). Experimental establishment of miRNAs and their effects is complicated and even futile when aiming to establish the entirety of miRNA target interactions. Therefore, computational approaches have been proposed. Many such tools rely on machine learning (ML) which involves example selection, feature extraction, model training, algorithm selection, and parameter optimization. Different ML algorithms have been used for model training on various example sets, more than 1,000 features describing pre-miRNAs have been proposed and different training and testing schemes have been used for model establishment. For pre-miRNA detection, negative examples cannot easily be established causing a problem for two class classification algorithms. There is also no consensus on what ML approach works best and, therefore, we set forth and established the impact of the different parts involved in ML on model performance. Furthermore, we established two new negative datasets and analyzed the impact of using them for training and testing. It was our aim to attach an order of importance to the parts involved in ML for pre-miRNA detection, but instead we found that all parts are intricately connected and their contributions cannot be easily untangled leading us to suggest that when attempting ML-based pre-miRNA detection many scenarios need to be explored

    A machine learning approach for microRNA precursor prediction in retro-transcribing virus genomes

    No full text
    PubMed: 28187417Identification of microRNA (miRNA) precursors has seen increased efforts in recent years. The difficulty in experimental detection of pre-miRNAs increased the usage of computational approaches. Most of these approaches rely on machine learning especially classification. In order to achieve successful classification, many parameters need to be considered such as data quality, choice of classifier settings, and feature selection. For the latter one, we developed a distributed genetic algorithm on HTCondor to perform feature selection. Moreover, we employed two widely used classification algorithms libSVM and random forest with different settings to analyze the influence on the overall classification performance. In this study we analyzed 5 human retro virus genomes; Human endogenous retrovirus K113, Hepatitis B virus (strain ayw), Human T lymphotropic virus 1, Human T lymphotropic virus 2, Human immunodeficiency virus 2, and Human immunodeficiency virus 1. We then predicted pre-miRNAs by using the information from known virus and human pre-miRNAs. Our results indicate that these viruses produce novel unknown miRNA precursors which warrant further experimental validation

    A Machine Learning Approach for MicroRNA Precursor Prediction in Retro-transcribing Virus Genomes

    No full text
    Identification of microRNA (miRNA) precursors has seen increased efforts in recent years. The difficulty in experimental detection of pre-miRNAs increased the usage of computational approaches. Most of these approaches rely on machine learning especially classification. In order to achieve successful classification, many parameters need to be considered such as data quality, choice of classifier settings, and feature selection. For the latter one, we developed a distributed genetic algorithm on HTCondor to perform feature selection. Moreover, we employed two widely used classification algorithms libSVM and random forest with different settings to analyze the influence on the overall classification performance. In this study we analyzed 5 human retro virus genomes; Human endogenous retrovirus K113, Hepatitis B virus (strain ayw), Human T lymphotropic virus 1, Human T lymphotropic virus 2, Human immunodeficiency virus 2, and Human immunodeficiency virus 1. We then predicted pre-miRNAs by using the information from known virus and human pre-miRNAs. Our results indicate that these viruses produce novel unknown miRNA precursors which warrant further experimental validation

    On the performance of pre-microRNA detection algorithms

    Get PDF
    MicroRNAs are crucial for post-transcriptional gene regulation, and their dysregulation has been associated with diseases like cancer and, therefore, their analysis has become popular. The experimental discovery of miRNAs is cumbersome and, thus, many computational tools have been proposed. Here we assess 13 ab initio pre-miRNA detection approaches using all relevant, published, and novel data sets while judging algorithm performance based on ten intrinsic performance measures. We present an extensible framework, izMiR, which allows for the unbiased comparison of existing algorithms, adding new ones, and combining multiple approaches into ensemble methods. In an exhaustive attempt, we condense the results of millions of computations and show that no method is clearly superior; however, we provide a guideline for biomedical researchers to select a tool. Finally, we demonstrate that combining all of the methods into one ensemble approach, for the first time, allows reliable purely computational pre-miRNA detection in large eukaryotic genomes.Scientific Research Council of Turkey (TUBITAK 113E326
    corecore