633 research outputs found

    Identification of microRNA precursors based on random forest with network-level representation method of stem-loop structure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>MicroRNAs (miRNAs) play a key role in regulating various biological processes such as participating in the post-transcriptional pathway and affecting the stability and/or the translation of mRNA. Current methods have extracted feature information at different levels, among which the characteristic stem-loop structure makes the greatest contribution to the prediction of putative miRNA precursor (pre-miRNA). We find that none of these features alone is capable of identifying new pre-miRNA accurately.</p> <p>Results</p> <p>In the present work, a pre-miRNA stem-loop secondary structure is translated to a network, which provides a novel perspective for its structural analysis. Network parameters are used to construct prediction model, achieving an area under the receiver operating curves (AUC) value of 0.956. Moreover, by repeating the same method on two independent datasets, accuracies of 0.976 and 0.913 are achieved, respectively.</p> <p>Conclusions</p> <p>Network parameters effectively characterize pre-miRNA secondary structure, which improves our prediction model in both prediction ability and computation efficiency. Additionally, as a complement to feature extraction methods in previous studies, these multifaceted features can reflect natural properties of miRNAs and be used for comprehensive and systematic analysis on miRNA.</p

    Analysis of Machine Learning Based Methods for Identifying MicroRNA Precursors

    Get PDF
    MicroRNAs are a type of non-coding RNA that were discovered less than a decade ago but are now known to be incredibly important in regulating gene expression despite their small size. However, due to their small size, and several other limiting factors, experimental procedures have had limited success in discovering new microRNAs. Computational methods are therefore vital to discovering novel microRNAs. Many different approaches have been used to scan genomic sequences for novel microRNAs with varying degrees of success. This work provides an overview of these computational methods, focusing particularly on those methods based on machine learning techniques. The results of experiments performed on several of the machine learning based microRNA detectors are provided along with an analysis of their performance

    Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs

    Full text link
    © 2018 The Author(s). Background: Distinction between pre-microRNAs (precursor microRNAs) and length-similar pseudo pre-microRNAs can reveal more about the regulatory mechanism of RNA biological processes. Machine learning techniques have been widely applied to deal with this challenging problem. However, most of them mainly focus on secondary structure information of pre-microRNAs, while ignoring sequence-order information and sequence evolution information. Results: We use new features for the machine learning algorithms to improve the classification performance by characterizing both sequence order evolution information and secondary structure graphs. We developed three steps to extract these features of pre-microRNAs. We first extract features from PSI-BLAST profiles and Hilbert-Huang transforms, which contain rich sequence evolution information and sequence-order information respectively. We then obtain properties of small molecular networks of pre-microRNAs, which contain refined secondary structure information. These structural features are carefully generated so that they can depict both global and local characteristics of pre-microRNAs. In total, our feature space covers 591 features. The maximum relevance and minimum redundancy (mRMR) feature selection method is adopted before support vector machine (SVM) is applied as our classifier. The constructed classification model is named MicroRNA -NHPred. The performance of MicroRNA -NHPred is high and stable, which is better than that of those state-of-the-art methods, achieving an accuracy of up to 94.83% on same benchmark datasets. Conclusions: The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the sequences and secondary structures, which are capable of characterizing the sequence evolution information and sequence-order information, and global and local information of pre-microRNAs secondary structures. MicroRNA -NHPred is a valuable method for pre-microRNAs identification. The source codes of our method can be downloaded from https://github.com/myl446/MicroRNA-NHPred

    High precision in microRNA prediction: a novel genome-wide approach with convolutional deep residual networks

    Get PDF
    MicroRNAs (miRNAs) are small non-coding RNAs that have a key role in the regulation of gene expression. The importance of miRNAs is widely acknowledged by the community nowadays and computational methods are needed for the precise prediction of novel candidates to miRNA. This task can be done by searching homologous with sequence alignment tools, but results are restricted to sequences that are very similar to the known miRNA precursors (pre-miRNAs). Besides, a very important property of pre-miRNAs, their secondary structure, is not taken into account by these methods. To fill this gap, many machine learning approaches were proposed in the last years. However, the methods are generally tested in very controlled conditions. If these methods were used under real conditions, the false positives increase and the precisions fall quite below those published. This work provides a novel approach for dealing with the computational prediction of pre-miRNAs: a convolutional deep residual neural network (mirDNN). This model was tested with several genomes of animals and plants, the full-genomes, achieving a precision up to 5 times larger than other approaches at the same recall rates. Furthermore, a novel validation methodology was used to ensure that the performance reported in this study can be effectively achieved when using mirDNN in novel species. To provide fast an easy access to mirDNN, a web demo is available at http://sinc.unl.edu.ar/web-demo/mirdnn/. The demo can process FASTA files with multiple sequences to calculate the prediction scores and generates the nucleotide importance plots.Fil: Yones, Cristian Ariel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Raad, Jonathan. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Bugnon, Leandro Ariel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentin

    Discovery and annotation of novel microRNAs in the porcine genome by using a semi-supervised transductive learning approach

    Get PDF
    Despite the broad variety of available microRNA (miRNA) prediction tools, their application to the discovery and annotation of novel miRNA genes in domestic species is still limited. In this study we designed a comprehensive pipeline (eMIRNA) for miRNA identification in the yet poorly annotated porcine genome and demonstrated the usefulness of implementing a motif search positional refinement strategy for the accurate determination of precursor miRNA boundaries. The small RNA fraction from gluteus medius skeletal muscle of 48 Duroc gilts was sequenced and used for the prediction of novel miRNA loci. Additionally, we selected the human miRNA annotation for a homology-based search of porcine miRNAs with orthologous genes in the human genome. A total of 20 novel expressed miRNAs were identified in the porcine muscle transcriptome and 27 additional novel porcine miRNAs were also detected by homology-based search using the human miRNA annotation. The existence of three selected novel miRNAs (ssc-miR-483, ssc-miR484 and ssc-miR-200a) was further confirmed by reverse transcription quantitative real-time PCR analyses in the muscle and liver tissues of Göttingen minipigs. In summary, the eMIRNA pipeline presented in the current work allowed us to expand the catalogue of porcine miRNAs and showed better performance than other commonly used miRNA prediction approaches. More importantly, the flexibility of our pipeline makes possible its application in other yet poorly annotated non-model species.info:eu-repo/semantics/acceptedVersio

    BP Neural Network Could Help Improve Pre-miRNA Identification in Various Species

    Get PDF

    Evaluation of blood-based microRNAs toward clinical use as biomarkers in common and rare diseases

    Get PDF
    According to the GLOBOCAN project of the International Agency for Research on Cancer, the top three common cancer diseases worldwide in the year 2020 were breast, lung and colorectal cancer. These are usually diagnosed via imaging methods (e.g. computer tomography) or invasive methods (e.g. biopsy). However, these techniques are potentially risky and expensive and thus not accessible to all patients, resulting in most cancers being detected in an advanced stage. Since the discovery of small non-coding RNAs and specifically microRNAs and their role as gene regulators, many researchers investigate their association with disease development. In particular, researchers examine body fluid based microRNAs which could present potential cost-effective and minimally- or non-invasive alternatives to the previously described established diagnosis methods. This dissertation focuses on microRNAs and investigates their suitability as minimally-invasive blood-borne biomarkers for potential diagnostic purposes. More specifically, the goals of this work are (1) to implement a new method to predict novel microRNAs, (2) to understand stability and characteristics of these small non-coding RNAs, possibly relevant for the last goal, (3) to discover potential diagnostic biomarkers in common and rare diseases. The first goal was addressed by developing miRMaster, a web service to predict new microRNAs. The tool uses machine learning and high-throughput sequencing data to find microRNA candidates that follow the known biogenesis pathways. The second goal was pursued in four publications. First, we performed a large scale evaluation of miRMaster by generating a high-resolution map of the human small non-coding RNA transcriptome for which we analyzed and validated potential microRNA candidates. Next, we examined the influence of seasonal effects on microRNA expression profiles and observed the largest difference between spring and the other seasons. Additionally, we evaluated the evolutionary conservation of small non-coding RNAs in zoo animals and showed that the distribution of sncRNA classes varies across species, while common microRNA families are present in more diverse organisms than assumed so far. Furthermore, we analyzed if microRNAs are technically stable, and whether biological variation is preserved when using capillary dried blood spots as an alternative sample collection device to venous blood specimens. Finally, we investigated the suitability of microRNAs as biomarkers for two diseases: lung cancer and Marfan disease. We identified blood-borne biomarker candidates for lung cancer detection in a large-scale multi-center study via machine learning. For the rare Marfan disease we analyzed the paired messenger RNA and microRNA expression levels in whole-blood samples. This highlighted several significantly deregulated microRNAs and messenger RNAs, which we subsequently validated in an independent cohort. In summary, this thesis provides valuable results toward potential clinical use of microRNAs, and the herein described projects represent comprehensive analyses of them from different perspectives: starting with microRNA discovery, addressing various technical and biological questions and ending with the potential use as biomarkers.Nach Angaben des GLOBOCAN-Projekts der International Agency for Research on Cancer sind die drei häufigsten Krebserkrankungen weltweit im Jahr 2020 Brust-, Lungen- und Darmkrebs. Diese werden in der Regel durch bildgebende Verfahren (z.B. Computertomographie) oder invasive Methoden (z.B. Biopsie) diagnostiziert. Diese Verfahren sind jedoch potenziell risikoreich und teuer und daher nicht für alle Patienten zugänglich. Dies führt dazu, dass die meisten Krebsarten erst in einem fortgeschrittenen Stadium entdeckt werden. Seit der Entdeckung der kurzen nichtkodierenden RNAs und insbesondere der microRNAs und ihrer Rolle als Genregulatoren untersuchen viele Forscher ihren Zusammenhang mit der Krankheitsentwicklung. Insbesondere untersuchen die Forscher die in Körperflüssigkeiten vorkommenden microRNAs, die potenziell kosteneffiziente und minimal- oder nicht-invasive Alternativen zu den bisher beschriebenen etablierten Diagnosemethoden darstellen könnten. Diese Dissertation konzentriert sich auf microRNAs und untersucht deren Eignung als minimal-invasive blutbasierte Biomarker für potenzielle diagnostische Zwecke. Genauer gesagt sind die Ziele dieser Arbeit (1) die Implementierung einer neuen Methode zur Vorhersage neuartiger microRNAs, (2) das Verständnis über die Stabilität und Charakteristika dieser kurzen nicht-kodierenden RNAs, die möglicherweise für das nächste Ziel relevant sind, (3) die Entdeckung potenzieller diagnostischer Biomarker für verschiedene Anwendungen. Das erste Ziel wurde durch die Entwicklung von miRMaster verfolgt, einem Webdienst zur Vorhersage neuer microRNAs. Das Tool nutzt maschinelles Lernen und Hochdurchsatz-Sequenzierungsdaten, um microRNA-Kandidaten zu finden, die den bekannten Wege der Biogenese folgen. Das zweite Ziel wurde in vier Veröffentlichungen verfolgt. Zunächst führten wir eine groß angelegte Evaluierung von miRMaster durch, indem wir eine High-Resolution Map des menschlichen Transkriptoms kurzer nichtkodierender RNAs erstellten, für die wir potenzielle microRNA-Kandidaten analysierten und validierten. Anschließend untersuchten wir den Einfluss saisonaler Effekte auf die microRNA-Expressionsprofile und beobachteten den größten Unterschied zwischen dem Frühling und den anderen Jahreszeiten. Darüber hinaus untersuchten wir die evolutionäre Erhaltung kurzer nichtkodierender RNAs in Zoo-Tieren und zeigten, dass die Verteilung der kurzer nichtkodierenden RNA-Klassen zwischen den Arten variiert, während gemeinsame microRNA-Familien in verschiedeneren Organismen vorkommen als bisher angenommen. Darüber hinaus analysierten wir, ob microRNAs technisch stabil sind und ob die biologische Variation erhalten bleibt, wenn kapillares Trockenblut als alternatives Probenentnahmeverfahren zu venösen Blutproben verwendet werden. Schließlich untersuchten wir die Eignung von microRNAs als Biomarker für zwei Krankheiten: Lungenkrebs und Marfan-Krankheit. In einer groß angelegten multizentrischen Studie identifizierten wir mit Hilfe von maschinellem Lernen Biomarker-Kandidaten aus dem Blut für die Erkennung von Lungenkrebs. Für die seltene Marfan-Krankheit analysierten wir die gepaarten Expressionsniveaus von messengerRNA und microRNA in Vollblutproben. Dabei wurden mehrere signifikant deregulierte microRNAs und messengerRNAs festgestellt, die wir anschließend in einer unabhängigen Kohorte validierten. Zusammenfassend lässt sich sagen, dass diese Arbeit wertvolle Ergebnisse im Hinblick auf die potenzielle klinische Verwendung von microRNAs liefert. Die hier beschriebenen Projekte stellen umfassende Analysen aus verschiedenen Blickwinkeln dar: angefangen bei der Entdeckung von microRNAs, über verschiedene technische und biologische Fragen bis hin zur potenziellen Verwendung als Biomarker
    corecore