633 research outputs found
Identification of microRNA precursors based on random forest with network-level representation method of stem-loop structure
<p>Abstract</p> <p>Background</p> <p>MicroRNAs (miRNAs) play a key role in regulating various biological processes such as participating in the post-transcriptional pathway and affecting the stability and/or the translation of mRNA. Current methods have extracted feature information at different levels, among which the characteristic stem-loop structure makes the greatest contribution to the prediction of putative miRNA precursor (pre-miRNA). We find that none of these features alone is capable of identifying new pre-miRNA accurately.</p> <p>Results</p> <p>In the present work, a pre-miRNA stem-loop secondary structure is translated to a network, which provides a novel perspective for its structural analysis. Network parameters are used to construct prediction model, achieving an area under the receiver operating curves (AUC) value of 0.956. Moreover, by repeating the same method on two independent datasets, accuracies of 0.976 and 0.913 are achieved, respectively.</p> <p>Conclusions</p> <p>Network parameters effectively characterize pre-miRNA secondary structure, which improves our prediction model in both prediction ability and computation efficiency. Additionally, as a complement to feature extraction methods in previous studies, these multifaceted features can reflect natural properties of miRNAs and be used for comprehensive and systematic analysis on miRNA.</p
Analysis of Machine Learning Based Methods for Identifying MicroRNA Precursors
MicroRNAs are a type of non-coding RNA that were discovered less than a decade ago but are now known to be incredibly important in regulating gene expression despite their small size. However, due to their small size, and several other limiting factors, experimental procedures have had limited success in discovering new microRNAs. Computational methods are therefore vital to discovering novel microRNAs. Many different approaches have been used to scan genomic sequences for novel microRNAs with varying degrees of success. This work provides an overview of these computational methods, focusing particularly on those methods based on machine learning techniques. The results of experiments performed on several of the machine learning based microRNA detectors are provided along with an analysis of their performance
Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs
© 2018 The Author(s). Background: Distinction between pre-microRNAs (precursor microRNAs) and length-similar pseudo pre-microRNAs can reveal more about the regulatory mechanism of RNA biological processes. Machine learning techniques have been widely applied to deal with this challenging problem. However, most of them mainly focus on secondary structure information of pre-microRNAs, while ignoring sequence-order information and sequence evolution information. Results: We use new features for the machine learning algorithms to improve the classification performance by characterizing both sequence order evolution information and secondary structure graphs. We developed three steps to extract these features of pre-microRNAs. We first extract features from PSI-BLAST profiles and Hilbert-Huang transforms, which contain rich sequence evolution information and sequence-order information respectively. We then obtain properties of small molecular networks of pre-microRNAs, which contain refined secondary structure information. These structural features are carefully generated so that they can depict both global and local characteristics of pre-microRNAs. In total, our feature space covers 591 features. The maximum relevance and minimum redundancy (mRMR) feature selection method is adopted before support vector machine (SVM) is applied as our classifier. The constructed classification model is named MicroRNA -NHPred. The performance of MicroRNA -NHPred is high and stable, which is better than that of those state-of-the-art methods, achieving an accuracy of up to 94.83% on same benchmark datasets. Conclusions: The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the sequences and secondary structures, which are capable of characterizing the sequence evolution information and sequence-order information, and global and local information of pre-microRNAs secondary structures. MicroRNA -NHPred is a valuable method for pre-microRNAs identification. The source codes of our method can be downloaded from https://github.com/myl446/MicroRNA-NHPred
High precision in microRNA prediction: a novel genome-wide approach with convolutional deep residual networks
MicroRNAs (miRNAs) are small non-coding RNAs that have a key role in the regulation of gene expression. The importance of miRNAs is widely acknowledged by the community nowadays and computational methods are needed for the precise prediction of novel candidates to miRNA. This task can be done by searching homologous with sequence alignment tools, but results are restricted to sequences that are very similar to the known miRNA precursors (pre-miRNAs). Besides, a very important property of pre-miRNAs, their secondary structure, is not taken into account by these methods. To fill this gap, many machine learning approaches were proposed in the last years. However, the methods are generally tested in very controlled conditions. If these methods were used under real conditions, the false positives increase and the precisions fall quite below those published. This work provides a novel approach for dealing with the computational prediction of pre-miRNAs: a convolutional deep residual neural network (mirDNN). This model was tested with several genomes of animals and plants, the full-genomes, achieving a precision up to 5 times larger than other approaches at the same recall rates. Furthermore, a novel validation methodology was used to ensure that the performance reported in this study can be effectively achieved when using mirDNN in novel species. To provide fast an easy access to mirDNN, a web demo is available at http://sinc.unl.edu.ar/web-demo/mirdnn/. The demo can process FASTA files with multiple sequences to calculate the prediction scores and generates the nucleotide importance plots.Fil: Yones, Cristian Ariel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Raad, Jonathan. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Bugnon, Leandro Ariel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentin
Discovery and annotation of novel microRNAs in the porcine genome by using a semi-supervised transductive learning approach
Despite the broad variety of available microRNA (miRNA) prediction tools, their application to the discovery and annotation of novel miRNA genes in domestic species is still limited. In this study we designed a comprehensive pipeline (eMIRNA) for miRNA identification in the yet poorly annotated porcine genome and demonstrated the usefulness of implementing a motif search positional refinement strategy for the accurate determination of precursor miRNA boundaries. The small RNA fraction from gluteus medius skeletal muscle of 48 Duroc gilts was sequenced and used for the prediction of novel miRNA loci. Additionally, we selected the human miRNA annotation for a homology-based search of porcine miRNAs with orthologous genes in the human genome. A total of 20 novel expressed miRNAs were identified in the porcine muscle transcriptome and 27 additional novel porcine miRNAs were also detected by homology-based search using the human miRNA annotation. The existence of three selected novel miRNAs (ssc-miR-483, ssc-miR484 and ssc-miR-200a) was further confirmed by reverse transcription quantitative real-time PCR analyses in the muscle and liver tissues of Göttingen minipigs. In summary, the eMIRNA pipeline presented in the current work allowed us to expand the catalogue of porcine miRNAs and showed better performance than other commonly used miRNA prediction approaches. More importantly, the flexibility of our pipeline makes possible its application in other yet poorly annotated non-model species.info:eu-repo/semantics/acceptedVersio
Evaluation of blood-based microRNAs toward clinical use as biomarkers in common and rare diseases
According to the GLOBOCAN project of the International Agency for Research on Cancer, the top three common cancer diseases worldwide in the year 2020 were breast, lung and colorectal cancer. These are usually diagnosed via imaging methods (e.g. computer tomography) or invasive methods (e.g. biopsy). However, these techniques are potentially risky and expensive and thus not accessible to all patients, resulting in most cancers being detected in an advanced stage. Since the discovery of small non-coding RNAs and specifically microRNAs and their role as gene regulators, many researchers investigate their association with disease development. In particular, researchers examine body fluid based microRNAs which could present potential cost-effective and minimally- or non-invasive alternatives to the previously described established diagnosis methods.
This dissertation focuses on microRNAs and investigates their suitability as minimally-invasive blood-borne biomarkers for potential diagnostic purposes. More specifically, the goals of this work are (1) to implement a new method to predict novel microRNAs, (2) to understand stability and characteristics of these small non-coding RNAs, possibly relevant for the last goal, (3) to discover potential diagnostic biomarkers in common and rare diseases. The first goal was addressed by developing miRMaster, a web service to predict new microRNAs. The tool uses machine learning and high-throughput sequencing data to find microRNA candidates that follow the known biogenesis pathways. The second goal was pursued in four publications. First, we performed a large scale evaluation of miRMaster by generating a high-resolution map of the human small non-coding RNA transcriptome for which we analyzed and validated potential microRNA candidates. Next, we examined the influence of seasonal effects on microRNA expression profiles and observed the largest difference between spring and the other seasons. Additionally, we evaluated the evolutionary conservation of small non-coding RNAs in zoo animals and showed that the distribution of sncRNA classes varies across species, while common microRNA families are present in more diverse organisms than assumed so far. Furthermore, we analyzed if microRNAs are technically stable, and whether biological variation is preserved when using capillary dried blood spots as an alternative sample collection device to venous blood specimens. Finally, we investigated the suitability of microRNAs as biomarkers for two diseases: lung cancer and Marfan disease. We identified blood-borne biomarker candidates for lung cancer detection in a large-scale multi-center study via machine learning. For the rare Marfan disease we analyzed the paired messenger RNA and microRNA expression levels in whole-blood samples. This highlighted several significantly deregulated microRNAs and messenger RNAs, which we subsequently validated in an independent cohort.
In summary, this thesis provides valuable results toward potential clinical use of microRNAs, and the herein described projects represent comprehensive analyses of them from different perspectives: starting with microRNA discovery, addressing various technical and biological questions and ending with the potential use as biomarkers.Nach Angaben des GLOBOCAN-Projekts der International Agency
for Research on Cancer sind die drei häufigsten Krebserkrankungen
weltweit im Jahr 2020 Brust-, Lungen- und Darmkrebs. Diese werden in
der Regel durch bildgebende Verfahren (z.B. Computertomographie)
oder invasive Methoden (z.B. Biopsie) diagnostiziert. Diese Verfahren
sind jedoch potenziell risikoreich und teuer und daher nicht für alle
Patienten zugänglich. Dies führt dazu, dass die meisten Krebsarten
erst in einem fortgeschrittenen Stadium entdeckt werden. Seit der
Entdeckung der kurzen nichtkodierenden RNAs und insbesondere
der microRNAs und ihrer Rolle als Genregulatoren untersuchen viele
Forscher ihren Zusammenhang mit der Krankheitsentwicklung. Insbesondere
untersuchen die Forscher die in Körperflüssigkeiten vorkommenden
microRNAs, die potenziell kosteneffiziente und minimal- oder
nicht-invasive Alternativen zu den bisher beschriebenen etablierten
Diagnosemethoden darstellen könnten.
Diese Dissertation konzentriert sich auf microRNAs und untersucht
deren Eignung als minimal-invasive blutbasierte Biomarker
für potenzielle diagnostische Zwecke. Genauer gesagt sind die Ziele
dieser Arbeit (1) die Implementierung einer neuen Methode zur
Vorhersage neuartiger microRNAs, (2) das Verständnis über die Stabilität
und Charakteristika dieser kurzen nicht-kodierenden RNAs, die
möglicherweise für das nächste Ziel relevant sind, (3) die Entdeckung
potenzieller diagnostischer Biomarker für verschiedene Anwendungen.
Das erste Ziel wurde durch die Entwicklung von miRMaster verfolgt,
einem Webdienst zur Vorhersage neuer microRNAs. Das Tool nutzt
maschinelles Lernen und Hochdurchsatz-Sequenzierungsdaten, um
microRNA-Kandidaten zu finden, die den bekannten Wege der
Biogenese folgen. Das zweite Ziel wurde in vier Veröffentlichungen
verfolgt. Zunächst führten wir eine groß angelegte Evaluierung
von miRMaster durch, indem wir eine High-Resolution Map des
menschlichen Transkriptoms kurzer nichtkodierender RNAs erstellten,
für die wir potenzielle microRNA-Kandidaten analysierten und
validierten. Anschließend untersuchten wir den Einfluss saisonaler
Effekte auf die microRNA-Expressionsprofile und beobachteten
den größten Unterschied zwischen dem Frühling und den anderen
Jahreszeiten. Darüber hinaus untersuchten wir die evolutionäre
Erhaltung kurzer nichtkodierender RNAs in Zoo-Tieren und zeigten,
dass die Verteilung der kurzer nichtkodierenden RNA-Klassen zwischen
den Arten variiert, während gemeinsame microRNA-Familien
in verschiedeneren Organismen vorkommen als bisher angenommen.
Darüber hinaus analysierten wir, ob microRNAs technisch
stabil sind und ob die biologische Variation erhalten bleibt, wenn
kapillares Trockenblut als alternatives Probenentnahmeverfahren zu
venösen Blutproben verwendet werden. Schließlich untersuchten wir
die Eignung von microRNAs als Biomarker für zwei Krankheiten:
Lungenkrebs und Marfan-Krankheit. In einer groß angelegten multizentrischen
Studie identifizierten wir mit Hilfe von maschinellem
Lernen Biomarker-Kandidaten aus dem Blut für die Erkennung von
Lungenkrebs. Für die seltene Marfan-Krankheit analysierten wir die
gepaarten Expressionsniveaus von messengerRNA und microRNA
in Vollblutproben. Dabei wurden mehrere signifikant deregulierte
microRNAs und messengerRNAs festgestellt, die wir anschließend in
einer unabhängigen Kohorte validierten.
Zusammenfassend lässt sich sagen, dass diese Arbeit wertvolle
Ergebnisse im Hinblick auf die potenzielle klinische Verwendung von
microRNAs liefert. Die hier beschriebenen Projekte stellen umfassende
Analysen aus verschiedenen Blickwinkeln dar: angefangen bei der
Entdeckung von microRNAs, über verschiedene technische und biologische
Fragen bis hin zur potenziellen Verwendung als Biomarker
- …