439 research outputs found

    Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs

    Full text link
    © 2018 The Author(s). Background: Distinction between pre-microRNAs (precursor microRNAs) and length-similar pseudo pre-microRNAs can reveal more about the regulatory mechanism of RNA biological processes. Machine learning techniques have been widely applied to deal with this challenging problem. However, most of them mainly focus on secondary structure information of pre-microRNAs, while ignoring sequence-order information and sequence evolution information. Results: We use new features for the machine learning algorithms to improve the classification performance by characterizing both sequence order evolution information and secondary structure graphs. We developed three steps to extract these features of pre-microRNAs. We first extract features from PSI-BLAST profiles and Hilbert-Huang transforms, which contain rich sequence evolution information and sequence-order information respectively. We then obtain properties of small molecular networks of pre-microRNAs, which contain refined secondary structure information. These structural features are carefully generated so that they can depict both global and local characteristics of pre-microRNAs. In total, our feature space covers 591 features. The maximum relevance and minimum redundancy (mRMR) feature selection method is adopted before support vector machine (SVM) is applied as our classifier. The constructed classification model is named MicroRNA -NHPred. The performance of MicroRNA -NHPred is high and stable, which is better than that of those state-of-the-art methods, achieving an accuracy of up to 94.83% on same benchmark datasets. Conclusions: The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the sequences and secondary structures, which are capable of characterizing the sequence evolution information and sequence-order information, and global and local information of pre-microRNAs secondary structures. MicroRNA -NHPred is a valuable method for pre-microRNAs identification. The source codes of our method can be downloaded from https://github.com/myl446/MicroRNA-NHPred

    MicroRNA Identification Based on Bioinformatics Approaches

    Get PDF

    Identification of clustered microRNAs using an ab initio prediction method

    Get PDF
    BACKGROUND: MicroRNAs (miRNAs) are endogenous 21 to 23-nucleotide RNA molecules that regulate protein-coding gene expression in plants and animals via the RNA interference pathway. Hundreds of them have been identified in the last five years and very recent works indicate that their total number is still larger. Therefore miRNAs gene discovery remains an important aspect of understanding this new and still widely unknown regulation mechanism. Bioinformatics approaches have proved to be very useful toward this goal by guiding the experimental investigations. RESULTS: In this work we describe our computational method for miRNA prediction and the results of its application to the discovery of novel mammalian miRNAs. We focus on genomic regions around already known miRNAs, in order to exploit the property that miRNAs are occasionally found in clusters. Starting with the known human, mouse and rat miRNAs we analyze 20 kb of flanking genomic regions for the presence of putative precursor miRNAs (pre-miRNAs). Each genome is analyzed separately, allowing us to study the species-specific identity and genome organization of miRNA loci. We only use cross-species comparisons to make conservative estimates of the number of novel miRNAs. Our ab initio method predicts between fifty and hundred novel pre-miRNAs for each of the considered species. Around 30% of these already have experimental support in a large set of cloned mammalian small RNAs. The validation rate among predicted cases that are conserved in at least one other species is higher, about 60%, and many of them have not been detected by prediction methods that used cross-species comparisons. A large fraction of the experimentally confirmed predictions correspond to an imprinted locus residing on chromosome 14 in human, 12 in mouse and 6 in rat. Our computational tool can be accessed on the world-wide-web. CONCLUSION: Our results show that the assumption that many miRNAs occur in clusters is fruitful for the discovery of novel miRNAs. Additionally we show that although the overall miRNA content in the observed clusters is very similar across the three considered species, the internal organization of the clusters changes in evolution

    Identification and analysis of miRNAs in human breast cancer and teratoma samples using deep sequencing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>MiRNAs play important roles in cellular control and in various disease states such as cancers, where they may serve as markers or possibly even therapeutics. Identifying the whole repertoire of miRNAs and understanding their expression patterns is therefore an important goal.</p> <p>Methods</p> <p>Here we describe the analysis of 454 pyrosequencing of small RNA from four different tissues: Breast cancer, normal adjacent breast, and two teratoma cell lines. We developed a pipeline for identifying new miRNAs, emphasizing extracting and retaining as much data as possible from even noisy sequencing data. We investigated differential expression of miRNAs in the breast cancer and normal adjacent breast samples, and systematically examined the mature sequence end variability of miRNA compared to non-miRNA loci.</p> <p>Results</p> <p>We identified five novel miRNAs, as well as two putative alternative precursors for known miRNAs. Several miRNAs were differentially expressed between the breast cancer and normal breast samples. The end variability was shown to be significantly different between miRNA and non-miRNA loci.</p> <p>Conclusion</p> <p>Pyrosequencing of small RNAs, together with a computational pipeline, can be used to identify miRNAs in tumor and other tissues. Measures of miRNA end variability may in the future be incorporated into the discovery pipeline as a discriminatory feature. Breast cancer samples show a distinct miRNA expression profile compared to normal adjacent breast.</p

    miRFam: an effective automatic miRNA classification method based on n-grams and a multiclass SVM

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>MicroRNAs (miRNAs) are ~22 nt long integral elements responsible for post-transcriptional control of gene expressions. After the identification of thousands of miRNAs, the challenge is now to explore their specific biological functions. To this end, it will be greatly helpful to construct a reasonable organization of these miRNAs according to their homologous relationships. Given an established miRNA family system (e.g. the miRBase family organization), this paper addresses the problem of automatically and accurately classifying newly found miRNAs to their corresponding families by supervised learning techniques. Concretely, we propose an effective method, <it>miRFam</it>, which uses only primary information of pre-miRNAs or mature miRNAs and a multiclass SVM, to automatically classify miRNA genes.</p> <p>Results</p> <p>An existing miRNA family system prepared by miRBase was downloaded online. We first employed <it>n</it>-grams to extract features from known precursor sequences, and then trained a multiclass SVM classifier to classify new miRNAs (i.e. their families are unknown). Comparing with miRBase's sequence alignment and manual modification, our study shows that the application of machine learning techniques to miRNA family classification is a general and more effective approach. When the testing dataset contains more than 300 families (each of which holds no less than 5 members), the classification accuracy is around 98%. Even with the entire miRBase15 (1056 families and more than 650 of them hold less than 5 samples), the accuracy surprisingly reaches 90%.</p> <p>Conclusions</p> <p>Based on experimental results, we argue that <it>miRFam </it>is suitable for application as an automated method of family classification, and it is an important supplementary tool to the existing alignment-based small non-coding RNA (sncRNA) classification methods, since it only requires primary sequence information.</p> <p>Availability</p> <p>The source code of <it>miRFam</it>, written in C++, is freely and publicly available at: <url>http://admis.fudan.edu.cn/projects/miRFam.htm</url>.</p

    Computational and experimental tools of MiRNAs in cancer

    Get PDF
    MicroRNAs (miRNAs) are short non-protein coding and single-stranded small RNA molecules with a critical role in the regulation of gene expression. These molecules are crucial regulatory elements in diverse biological processes such as apoptosis, development, and progression. miRNA genes have been associated with various human diseases, particularly cancer, and considered as a new biomarker. After the discovery of miRNAs, many researches have focused on identifying and characterizing miRNA genes in cancer. The various expression levels of miRNAs between cancer cells and normal cells are very crucial to diagnosis, prognosis, and treatment of many cancers. Many computational and experimental tools have been employed to characterize miRNAs. However, there exist some challenges in identifying miRNA using both computational and experimental tools due to miRNA features. The present review briefly introduced miRNA biology and certain computational and experimental tools for identifying and profiling miRNAs in cancer. Furthermore, we presented the advantages and challenges of these tools. © 2020, Shriaz University of Medical Sciences. All rights reserved
    corecore