129 research outputs found

    Analysis of Machine Learning Based Methods for Identifying MicroRNA Precursors

    Get PDF
    MicroRNAs are a type of non-coding RNA that were discovered less than a decade ago but are now known to be incredibly important in regulating gene expression despite their small size. However, due to their small size, and several other limiting factors, experimental procedures have had limited success in discovering new microRNAs. Computational methods are therefore vital to discovering novel microRNAs. Many different approaches have been used to scan genomic sequences for novel microRNAs with varying degrees of success. This work provides an overview of these computational methods, focusing particularly on those methods based on machine learning techniques. The results of experiments performed on several of the machine learning based microRNA detectors are provided along with an analysis of their performance

    MicroRNA Identification Based on Bioinformatics Approaches

    Get PDF

    Characterization and Identification of MicroRNA Core Promoters in Four Model Species

    Get PDF
    MicroRNAs are short, noncoding RNAs that play important roles in post-transcriptional gene regulation. Although many functions of microRNAs in plants and animals have been revealed in recent years, the transcriptional mechanism of microRNA genes is not well-understood. To elucidate the transcriptional regulation of microRNA genes, we study and characterize, in a genome scale, the promoters of intergenic microRNA genes in Caenorhabditis elegans, Homo sapiens, Arabidopsis thaliana, and Oryza sativa. We show that most known microRNA genes in these four species have the same type of promoters as protein-coding genes have. To further characterize the promoters of microRNA genes, we developed a novel promoter prediction method, called common query voting (CoVote), which is more effective than available promoter prediction methods. Using this new method, we identify putative core promoters of most known microRNA genes in the four model species. Moreover, we characterize the promoters of microRNA genes in these four species. We discover many significant, characteristic sequence motifs in these core promoters, several of which match or resemble the known cis-acting elements for transcription initiation. Among these motifs, some are conserved across different species while some are specific to microRNA genes of individual species

    Using a kernel density estimation based classifier to predict species-specific microRNA precursors

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>MicroRNAs (miRNAs) are short non-coding RNA molecules participating in post-transcriptional regulation of gene expression. There have been many efforts to discover miRNA precursors (pre-miRNAs) over the years. Recently, <it>ab initio </it>approaches obtain more attention because that they can discover species-specific pre-miRNAs. Most <it>ab initio </it>approaches proposed novel features to characterize RNA molecules. However, there were fewer discussions on the associated classification mechanism in a miRNA predictor.</p> <p>Results</p> <p>This study focuses on the classification algorithm for miRNA prediction. We develop a novel <it>ab initio </it>method, miR-KDE, in which most of the features are collected from previous works. The classification mechanism in miR-KDE is the relaxed variable kernel density estimator (RVKDE) that we have recently proposed. When compared to the famous support vector machine (SVM), RVKDE exploits more local information of the training dataset. MiR-KDE is evaluated using a training set consisted of only human pre-miRNAs to predict a benchmark collected from 40 species. The experimental results show that miR-KDE delivers favorable performance in predicting human pre-miRNAs and has advantages for pre-miRNAs from the genera taxonomically distant to humans.</p> <p>Conclusion</p> <p>We use a novel classifier of which the characteristic of exploiting local information is particularly suitable to predict species-specific pre-miRNAs. This study also provides a comprehensive analysis from the view of classification mechanism. The good performance of miR-KDE encourages more efforts on the classification methodology as well as the feature extraction in miRNA prediction.</p

    Analysis of Antisense Expression by Whole Genome Tiling Microarrays and siRNAs Suggests Mis-Annotation of Arabidopsis Orphan Protein-Coding Genes

    Get PDF
    MicroRNAs (miRNAs) and trans-acting small-interfering RNAs (tasi-RNAs) are small (20-22 nt long) RNAs (smRNAs) generated from hairpin secondary structures or antisense transcripts, respectively, that regulate gene expression by Watson-Crick pairing to a target mRNA and altering expression by mechanisms related to RNA interference. The high sequence homology of plant miRNAs to their targets has been the mainstay of miRNA prediction algorithms, which are limited in their predictive power for other kingdoms because miRNA complementarity is less conserved yet transitive processes (production of antisense smRNAs) are active in eukaryotes. We hypothesize that antisense transcription and associated smRNAs are biomarkers which can be computationally modeled for gene discovery.We explored rice (Oryza sativa) sense and antisense gene expression in publicly available whole genome tiling array transcriptome data and sequenced smRNA libraries (as well as C. elegans) and found evidence of transitivity of MIRNA genes similar to that found in Arabidopsis. Statistical analysis of antisense transcript abundances, presence of antisense ESTs, and association with smRNAs suggests several hundred Arabidopsis 'orphan' hypothetical genes are non-coding RNAs. Consistent with this hypothesis, we found novel Arabidopsis homologues of some MIRNA genes on the antisense strand of previously annotated protein-coding genes. A Support Vector Machine (SVM) was applied using thermodynamic energy of binding plus novel expression features of sense/antisense transcription topology and siRNA abundances to build a prediction model of miRNA targets. The SVM when trained on targets could predict the "ancient" (deeply conserved) class of validated Arabidopsis MIRNA genes with an accuracy of 84%, and 76% for "new" rapidly-evolving MIRNA genes.Antisense and smRNA expression features and computational methods may identify novel MIRNA genes and other non-coding RNAs in plants and potentially other kingdoms, which can provide insight into antisense transcription, miRNA evolution, and post-transcriptional gene regulation

    Computational classification of small RNAs and their targets

    Get PDF
    Small RNAs, and in particular microRNAs, are currently receiving a great deal of attention due to their important roles in gene regulation and organism development. Recently, new high-throughput technologies have made it possible to sequence hundreds of thousands of small RNAs from a single experimental sample. In this thesis we develop new computational tools to process such high-throughput small RNA datasets in order to identify microRNAs and other biologically interesting small RNA candidates and to predict their target genes. We apply these tools to a variety of plant and animal datasets and present some novel discoveries including miRNAs involved in fruit development in tomato (Solanum lycopersicon).EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Uncovering structural genomic contents of wheat

    Get PDF
    Production rate of wheat, an important food source worldwide, is significantly limited by both biotic and abiotic stress factors. Development of stress resistant cultivars are highly dependent on the understanding of the molecular mechanisms and structural elements in wheat and/or wheat interacting species. The huge and complex genome of bread wheat (BBAADD genome) has stood as a vital obstruction for understanding the molecular mechanisms until the recent availability of wheat reference genome. In this study, we provided improved and/or novel methodologies to reveal structural elements in plants. These methodologies include miRNA identification, manual curation of lncRNAs, identification of lncRNAs using wheat specific prediction models and a comparative analysis of WES data analysis tools. Using these techniques, we here focused on the uncovering of structural genomic contents of wheat. With an improved identification methodologies and manual annotation of lncRNAs, we revealed several miRNAs and lncRNAs in Triticum turgidum species and Wheat stem sawfly (WSS), a major pest of wheat. We provided a comprehensive transcriptome analysis of tetraploid wheat varieties and revealed drought responsive transcripts. Additionally, we presented the first clues of miRNA mobility between WSS larva and hexaploid wheat. Thereby, besides enrichment of the genetic information available for wheat species, this study provides important elements driving both abiotic and biotic stress responses in wheat. In this study, we also applied machine learning approaches for the fast and accurate prediction of lncRNAs in wheat species. With annotated genomes of hexaploid and tetraploid wheats, we provided better accuracy scores (99.81%) over the most popular tools available. Finally, we conducted a comparative analysis of the tools used for variant discovery. Among eight aligners and three callers, we chose the best combination for the variant calling in wheat. Later, we performed variant calling in 48 lines of elite wheat cultivars using the best tool sets. Overall, this study focused on the improvements on the identification of miRNAs, lncRNAs and structural variations in whea

    In silico prediction of active RNA genes in legumes

    No full text
    Accumulating evidence suggests that non-coding RNAs (ncRNAs) play key roles in gene regulation and may form the basis of an inter-gene communication system. MicroRNAs are a class of small non-coding RNAs found in both plants and animals that regulate the expression of other genes. Identification and analysis of microRNAs enhances our understanding of the important roles that microRNAs play in this complex regulatory network. The work presented in this thesis constitutes the first large-scale prediction and characterization of both ncRNAs and miRNAs in the model legume Medicago truncatula and Lotus japonicus, and provides a basis for further research on elucidating ncRNA function in legume genomics..

    Novel Bioinformatic Approaches for Analyzing Next-Generation Sequencing Data

    Get PDF
    In general, DNA reconstruction is deemed as the key of molecular biology since it makes people realize how genotype affects phenotypes. The DNA sequencing technology emerged exactly towards this and has greatly promoted molecular biology’s development. The traditional method, Sanger, is effective but extremely expensive on a cost-per-base basis. This shortcoming of Sanger method leads to the rapid development of next-generation sequencing technologies. The NGS technologies are widely used by virtue of their low-cost, high-throughput, and fast nature. However, they still face major drawbacks such as huge amounts of data as well as relatively short read length compared with traditional methods. The scope of the research mainly focuses upon a quick preliminary analysis of NGS data, identification of genome-wide structural variations (SVs), and microRNA prediction. In terms of preliminary NGS data analysis, the author developed a toolkit named SeqAssist to evaluate genomic library coverage and estimate the redundancy between different sequencing runs. Regarding the genome-wide SV detection, a one-stop pipeline was proposed to identify SVs, which integrates the components of preprocessing, alignment, SV detection, breakpoints revision, and annotation. This pipeline not only detects SVs at the individual sample level, but also identifies consensus SVs at the population and cross-population levels. At last, miRDisc, a pipeline for microRNA discovery, was developed for the identification of three categories of miRNAs, i.e., known, conserved, and novel microRNAs
    corecore